Fujitsu announced that it has developed a new deep learning technology that can significantly increase the efficiency of deep learning on highly parallel GPU-based systems.
The Problem With Deep Learning On Multiple GPUs
Over the past few years, there has been an explosion of interest in deep learning as a better way to train machines to do certain tasks. Because of this, GPUs, which are well suited for processing many pieces of similar data simultaneously, have also become the centerpiece technology for deep learning development.
However, even with GPUs, it still takes too much time to create a new algorithm with deep learning out of large amounts of data. One of the issues is that deep learning and other GPU-related operations don’t scale that well over multiple GPUs.
The conventional method for accelerating deep learning that would normally be done on a single GPU is to link multiple computers in parallel and share the data across them. However, as more machines are added into the mix, it becomes progressively harder to share data between the machines, and the total performance starts seeing diminishing returns.
Fujitsu’s Efficient Data Sharing Across Multiple Machines
Fujitsu Laboratories said it has created new software technology that can more efficiently share the data between computers and applied it to Caffe, a popular open source deep learning framework.
Fujitsu’s new technology can automatically control the priority order for data transmission. This technique is used to send the data that is needed for the next learning process ahead of time to all the machines. This way, some of the delay that existed in alternative solutions is removed, and the operations can be performed in a shorter amount of time.
Fujitsu also improved how its software deals with various data sizes, by automatically applying the optimal operational method and thus minimizing the total operation time.
Fujitsu Tests Its New Technology
The company then tested it on AlexNet, an image classification neural network. Fujitsu’s technology managed to achieve a performance 14.7x that of a single GPU when using 16 GPUs, and a 27x improvement when it used 64 GPUs.
No software is perfectly parallel today, which is why we’re still seeing “only” a 27x improvement when using 64 GPUs, instead of a 64x improvement. However, the performance of Fujitsu software still showed an improvement in learning speeds of 46% for 16 GPUs and 71% for 64 GPUs, compared to more conventional software.
The more efficient software can help academics, governments and other companies significantly shorten the time it takes for them train a deep learning algorithm for various research purposes and product development.
Fujitsu aims to start commercializing this technology as part of Fujitsu’s AI technology, Human Centric AI Zinrai, by the end of 2016.