At Google I/O 2017, Google announced its next-generation machine learning chip, called the “Cloud TPU.” The new TPU no longer does only inference--now it can also train neural networks.
First Gen TPU
Google created its own TPU to jump “three generations” ahead of the competition when it came to inference performance. The chip seems to have delivered, as Google published a paper last month in which it demonstrated that the TPU could be up to 30x faster than a Kepler GPU and up to 80x faster than a Haswell CPU.
The comparison wasn’t quite fair, as those chips were a little older, but more importantly, they weren’t intended for inference.
Nvidia was quick to point out that its inference-optimized Tesla P40 GPU is already twice as fast as the TPU for sub-10ms latency applications. However, the TPU was still almost twice as fast as the P40 in peak INT8 performance (90TOPS vs 48TOPS).
The P40 also achieved its performance using more than three times as much power, so this comparison wasn’t that fair, either. The bottom line is that right now it’s not easy to compare wildly different architectures to each other when it comes to machine learning tasks.
Cloud TPU Performance
In last month’s paper, Google hinted that a next-generation TPU could be significantly faster if certain modifications were made. The Cloud TPU seems to have have received some of those improvements. It’s now much faster, and it can also do floating-point computation, which means it’s suitable for training neural networks, too.
According to Google, the chip can achieve 180 teraflops of floating-point performance, which is six times more than Nvidia’s latest Tesla V100 accelerator for FP16 half-precision computation. Even when compared against Nvidia’s “Tensor Core” performance, the Cloud TPU is still 50% faster.
Google made the Cloud TPU highly scalable and noted that 64 units can be put together to form a “pod” with a total performance of 11.5 petaflops of computation for a single machine learning task.
Strangely enough, Google hasn’t given the numbers for inference performance yet, but it may reveal them in the near future. Power consumption was not revealed either, as it was for the TPU.
Cloud TPUs For Everyone
Up until now, Google has kept its TPUs to itself, likely because it was still experimental technology and the company wanted to first see how it fared in the real world. However, the company will now make the Cloud TPUs available to all of its Google Compute Engine customers. Customers will be able to mix and match Cloud TPUs with Intel CPUs, Nvidia GPUs, and the rest of its hardware infrastructure to optimize their own machine learning solutions.
It almost goes without saying that the Cloud TPUs support the TensorFlow machine learning software library, which Google open sourced in 2015.
Google will also donate access to 1,000 Cloud TPUs to top researchers under the TensorFlow Research Cloud program to see what people do with them.
Update, 5/18/17, 7:52am PT: Fixed typo.