At Google I/O 2019 this week, Google announced that its Cloud TPU (tensor processing unit) pods, featuring both the 2nd and 3rd generations of its TPU chips, are now available in public beta on its Google Cloud Platform.
Cloud TPU Pods With 1,000 TPUs
TPUs are ASICs Google developed for machine learning workloads. When Google first announced its TPU chip in 2016, it was a revelation in terms of inference performance. The chip showed up to 30 times higher performance than an Nvidia Kepler GPU (which lacked any optimization for inference at the time) and 80 times the performance of an Intel Haswell CPU. In 2017, Google announced the second-generation TPU, called the “Cloud TPU.” The new chip could now be used not just for inference (running trained machine learning neural network models) but also for training.
Now, developers can access either a full 1,000-TPU pod or “slices” of this pod. Previously, Cloud TPU pods supported 256 Cloud TPUs, but it seems Google has now created a toroidal mesh network across multiple racks, so that a TPU pod can contain more than 1,000 TPUs. Developers can access slices of the pod as small as 16-cores (two TPUs) if they are on a budget.
Google showed at its I/O event that a 256 Cloud TPU v2 slice can train a standard ResNet-50 image classification model using the ImageNet data in 11.3 minutes, while a 256 Cloud TPU v3 slice can train it in 7.1 minutes. According to these numbers, the Cloud TPU v2 is 60% slower than the Cloud TPU v2. Using a single TPU, the same model would be trained in 302 minutes.
Information for accessing the public beta is available on Google's blog post.
Google Cloud TPU's Evolution
When it debuted, Google said then its 2nd-generation TPU could achieve 180 teraflops (TFLOPS) of floating-point performance, or six times more than Nvidia’s latest Tesla V100 accelerator for FP16 half-precision computation. The Cloud TPU also had a 50% advantage over Nvidia’s Tensor Core performance. Google designed its Cloud TPU pods with 64 TPUs each, for a total peak performance of 11.5 petaFLOPS.
A year later, in 2018, the company announced version 3 of its TPU with a performance rated at 420 TFLOPS. The company also announced a new liquid-cooled pod configuration with eight times the performance of the previous one, featuring 256 TPUs and 100 petaFLOPS performance.
Even though Google doesn’t sell the Cloud TPUs directly (it only sells an inference-optimized version called Edge TPU), by giving developers access to them in the cloud, the company is still competing with companies such as Nvidia or Intel that would like developers to buy their machine learning hardware instead. The TPUs tend to have better performance/dollar compared to the alternatives, which should put pressure on machine learning chipmakers to offer higher value.
Google Cloud TPU Use Cases
Google has clarified that not all types of machine learning applications are suited for the Cloud TPU. According to Google, the ones that make the most sense include:
- Models dominated by matrix computations
- Models with no custom TensorFlowo perating inside the main training loop
- Models that rain for weeks or months
- Larger and very large models with very large effective batch sizes
Additionally, Google has recommended against using TPUs for applications such as Linear algebra programs that require frequent branching and workloads that access memory in a sparse manner or require high-precision arithmetic.