At its DevSummit conference this week, Arm said that is next-generation GPU architecture due in 2022 will almost double performance compared to this year's Arm Mali-G710 as far as FP32 machine learning performance is concerned. When compared to Arm's previous-generation architecture from 2018, the new GPU will be almost five times faster in said workloads.
Speaking at DevSummit, Arm's Ian Bratt, Senior Director of Technology for the ML business unit at Arm, showed a slide demonstrating how quickly machine learning performance of Arm's GPU architectures progresses. The company's 2022 GPU architecture is now expected to be 4.7 times faster than the Mali-G76 in FP32 ML workloads as far as per-core performance is concerned. A GPU core is a cluster of execution units, texture units, raster units, and other application specific hardware.
It is unclear how FP32 ML performance uplift affects performance in games or overall performance-per-watt, but we do know that the latest Arm Mali G-710 provides a 35% higher ML performance and a 20% higher graphics performance in an ISO-process node GPU configuration compared to a Mali-G78 implementation. That said, it is evident how important ML performance is for Arm. Meanwhile, Arm needs to ensure that software developers have tools to take advantage of its latest technologies.
"It's more than just adding instructions and improving hardware IP, we also have to provide the software, the tools, the libraries to enable that ML performance," said Bratt, according to The Register.
Arm significantly accelerated its GPU architecture development in recent years. Back in the day, Arm could sit on a single GPU architecture for about five years and while that architecture evolved quite meaningfully over time both in terms of per-core performance and in terms of the number of cores supported, these architectural performance enhancements were not exactly breakthrough. Starting from 2016, Arm went to a three-year GPU architecture cadence while continuing to introduce a new iteration of its architectures every year.
That speed-up led to a very quick evolution of capabilities and performance. For example, the Mali-G710 (based on the Valhalla 3 architecture) introduced this year is two or three times faster than the Mali-G76 (Bifrost 3 architecture) announced in 2018 as far as per-core graphics performance is concerned.
What remains to be seen is how Arm's GPU architectures will evolve if (or when) regulators around the world approve the company's acquisition by Nvidia. The Santa Clara, California-based company is the world's largest supplier of discrete GPUs as well as compute GPUs used for variety of workloads, including machine learning, so it is unclear whether Nvidia continues to develop Mali GPUs, or will reassign Mali GPU developers to its own architectures.
Qualcomm, Samsung, Apple, and NVIDIA have their own designs, so they don't use ARM's.
200% faster would mean 3x the baseline, but Arm said it was almost twice as fast, not three times as fast. Big difference.
Also the claim "4.7 times faster" is confusing. It's not clear what it means, since we usually say "N times as fast", not "N times faster". Arm meant 4.7x, so it's 4.7 times as fast as the baseline. As a percentage, that means 370% faster.