Nvidia's Ampere A100 GPU has been out since May, but we had no idea of how it performed until today. Jules Urbach, founder and CEO of OTOY, has tweeted what seems to be the first benchmark of the A100.
The A100 scored 446 points on OctaneBench, thus claiming the title of fastest GPU to ever grace the benchmark. The Nvidia Titan V was the previous record holder with an average score of 401 points. The A100 delivered up to 11.2% higher performance than the Titan V. Urbach highlighted that the A100 run was with RTX disabled.
It doesn't come as a complete shock that the A100 would topple the Titan V if you look closely at the A100's composition. The GA100 silicon measures 826 millimeters-squared and flaunts 54.2 billion transistors, which is possible, thanks to TSMC's 7nm FinFET manufacturing process. The silicon comes equipped with 128 streaming multiprocessors (SMs), amounting to 8,192 CUDA cores. The A100 doesn't leverage the full die, but its specifications are impressive nonetheless.
The A100 is equipped with 6,912 CUDA cores and 432 Tensor cores. The GPU's other imposing traits include 40GB of HBM2E memory across a 5,120-bit memory interface for a bandwidth up to a whopping 1,555 GBps. The Titan V's 5,120 CUDA cores and 12GB of HBM2 memory look paltry beside the A100.
OctaneBench benchmarks graphics cards with the OctaneRender, and one of its requirements is Nvidia CUDA. Therefore, you won't find any Radeon GPUs from the Red Team on the leaderboard. You will find a generous lot of GeForce, Quadro and Tesla devices on the list though.
The GeForce RTX 2080 Ti ranks 14 on the OctaneBench leaderboard with an average score of 302. The A100 is up to 47.7% faster. Keep in mind that GA100 silicon is tailored for Nvidia's data center products. It's delusional to think it will make its way to Nvidia's forthcoming consumer graphics cards, presumably dubbed RTX 3080 and RTX 3090. The A100 is the successor to the GV100 (Volta), so it could end up in a Titan GPU.
Multiple rumors claim that mainstream Ampere graphics cards may employ the GA102 die. Obviously, it'll be smaller in comparison to the GA100 and ultimately features less SMs. Thus far, the GA102 die is rumored to have up to 84 SMs that results in 5,376 CUDA cores. This might be the silicon that Nvidia uses for the GeForce RTX 3080 Ti or GeForce RTX 3090. In any event, it's unlikely that the GA102 will outperform the GA100 given the CUDA core deficit. It could be a close fight if Nvidia gives the GA102-based GPUs some crazy clock speeds.
Ampere will undoubtedly bring an important performance upgrade over Turing. There are many figures that are being thrown around, and we won't know exactly how much until Nvidia officially drops Ampere. In all likelihood, the generation-over-generation uplift will be less than 47.7%.