AWS Graviton4 CPU benchmarked against AMD and Intel processors — faster than predecessors and more cost-effective

AWS
(Image credit: AWS)

Graviton processors from Amazon Web Services can only be accessed in the cloud, but Phoronix managed to benchmark the latest Graviton4 and compare it to rivals from AMD and Intel. As it turns out, the AWS Graviton4 chip offers massive generation-to-generation improvements over its predecessor and can beat AMD's EPYC 'Genoa' and Intel Xeon 'Sapphire Rapids' in a variety of benchmarks.   

Amazon's Graviton4 packs 96 Arm Neoverse V2 cores with 2MB of L2 cache per core and features 12 channels for DDR5-5600 memory, making it a very powerful CPU for a variety of workloads. Amazon Web Services offers Graviton4 for R8g instances, which provide up to three times more virtual CPUs (vCPUs) and triple the memory compared to the existing R7g instances. 

The AWS Graviton4 performs impressively in high-performance computing (HPC) workloads, showcasing significant generational improvements and competitive performance against other processors. For example, in the miniFE finite element mini HPC benchmark, Graviton4 outperformed Graviton3 and Graviton2 significantly and even led over the AMD EPYC 9R14 Genoa instance, thanks to the Neoverse-V2 cores and additional memory bandwidth. Similarly, in the Xcompact3d Incompact3d benchmark involving complex fluid dynamics simulations, Graviton4 delivered substantial generational gains, surpassing the AMD EPYC 9R14 instance in raw performance and cost efficiency. 

For OpenFOAM, which simulates fluid dynamics, Graviton4 competed closely with AMD EPYC and Intel Xeon instances, with a particularly favorable cost-performance ratio. In molecular dynamics simulations using GROMACS, Graviton4 demonstrated significant performance gains over its predecessors, although AMD EPYC and Intel Xeon remained the fastest instances in these tests.   

What was particularly surprising is that Graviton4 also outperformed all of its rivals in 7-Zip compression benchmarks. Although this tool is not often used in the cloud, it demonstrates that Graviton4 is proficient in compression/decompression workloads. 

In cryptographic benchmarks, the AWS Graviton4 demonstrated substantial improvements over its predecessors and offered competitive performance against processors from AMD and Intel. In the Xmrig GhostRider test, Graviton4 achieved a 2.82x improvement over Graviton3, significantly outperforming the Intel Xeon R7i and only trailing behind the AMD EPYC instance. 

Additionally, in various OpenSSL benchmarks, Graviton4 shows significant generational improvements. For example, in the ChaCha20 algorithm test, Graviton4 achieved 308,611,390,393 bytes per second, which is nearly triple the performance of Graviton3 and substantially higher than Graviton2. Yet, Graviton4 was substantially behind its rivals AMD and Intel. Similar gains were seen in the AES-128-GCM and ChaCha20-Poly1305 tests, where Graviton4 consistently outperformed its predecessors and provided competitive results against AMD EPYC but fell behind Intel Xeon processors.

Graviton4 demonstrates impressive performance in compilation workloads, significantly outperforming previous generations and competing well with high-end processors. In the Timed Compilation of the Gem5 simulator, Graviton4 completed the task in 186.77 seconds, compared to 213.29 seconds for AMD EPYC 9R14 and 244.88 seconds for Intel Xeon 8488C, showcasing faster build times. In the Timed LLVM compilation using the Ninja build system, Graviton4 completed in 182.06 seconds, surpassing Graviton3's 257.85 seconds and Graviton2's 344.43 seconds. 

When it comes to database performance, in PostgreSQL 16, Graviton4 offers significant performance improvements over Graviton3, which can be attributed to the increased core count, enhanced Neoverse-V2 microarchitecture, and support for twelve-channel DDR5-5600 memory. Still, Graviton4 is slower than AMD and Intel. As for RocksDB 9.0, Graviton4 not only significantly increases performance compared to predecessor, but outperforms both AMD and Intel instances.

The AWS Graviton4 shows impressive performance in Blender 4.0.2 benchmarks, significantly improving over previous generations and competing well with other processors. Graviton4 completed the BMW27 render test in 50.64 seconds, compared to 62.77 seconds for Graviton3 and 83.01 seconds for Graviton2. The Classroom render test saw Graviton4 finishing in 105.39 seconds, outperforming Graviton3's 129.48 seconds and Graviton2's 169.86 seconds. The Fishy Cat render test had Graviton4 completing in 95.01 seconds, better than Graviton3's 115.47 seconds and Graviton2's 147.77 seconds. Graviton4 finished the Barbershop render test in 499.63 seconds, surpassing Graviton3's 644.98 seconds. Still, Graviton4 is consistently slower than AMD EPYC and Intel Xeon processors here.

The performance-per-dollar analysis also favors Graviton4; the Gem5 compilation offered the best value at approximately $0.186 per run, while AMD EPYC and Intel Xeon instances cost around $0.288 per run. In the Godot game engine compilation, Graviton4 costs $0.155 per run, compared to $0.172 for AMD EPYC and $0.194 for Intel Xeon. These results highlight Graviton4's efficiency and cost-effectiveness for CI/CD build servers and other compilation-intensive tasks

To sum up, the AWS Graviton4 delivers significant generational improvements and competitive performance against AMD and Intel. It excels in HPC, Blender 4.0.2, 7-Zip compression, cryptographic tasks, and compilation workloads, often providing better performance-per-cost ratios. Despite trailing in some cryptographic and database benchmarks, it outperforms rivals in RocksDB 9.0, making it a competitive choice for various tasks.

Anton Shilov
Contributing Writer

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

  • bit_user
    Edit: Oops! I assumed he was talking about today's round of benchmarks! My comments were written against this Phoronix article:
    https://www.phoronix.com/review/graviton4-96-core
    A key detail of these benchmarks is that all instances had SMT disabled. I thought it was impressive to see the (lower-clocked?) Graviton 4 manage a few decisive victories over EPYC (both with the same core count). If you excluded the benchmarks where AVX-512 plays a significant role, I think the results would've tipped even more in Graviton 4's favor.

    It's great to see Emerald Rapids and Sierra Forest in the mix, even with the former competing at a core-count disadvantage (and the latter obviously leaning far in the opposite direction, with 144 E-cores).

    BTW, the author (Anton) neglected to mention the final Geomean, which went 21.3% in favor of EPYC 9684X. The EPYC 9654 even beat it by 11.5%.


    Regarding the Phoronix tests actually discussed in the article, I have only two observations I'd like to note:
    The Intel CPU (Sapphire Rapids Xeon 8488C) was the only processor with SMT/Hyperthreading enabled, putting it at a substantial disadvantage.
    The Geomean of EPYC 9R14 was 24.9% greater than that of the Graviton 4, which is a much better win for AMD at 64 cores than the 11.5% they managed with the non-X3D 96-core CPU (above).
    Reply