At re:Invent 2019 on Tuesday in Las Vegas, Amazon Web Services (AWS) announced its second-generation of Arm-based homegrown processors called Graviton2. Featuring a four times higher core count than the A1 instances they succeed, the 30-billion-transistor processors will power Amazon’s sixth-generation EC2 instances. AWS claims the M6g instances have a significant uplift in performance and cost efficiency compared to the Intel-based M5 instance.
From a high level, the Graviton2 processor is a custom AWS-designed chip built on 7nm with 30 billion transistors. It is based on Arm’s Neoverse N1 cores. Amazon claims they have a 7x performance uplift compared to the Graviton-based A1 instances that it announced at re:Invent 2018.
As a refresher, Graviton was based on the 2015, 64-bit Armv8 Cortex-A72 microarchitecture of the first-generation 16nm Neoverse platform. It featured four quad-core clusters with 2MB of L2 cache each, for a total of 16 cores running at 2.3GHz.
Graviton2 is based on the second-gen, 7nm Neoverse platform, codenamed Ares. Arm detailed Ares’ 4-wide Neoverse N1 microarchitecture in February and claimed it delivered a 30% increase in power efficiency (at the same frequency) and a 60% uplift in IPC while also providing twice the floating-point SIMD performance per core. While Arm touts impressive numbers, we’re still talking about an architecture that is similar to the one found in cellphones, as it is the counter-part of the Cortex- A76. Describing its architecture, Arm stated:
“The pipeline is an 11-stage accordion pipeline, which shortens in the presence of branch misses, and lengthens in normal operation. It uses a 4-wide front-end, with 8-wide dispatch/issue, three full 64-bit integer ALUs and a dedicated branch unit. The Neon Advanced SIMD pipeline is substantially wider, with dual 128b data paths. The ability to feed the SIMD engine is also widened substantially to dual 128-bit load / store pipeline with decoupled address/data, enabling sustained 2x128 performance.”
Arm said N1 could scale to 128 cores, however, Graviton2 features “just” 64 cores connected by a 2TB/s mesh architecture. It also has twice the amount of L2 cache per core and 5x faster memory, due to 8 DDR4-3200 channels, while also being always-encrypted. It supports 64 PCIe 4.0 lanes and also has support for FP16 and INT8 numerics.
Most interesting are the comparisons AWS provided to Intel-based M5 instances. Amazon claims the instances deliver 20% lower cost and up to 40% higher performance. More specifically, Amazon provided following per-vCPU performance improvements over the M5 instances:
- SPECjvm® 2008: +43% (estimated)
- SPEC CPU® 2017 integer: +44% (estimated)
- SPEC CPU 2017 floating point: +24% (estimated)
- HTTPS load balancing with Nginx: +24%
- Memcached: +43% performance, at lower latency
- X.264 video encoding: +26%
- EDA simulation with Cadence Xcellium: +54%
On the surface, these would seem like hugely impressive claims: Graviton2 both has higher core-count and higher performance per virtual core. However, it should be noted that Intel’s CPUs have HyperThreading, resulting in two vCPUs per core. That means these tests only use one thread of Intel's available two threads per core. Other caveats could be placed, but in general it would have been better if AWS had adopted the more standard approach of benchmarking full systems against each other.
Graviton2 will be featured in general-purpose M6g, compute-optimized C6g and memory-optimized R6g EC2 instances, with up 512GiB of memory and up to 25 Gbps of network bandwidth. The Graviton2 EC2 instances are available now for non-production workloads, with general availability and more instances coming in 2020.
As is clear from the architecture of the Neoverse N1-based Graviton2 processor, it is not competing on IPC, but rather on core count and price.
First and foremost, as the world’s leading cloud service providing by a large margin, AWS is setting itself in pole position as the leading provider of Arm-based server processors. After years of failed attempts from other companies to gain a foothold in the server market with Arm architecture, Arm seemingly has finally found a worthy ally in AWS.
This raises the question: should Intel be threatened by AWS investing in an Arm line-up of instances?
While the data center is an important and growing market, in reality, Amazon is a relatively small fraction of Intel’s overall business.
Running some numbers, Intel had roughly $71 billion in revenue last quarter. Its data center business made up $23 billion of that, and based on provided information, Intel’s cloud segment represented a roughly $11 billion business. Amazon has roughly 33% of the cloud market share, resulting in an estimated $3.6 billion revenue that Intel generates from the cloud giant, or possibly 5% of its revenue. Even if Graviton achieves a 25% adoption, only around $1 billion in revenue would be at stake for Intel.
However, the issue with Arm adoption remains software compatibility. While it is likely that many server customers are eager for more competition in the server space, that competition has been provided by AMD’s credible 7nm Epyc platform this year, which runs x86 code just like the Intel servers they could possibly replace.
Nonetheless, with Amazon’s Graviton2, Arm possibly has its most credible solution yet to gain a foothold in the server market.
"Considering Intel’s last Atom-based server processors, codenamed Denverton, are based on Goldmont, if the next generation is based on Tremont the performance uplift should be very significant."
Keep in mind CPUs are only a part of the total server price. Even if those chips were free for AWS, they'd still need to pay for memory, storage, board, power, assembly...
So, it's definitely wide enough to offer some serious competition. ...and much more than previous generations.
The thing is that Intel published some graphs in Lakefield's launch that showed the performance crossover point between the Ice Lake core and a single Tremont core. IIRC, they claimed that Tremont maxed out at like 70% of Ice Lake. However, I think that was absolute performance, rather than per-clock. Anyway, I'm still not 100% sure that Tremont would beat the N1, but Ice Lake will certainly bring it.