Marvell's 7nm Thunder X3 server CPU will launch in 2020 with a whopping 96 cores and 384 threads. Marvell promises more than a 25% improvement in IPC and 3x the performance per socket compared to the previous-gen TX2. The chip one-ups Amazon’s Graviton2 that has 64 cores and Ampere’s Altra that comes with 80 cores.
To recap, the ThunderX2 launched in 2018. It featured 32 custom, quad-issue out-of-order Arm v8.1 cores with up to four threads per core, running at 2.5GHz. Serve The Home reviewed the CPU and considered it the “de facto Arm server option” on the market.
Marvell is on a two-year cadence aiming for an over 2x improvement per generation, and its newest ThunderX3 processor is scheduled for later this year. ThunderX4 is slated for 2024.
The ThunderX3 is manufactured on TSMC’s 7nm process. This gives room for more cores: 96 to be exact. This is a 3x increase over TX2 and almost 3.5x as much as the best Intel currently offers in socketed platforms. It is also 1.5x more than AMD’s Rome. It also retains its 4-way SMT (quad hyper-threading) to deliver 384 threads. The Arm cores are now upgraded to the Arm v8.3 instruction set. The chip supports two-socket configurations.
Arm did not discuss frequencies, but disclosed that the TDP ranges from 100W to 240W.
Further, the cores feature four 128-bit Neon SIMD units. In terms of width, this would be the equivalent of one x86 AVX-512 unit, although the highest-end Xeon Scalable parts contain two units. In contrast, Rome contains two 256-bit SIMD units per core.
On the I/O side, the silicon supports 8-channel DDR4-3200, which is similar to Intel’s upcoming Whitley platform. There's also 64 lanes of PCIe 4.0, half the amount of AMD’s Rome.
Marvell claims an over 25% instruction per cycle (IPC) improvement, and over a 60% improvement in single-threaded performance. The socket-level performance comes in at over 3x, while floating-point performance is even higher at 5x due to the additional SIMD units.
Adoption and competition
Marvell claims it is the “most widely-supported” Arm server processor. Though, in terms of adoption it is facing strong competition this year with Amazon’s AWS Graviton2 chip with 64 Arm Neoverse N1 cores, as well as Ampere’s recently announced 80-core Altra.
Marvell has also reiterated some key customer deployments of the ThunderX2, which includes the world’s first Arm-based supercomputer.
The TX3 targets the cloud, edge and HPC.
Going up against x86
Most of the world’s data center runs on x86, and Marvell has taken some time to discuss why it sees its ThunderX series chips capable of entering this market.
Marvell in particular points at Intel’s loss of process leadership, which is something Intel itself has acknowledged multiple times. (Marvell puts the 10nm delay at a mere two years, contrary to Intel’s admitted three year slip.) Marvell reasons that this hinders Intel’s ability to scale up the core count per die. Meanwhile, AMD’s chiplet approach would have a higher latency.
Arm, on the other hand, benefits from not having legacy and has a high power efficiency, Marvell claims. Just as importantly as performance, the ecosystem would be ready.
As always, reviews would have to assess performance, IPC and power claims, but the TX3's 96 cores and four threads per core has something going for it, although it would also need a good memory subsystem to support that. On the other hand, AWS’ Graviton2 benefits from being a homegrown chip for which it AWS has the flexibility to put a favorable pricing on its Arm instances.
Compared to the x86 offerings, Intel’s loss of process leadership has led the company to change prices over improving core counts, just as Marvell claims. The upcoming Cooper Lake processors will feature 56 cores by virtue of containing two 28-core dies.
Nevertheless, Intel is intent on reclaiming process leadership by its 5nm node, and its CFO recently expressed that it expects parity during the 7nm generation. Meanwhile, Intel has pointed to other aspects in which it does have leadership, such as in architecture, interconnect and packaging.
It is expected that Intel’s 7nm Granite Rapids will feature (class-leading) Golden Cove (or better) cores and make use of Intel’s Foveros active interposer and 3D stacking technology.
Intel has expressed that its goal is to use its packaging lead to reduce the die size of the individual chiplets – similar to AMD’s approach – while retaining close to monolithic performance. Moreover, this will allow for beyond-monolithic die sizes, as Intel has already demonstrated via Ponte Vecchio.
While this is just speculation on our side, we expect Intel to make a comeback in the core count department as it moves to the 7nm era with process parity and beyond-monolithic die sizes via its advanced packaging technologies, but those improvements will take time to come to market.
Recently, a Lakefield die shot revealed that Intel's four-core Atom cluster is similar in size to one Sunny Cove core. It could hence be a possibility that Intel may seek to battle the Arm CPUs with its Atom series, as it had done in the first half of last decade when Arm was looking to take major market share via microservers.
Looking at the market commercially, most OEMs and cloud service providers had probably wanted to see more competition in the data center CPU market, but this is what AMD delivers. As such, it remains to be seen if the latest Arm salvo has more success than previous ones.
Marvell will disclose more information about the architecture at HotChips.