AMD demoed its 64-core/128 thread dual socket Naples server platform against a comparable Intel server at its recent Tech Day but timed the announcement to coincide with the OCP (Open Compute Project) Summit this week. AMD presented a narrow workload that doesn't do much to reveal Naples' broader performance trend, although it does compare favorably against the Intel comparison platform. More importantly, AMD also revealed more details on the architecture and design of its forthcoming server SoCs.
In the past, AMD's desktop PC market penetration has lagged woefully behind Intel's, but a quick glance at its data center share highlights an even larger disparity between the two companies. Intel currently commands ~99% of data center server sockets, and its dominance in the high-margin segment has been a key to its success on many fronts. Intel is leveraging its commanding Xeon lead as a springboard to attack other lucrative segments, such as networking with Omni-Path/silicon photonics and memory with 3D XPoint, but it leverages locked-down proprietary interconnects that have raised the ire of the broader industry.
AMD, by contrast, leverages open protocols where it can and participates in developmental efforts on several new open interconnects, such as CCIX, Gen-Z, and OpenCAPI, so for the broader industry, a competitive AMD represents more than just a cost advantage and second source. Let's dive in.
The High-Level
AMD's Naples brings Southbridge functionality on-die, so there is no need for an I/O hub or chipset. AMD claimed the integrated connectivity will increase performance and reduce cost. For instance, a dual-socket server can support up to 24 NVMe drives without additional hardware, and Naples features integrated Ethernet capabilities, although AMD did not specify performance metrics. The Zen microarchitecture infuses the power of SMT into the Naples platform, and the combination of 32 physical cores and 64 threads provides AMD with 45% more cores than Intel's flagship E5-2699 v4, which features 22 cores.
A single Naples SoC provides 128 PCIe 3.0 lanes compared to Intel's 40, which is another impressive improvement, though Naples loses some of that advantage in dual-socket servers. The generous PCIe lane allocation allows for six x16 connections, which AMD noted is particularly useful for machine learning applications that leverage GPUs. Naples features eight DDR4 memory channels per socket without the use of a memory buffer (such as Intel's Jordan Creek Scalable Memory Buffers), whereas Intel is limited to four channels. Naples' bulky memory provisioning supports up to 170.76 GB/s of throughput and a maximum capacity of 2.048TB of memory per processor, whereas Intel's E5-2699A v4 plateaus at 78.6 GB/s of throughput and 1.54TB of memory per processor.
AMD will release several SKUs with varying core counts and frequencies, but it isn't revealing clock speeds, TDP, or pricing at this time. The company noted that it will provide more information on the product stack when Naples comes to market (obviously). AMD will support a broad range of out-of-band management interfaces, such as IPMI, and although OS compatibility remains undefined, the company is working with a broad range of operating system and hypervisor vendors.
Expanding to two sockets yields an increase to 16 memory channels (32 DIMMs) that support up to 4TB of memory. AMD's Infinity Fabric connects the processors and communicates across the PCIe interface, so it consumes 64 PCIe lanes between the two Naples SoCs in a dual-socket configuration. The two-socket configuration provides 128 PCIe lanes to the host system, which still outweighs Intel's 80 lanes in a dual-socket server. AMD hasn't specified Infinity Fabric's bandwidth capabilities, among many other details. The Infinity Fabric lane requirements prevent AMD from employing Naples in quad-socket configurations, which would consume all of the available PCIe lanes for inter-socket communication. It isn't clear if AMD plans to bring a quad-socket competitor to market.
AMD hasn't released detailed information on its Infinity Fabric, though we do know that it's an updated version of the HyperTransport protocol. AMD did state that it has improved the protocol's QoS, security characteristics, and other key metrics, which helps with VM provisioning and management.
AMD also claimed that the Infinity Fabric provides near-linear scaling up to 64 cores, but as the footnote indicates, the company hasn't validated scaling with working silicon.
The Demos
AMD configured two dual-socket test systems for its live demonstration, and the Naples "Speedway" reference server design enjoys a pronounced advantage in core count, memory capacity, memory speed, and PCIe lanes compared to the Broadwell server.
AMD was light on the details of its custom seismic workload, and although we do know that it employs AVX instructions, it is impossible to compare the results to standardized workloads or provide a detailed analysis of the tests. AMD indicated that the workload is a computationally intensive analysis involving iterations of 3D wave equations that stresses the CPU, memory, and I/O subsystem. We also weren't provided with more detailed system specifications or settings, so take the results with a grain of salt.
The first workload consisted of 10 iterations of a 1 billion sample grid. AMD restricted its core count and memory speed to match the Intel system, yet still managed to complete the workload in roughly half the time.
For the second test, AMD conducted the same test but brought all 64 cores to bear and bumped its memory speed up to 2,400MHz while the Intel system remained at 1,866MHz. Once again, AMD's carefully selected workload completed faster on the Naples system, yielding a 2.5X advantage. It's impossible to derive any useful scalability comparisons between the workload completion time of the 44-core Naples configuration and the 64-core native configuration due to a lack of information on the workload.
Finally, AMD provided a demo specifically designed to highlight its memory capacity advantage. The company increased the dataset to 10 iterations of a 4 billion sample grid, which simply couldn't run on the Intel system due to its memory capacity disadvantage.
The Takeaway
AMD's Naples design is impressive, and although the benchmarks are obviously very limited and designed to cast Naples in a favorable light, our initial Ryzen tests indicate the Zen architecture is well-optimized for HPC workloads. It will be interesting to see the actual Naples silicon in action in a wide range of industry-standard workloads.
Frankly, AMD faces an uphill battle against Intel in the data center. Intel has the advantage of a wide range of server SKUs and a burgeoning portfolio of new technologies at its disposal. Intel is bringing FPGAs on-package with its Xeons, and we expect them to come on-die in the future, which offers a tremendous amount of flexibility. Other emerging technologies, such as silicon photonics and 3D XPoint, are also major differentiators.
Perhaps Intel's greatest advantage in the data center is its long track record of reliable platforms. Data center operators tend to shy away from new architectures until they have proven their trustworthiness in the field, so we may see a slow adoption rate as AMD deploys its solutions. Intel also benefits from a wide range of applications that are almost solely optimized for its architecture, so AMD will have to expend a considerable amount of effort on ecosystem enablement.
AMD's biggest advantage will likely come in the form of its pricing structure. Intel's Xeon portfolio is prohibitively expensive, particularly on the high-end, so any competition is welcome. Unfortunately, most of the Naples story is yet to be told, so we will have to wait until launch in Q2 2017 for more information.