Intel Xeon Platinum 8176 Scalable Processor Review

Into The Core And Cache, AVX-512

The Skylake Microarchitecture

Skylake's architectural advantages over Broadwell are well-known from our desktop explorations. However, Intel added more to this particular implementation, such as AVX-512 and its re-worked cache hierarchy.

First, a quick refresher. The Skylake design first seen on mainstream desktops is fairly similar to Broadwell, which preceded it. Intel made a number of tweaks to promote instruction-level parallelism, though. Architects widened the front end and improved the execution engine with a larger re-order buffer, scheduler, integer register file, and greater retire performance. Improved decoders, branch predictor (reduced penalties for wrong jumps), and deeper load/store buffers are also among the list of enhancements. Skylake features a wider integer pipeline than its predecessor, along with a beefed-up micro-op cache (1.5x bandwidth and larger instruction window) that has an 80% hit rate.

While that was all well and good, the changes Intel makes to this revamped Skylake architecture are equally exciting.

The Skylake-SP Core - Bolt It On

Intel enables AVX-512 by fusing port 0 and 1 into a single 512b execution unit (doubling throughput) and extending port 5 to add the second FMA unit outside of the core. That's represented as "Extended AVX" in the slide below. This facilitates up to 32 double-precision and 64 single-precision FLOPS per core.

The original core design could have supported higher L1 throughput. However, that wasn't required until Intel added the second FMA unit. Doubling the core's compute capability necessitated more throughput to prevent stalls. So, Intel doubled L1-D load and store bandwidth to keep each core fed. Now we get up to two 64-byte loads and one 64-byte store per cycle.

In addition to Skylake's 256KB per-core L2 cache, Intel tacks on 768KB more outside of the core. Our diagram isn't just a lazy mock-up. Rather, Intel specifies that the added blocks are physically outside of the original core. Data stored on the external L2 cache consequently suffers a two-cycle penalty compared to the internal L2. Still, going from 256KB to 1MB of L2 is a big deal. Greater capacity and better caching algorithms more than make up for slightly higher latency in the bigger performance picture.

Rejiggering The Cache Hierarchy

And then there's the re-architected cache hierarchy. The quadrupled L2 is still private to each core, so none of its data is shared with other cores. Each core is also associated with less shared L3 cache (from 2.5MB to 1.375MB per core).

In an inclusive hierarchy, cache lines stored in L2 are duplicated in the L3. Considering the increased L2 capacity, an inclusive hierarchy would populate most of the L3 cache with duplicated data. To offset the increased L2 capacity, Intel transitioned to a non-inclusive caching scheme that doesn't require all data in the L2 to be duplicated in L3. Now the L3 serves as an overflow cache instead of the primary cache (L2 is now primary). That means only cache lines that are shared across multiple cores are duplicated in the L3 cache. The L2 cache remains 16-way associative, but the L3 drops from 20-way to 11-way.

The increased L2 capacity is a boon for virtualization workloads, largely because they don't have to share the larger private L2 cache. This also grants threaded workloads access to more data per thread, reducing data movement through the mesh.

The L2 miss-per-instruction ratio is reduced in most workloads, while the L3 miss ratio purportedly remains comparable to Broadwell-EP (this varies by workload, of course). Intel provided latency benchmarks along with SPECint*_rate data to represent its L2 and L3 hit rates in common applications.

L3 latency does increase because there are more cores in the overall design. Also, the L3 cache operates at a lower frequency to match the mesh topology.

AVX-512

Intel adds AVX-512 support, which debuted with Knights Landing, to its Skylake design. However, the company doesn't support all 11 instructions. Instead, it targets specific feature sets for different market segments.

The vector unit goes from 256 bits wide to 512, the operand registers are doubled from 16 to 32, and eight mask registers are added, among other improvements. All of that returns solid gains in single- and double-precision compute performance compared to previous generations.

Processing AVX instructions is power-hungry work, so these CPUs operate at different clock rates based on the work they're doing. With the addition of AVX-512, there are now three sets of base and Turbo Boost frequencies (non-AVX, AVX 2.0, AVX-512), and they vary on a per-core basis. Intel's performance slide highlights that, even at lower AVX frequencies, overall compute rates increase appreciably with the instruction set. AVX-512 also dominates in both GFLOPS-per-watt and GFLOPS-per-GHz metrics.

MORE: Best CPUs

MORE: Intel & AMD Processor Hierarchy

MORE: All CPU Content

This thread is closed for comments
31 comments
    Your comment
  • the nerd 389
    Do these CPUs have the same thermal issues as the i9 series?

    I know these aren't going to be overclocked, but the additional CPU temps introduce a number of non-trivial engineering challenges that would result in significant reliability issues if not taken into account.

    Specifically, as thermal resistance to the heatsink increases, the thermal resistance to the motherboard drops with the larger socket and more pins. This means more heat will be dumped into the motherboard's traces. That could raise the temperatures of surrounding components to a point that reliability is compromised. This is the case with the Core i9 CPUs.

    See the comments here for the numbers:
    http://www.tomshardware.com/forum/id-3464475/skylake-mess-explored-thermal-paste-runaway-power.html
  • Snipergod87
    983009 said:
    Do these CPUs have the same thermal issues as the i9 series? I know these aren't going to be overclocked, but the additional CPU temps introduce a number of non-trivial engineering challenges that would result in significant reliability issues if not taken into account. Specifically, as thermal resistance to the heatsink increases, the thermal resistance to the motherboard drops with the larger socket and more pins. This means more heat will be dumped into the motherboard's traces. That could raise the temperatures of surrounding components to a point that reliability is compromised. This is the case with the Core i9 CPUs. See the comments here for the numbers: http://www.tomshardware.com/forum/id-3464475/skylake-mess-explored-thermal-paste-runaway-power.html


    Wouldn't be surprised if they did but also wouldn't be surprised in Intel used solder on these. Also it is important to note that server have much more airflow than your standard desktop, enabling better cooling all around, from the CPU to the VRM's. Server boards are designed for cooling as well and not aesthetics and stylish heat sink designs
  • InvalidError
    983009 said:
    the thermal resistance to the motherboard drops with the larger socket and more pins. This means more heat will be dumped into the motherboard's traces.

    That heat has to go from the die, through solder balls, the multi-layer CPU carrier substrate, those tiny contact fingers and finally, solder joints on the PCB. The thermal resistance from die to motherboard will still be over an order of magnitude worse than from the die to heatsink, which is less than what the VRM phases are sinking into the motherboard's power and ground planes. I wouldn't worry about it.
  • jowen3400
    Can this run Crysis?
  • bit_user
    Quote:
    The 28C/56T Platinum 8176 sells for no less than $8719

    Actually, the big customers don't pay that much, but still... For that, it had better be made of platinum!

    That's $311.39 per core!

    The otherwise identical CPU jumps to a whopping $11722, if you want to equip it with up to 1.5 TB of RAM instead of only 768 GB.

    Source: http://ark.intel.com/products/120508/Intel-Xeon-Platinum-8176-Processor-38_5M-Cache-2_10-GHz
  • Kennyy Evony
    jowen3400 21 minutes ago
    Can this run Crysis?

    Jowen, did you just come up to a Ferrari and ask if it has a hitch for your grandma's trailer?
  • qefyr_
    W8 on ebay\aliexpress for $100
  • bit_user
    2508511 said:
    W8 on ebay\aliexpress for $100

    I wouldn't trust a $8k server CPU I got for $100. I guess if they're legit pulls from upgrades, you could afford to go through a few @ that price to find one that works. Maybe they'd be so cheap because somebody already did cherry-pick the good ones.

    Still, has anyone had any luck on such heavily-discounted server CPUs? Let's limit to Sandybridge or newer.
  • JamesSneed
    328798 said:
    Quote:
    The 28C/56T Platinum 8176 sells for no less than $8719
    Actually, the big customers don't pay that much, but still... For that, it had better be made of platinum! That's $311.39 per core! The otherwise identical CPU jumps to a whopping $11722, if you want to equip it with up to 1.5 TB of RAM instead of only 768 GB. Source: http://ark.intel.com/products/120508/Intel-Xeon-Platinum-8176-Processor-38_5M-Cache-2_10-GHz


    That is still dirt cheap for a high end server. An Oracle EE database license is going to be 200K+ on a server like this one. This is nothing in the grand scheme of things.
  • bit_user
    87433 said:
    An Oracle EE database license is going to be 200K+ on a server like this one. This is nothing in the grand scheme of things.

    A lot of people don't have such high software costs. In many cases, the software is mostly home-grown and open source (or like 100%, if you're Google).
  • bit_user
    983009 said:
    I know these aren't going to be overclocked, but the additional CPU temps introduce a number of non-trivial engineering challenges that would result in significant reliability issues if not taken into account.

    Actually, the main reason to solder these is because datacenter operators like to save energy on cooling by running their CPUs rather hot.

    I think you guys should de-lid and find out!
  • bit_user
    2497595 said:
    it is illegal and you could get in trouble for buying engineering samples when they arrive in your country if you live in USA or some countries in EU .

    Wow. Source?

    Unless they're stolen (because it's illegal to receive stolen property, regardless of whether you know it is), how on earth can it be illegal to buy any CPU?

    I can see how it might be a civil offense to sell them, if they're covered by NDA or some other sort of contract, but that would only pertain to the party breaking contract (i.e. the seller). Regardless, I wouldn't want engineering samples because they usually have significant bugs or limitations.
  • bit_user
    2497595 said:
    engineering samples are owned by Intel/AMD and if some one sells them then they are stolen .

    So, then why doesn't the owner get in trouble when Intel/AMD/etc. wants it back? Or is the ownership just a legal fiction created to establish grounds for pursuing buyers?

    2497595 said:
    as for engineering samples full of bugs and limitations ? not really they work fine .

    I have limited experience with them, but I have to disagree. Surely, some work alright. But that's not categorically true. And whenever benchmarks start to leak out about some new CPU or GPU, you always read caveats that they might be from engineering samples that aren't running at full speed.
  • none12345
    "as for bugs ? it is VERY RARE to happen in ES these days..."

    You ment to say very common. All processors have eratta in them. I think you mean serious bugs, but all of them have bugs.
  • adamboy64
    This was a great read. It was good to get up to speed on the new Xeon lineup, even though I'm far from understanding all the technical details.
    Thank you.
  • GR1M_ZA
    Would like to see comparison between the new EPYC Server CPU's and these.
  • cats_Paw
    MSI Afterburner cant run on this. Too many threads to fit in the screen.
  • aldaia
    328798 said:
    Quote:
    The 28C/56T Platinum 8176 sells for no less than $8719
    Actually, the big customers don't pay that much, but still... For that, it had better be made of platinum! That's $311.39 per core! The otherwise identical CPU jumps to a whopping $11722, if you want to equip it with up to 1.5 TB of RAM instead of only 768 GB. Source: http://ark.intel.com/products/120508/Intel-Xeon-Platinum-8176-Processor-38_5M-Cache-2_10-GHz


    Adding to that, we recently renovated our supercomputer. We have almost 3500 dual-socket compute nodes. That's nearly 7000 24-core Xeon 8160. Other than 4 less cores per unit, its identical to Xeon 8176. I don't really know how much we paid for each Xeon, not even high management knows that, since we ordered the supercomputer as a whole to the best bidder.

    The whole supercomputer is €34 million. €4 million are devoted to the disc system, and €30 million to the compute subsystem + some work on the electrical and cooling systems. The compute system includes the racks, the interconnection network, cabling (more than 50 Km of cabling) and several months installing and testing components. I assume most of the cost is due to the compute nodes.

    As a guessing exercise, lets say that €25 million are devoted to the compute nodes, that is €7150 per node, which includes 2 sockets , motherboard, memory, SSD disc, redundant power source and router to connect to other nodes. Guessing again I would say that each Xeon 8160 should be somewhere around €2000-2500. Xeon 8160 is listed at $4702
  • captaincharisma
    328798 said:
    87433 said:
    An Oracle EE database license is going to be 200K+ on a server like this one. This is nothing in the grand scheme of things.
    A lot of people don't have such high software costs. In many cases, the software is mostly home-grown and open source (or like 100%, if you're Google).


    which is why the majority of businesses are still stuck on windows XP and 7 PC's only able to use internet explorer 6 for a web browser
  • Trevor_45
    These tests are all fine and good for IT professionals. But I want to see some gaming results! Just for the entertainment value. PLEASE!

    Yes, it's a server chip not meant for gaming blah blah blah. Just run the games. k thx.
  • Rob1C
    Now we wait for 7nm Wars.
  • jimmysmitty
    983009 said:
    Do these CPUs have the same thermal issues as the i9 series? I know these aren't going to be overclocked, but the additional CPU temps introduce a number of non-trivial engineering challenges that would result in significant reliability issues if not taken into account. Specifically, as thermal resistance to the heatsink increases, the thermal resistance to the motherboard drops with the larger socket and more pins. This means more heat will be dumped into the motherboard's traces. That could raise the temperatures of surrounding components to a point that reliability is compromised. This is the case with the Core i9 CPUs. See the comments here for the numbers: http://www.tomshardware.com/forum/id-3464475/skylake-mess-explored-thermal-paste-runaway-power.html


    You mean thermal issues that will never be seen because server CPUs are never OCed? Most server CPUs will not be maxed out 24x7. A single server with this CPU will probably be cut up into at least 6 different server roles using VM.

    Either way the i9 seems to be fine at stock speeds. The biggest issues arise when overclocking, which is the same with every CPU.

    Temps are also irrelevant as they do not have a proper setup for it. Most servers in datacenters, where these will normally reside, have a hot and cold side. The cold side is normally kept in the 60s so the air coming in is very cold and the hot side is all the expelled air being pushed over the RAM, CPUs and CPUs (HDDs too if you have them in your server instead of a SAN) and gets damn hot. Our server room up in North Dakota lost power about 4 months ago, when it is still very cool outside, and the backup batteries kept the servers running long enough without AC that it hit 165f in the room.

    Anything that the consumer side is affected by wont normally affect the server market as they are very different beasts all together.
  • bit_user
    34444 said:
    328798 said:
    In many cases, the software is mostly home-grown and open source (or like 100%, if you're Google).
    which is why the majority of businesses are still stuck on windows XP and 7 PC's only able to use internet explorer 6 for a web browser

    I think the majority of businesses still on Win7 are just too cheap to upgrade or don't want the hassle. At this point, you might be right about the businesses still on XP.

    Anyway, that's not what I had in mind. I was talking about homegrown datacenter & cloud apps, as this is a server chip.