Intel Xeon E5-2600 v4 Broadwell-EP Review

Broadwell-EP Architecture

Click To See Full Screen.Click To See Full Screen.

The Broadwell-EP line-up is based on three different die configurations with modular designs. The HCC die measures 18.1x25.2mm and comprises ~7.2 billion transistors. The architecture itself still employs two full rings per HCC die, but now it's symmetrical. In Haswell-EP, the ring on the right serviced two additional cores, creating asymmetry.

Here, Intel connects both bidirectional rings to 12 cores each, and it disables an equal number of cores per ring to create SKUs with fewer cores. As an example, the flagship 22-core Xeon E5-2699 v4 has 11 active cores per ring. As you work your way down the stack, two cores at a time are turned off, one from each side, along with their corresponding slices of last-level cache. That's how Intel creates models with less L3, too.

Each active core is associated with 2.5MB of LLC cache that is shared across its ring, and any core can address any part of the cache. The advantage of two distinct rings is more efficient scheduling; everything that happens on one ring is independent and occurs without any interference from the other ring. Routing ring traffic intelligently, and in the correct direction, is naturally quite important; a transaction on the ring can take up to 12 cycles (depending on how far it has to travel). There's intelligence built in to address this. Without it, if a core needed information in cache to the "south" of it and the traffic went north, that request would have to make a complete loop. Instead, the scheduler correctly routes traffic south, yielding faster access to data in the cache.

Balancing a workload between two rings also reduces the number of cycles that would be required to navigate one larger ring. The only caveat is that routing traffic between rings requires a trip across the buffered switches connecting them at the top and bottom, which incurs a (roughly) five-cycle delay. Each ring has access to its own memory controller (bottom), but only the ring on the left has access to the QPI links and PCIe lanes (top).

Click To See Full Screen.Click To See Full Screen.

The MCC die measures 16.2x18.9mm and has ~4.7 billion transistors, while the LLC die measures 16.2x15.2mm and employs ~3.2 billion transistors.

Intel drops the number of cores per ring from 12 to 10 on the MCC and LCC configurations, but continues to employ a bidirectional ring structure. The MCC's partially severed ring even gets an additional memory controller. Then Intel does remove the second ring's last traces for the Low Core Count (LCC) die, eliminating it and the other memory controller. This also gets rid of any reason to have the buffered switches, which connected the two rings on the larger dies.

LCC-based models can still address four DDR4 memory channels through the single controller, illustrated by the four arrows emanating from that piece of logic. This results in a small loss of throughput, since there isn't a second memory scheduler to help service transactions. But Intel doesn't quantify the extent of the performance impact.

Performance Boosting Technologies

Broadwell-based CPUs boast a roughly 5.5% IPC boost compared to Haswell. The most notable improvements affect floating-point instruction performance, and include a reduction in Vector FP multiply latency from five cycles to three, improvements to the Radix-1024 divider, split scalar divider and hardware assist for vector gather operations (60 percent fewer).

Other compelling additions include virtualization-centric features like posted interrupts, which reduce VM enter/exit latency by batching the interrupts, and page modification logging, minimizing the overhead of VM-based fault tolerance through rapid checkpointing.

Intel also employs Transactional Synchronization Extensions (TSX) to boost performance, and its new Hardware Controlled Power Management purportedly cuts power consumption. We'll put that claim to the test on page eight. 

Orchestration And Security Features

Intel's Resource Director Technology provides enhanced telemetry data, which allows administrators to automate provisioning and increase resource utilization. This includes Cache Allocation Technology, Code and Data Prioritization (CDP), Memory Bandwidth Motioning (MBM) and enhanced Cache Monitoring Technology (CMT).

You also get a spate of enhanced security features, including faster data encryption and decryption, network security and trusted compute pools through Crypto Speedup (ADOX/ADCX), a new random seed generator (RDSEED), Supervisor Mode Access Prevention (SMNAP) and Virtualization Exception (#VE) technology.

Models And Pricing

This thread is closed for comments
22 comments
    Your comment
  • utroz
    Hmm well we know that Broadwell-E chips must be coming very very soon if Intel let this info out.
  • bit_user
    Wasn't there supposed to be a 4-core 5.0 GHz SKU? Single-thread performance still matters, in many cases.
  • turkey3_scratch
    328798 said:
    Wasn't there supposed to be a 4-core 5.0 GHz SKU? Single-thread performance still matters, in many cases.


    In most server applications it doesn't matter as much as multithreaded performance. If you need single-core strength, getting a consumer chip is actually better, but you probably aren't running a server if single-threaded is your focus.
  • PaulyAlcorn
    Quote:
    Wasn't there supposed to be a 4-core 5.0 GHz SKU? Single-thread performance still matters, in many cases.

    I read the rumors on that as well, but nothing official has surfaced as of yet to my knowledge.
  • bit_user
    1712875 said:
    328798 said:
    Wasn't there supposed to be a 4-core 5.0 GHz SKU? Single-thread performance still matters, in many cases.
    In most server applications it doesn't matter as much as multithreaded performance. If you need single-core strength, getting a consumer chip is actually better, but you probably aren't running a server if single-threaded is your focus.
    Try telling that to high-frequency traders. I'm sure they want the reliability features of Xeons (ECC, for example), but the highest clock speed available.

    And the fact that Intel even released low-core high-clock SKUs is an acknowledgement of this continuing need. Clock just not as high as I'd read. With the other specs basically matching the Haswell version, the only difference is ~5% IPC improvement. Seems pretty poor improvement, for a die-shrink.
  • firefoxx04
    Would nice to have a quad core xeon that turbos at 4.4ghz just like the 4790k. I had to go with a 4690k when building an autocad system because it only uses one core and needs that core to be fast... this means i have to sacrifice ecc support.
  • bit_user
    2074532 said:
    Quote:
    Wasn't there supposed to be a 4-core 5.0 GHz SKU? Single-thread performance still matters, in many cases.
    I read the rumors on that as well, but nothing official has surfaced as of yet to my knowledge.

    On wccftech (not the most reliable source, I know), they claimed:

    Model: Intel Xeon E5-2602 V4
    Cores/threads: 4/8
    Base clock: 5.1 GHz
    Turbo clock: TBD
    L3 Cache: 5 MB
    TDP: 165W

    Given what we know about 2.5 MB/core of L3 Cache, the 5 MB figure sounds suspicious. It's conceivable they could disable some to hit the target TDP, I guess.
  • firefoxx04
    We cant get skylake to consistently hit 5ghz... why would a xeon chip suddenly hit 5ghz?
  • JamesSneed
    211300 said:
    We cant get skylake to consistently hit 5ghz... why would a xeon chip suddenly hit 5ghz?


    I'm not saying the 5Ghz rumor is true but Intel has always known which chips can hit higher clocks during certification if the chip is a top end or low end chip cores disabled etc. I'm sure they could cherry pick a few to sell for $$$ if they wanted. Now are they I have no real idea.
  • bit_user
    211300 said:
    We cant get skylake to consistently hit 5ghz... why would a xeon chip suddenly hit 5ghz?
    Well, I was surprised, too.

    There are obviously things you can do in chip design that allow one to reach different timing targets. And I was hoping they might've refined their 14 nm process, since the time the first Broadwells launched. So, I thought, with more TDP headroom afforded by this socket (roughly double what Skylake has to work with), maybe they could do it.

    I thought maybe Intel was addressing some pent-up demand for high clockspeed applications. That said, it seemed particularly odd in Broadwell, given that it generally seems oriented towards lower clockspeed / lower power applications.

    But maybe it was a typo, or even a blatant lie, in order to track down leakers.
  • alidan
    Quote:
    We cant get skylake to consistently hit 5ghz... why would a xeon chip suddenly hit 5ghz?


    proper binning and sold specifically as that because of what it hits, this could double/triple the value of the chip at least compared to other lower binned versions.
  • thor220
    Quote:
    Wasn't there supposed to be a 4-core 5.0 GHz SKU? Single-thread performance still matters, in many cases.


    A really high clock on a server platform seems like an overclocker's dream to me. Stability and performance. Not to mention that server processors use solder instead of that cheap paste Intel uses in their consumer processors.
  • RedJaron
    Doesn't sound right to me. A server chip binned that high would be ridiculously expensive, more than even the 5960X. I can't see then selling more than a couple hundred to the richest and most eccentric computer enthusiasts.
  • LudeMasta99
    How many FPS will I get in Crysis with this?
  • Adriano Bordignon
    How does Photoshop behave under this cpu?
  • bit_user
    570460 said:
    Doesn't sound right to me. A server chip binned that high would be ridiculously expensive, more than even the 5960X. I can't see then selling more than a couple hundred to the richest and most eccentric computer enthusiasts.
    FWIW, IBM introduced Power6 processors in 2007 & 2009 that were clocked up to 5 GHz. No doubt, they cost an arm and a couple legs.
  • Waldek
    Slightly off the topic, but... I was curious about the data centers' power consumption statistics. The article says 416.2 TWh per year. This is true. What the article says incorrectly, however, is that it would be more than 182 countries (of 192). The correct example would be that this gives the datacenters of the world 11th place in the power consumption ranking in the world. For example, the UK alone consumes 320 TWh (and is currently number 11 worldwide). The datacenters consume currently ca. 5% of the world's power usage...
  • sincreator
    Getting a chip to hit 5.0ghz or more stable is pretty rare to say the least. Silicon Lottery https://siliconlottery.com/collections/2011-3 specializes in picking out binned chips to sell, and they don't even have one model that is clocked that high.
  • PaulyAlcorn
    Quote:
    Slightly off the topic, but... I was curious about the data centers' power consumption statistics. The article says 416.2 TWh per year. This is true. What the article says incorrectly, however, is that it would be more than 182 countries (of 192). The correct example would be that this gives the datacenters of the world 11th place in the power consumption ranking in the world. For example, the UK alone consumes 320 TWh (and is currently number 11 worldwide). The datacenters consume currently ca. 5% of the world's power usage...


    The article does not state that it is more than the *combined* total of 182 countries, merely that it consumes more power than each of them compared individually. You are right,mentioning that it would place 11th in the world is probably a better way of stating the statistic.
  • bit_user
    248772 said:
    Getting a chip to hit 5.0ghz or more stable is pretty rare to say the least. Silicon Lottery https://siliconlottery.com/collections/2011-3 specializes in picking out binned chips to sell, and they don't even have one model that is clocked that high.
    Sure, but there's a difference between binning chips designed to run at a lower clock vs. actually designing a chip to hit higher clock speeds. There's no reason Intel can't make chips that clock higher, but they don't choose to because they think there's not sufficient market demand for something which burns so much power. AMD tried this with 225 W TDP Bulldozers, a few years back.

    I remember reading that the Pentium 4 was originally designed to scale up to 10 GHz, by the end of its production. Of course, back then, the only way they could hit those speeds was to use really long pipelines composed of very simple stages. Then, when they discovered that leakage of newer process nodes was higher than anticipated, they were left with a very inefficient architecture that was stuck below the clock speeds that would've made it competitive.

    These days, I think Intel could do it without such a drastic architectural tradeoff. But it still comes down to a power vs. clock, no matter what.
  • utroz
    328798 said:
    570460 said:
    Doesn't sound right to me. A server chip binned that high would be ridiculously expensive, more than even the 5960X. I can't see then selling more than a couple hundred to the richest and most eccentric computer enthusiasts.
    FWIW, IBM introduced Power6 processors in 2007 & 2009 that were clocked up to 5 GHz. No doubt, they cost an arm and a couple legs.


    Those IBM chips had a really long pipeline to allow clock speeds that high as well as an SOI process node basically built from the ground up for them. I wonder what version of 14nm Intel is using for Broadwell-E/EP/EX as I know they had one version they used for the Broadwell-U,Y,H,DT(C) and when they moved to Skylake they used an updated version of 14nm. Is it possible that Broadwell_E/EP/EX are using the updated 14nm process?
  • pastorpastor
    nice review, but I'm deceived, there is no important 3d rendering benchmarks like cinebench 3dsmax / VRAY