Intel Xeon E5-2600 V3 Review: Haswell-EP Redefines Fast

Meet Intel's Grantley Platform

In addition to its new Haswell-EP-based CPUs, there's a lot more to Intel's Grantley platform. We were given the following quick reference guide, which covers the basics:

There are a number of evolutionary changes to account for, but perhaps the biggest is Grantley's memory support. Four generations of server platforms dating back to Nehalem-EP utilized DDR3 RAM, and we've seen efforts to further tweak that standard for lower power use or greater density. Registered DDR4 DIMMs successfully achieve those improvements, additionally increasing throughput per channel.

Servers are often loaded up with RAM to handle more VMs or even to expand the space available for in-memory storage applications like memcached or redis. This typically requires more DIMMs per memory channel, imposing penalties on the peak data rate you're able to hit. DDR4 is designed to accommodate more DIMMs in a configuration without the performance penalty suffered by DDR3. And because it operates on a lower input voltage than even DDR3L, energy efficiency is built-in to the spec.

Of course, memory support is a product of the CPU's integrated memory controller. But not all system functions are built into Intel's processors yet. You still need a platform controller hub for a lot of the peripheral connectivity and I/O. The Wellsburg PCH, much like the already-reviewed X99 Express, exposes 10 SATA 6Gb/s ports. That's a significant upgrade to the Xeon E5-2600 v1 and v2 platform, where the focus was on adding optional SAS connectivity. Intel is clearly taking a different tact to coincide with the introduction of its NVMe-based SSDs. We're plenty happy with expanded SATA support, which is great for low-cost SSDs and traditional mechanical disks. High-performance storage is moving to the PCIe bus.

Other features include six USB 3.0 ports and eight second-gen USB connectors, useful for faster KVM cart access and accelerated boot from an internal VMware ESXi USB key installation. Several of the platforms we've seen in the lab are USB 3.0-only, in fact. That's a significant change from previous generations limited to USB 2.0.

The CPUs still enable 40 PCI Express 3.0 lanes, divisible into a number of different link configurations. This is a common feature on processors in the -EP range. With faster networking in this generation and a renewed focus on PCIe-based flash storage, all of that connectivity should go to good use.

Later in this article, I'll cover how power consumption and distribution change with Haswell-EP. The key is that, again, voltage regulation is on-package, and P-state control is more granular. As we saw on the desktop, this results in low idle power use. But unlike Haswell in its mainstream form, Haswell-EP packs up to 4.5x as many execution cores and more than five times the last-level cache. And in dual-CPU arrays, the effects of power savings are doubled per machine.

At least in my opinion, the most exciting platform change involves networking, including the 40 GbE Fortville controller...

  • CaptainTom
    Wonder how long it is until 18-core CPU's are utilized well in games...Maybe 2018 or 2020?
    Reply
  • dovah-chan
    Captain Tom said:
    Wonder how long it is until 18-core CPU's are utilized well in games...Maybe 2018 or 2020?

    Actually we should be trying to move away from traditional serial-styled processing and move towards parallel processing. Each core can handle only one task at a time and only utilize it's own resources by itself.

    This is unlike a GPU, where many processors utilize the same resources and perform multiple tasks at the same time. The problem is that this type of architecture is not supported at all in CPUs and Nvidia is looking for people to learn to program for parallel styled architectures.

    But this lineup of CPUs is clearly a marvel of engineering and hard work. Glad to see the server industry will truly start to benefit from the low power and finely-tuned abilities of haswell along with the recently introduced DDR4 which is optimized for low power usage as well. This, combined along with flash-based storage (aka SSDs) which also have lower power drain than the average HDD, will slash through server power bills and save companies literally billions of dollars. Technology is amazing isn't it?
    Reply
  • 2Be_or_Not2Be
    There is still a lot in games that doesn't translate well into parallel processing. A lot of gaming action only happens as a direct result of the user's input, and it usually triggers items that are dependent upon the results from another item. So parallel processing doesn't help a lot there; single-threaded performance helps more.

    However, with multiple cores, now we can have better AI and other "off-screen" items that don't necessarily always depend upon the user's direct input. There's still a lot of work to be done there, though.
    Reply
  • 2Be_or_Not2Be
    The new Haswell-EP Xeons are definitely going to help with virtualization. However, I see the high-price of DDR4 and the relative scarcity of it now as being a bit of a handicap to fast adoption, especially since that is one of the major limiting factors to how many servers you can virtualize.

    I think all of the major server vendors are going to suck up all of the major memory manufacturers DDR4 capacity for a while before the prices go down.
    Reply
  • balister
    The new Haswell-EP Xeons are definitely going to help with virtualization. However, I see the high-price of DDR4 and the relative scarcity of it now as being a bit of a handicap to fast adoption, especially since that is one of the major limiting factors to how many servers you can virtualize.

    I think all of the major server vendors are going to suck up all of the major memory manufacturers DDR4 capacity for a while before the prices go down.

    Whether it helps or hinders will ultimately depend on the VM admin. What most VM admins don't realize is that HT can actually end up degrading performance in virtual environments unless the VM admin took specific steps to use HT properly (and most do not). A lot of companies will tell you to turn off HT to increase performance because they've dealt with a lot of VM admins that don't set things up properly (a lot of VM admins over allocate which is part of the reason using HT can degrade performance, but there are other settings as well that have to be set in the Hypervisor so that the guest VMs get the resources they need).
    Reply
  • InvalidError
    14133592 said:
    Actually we should be trying to move away from traditional serial-styled processing and move towards parallel processing. Each core can handle only one task at a time and only utilize it's own resources by itself.
    This is easier said than done since there are tons of everyday algorithms, such as text/code parsing, that are fundamentally incompatible with threading. If you want to build a list or tree using threads, you usually need to split the operation to let each thread work in isolated parts of the list/tree so they do not trip over each other and waste most of their time waiting on mutexes and at the end of the build process, you have a merge process to bring everything back together which is usually not very thread-friendly if you want it to be efficient.

    In many cases, trying to convert algorithms to threads is simply more trouble than it is worth.
    Reply
  • Rob Burns
    Great to see these processors out, and overall good article. I only wish you used the same benchmark suite you had for the Haswell-E processors: 3DS Max, Adobe Premiere, After Effects, Photoshop. I'd also love to see Vray added to the mix. Not much useful benchmark data in here for 3D professionals. Some good detail on the processors themselves however.
    Reply
  • The3monitors
    Just take my money. Pls.
    Reply
  • Drejeck
    Wonder how long it is until 18-core CPU's are utilized well in games...Maybe 2018 or 2020?
    Simply never.
    A game is made by sound, logic and graphics. You may dedicate this 3 processes to a number of cores but they remain 3. As you split load some of the logic must recall who did what and where. Logic deals mainly with FPU units, while graphics with integers. GPUs are great integers number crunchers. They have to be fed by the CPU so an extra core manage data through different memories, this is where we start failing. Keeping all in one spot, with the same resources reduces need to transfer data. By implementing a whole processor with GPU, FPU, x86 and sound processor all in one package with on board memory makes for the ultimate gaming processor. As long as we render scenes with triangles we will keep using the legacy stuff. When the time will come to render scenes by pixel we will need a fraction of today's performance, and half of the texture memory (just scale the highest quality) and half of models memory. Epic is already working on that.
    Reply
  • pjkenned
    Great to see these processors out, and overall good article. I only wish you used the same benchmark suite you had for the Haswell-E processors: 3DS Max, Adobe Premiere, After Effects, Photoshop. I'd also love to see Vray added to the mix. Not much useful benchmark data in here for 3D professionals. Some good detail on the processors themselves however.

    Great points. One minor complication is that the NVIDIA GeForce Titan used in the Haswell-E review would not have fit in the 1U servers (let alone be cooled well by then.) Onboard Matrox G200eW graphics are too much of a bottleneck for the standard test suite.

    On the other hand, this platform is going to be used primarily in servers. Although there are some really nice workstation options coming, we did not have access in time for testing.

    One plus is that you can run the tests directly on your own machine by booting to a Ubuntu 14.04 LTS LiveCD, and issuing three commands. There is a video and the three simple commands here: http://linux-bench.com/howto.html That should give you a rough idea in terms of performance of your system compared to the test systems.

    Hopefully we will get some workstation appropriate platforms in the near future where we can run the standard set of TH tests. Thanks for your feedback since it is certainly on the radar.
    Reply