Intel Xeon E5-2600 v2: More Cores, Cache, And Better Efficiency

Results: Sandra 2014 And 3DMark

In Intel Xeon E5-2600: Doing Damage With Two Eight-Core CPUs, we saw just how much faster a pair of Sandy Bridge-EP-based Xeon E5s were than Westmere-EP- or Nehalem-EP-based Xeons. More so than on the desktop, Intel is aggressive with ramping up the core count of its business-oriented products. So, stepping up from four to six and then to eight cores per socket turns into big gains in threaded software.

The transition to 22 nm manufacturing allows Intel to create up to 12-core Xeon E5-2600 v2 CPUs. However, the replacement for its original Xeon E5-2687W is another eight-core model. Instead of adding more processing resources, Intel increases shared L3 cache to 25 MB and bumps up clock rates. Those alterations, folded in on top of the architectural changes to Ivy Bridge, result in a minor improvement to Sandra’s integer math benchmark, and a more marked speed-up in double-precision calculations.

Of course, both dual-processor setups demonstrate a significant advantage in raw processing power compared to one Core i7-4960X.

As we know from Intel Core i7-3770K Review: A Small Step Up For Ivy Bridge, the company didn’t make a ton of compelling architectural changes to its IA cores. The Xeon E5-2687W v2 does enjoy the advantage of more aggressive clock rates compared to its predecessor, though AVX support across the board means all three configurations benefit.

Even in single-processor configurations, Intel’s quad-channel memory controller facilitates lots of bandwidth. The Core i7-4960X manages more than 40 GB/s at DDR3-1866. Two Xeon E5-2687W CPUs almost double that number using DDR3-1600, achieving 74 GB/s. The Xeon E5-2687W v2s increase maximum throughput almost 10%, cresting 80 GB/s.

We also know that the inclusion of AES-NI in all three of these workstations means that instructions are executed as fast as they’re fed from RAM, making this a bandwidth-constrained task. As we’d expect, performance scales accordingly.

The hashing benchmark is handled by the x86 cores, so the six-core -4960X understandably manages less than half of the throughput posted by both 16-core configurations.

Given the older workstation-oriented GPU in our test system, the only data point worth looking at from 3DMark is the threaded Physics test outcome. Clearly the benchmark doesn't scale according to core count. But the newer Xeon E5-2687W v2 does appear to gain from its larger shared L3 cache and higher stock clock rates.

Chris Angelini
Chris Angelini is an Editor Emeritus at Tom's Hardware US. He edits hardware reviews and covers high-profile CPU and GPU launches.
  • GL1zdA1
    Does this mean, that the 12-core variant with 2 memory controllers will be a NUMA CPU, with cores having different latencies when accessing memory depending on which MC is near them?
    Reply
  • Draven35
    The Maya playblast test, as far as I can tell, is very single-threaded, just like the other 3d application preview tests I (we) use. This means it favors clock speed over memory bandwidth.

    The Maya render test seems to be missing O.o
    Reply
  • Cryio
    Thank you Tom's for this Intel Server CPU. I sure hope you'll make a review of AMD's upcoming 16 core Steamroller server CPU
    Reply
  • Draven35
    Tell AMD that.
    Reply
  • cats_Paw
    Dat Price...
    Reply
  • voltagetoe
    If you've got 3ds max, why don't you use something more serious/advanced like Mental Ray ? The default renderer tech represent distant past like year 1995.
    Reply
  • lockhrt999
    "Our playblast animation in Maya 2014 confounds us."@canjelini : Apart from rendering, most of tools in Maya are single threaded(most of the functionality has stayed same for this two decades old software). So benchmarking maya playblast is as identical as itunes encode benchmarking.
    Reply
  • daglesj
    I love Xeon machines. As they are not mainstream you can usually pick up crazy spec Xeon workstations for next to nothing just a few years after they were going for $3000. They make damn good workhorses.
    Reply
  • InvalidError
    @GL1zdA1: the ring-bus already means every core has different latency accessing any given memory controller.Memory controller latency is not as much of a problem with massively threaded applications on a multi-threaded CPU since there is still plenty of other work that can be done while a few threads are stalled on IO/data. Games and most mainstream applications have 1-2 performance-critical threads and the remainder of their 30-150 other threads are mostly non-critical automatic threading from libraries, application frameworks and various background or housekeeping stuff.
    Reply
  • mapesdhs
    Small note, one can of course manually add the Quadro FX 1800 to the relevant file
    (raytracer_supported_cards.txt) in the appropriate Adobe folder and it will work just
    fine for CUDA, though of course it's not a card anyone who wants decent CUDA
    performance with Adobe apps should use (one or more GTX 580 3GB or 780Ti is best).

    Also, hate to say it but showing results for using the card with OpenCL but not
    showing what happens to the relevant test times when the 1800 is used for CUDA
    is a bit odd...

    Ian.

    PS. I see the messed-up forum posting problems are back again (text all squashed
    up, have to edit on the UK site to fix the layout). Really, it's been months now, is
    anyone working on it?

    Reply