Sign in with
Sign up | Sign in
Intel Xeon E5-2600 v2: More Cores, Cache, And Better Efficiency
By ,
1. All About Intel's Ivy Bridge-EP-Based Xeon CPUs

Intel has a lot of irons in its proverbial fire. The company is merrily hammering away on the mobile space with its Haswell and Silvermont architectures. It’s targeting everything from tablets to notebooks with very powerful integrated graphics, and doing a really good job, in my opinion. We have Bay Trail-based tablets in the lab that look promising, and I have a bunch of data from the Iris Pro 5200 graphics engine that hasn’tevenbeen published yet.

However, the desktop portfolio is cooling off to the side after more than two years of fairly modest evolution. Because the Tom’s Hardware team spends so much time at work and play on stationary workstations, our disappointment with Intel’s efforts in this space is woven through much of the site’s content, even if Intel does sell the fastest CPUs. Without AMD competing at a high enough level, we’ve had very little reason to recommend upgrading your host processor since the Sandy Bridge days.

Not so in the server and workstation space. There, Intel leverages its manufacturing strength to build CPUs that deftly cut through professional applications and carve up our efficiency measurements. A few months back, we previewed the Xeon E5-2697 v2 in Intel's 12-Core Xeon With 30 MB Of L3: The New Mac Pro's CPU? and determined the chip to be more efficient than either the eight-core Xeon E5-2687W or Core i7-3970X.

And now I have a pair of Xeon E5-2687W v2 processors based on the same Ivy Bridge-EP design. Although they’re 150 W CPUs (the -2697 v2 is a 130 W part), the workstation-specific chips sport the same eight cores as the first-gen -2687W. Intel added shared L3 cache, though. It also increased the peak base and Turbo Boost frequencies at a lower peak VID. And I know both CPUs bear the same TDP, but the newer architecture is simply lower-power.  

Meet Ivy Bridge-EP

After today, we’ll have benchmarked the 12- and eight-core Xeon E5-2600-series CPUs. But Intel also sells four-, six-, and 10-core models as well. In fact, there are 18 total SKUs up and down the v2 stack. The diverse line-up is derived from three physical dies sporting six, 10, and 12 cores. As you can imagine, each set of resources is intentionally modular to help simplify the creation of these different designs.

12-core die12-core die

The most complex version, which is what we previewed in the Xeon E5-2697 story, employs three columns of building blocks, consisting of cores and 2.5 MB last-level cache slices, and four rows of those resources. Multiple ring buses facilitate communication across the die, and multiplexers ensure information gets to the stop where it’s needed. There’s a single QPI agent communicating at up to 9.6 GT/s (though existing models are capped at 8 GT/s), and 40 lanes of third-gen PCI Express connectivity split into two x16 links and an additional eight-lane link. The 12-core die utilizes two memory controllers, each responsible for two channels of up to DDR3-1866.

10-core die10-core die

Stepping back to 10 cores reduces complexity quite a bit. The configuration shrinks to two columns, but is now five rows long. The QPI agent remains intact, but maxes out at 8 GT/s, while the PCI Express controller doesn’t change at all. Intel’s 10-core configuration sports a single memory controller that hosts all four channels. And there’s just one ring bus to shuttle data between the various stops, too.

Eight-core Xeon E5s are sourced from this same die, so a couple of the core/cache slices are disabled, leaving everything else functional. Incidentally, that’s how the Xeon E5-2687W v2 we’re testing today can include eight cores but also a 25 MB shared L3 cache—two cores are disabled, but the corresponding L3 remains active.

Six-core dieSix-core die

Once you drop to six cores, it’s cheaper to create a third die configuration than to create higher-volume parts from the pricey 10-core arrangement. Also a two-column part, the six-core CPU is three rows long with an 8 GT/s-capable QPI agent and the same PCI Express connectivity. Again, one memory controller is responsible for all four 64-bit DDR3 channels.

Intel uses those three dies to create a stack broken up into Advanced, Standard, Basic, and Segment-Optimized models ranging from 60 up to 150 W, and base clock rates from 1.7 up to 3.5 GHz. From the top to the bottom, the entire portfolio is compatible with the same LGA 2011 (Socket R) interface as before. That means upgrading an existing server or workstation is as easy as updating the platform’s firmware. On our Intel W2600CR2 motherboard, we simply did this with the Sandy Bridge-EP-based Xeon E5-2687Ws installed.  


Cores
LLC
QPI
Memory
Base Clock
TDP
Price
Advanced
Xeon E5-2690 v2
10
25 MB
8 GT/s
DDR3-1866
3.0 GHz
130 W
$2057
Xeon E5-2680 v2
10
25 MB8 GT/sDDR3-18662.8 GHz
115 W
$1723
Xeon E5-2670 v2
10
25 MB8 GT/sDDR3-18662.5 GHz
115 W
$1552
Xeon E5-2660 v2
10
25 MB8 GT/sDDR3-18662.2 GHz
95 W
$1389
Xeon E5-2650 v2
8
20 MB
8 GT/sDDR3-18662.6 GHz
95 W
$1166
Standard
Xeon E5-2640 v2
8
20 MB
7.2 GT/s
DDR3-1600
2.0 GHz
95 W
$885
Xeon E5-2630 v2
6
15 MB
7.2 GT/sDDR3-16002.6 GHz
80 W
$612
Xeon E5-2620 v2
6
15 MB
7.2 GT/sDDR3-16002.1 GHz
80 W
$406
Basic
Xeon E5-2609 v2
4
10 MB
6.4 GT/s
DDR3-1333
2.5 GHz
80 W
$294
Xeon E5-2603 v2
4
10 MB
6.4 GT/s
DDR3-1333
1.8 GHz
80 W
$202
Segment-Optimized
Xeon E5-2697 v2
12
30 MB
8 GT/s
DDR3-18662.7 GHz
130 W
$2614
Xeon E5-2695 v2
12
30 MB
8 GT/s
DDR3-18662.4 GHz
115 W
$2336
Xeon E5-2687W v2
8
20 MB
8 GT/s
DDR3-18663.4 GHz
150 W
$2108
Xeon E5-2667 v2
8
25 MB
8 GT/sDDR3-18663.3 GHz
130 W
$2057
Xeon E5-2643 v2
6
25 MB
8 GT/sDDR3-18663.5 GHz
130 W
$1552
Xeon E5-2637 v2
4
15 MB
8 GT/sDDR3-18663.5 GHz
130 W
$996
Xeon E5-2650L v2
10
25 MB
8 GT/sDDR3-16001.7 GHz
70 W
$1219
Xeon E5-2630L v2
6
15 MB
7.2 GT/s
DDR3-16002.4 GHz
60 W
$612

CPUs in the Advanced bin are mostly 10-core models with 25 MB of L3, though there’s an eight-core CPU with 20 MB in there as well. They all feature 8 GT/s QPI links, Hyper-Threading and Turbo Boost support, and a memory controller capable of 1866 MT/s transfer rates.

The Standard stack is smaller, with one eight-core SKU boasting 25 MB of LLC and two six-core chips complemented by 15 MB. Intel’s QuickPath Interface is deliberately slowed to 7.2 GT/s, as is the quad-channel memory controller’s maximum speed (all three processors accommodate up to DDR3-1600 modules). Hyper-Threading and Turbo Boost are both retained, though.

Both members of the Basic segment are quad-core CPUs with 10 MB of shared L3. QPI performance is pared back to 6.4 GT/s, while the memory controller tops out at DDR3-1333. That’s still arguably plenty for the lower-power applications those processors will find themselves in, though Intel does deactivate Hyper-Threading and Turbo Boost, unfortunately.

Intel Xeon E5-2600 v2: More Cores, More Cache, And Better Efficiency

Xeon E5-2687W v2: Bringing Out The Big Guns

Of course, Intel’s Xeon E5-2687W v2 doesn’t fit into any of those three categories. It’s a workstation-specific member of the Segment-Optimized line-up, purpose-built for roomy pedestal/4U enclosures where dissipating 2 x 150 W isn’t a problem, and a balance between parallelism and clock rate takes precedent over more lower-frequency cores.

Like the Xeon E5-2687W before it, -2687W v2 is an eight-core part. Its base clock rate increases from 3.1 GHz the generation prior up to 3.4 GHz, and the maximum Turbo Boost frequency similarly jumps from 3.8 to 4 GHz. An extra 5 MB of shared L3 cache typically won’t confer significant gains. However, you will see benchmark situations where it makes a difference.

Each of the -2687W v2’s QPI links operate at a full 8 GT/s. And the processor’s quad-channel memory controller supports 1866 MT/s data rates. In theory, that’s up to 59.7 GB/s per processor, though real-world throughput is always going to be lower.

The 10-core die on which our Xeon E5-2687W v2 is basedThe 10-core die on which our Xeon E5-2687W v2 is based

Of course, the second-gen Xeon E5 is built using Intel’s Ivy Bridge architecture, so it gets the subtle tweaks introduced back in April 2012 alongside the company’s desktop Core CPUs, including a handful adjustments to the core, cache, and memory controller that improve IPC throughput by a few percent compared to Sandy Bridge.

When you combine the architectural evolution, higher clock rates, and more shared L3 cache, you know what to expect going from Xeon E5-2687W to -2687W v2. But that’s not the whole story. When Intel made the switch from Sandy to Ivy Bridge, its emphasis was on transitioning from 32 to 22 nm manufacturing. The company does successfully push the Xeon E5 family’s performance story forward. However, it also cuts power consumption. That combination is great for boosting efficiency. So, we’re going to start by digging into the benchmarks, fold in power consumption, and then wrap with an energy comparison.

2. Test Setup And Benchmarks

We had Intel’s P4000 enclosure on-hand from a previous story, and used the chassis to house our dual Xeon configurations. We were able to update our Intel W2600CR2 motherboard to the latest firmware as well, adding support for the company’s Ivy Bridge-EP-based processors.

The Xeon E5-2687W v2 officially supports up to 256 GB of DDR3-1866 memory. The kits currently available operate at 1.5 V with CAS 13 timings, though. It was easiest for us to stick with the 64 GB of DDR3L-1600 at CAS 11—none of these workloads should be bandwidth-limited, after all.

Special thanks to Crucial for supplying the RAM and Intel for the platform we’ve been using for almost two years now. 

Test Hardware
Processors
2 x Intel Xeon E5-2687W v2 (Ivy Bridge-EP) 3.4 GHz, Eight Cores, LGA 2011, 8 GT/s QPI, 25 MB Shared L3, Hyper-Threading enabled, Power-savings enabled

2 x Intel Xeon E5-2687W (Sandy Bridge-EP) 3.1 GHz, Eight Cores, LGA 2011, 8 GT/s QPI, 20 MB Shared L3, Hyper-Threading enabled, Power-savings enabled

1 x Intel Core i7-4960X (Ivy Bridge-E) 3.6 GHz, Six Cores, LGA 2011, 15 MB Shared L3, Hyper-Threading enabled, Power-savings enabled
Motherboards
Intel W2600CR2 (LGA 2011) Intel 5520/ICH10R, BIOS 02.01.0002

MSI X79A-GD45 Plus (LGA 2011) Intel X79 Express, BIOS 17.5
Memory
Crucial 64 GB (8 x 8 GB) DDR3-1600 Registered ECC, MT36KSF1G72PZ-1G6M1HF

G.Skill 32 GB (4 x 8 GB) DDR3-1600 Unbuffered, F3-12800CL9Q2-32GBZL
Hard Drive
Intel SSDSA2BZ200G3 200 GB SATA 3 Gb/s (SSD 710)
Graphics
Nvidia Quadro FX 1800
Power Supply
Intel DPS-750XB A 750 W

Chicony CPB09-003A 1000 W
System Software And Drivers
Operating System
Windows 8 Professional 64-bit
DirectX
DirectX 11
Graphics DriverNvidia Quadro Driver 331.87
Benchmark Configuration
Adobe Creative Suite
Adobe After Effects CCVersion 12.0.0.404 x64: Create Video which includes three Streams, 210 Frames, Render Multiple Frames Simultaneosly
Adobe Photoshop CCVersion 14.0 x64: Filter 15.7 MB TIF Image: Radial Blur, Shape Blur, Median, Polar Coordinates
Adobe Premeire Pro CCVersion 7.0.0, 6.61 GB MXF Project to H.264 to H.264 Blu-ray, Output 1920x1080, Maximum Quality
Audio/Video Encoding
iTunesVersion 11.0.4.4 x64: Audio CD (Terminator II SE), 53 minutes, default AAC format 
LAME MP3Version 3.98.3: Audio CD "Terminator II SE", 53 min, convert WAV to MP3 audio format, Command: -b 160 --nores (160 Kb/s)
HandBrake CLIVersion: 0.9.9: Video from Canon EOS 7D (1920x1080, 25 FPS) 1 Minutes 22 Seconds
Audio: PCM-S16, 48,000 Hz, Two-Channel, to Video: AVC1 Audio: AAC (High Profile)
TotalCode Studio 2.5Version: 2.5.0.10677: MPEG-2 to H.264, MainConcept H.264/AVC Codec, 28 sec HDTV 1920x1080 (MPEG-2), Audio: MPEG-2 (44.1 kHz, 2 Channel, 16-Bit, 224 Kb/s), Codec: H.264 Pro, Mode: PAL 50i (25 FPS), Profile: H.264 BD HDMV
Productivity
ABBYY FineReaderVersion 11.0.102.583: Read PDF save to Doc, Source: Political Economy (J. Broadhurst 1842) 111 Pages
Adobe Acrobat XIVersion 11.0.0: Print PDF from 115 Page PowerPoint, 128-bit RC4 Encryption
Autodesk 3ds Max 2012 and 2013
Version 14.0 x64: Space Flyby Mentalray, 248 Frames, 1440x1080
BlenderVersion: 2.68a, Cycles Engine, Syntax blender -b thg.blend -f 1, 1920x1080, 8x Anti-Aliasing, Render THG.blend frame 1
Visual Studio 2010Version 10.0, Compile Google Chrome, Scripted
Cinebench
Cinebench R15.0 CPU Component
Euler3D
CFD simulation over NACA 445.6 aeroelastic test wing at Mach .5
Autodesk Maya 2014Tom’s Hardware Logo render in mental ray, 1920x1080, global illumination, photo-realistic motion blur, ray-traced shadows, OpenGL Test: Generate Playblast (OpenGL preview) animation to Y: RAM drive
e-on Software Vue 2014 PLECustom workload: Landscape (generated in Vue 8 full version and imported into PLE)
File Compression
WinZipVersion 18.0 Pro: THG-Workload (1.3 GB) to ZIP, command line switches "-a -ez -p -r"
WinRARVersion 5.0: THG-Workload (1.3 GB) to RAR, command line switches "winrar a -r -m3"
7-ZipVersion 9.30 Alpha: THG-Workload (1.3 GB) to .7z, command line switches "a -t7z -r -m0=LZMA2 -mx=5"
Synthetic Benchmarks and Settings
3DMark
Version: 1.1, Benchmark Only
SiSoftware Sandra 2014Version 2014.02.20.10, CPU Test = CPU Arithmetic / Multimedia / Cryptography / Memory Bandwidth
3. Results: Sandra 2014 And 3DMark

In Intel Xeon E5-2600: Doing Damage With Two Eight-Core CPUs, we saw just how much faster a pair of Sandy Bridge-EP-based Xeon E5s were than Westmere-EP- or Nehalem-EP-based Xeons. More so than on the desktop, Intel is aggressive with ramping up the core count of its business-oriented products. So, stepping up from four to six and then to eight cores per socket turns into big gains in threaded software.

The transition to 22 nm manufacturing allows Intel to create up to 12-core Xeon E5-2600 v2 CPUs. However, the replacement for its original Xeon E5-2687W is another eight-core model. Instead of adding more processing resources, Intel increases shared L3 cache to 25 MB and bumps up clock rates. Those alterations, folded in on top of the architectural changes to Ivy Bridge, result in a minor improvement to Sandra’s integer math benchmark, and a more marked speed-up in double-precision calculations.

Of course, both dual-processor setups demonstrate a significant advantage in raw processing power compared to one Core i7-4960X.

As we know from Intel Core i7-3770K Review: A Small Step Up For Ivy Bridge, the company didn’t make a ton of compelling architectural changes to its IA cores. The Xeon E5-2687W v2 does enjoy the advantage of more aggressive clock rates compared to its predecessor, though AVX support across the board means all three configurations benefit.

Even in single-processor configurations, Intel’s quad-channel memory controller facilitates lots of bandwidth. The Core i7-4960X manages more than 40 GB/s at DDR3-1866. Two Xeon E5-2687W CPUs almost double that number using DDR3-1600, achieving 74 GB/s. The Xeon E5-2687W v2s increase maximum throughput almost 10%, cresting 80 GB/s.

We also know that the inclusion of AES-NI in all three of these workstations means that instructions are executed as fast as they’re fed from RAM, making this a bandwidth-constrained task. As we’d expect, performance scales accordingly.

The hashing benchmark is handled by the x86 cores, so the six-core -4960X understandably manages less than half of the throughput posted by both 16-core configurations.

Given the older workstation-oriented GPU in our test system, the only data point worth looking at from 3DMark is the threaded Physics test outcome. Clearly the benchmark doesn't scale according to core count. But the newer Xeon E5-2687W v2 does appear to gain from its larger shared L3 cache and higher stock clock rates.

4. Results: Adobe CC

Today’s story forces us to consider one consequence of a growing emphasis on heterogeneous computing. As we offload parallel tasks to on-die or discrete graphics engines, there’s less for many-core CPUs to do.

Although it’s tempting to look at our results and assume that CUDA acceleration is helping normalize performance as the Quadro FX 1800 becomes a bottleneck, Nvidia’s older pro board isn’t on Adobe’s list of supported add-in cards. We double-checked and verified that there is no GPU activity during the test; it’s CPU-only.

We also know from past stories that our Premiere Pro rendering tasks do utilize many cores. It’s probable that our benchmark isn’t complex enough to fully demonstrate what two eight-core processors can do. The Paladin test we used previously was intensive, but designed for Premiere Pro CS5. Two generations later, our Hollywood sequence just isn’t the same.

The same goes for After Effects, which can be accelerated by CUDA/OpenCL-compatible cards, but doesn’t natively support our Quadro FX 1800. In the past, this test was actually bottlenecked by three QuickTime clips, which couldn’t be threaded. We replaced those with PNG sequences to address that limitation. Now we see 100% utilization, though scaling is not evident based on host processor performance.

Finally, by the time we get to Photoshop CC, OpenCL support is enabled on our Quadro FX 1800. Interestingly, though, backing the Nvidia card with more x86 cores doesn’t help improve the performance of accelerated filters. In fact, the opposite is true: both dual-CPU workstations are slower than the Core i7-based box.

The situation reverses when we execute a series of threaded filters. The two Xeon E5-2687W v2s do their job in half the time of one Core i7-4960X. Chalk this up as an application where it pays to know where to spend money on hardware. Certain filters are going to push mainstream CPUs with high clock rates. Others will favor massively parallel configurations. And a few more are optimized for OpenCL.

5. Results: Media Encoding

Rovi’s TotalCode Studio certainly doesn’t scale according to core count or cost. However, two Xeon E5-2687W v2 CPUs are quicker than last generation’s Xeon E5-2687Ws, which are in turn faster than a single Core i7-4960X. Because the gains are so small, though, you probably won’t rush to add cores if you’re encoding video with TotalCode.

Converting video clips to H.264 in HandBrake scales far better. This is particularly interesting because HandBrake employs the x264 encoder, which is really well-optimized for many-core CPUs. Beyond that, there are builds of HandBrake that support Intel’s Quick Sync technology and OpenCL (which offloads cropping and down-scaling to GPUs).

Also, we know from our early work with x265 (Next-Gen Video Encoding: x265 Tackles HEVC/H.265) that next-gen encoders are going to be very performance-hungry as they facilitate higher quality at the same bit rates or the same quality at lower bit rates compared to H.264. When quality necessitates a software encoder, expect the very fastest host processors to deliver the best experience.

Like Photoshop, Sony Vegas uses OpenCL acceleration to speed up this workload. Our Quadro FX 1800 sits around 82% utilization, while IA cores hover under 25% on the Core i7. And also like Photoshop, performance doesn’t improve on a platform with more cores. Instead, the Core i7 is fastest, while the Xeon setups essentially tie.

LAME and iTunes, both single-threaded metrics, reflect the same thing: Ivy Bridge at high clock rates is quicker than Sandy Bridge at lower frequencies. Much of this is owed to Intel’s transition from 32 to 22 nm manufacturing, facilitating more aggressive settings within the same thermal envelope.

6. Results: Rendering

Maxon’s Cinebench R15 release (based on the company's Cinema 4D product) is a bit different from past versions of the benchmark. It’s able to utilize up to 256 cores (physical or logical) to render a scene with around 2000 objects made up of more than 300,000 polygons. Maxon altered the scale significantly so that results from previous versions can’t be compared—that’s why the numbers are so much higher than Cinebench results we’ve presented in the past.

The single-core numbers reflect the difference between Intel’s Sandy and Ivy Bridge architectures. Meanwhile, the multi-core component illustrates the difference between six and 16 cores. Moreover, the Ivy Bridge-EP-based Xeon E5-2687W v2 enjoys an extra advantage due to its tuned architecture and higher operating frequency.

Our 3ds Max workload is real-world, so you don’t get the same sort of scaling that comes from a synthetic designed to extract maximum performance. With that said, we see a massive speed-up going from the single Core i7 to the dual-processor workstations. Ivy Bridge-EP is marginally faster than Sandy Bridge-EP, but that’s what we would have expected given a comparable core count and small clock rate increases. Where we’re really hoping for big gains is the efficiency measurement, where performance and power get factored together.

Or maybe it really is possible to maximize performance using a real-world workload. Our Blender test makes a clear distinction between the fastest desktop processor you can buy and Intel’s workstation-oriented Xeon E5s in dual-processor configs.

Again, comparing the Xeon E5-2687W and -2687W v2 reveals relatively minor performance differences, as expected. Power is where these two should stand apart.

e-on Software’s Vue 2014 gives us another stark comparison between the very best you can do on the desktop-oriented LGA 2011 platform and what becomes possible as you step into the realm of Xeon-powered workstations. Our custom landscape test takes more than 22 minutes to render on the Core i7. Stepping up to a pair of Xeon E5-2687Ws cuts that under 10 minutes. And the newer -2687W v2s fall under nine.

Our playblast animation in Maya 2014 confounds us. Our best theory is that the same GPU utilization issue that keeps OpenCL-accelerated titles like Vegas and Photoshop from favoring the dual-CPU workstations is in effect here as well, giving Intel’s Core i7 the lead.

7. Results: Productivity

Compiling Google’s Chrome Web browser in Visual Studio 2010 shows off another strength of our dual-CPU machines. Not all development projects are going to benefit as profoundly; however, in this particular test, Intel’s Core i7-4960X needs more than 15 minutes to finish the job. Last generation’s Xeon E5-2687W wraps up in less than 10 minutes. Two Intel Xeon E5-2687W v2s get back to idle in fewer than nine minutes.

Based on the STARS Euler3D computational fluid dynamics production code, Euler3D’s workload is described as follows

“The benchmark testcase is the AGARD 445.6 aeroelastic test wing. The wing uses a NACA 65A004 airfoil section and has a panel aspect ratio of 1.65, a taper ratio of 0.66, and a 45 degree quarter-chord sweep angle. This AGARD wing was tested at the NASA Langley Research Center in the 16-foot Transonic Dynamics Tunnel and is a standard aeroelastic test case used for validation of unsteady, compressible CFD codes…The benchmark CFD grid contains 1.23 million tetrahedral elements and 223 thousand nodes. The benchmark executable advances the Mach 0.50 AGARD flow solution. Our benchmark score is reported as a CFD cycle frequency in Hertz.”

Because each Xeon E5-2687W v2 sports eight cores, the Ivy Bridge-EP-based setup is easily more than twice as fast as a six-core Core i7-4960X. The current-gen Xeons are also quite a bit quicker than their predecessors thanks to higher clock rates.

Software developer ABBYY puts a lot of effort into optimizing for threading, and the latest version of FineReader continues utilizing all of the host processing resources we throw at it, so long as each core gets 512 MB of RAM. You might not consider optical character recognition to be a compute-intensive operation, but the Xeon E5-2687W v2s finish our benchmark workload in half the time as a flagship Core i7.

In contrast, printing a PowerPoint presentation to PDF is a decidedly single-threaded operation that doesn’t benefit from many cores. But because of Intel’s shift to 22 nm manufacturing and its effect on power, the company can set its Xeon E5-2687W v2 to run at 4 GHz in situations where only one core is active. As a result, the new Xeon is just about as fast as the six-core -4960X also based on the Ivy Bridge architecture, and almost 10% faster than the original Xeon E5-2687W.

8. Results: Compression

I typically think of 7-Zip as our best-threaded file compression benchmark. However, the fact that two Xeon E5-2687Ws finish first suggest that something else is limiting performance. All else being equal, we’d expect the Ivy Bridge-based version to win—it runs at higher clock rates, has more cache, and offers additional memory bandwidth.

In any case, the dual-processor workstations are at least notably quicker than one Core i7-4960X.

WinRAR is better known for favoring architectural tweaks that improve efficiency per clock cycle. Not surprisingly, the two Ivy Bridge-based CPUs finish in the lead, ahead of two Sandy Bridge-EP-based processors.

Our WinZip chart includes three separate benchmarks, and the very latest from Intel makes them difficult to interpret.

Let’s start with the longest bar, corresponding to the EZ test. This represents maximum compression. Our Core i7 and dual Sandy Bridge-EP-based Xeons score similarly. Meanwhile, the -2687W v2 crushes this test. We actually saw the same thing in Intel's 12-Core Xeon With 30 MB Of L3: The New Mac Pro's CPU?, and the benchmark is consistent.

Then there’s the general CPU benchmark, which is well-threaded in WinZip 18.0, and appears to reward both dual-processor workstations compared to the Core i7.

Finally, we have the OpenCL-accelerated test, which does run faster on the Core i7, but slows down on the dual-socket systems versus CPU-only processing. Even those slower results remain faster than the Core i7’s finish, though. Here’s my stab at an explanation: WinZip only offloads files larger than 8 MB to the graphics card for compression. Because our workload is a blend of file sizes, the OpenCL-accelerated files slow down the 16-core setups. Meanwhile the six-core -4960X does enjoy some speed-up from Nvidia’s Quadro FX 1800. Ultimately, though, the well-threaded compression engine still runs everything else through the Xeons faster.

9. Power Consumption And Efficiency

Our benchmark suite is automated so that tests run in the same order each time, with the same delays between commands. There is even a period of idle time injected at the end to capture the reality that even high-end workstations aren’t under load 24x7. At the end of that idle period, the workstation shuts itself down automatically.

As that’s happening, we log power consumption. The above chart represents power use through the run. We also get a sense for how long each configuration takes to finish the batch file and turn itself off, given the length of each line. Right away it’s clear that two Xeon E5-2687W v2s complete our battery of benchmarks faster than first-gen -2687Ws, and they do it using less energy.

Averaging the data points together shows that, indeed, the newer Xeons use 20 W less through our suite. That’s pretty remarkable considering:

  1. The new Xeons operate at higher clock rates under load and in lightly-threaded apps.
  2.  The new Xeons have 5 MB more of shared L3 cache each.
  3. The average results have a ton of single-threaded work and idle time factored in; considering threaded workloads-only would exacerbate the difference.

Of course, the averages themselves don’t take into account how quickly a given platform got its job done, dropped to idle, and stopped using power. For that, we need to create a unit of energy by multiplying wattage by the time it takes to finish our workload.

Those single-threaded tasks and that idle time give Intel’s Core i7 a big advantage when it comes to average power consumption. However, because the two Xeon E5-2687W v2s are so much faster, they gain quite a bit of ground when we factor performance into the equation.

Compared to first-gen E5s, the new -2687W v2s use less power and are faster. That’s a recipe for an efficiency sweep, reflected in a 42 Wh advantage in our benchmark suite.

10. Ivy Bridge-EP: Faster And More Efficient On The Same Platform

It’s uncommon for professionals to pull one-generation-old CPUs out of their workstations and upgrade, but that’s technically what Intel’s Xeon E5-2600 v2 line-up lets you do. The company successfully shifted from 32 to 22 nm manufacturing, simultaneously enabling more complex processors (with up to 12 physical cores and 30 MB of shared L3 cache) that fit within previously-established thermal envelopes and drop into existing LGA 2011-equipped motherboards, after a firmware update, of course.

Beyond the increases to core count, cache, and clock rates, the Xeon E5-2600 v2s also center on the Ivy Bridge architecture. So, there is a handful of tweaks that improve per-cycle performance compared to Sandy Bridge as well. Finally, certain SKUs feature more aggressive data rates, pushing memory support to DDR3-1866 in some cases.

None of the workloads we ran need that much bandwidth. However, our benchmarks have no trouble illustrating where the Xeon E5-2687W v2 is better than its predecessor. Higher Turbo Boost frequencies mean the second-gen model wins in single-threaded tests. Even the clock rates in fully-loaded situations are an improvement, so you get more performance there, too. And regardless of the benchmark, power consumption is lower on the system with Ivy Bridge-EP-based CPUs, despite the consistent 150 W TDP.

Sure, you could save a ton of money and use even less energy by going with Intel’s Core i7-4960X. And in some cases, that actually makes sense. An increasing number of applications are being optimized for heterogeneous computing, which might exploit a highly parallelized graphics processor for massive performance gains in specific tasks. In those titles, throwing more money at a faster GPU will yield bigger gains than a second CPU. Then again, we just saw several examples of two Xeon E5s cutting the processing time of compile jobs, OCR workloads, and renders in half (or better).

I haven’t been very nice to Intel’s desktop team for a couple of subsequent generations. The step from Sandy Bridge to Ivy Bridge was disappointing for enthusiasts. Similarly, Haswell didn’t give us much more to be excited about. Same four cores, same 8 MB of shared L3, same 16 lanes of PCIe, and minor speed-ups attributable to architectural tweaks. Ho hum.

But in the Xeon world, Intel takes the thermal headroom freed up by its advanced manufacturing and more thoroughly utilizes it, leaving customers to choose whether they need more cores, higher clocks, or simply comparable performance at reduced power consumption. That’s the kind of innovation enthusiasts want to see more of.