After almost 14 years of writing about technology, I think it’s safe to say that I’ll perpetually enjoy getting my hands on the latest gear, testing it this way and that, and conveying my own impressions to folks who share my passion.
Although gaming-oriented components garner the most attention on this site, by far, enthusiasts can’t help but also get excited about more IT-oriented hardware, too. You might have a Phenom II X6 in your gaming box, but there’s a fair chance you also joined millions of readers who were curious about the water-cooled quad-Opteron rig that Puget Systems built in What Does A $16 000+ PC Look Like, Anyway?
Today’s story takes us down a similar path. We already evaluated the entire family of Sandy Bridge-E based Core i7-3000-series CPUs in Intel Core i7-3960X Review: Sandy Bridge-E And X79 Express and Intel Core i7-3930K And Core i7-3820: Sandy Bridge-E, Cheaper. We know that Intel neutered all of those desktop-oriented processors to some degree—whether to hit certain power targets at client-friendly clock rates or more easily differentiate its server parts, we may never know for sure.
But now we have access to the full Monty, branded as Xeon E5 for single-, dual-, and quad-socket servers.
Meet Sandy Bridge-EP
Intel uses the same piece of silicon to enable its Xeon E5s and Core i7-3000-series CPUs. As we know, Core i7s top out with six cores and 15 MB of shared L3. But the die actually hosts eight cores and 20 MB of last-level cache.
The modularity of this design is enabled by the same ring bus concept first introduced in Intel’s Second-Gen Core CPUs: The Sandy Bridge Review more than a year ago (more accurately, Xeon 7500s were Intel's first CPUs with ring buses, but we never tested them). You have cores, PCI Express control, QPI links, and a quad-channel memory controller all with stops around the ring. Because each core is tied to a 2.5 MB slice of L3 cache, it’s relatively easy to manipulate the die’s specifications to create a large number of derivative products with performance that scales up and down in a predictable way.
For a product like Core i7-3960X, Intel simply snipped two cores and their respective 2.5 MB cache slices. But the L3 can even be tweaked more specifically than that. A few Xeon E5 models present 2 MB/core, demonstrating granularity down to 512 KB chunks.
Today we’re able to test Sandy Bridge-EP (for Efficient Performance) in its most potent form: Xeon E5-2687W—a 150 W workstation-only processor boasting all eight of the die’s physical cores, its full 20 MB cache, twin 8 GT/s QPI links, 40 lanes of on-die third-gen PCIe, and a quad-channel memory controller capable of DDR3-1600. Manufactured at 32 nm, this highly-integrated SoC is composed of 2.27 billion transistors packed onto a portly 434 mm² die.

A maximum Turbo Boost frequency of 3.8 GHz makes the Xeon E5-2687W a little slower than Core i7-3960X, which hits 3.9 GHz, in lightly-threaded applications. However, a 3.1 GHz base frequency compares favorably to the -3960X’s 3.3 GHz clock in more taxing workloads thanks to the Xeon’s two-core advantage.
Although the Xeon includes more cache, it maintains the same one-core-to-2.5 MB ratio as the Core i7, and indeed most of the other Xeon E5 models.

The other notable difference between single-socket Core i7s/Xeon E5-1600s and Intel’s multi-socket platforms is the exposure of QPI. When Intel replaced the Gulftown-based processors with Sandy Bridge-E, it simultaneously shifted from three-piece platforms (CPU, northbridge, and southbridge) to a two-chip layout (CPU, platform controller hub), eliminating the I/O hub responsible for hosting PCI Express connectivity. The link between processor and northbridge, previously facilitated by QPI, was severed. With PCIe built right into Sandy Bridge-E, the southbridge component could be hitched right up to the CPU through a PCI Express-like Direct Media Interface. Thus, QPI is completely inactive on Sandy Bridge-E.
Multi-socket systems still need it for inter-processor communication, though. Sandy Bridge-EP CPUs feature two QPI links. In 2S configurations, they’re both used to shuttle data back and forth between sockets. With four processors in play, they create more of a circle, connecting each chip to the right and left. Intel tinkers with the QPI data rate as a differentiating feature, but whereas the Xeon 5600s topped out at 6.4 GT/s, yielding 25.6 GB/s per link, the highest-end Xeon E5s host 8 GT/s links, pushing bandwidth to 32 GB/s per link. Obviously, in a 2S workstation like ours, 64 GB/s of aggregate QPI bandwidth is super-duper overkill. But we’re happy to know that the days of front-side bus-based bottlenecks are over.
Aside from core count, last-level cache, and QPI, Sandy Bridge-EP is architecturally similar to Sandy Bridge-E. AVX support, AES-NI, second-gen Turbo Boost, Hyper-Threading—all of those familiar capabilities are included.
The only other difference of note is that Sandy Bridge-EP’s quad-channel memory controller supports mirroring, single device data correction, and lockstep. All three were available from Xeon 5500/5600 as well, but the whole triple-channel memory controller arrangement necessitated compromises. Now, you can mirror two channels and recover from a failure in each. Hooray for nice, round numbers.
An SoC architected for scalability paves the way for a very diverse portfolio of processors. There are actually four distinct families of Xeon E5 CPUs, each packaged up for a slightly different purpose.
Intel’s previous naming scheme allowed very little room to distinguish a large line-up, so it was forced to revamp its nomenclature. Xeon E7s are already available, as are the entry-level Xeon E3s. Xeon E5 sits in the middle, with a fair bit of overlap on both ends. Now, from the bottom to top, we should see some degree of consistency used in assigning model numbers. Let’s break it down:
First, you have the brand, Xeon. Easy enough. Then there’s the product line: E3, E5, or E7. Again, we get the general sense that E3 is intended for entry-level single-socket workstations and servers, while E5 now spans a broader range from single- to quad-socket systems. The E7s cover two-, four-, and eight-socket servers.
The first digit you encounter specifies wayness, or the maximum number of CPUs in a node (that’s 1, 2, 4, or 8).
The second is indicative of socket type. Somewhat confusingly, Intel plans to use the numbers 2, 4, 6, and 8 moving forward. However, the actual interface corresponding to each digit may change. At least for 2012, we end up with the following associations:
2 = LGA 1155
4 = LGA 1356
6 = LGA 2011
8 = LGA 1567
The last two numbers are SKU designators like 10, 20, 30, and so on. Although there’s no formula to tell you why one chip might be a 50 and another a 70, Intel says it uses a combination of core count, cache size, clock rate, QPI data rates, and so on to classify each chip.
Certain models might also receive a single-letter suffix. For example, a model ending in L is meant as a low-power part. The CPUs we’re testing today are flagged as workstation models with a W suffix.
Finally, in the future, Intel plans to use a version number after the model name like v2 or v3 to identify generational progression. Ivy Bridge-based CPUs will be the first to employ those.
Xeon E5-1600
Based on the aforementioned information, we know that a Xeon E5-1600 processor is designed for single-socket LGA 2011-based configurations. And would it surprise you to learn that the three available models mirror the trio of desktop Core i7-3000s we’ve already reviewed? Specification-wise, they match the Core i7-3960X, -3930K, and 3820 exactly, adding ECC memory support as a principal differentiator. The Xeons also support up to 375 GB of memory, according to Intel, along with vPro technology.
Xeon E5-2600
Now we’re talking about hardware you can’t already get on the desktop side, since the -2600s support two-socket arrangements. The largest family of Xeon E5s, the -2600s are 17-strong, ranging from an 80 W dual-core model to a workstation-specific eight-core 150 W flagship. In between, you’ll find four- and six-core models at 80 and 95 W. A pair of low-power SKUs even dips down to 60 W.
| Cores/Threads | Cache | TDP | QPI | Memory Support | |
|---|---|---|---|---|---|
| Advanced | |||||
| Xeon E5-2690 | 8/16 | 20 MB | 135 W | 8 GT/s | DDR3-1600 |
| Xeon E5-2680 | 8/16 | 20 MB | 130 W | 8 GT/s | DDR3-1600 |
| Xeon E5-2670 | 8/16 | 20 MB | 115 W | 8 GT/s | DDR3-1600 |
| Xeon E5-2665 | 8/16 | 20 MB | 115 W | 8 GT/s | DDR3-1600 |
| Xeon E5-2660 | 8/16 | 20 MB | 95 W | 8 GT/s | DDR3-1600 |
| Xeon E5-2650 | 8/16 | 20 MB | 95 W | 8 GT/s | DDR3-1600 |
| Standard | |||||
| Xeon E5-2640 | 6/12 | 15 MB | 95 W | 7.2 GT/s | DDR3-1333 |
| Xeon E5-2630 | 6/12 | 15 MB | 95 W | 7.2 GT/s | DDR3-1333 |
| Xeon E5-2620 | 6/12 | 15 MB | 95 W | 7.2 GT/s | DDR3-1333 |
| Basic | |||||
| Xeon E5-2609 | 4/4 | 10 MB | 80 W | 6.4 GT/s | DDR3-1066 |
| Xeon E5-2603 | 4/4 | 10 MB | 80 W | 6.4 GT/s | DDR3-1066 |
| Additional LGA 2011 SKUs | |||||
| Xeon E5-2687W | 8/16 | 20 MB | 150 W | 8 GT/s | DDR3-1600 |
| Xeon E5-2667 | 6/12 | 15 MB | 130 W | 7.2 GT/s | DDR3-1333 |
| Xeon E5-2643 | 4/8 | 10 MB | 130 W | 6.4 GT/s | DDR3-1066 |
| Xeon E5-2637 | 2/4 | 5 MB | 80 W | ||
| Low Power | |||||
| Xeon E5-2650L | 8/16 | 20 MB | 70 W | 8 GT/s | DDR3-1600 |
| Xeon E5-2630L | 6/12 | 15 MB | 60 W | 7.2 GT/s | DDR3-1333 |
We got our hands on a pair of Xeon E5-2687Ws, the aforementioned 150 W parts set aside explicitly for workstation configs. Armed with eight cores, a 3.1 GHz base clock rate (3.8 GHz at its highest Turbo Boost frequency), 20 MB of L3 cache, and 8 GT/s QPI links, this is pretty much top of the line, so long as you’re able to keep it cool.
Xeon E5-4600
Past-generation Xeon 5500 and 5600s were limited to dual-socket systems. So, it might seem strange that there’s an entire line of Xeon E5s built to drop into glueless quad-socket platforms. But as we already saw with the Xeon E7s, Intel doesn’t seem to be trying to segment its server CPUs based on processor count anymore. As a result, we have the Xeon E5-4600 series.
Spanning four- to eight-core models with two QPI links each, the E5-4600s are less expensive than the E7s, which employ four QPI links and up to 10 cores per CPU. On a sliding scale, Xeon E7s have an upper hand in enterprise performance, memory expandability, and RAS functionality, while the E5s rule in performance/watt and density-oriented HPC environments.
| Cores/Threads | Cache | TDP | QPI | Memory Support | |
|---|---|---|---|---|---|
| Advanced | |||||
| Xeon E5-4650 | 8/16 | 20 MB | 130 W | 8 GT/s | DDR3-1600 |
| Xeon E5-4640 | 8/16 | 20 MB | 95 W | 8 GT/s | DDR3-1600 |
| Standard | |||||
| Xeon E5-4620 | 8/16 | 16 MB | 95 W | 7.2 GT/s | DDR3-1333 |
| Xeon E5-4610 | 6/12 | 15 MB | 95 W | 7.2 GT/s | DDR3-1333 |
| Basic | |||||
| Xeon E5-4607 | 6/12 | 12 MB | 95 W | 6.4 GT/s | DDR3-1066 |
| Xeon E5-4603 | 4/8 | 10 MB | 95W | 6.4 GT/s | DDR3-1066 |
| Low Power | |||||
| Xeon E5-4650L | 8/16 | 20 MB | 115 W | 8 GT/s | DDR3-1600 |
| Frequency-Optimized | |||||
| Xeon E5-4617 | 6/12 | 15 MB | 130 W | 7.2 GT/s | DDR3-1600 |
You’ll find eight -4600 SKUs sporting between four and eight cores, and with TDPs that range from 95 to 130 W.
Xeon E5-2400
All of the Xeon E5-x600 processors drop into the LGA 2011 interface with which we’re already familiar. But Intel is introducing another socket for premium 1S and entry-level 2S systems called LGA 1356. Although it’s the true successor to LGA 1366, the 1356-pin socket isn’t compatible (likely as a result of power changes and the on-die PCI Express control). Like its precursor, though, LGA 1356 processors employ three memory channels and a single QPI link connecting CPUs in a 2S configuration. They also offer fewer third-gen PCI Express lanes: 24 rather than 40.
A second new interface is less of a big deal in the server space than it would be for desktop users, since the enterprise guys don’t spend a lot of time popping new CPUs into rack-mounted machines. As a result, the Xeon E5-2400s are simply Intel’s way to get more mileage out of its architecture and bridge the gap between its single-socket E5s and the more performance-oriented E5-x600s.
| Feature | Xeon E5-2600 Family | Xeon E5-2400 Family |
|---|---|---|
| Processor Interface | LGA 2011 | LGA 1356 |
| Memory Channels | 4 Per CPU | 3 Per CPU |
| Max DIMM Slots | 24 | 12 |
| Max Memory | 768 GB | 384 GB |
| PCIe Lanes/Controllers | 80 / 20 | 48 / 12 |
| Thermal Targets | 150, 135, 130, 115, 95, 80, 70, 60 W | 95, 80, 70, 60 W |
| Usage | Server/Workstation | Server |
Keeping Them Cool
Intel’s Core i7-3000 processors are its first desktop models to ship without any bundled cooling, leaving power users to pick their own solution (fortunately, we have you covered there with Big Air: 14 LGA 2011-Compatible Coolers For Core i7-3000, Reviewed). That was a controversial decision, since enthusiasts all use pedestal enclosures with fairly similar dimensions.

The server and workstation spaces aren’t as general, though. Some of these chips might find their way into freestanding small business boxes, while others go into narrow 1U chassis. It’s a little more understandable that you buy cooling for these Xeon E5s separately, based on your application.

Three heat sinks cover all 37 of the processors being introduced. Two of them, STS200P and STS200PNRW, are 25.5 mm-tall for rack-mounted environments. The former is a square 91.5x91.5 mm, while the latter is 70 mm wide and 106 mm long to accommodate the narrower sockets typical of HPC-oriented blades. Both are passive and rated for TDPs of up to 130 W. The third cooler, STS200C, includes a removable fan and is able to cope with thermal ceilings of up to 150 W.
Thus far, our only experience with Intel’s platform controller hub code-named Patsburg is X79 Express. However, the same piece of silicon is also used as a foundation for the C600 chipset family.
We’ve long known that X79 didn’t expose all of the core logic’s integrated functionality. It comes close, but there’s an entire Storage Controller Unit that goes unused. Actually, that’s not entirely true. We recently saw ECS’ X79R-AX enable four SAS ports in Seven $260-$320 X79 Express Motherboards, Reviewed.
The PCH that ECS employs corresponds to the –B variant of C600. Otherwise identical to X79 (including the same 14 USB 2.0 ports, an integrated gigabit Ethernet MAC, eight lanes of second-gen PCIe, and HD Audio), the –B model officially adds four 3 Gb/s SAS ports to the four 3 Gb and two 6 Gb/s SATA connectors. Intel’s Rapid Storage Technology enterprise driver facilitates RAID 0, 1, 10, and, with the addition of a BIOS update, RAID 5 support with hardware-based XOR across the SATA ports. SAS is limited to RAID 0, 1, and 10, though you can add an upgrade ROM to get RAID 5 as well.
| Intel C600 Chipset | ||||
|---|---|---|---|---|
| -A | -B | -D | -T | |
| PCH-Based SATA 3Gb/s Ports | 4 | 4 | 4 | 4 |
| PCH-Based SATA 6Gb/s Ports | 2 | 2 | 2 | 2 |
| SCU-Based Ports | 4 x SATA | 4 x SAS | 8 x SAS | 8 x SAS |
| RSTe SATA RAID Support | RAID 0/1/10/5 | RAID 0/1/10/5 | RAID 0/1/10/5 | RAID 0/1/10/5 |
| RSTe SAS RAID Support | No | RAID 0/1/10 | RAID 0/1/10 | RAID 0/1/10 |
| RST3 SAS RAID 5 Support | No | No | No | Yes |
| Silicon-Based RAID 5 XOR | Yes | Yes | Yes | Yes |
| PCI Express 3.0 x4 Uplink | No | No | Yes | yes |
Stepping up to the –D SKU doubles SAS connectivity to eight ports. Add that to the PCH’s native SATA and you end up with two 6 Gb/s ports and 12 3 Gb/s ports. Now, consider that C600 connects to one Xeon E5 processor via DMI 2.0—a four-lane PCIe 2.0-like link with 20 Gb/s of bidirectional throughput. That's a bottleneck just waiting to happen. So, Intel connects the PCH's SCU directly to four PCIe lanes hijacked from one of the processors, alleviating traffic from the storage controller.
The flagship –T version is functionally identical (including the eight SAS ports and four-lane uplink), only it includes RAID 5 support for the SATA and SAS ports, too. It’s not clear how much of a premium stepping up through the C600 hierarchy adds to Xeon E5-ready motherboards. However, if you were planning on buying an add-in HBA or RAID controller anyway, the option to get much of that functionality on-board is certainly convenient.
If you don’t need any of that fancy stuff, there’s a baseline –A model with four SATA 3Gb/s ports and six SATA 6Gb/s ports, four of which are tied to the SCU. It still supports RAID 0, 1, 10, and 5, and it includes hardware-based XOR, too. There’s just no SAS connectivity.
Intel sent us one of its P4000 enclosures and an W2600CR motherboards to test with. The roomy enclosure was much more acoustically-friendly than some of the Intel workstations we've tested in the past.
Crucial's team also stepped up to help us with this story, sending over 96 GB of very hard-to-find registered DDR3-1600 memory to populate our Xeon E5 and 5500/5600 platforms. DDR3-1333 is far more common, but it's unable to push our Xeon E5s to their highest supported data rate with one module installed per channel. Although registered modules are inherently slower than unbuffered memory, we actually realized better memory bandwidth using Crucial's hardware than a 32 GB quad-channel desktop kit that stood in while we set these workstations up.
| Test Hardware | |
|---|---|
| Processors | 2 x Intel Xeon E5-2687W (Sandy Bridge-EP) 3.1 GHz, Eight Cores, LGA 2011, 8 GT/s QPI, 20 MB Shared L3, Hyper-Threading enabled, Power-savings enabled |
| 2 x Intel Xeon X5680 (Westmere-EP) 3.33 GHz, Six Cores, LGA 1366, 6.4 GT/s QPI, 12 MB Shared L3, Hyper-Threading enabled, Power-savings enabled | |
| 2 x Intel Xeon W5580 (Nehalem-EP) 3.2 GHz, Four Cores, LGA 1366, 6.4 GT/s QPI, 8 MB Shared L3, Hyper-Threading enabled, Power-savings enabled | |
| 1 x Intel Core i7-3960X (Sandy Bridge-E) 3.3 GHz, Six Cores, LGA 2011, 15 MB Shared L3, Hyper-Threading enabled, Power-savings enabled | |
| Motherboards | Intel W2600CR (LGA 2011) Intel 5520/ICH10R, BIOS 50;53;28;112 |
| Intel S5520SCR (LGA 1366) Intel 5520/ICH10R, BIOS 50;53;28;112 | |
| Gigabyte X79-UD5 (LGA 2011) Intel X79 Express, BIOS F9 | |
| Memory | Crucial 64 GB (8 x 8 GB) DDR3-1600 Registered ECC, MT36KSF1G72PZ-1G6M1HF |
| G.Skill 32 GB (4 x 8 GB) DDR3-1600 Unbuffered, F3-12800CL9Q2-32GBZL | |
| Hard Drive | Intel SSDSA2BZ200G3 200 GB SATA 3 Gb/s (SSD 710) |
| Graphics | AMD FirePro V5900 |
| Power Supply | Intel DPS-750XB A 750 W |
| Chicony CPB09-003A 1000 W | |
| System Software And Drivers | |
| Operating System | Windows 7 Ultimate 64-bit |
| DirectX | DirectX 11 |
| Graphics Driver | FirePro Driver 8.911.3.1 |

Although we’ve seen AMD’s share of workstation CPU market grow to nearly five percent (in 2006, according to Jon Peddie Research), it’s now essentially zero. We’ve repeatedly invited AMD to participate in our workstation-oriented coverage, but it concedes that it’s no longer a player in this space.
Unlike the last time we looked at a pair of Xeon processors, AMD does have a suitable workstation chipset available in the SR5690, so we’d still very much like to see the company get more involved in courting professional customers (especially since it has all of those FirePro cards to sell them...). For now, we have four different configurations spanning three generations of Intel hardware.
| Benchmarks and Settings | |
|---|---|
| Audio/Video Encoding | |
| MainConcept 2.2 | Version: 2.2.0.5440 Video: MPEG-2 to H.264, MainConcept H.264/AVC Codec, 28 sec HDTV 1920x1080 (MPEG-2) Audio: MPEG-2 (44.1 kHz, Two-Channel, 16-Bit, 224 Kb/s) Codec: H.264 Pro, Mode: PAL 50i (25 FPS), Profile: H.264 BD HDMV |
| HandBrake CLI | Version: 0.9.5 Video: Big Buck Bunny (720x480, 23.972 frames) Five Minutes Audio: Dolby Digital, 48 000 Hz, Six-Channel, English to Video: AVC1 Audio1: AC3 Audio2: AAC (High Profile) |
| Lame MP3 | Version: 3.98.3 Audio CD "Terminator II SE", 53 min, convert WAV to MP3 audio format, Command: -b 160 --nores (160 Kb/s) |
| Applications | |
| Adobe After Effects | Version: CS5.5 Tom's Hardware Workload, SD project with three picture-in-picture frames, source video at 720p, Render Multiple Frames Simultaneously |
| Adobe Photoshop | Version: CS5 Tom's Hardware Workload, Radial Blur, Shape Blur, Median, Polar Coordinates filters |
| Adobe Premiere Pro | Version: CS5.5 Paladin Workload, Maximum Render Quality, H.264 Blu-ray profile |
| e-on Software Vue 8 PLE | 1920x1080 landscape render, Global Illumination enabled |
| SolidWorks 2010 | PhotoView 360 Render 01-Lighter Explode.SLDASM (SolidMuse.com) Image Output Resolution: 1920x1080, Render: Preview Quality “Good”, Final Render Quality “Best” |
| Euler3D | CFD simulation over NACA 445.6 aeroelastic test wing at Mach .5 |
| 3ds Max 2012 | Version: 10 x64 Rendering Space Flyby Mentalray (SPECapc_3dsmax9), Frame: 248, Resolution: 1440 x 1080 |
| Blender | Version: 2.62 Syntax blender -b thg.blend -f 1, Resolution: 1920x1080, Anti-Aliasing: 8x, Render: THG.blend frame 1, Cycles renderer and internal tile renderer (9x9) |
| Visual Studio 2010 | Compile Chrome project (1/31/2012) with devenv.com /build Release |
| ABBYY FineReader 10 | Version: 10 Professional Build (10.0.102.82) Read PDF, save to Doc, Source:Political Economy (J. Broadhurst 1842) 111 Pages |
| 7-Zip | Version 9.22 beta LZMA2, Syntax "a -t7z -r -m0=LZMA2 -mx=5", Benchmark: 2010-THG-Workload |
| WinRAR | Version: 4.11 RAR, Syntax "winrar a -r -m3", Benchmark: 2010-THG-Workload |
| WinZip | Version: 16.0 Pro WinZip CLI, Benchmark: 2010-THG-Workload |
| Synthetic Benchmarks and Settings | |
| SiSoftware Sandra 2012 SP2 | CPU Test=CPU Arithmetic/Multimedia, Memory Test=Bandwidth Benchmark, Cryptography, Cache Performance |
| Cinebench 11.5 | CPU Test, Built-in benchmark |

The theoretical gains moving from two prior-generation Xeon 5600s to a pair of Xeon E5s is impressive, just as the shift from Xeon 5500 to Xeon 5600 was.
A single Core i7-3960X does extremely well compared to a pair of Xeon W5580s. However, the Xeon E5-2687Ws, based on the same Sandy Bridge architecture, benefit from an additional two cores each.

Sandra 2011’s multimedia suite similarly shows the Xeon E5s dominating. We even turned AVX instructions off to make the results more comparable. Applications optimized for the x86 extensions enjoy even greater throughput.
I didn’t bother running standalone AVX numbers this time around because the core architecture we’re dealing with here is identical to the desktop implementation. If you’d like a comparison of Intel’s AVX implementation compared to AMD’s, check out this page in AMD Bulldozer Review: FX-8150 Gets Tested, where Cakewalk’s CTO Noel Borthwick gave us access to AVX-optimized routines from Sonar X1 for testing.

Three of the CPUs in this test should support AES-NI. As I discovered when I wrote Intel Xeon 5600-Series: Can Your PC Use 24 Processors?, the company’s Xeon 5600 engineering samples didn’t yet support the feature, though. As a result, only the Core i7-3960X and Xeon E5s reflect acceleration.
Why the huge performance gap? Well, we have two processors cranking on cryptography versus one, for starters. What might you expect to see from a pair of retail Xeon 5600s in the same test? Lower performance than the E5s, almost certainly. A hardware-based feature like AES-NI is incredibly easy to execute, and we know from tests that I ran in Intel Core i7-3960X Review: Sandy Bridge-E And X79 Express that memory bandwidth is actually the bottleneck in measures of AES256 performance. Thus, a quad-channel memory controller with support for DDR3-1600 has an inherent advantage over a triple-channel controller limited to DDR3-1333.

And here’s a perfect illustration. Although registered DDR3-1600 modules are hard to come by, as mentioned on the previous page, Crucial sent over 64 GB (8 x 8 GB) of PC-12800 memory for our E5-based workstation, enabling close to two times the effective bandwidth on Xeon E5 compared to the Xeon 5600s.
Interestingly, the Core i7-3960X, armed with unbuffered DDR3-1600 is the second-place finisher, even though its four memory channels are theoretically less capable than a pair of triple-channel Xeons armed with DDR3-1333.

After back and forth emails with Adrian Silasi over SiSoftware, we couldn’t figure out why the cache performance results for the Xeon 5600-series processors were turning out so low (particularly L2 cache bandwidth, which we'd expect to be far higher). One suspicion is that this routine is tripping a throttle due to repeated use of the cache and rapidly-escalating temperatures, though Intel's engineers claim the Xeon 5500s and 5600s don't have this mechanism in place.
It’s clear, however, that the Sandy Bridge-E and Sandy Bridge-EP architectures make big improvements to L3 cache throughput by virtue of their ring buses.

Our Premiere Pro workload comes from Adobe’s Creative Suite launch. It’s a professional-grade trailer for a new TV series, and we’ve seen it take anywhere from under a minute to over an hour to render. Generally, the difference is attributable to hardware support for the Mercury Playback Engine, enabled exclusively through Nvidia’s CUDA. So, I picked a FirePro card for all of our testing, allowing a closer look at CPU performance without GPU interference.
The results are compelling. You can use a single Core i7-3960X for this task, but it takes more than two times longer to render than a pair of Xeon E5-2687Ws. Even the Xeon 5500s get destroyed—and those were supposed to be the most significant server processors in history according to Pat Gelsinger back in 2009.
Of course, context is critical. Check out all of the processors we tested on page seven of Intel Core i7-3930K And Core i7-3820: Sandy Bridge-E, Cheaper. If you’re using a desktop card like Nvidia’s GeForce GTX 580, even a Phenom II X6 1100T can get this job finished in half the time of two pricey Xeon E5s. I’m no fan of locking out the competition, but when there’s money on the line, professionals working in CS5 simply owe it themselves to use a CUDA-enabled card.
CPU Utilization during After Effects

The results in our After Effects rendering test don’t look anything like Premiere Pro. The Core i7-3960X—twelve threads with access to 32 GB of memory—fares best. The Nehalem- and Westmere-based architectures, with 16 and 24 threads, respectively, and 48 GB of memory roughly match each other. The Xeon E5s fall somewhere in between.

The scores in Photoshop get us back to the performance picture we’d expect. Though the Xeon 5600s and 5500s yield fairly similar results, they both outperform a Core i7-3960X. In turn, Intel’s new Xeon E5-2687Ws make quick work of previous-generation dual-socket platforms.
CPU Utilization during MainConcept

Although we typically consider media encoding workloads to be ideal for showing off the benefits of multi-core processors, there’s a limit to these more desktop-oriented applications’ parallelism. MainConcept takes advantage of the physical cores on our Xeon 5500 and 5600 platforms, but still doesn’t fully tax each one. As a result, scaling isn’t particularly aggressive. Moreover, the Core i7-3960X’s improved architecture helps it out-maneuver two Xeon W5580s.
CPU Utilization during HandBrake

A similar situation transpires in HandBrake, though now the Core i7-3960X also overtakes two Xeon X5680s as well. At least for this type of task, a dual-processor workstation is pretty clearly overkill.

So why the heck would you run Lame, then? We already know this is a single-threaded test (at least when you run one instance of it). For our purposes, we’re really just demonstrating single-core per-clock performance and the impact of Turbo Boost on these flagship processors.
Core i7-3960X spins up to 3.9 GHz with a single core active. Combining the benefits of high frequency with the Sandy Bridge architecture, a first-place finish is no surprise. Xeon E5-2687W, an eight-core beast dissipating up to 150 W, runs at up to 3.8 GHz with one active core. As expected, it falls in just behind the desktop CPU. A max Turbo Boost frequency of 3.6 GHz earns the Xeon X5680 third place.

Although I generally don’t use the Cinebench OpenGL-based graphics test, it’s nice that the benchmark’s CPU component is able to utilize up to 64 threads.
The roughly 2000-object scene with somewhere around 300 000 polygons renders very quickly on a pair of Xeon E5-2687W processors, which execute 32 threads concurrently. The Xeon X5680s are quite a ways behind. A single Core i7-3960X almost manages to catch the two Xeon W5580s—a testament to its higher clock rates and more efficient Sandy Bridge architecture.
CPU Utilization during SolidWorks

Our SolidWorks PhotoView 360 workload caught me off guard. This render fully taxed each configuration we threw at it, regardless of core count or memory. And while the Xeon E5s finish first, their improvement over two Xeon X5680s is almost negligible.
The Xeon W5580s trail a ways back, and are actually beaten by a single Core i7-3960X. Based on past reviews, we know SolidWorks responds well to overclocking, but that’s simply not in the cards for these CPUs.
CPU Utilization during 3ds Max

Autodesk’s 3ds Max also taxes available compute resources. However, it demonstrates significant gains shifting from Xeon 5500 to 5600 and finally to E5, as we might expect. The Core i7-3960X almost manages to catch the two Xeon 5500s—again, a testament to the per-clock advantages of Sandy Bridge compared to the Nehalem architecture.

Although iray really delivers the best performance when it’s able to exploit GPU resources, our benchmark is limited to CPU-based rendering. Here, scaling is nothing short of amazing. A single Core i7-3960 at 3.3 GHz gets the job done in just over 10 minutes. Meanwhile, two eight-core Xeon E5-2687Ws at 3.1 GHz finish in about four and a half minutes. The 5600s and 5500s are in-between.
CPU Utilization during Blender

Introduced in Blender 2.61, the cycles render engine is ray tracing-based with support for interactive rendering, a new shading node system, new texture workflow, and of course GPU acceleration. Our cycles-based test sticks to processor-based rendering for now, and will evolve moving forward to include OpenCL testing.
Unfortunately, although they’re consistent, the results from the cycles engine aren’t very easy to break down. CPU utilization is always much higher using the new renderer compared to the old tile-based one, and yet the Xeon 5600s manage to outmaneuver the Xeon E5s. Core i7-3960X bests two Xeon 5500s, but again, it’s not clear why.

Our older Blender rendering test, configured to use the default 4x4 tile setting, tended to leave cores underutilized as it finished (you can see this by watching Windows’ task manager—busy time drops off very gradually). Reader Greg Wereszko let us know that we could potentially get significant gains by breaking the scene up more granularly using more tiles, keeping processor cores active as the test winds down. A 10x10 setting does, in fact, yield measurable improvements, though utilization never hits 100%, even at the start of the test when all cores should be active.

Vue is used to create, animate, and render 3D environments. Our custom scene fully taxes even the 32-thread dual Xeon E5 configuration.
As a result, performance improves significantly as you move from the 12-thread Core i7, to the 16-thread Xeon 5500s, to the 24-thread 5600s, and finally the new Xeons.
Because this workload takes a while, and yields a consistent 100% utilization, we’re using Vue 8 for our power analysis, too.

We also made a big change to our Visual Studio 2010 benchmark in anticipation of today’s launch. Gone is the Miranda IM client compile workload. In its place, we’re compiling Google Chrome—a task that takes more than 10 minutes with 16 cores at 100% utilization.
As with some of the benchmarks on the previous page, the biggest performance improvement happens between Intel’s Xeon 5500 and 5600 processors. Nevertheless, the new Xeon E5s serve up a significant boost as well.

Based on the STARS Euler3D computational fluid dynamics production code, Euler3D’s workload is described as follows:
“The benchmark testcase is the AGARD 445.6 aeroelastic test wing. The wing uses a NACA 65A004 airfoil section and has a panel aspect ratio of 1.65, a taper ratio of 0.66, and a 45 degree quarter-chord sweep angle. This AGARD wing was tested at the NASA Langley Research Center in the 16-foot Transonic Dynamics Tunnel and is a standard aeroelastic test case used for validation of unsteady, compressible CFD codes…The benchmark CFD grid contains 1.23 million tetrahedral elements and 223 thousand nodes. The benchmark executable advances the Mach 0.50 AGARD flow solution. Our benchmark score is reported as a CFD cycle frequency in Hertz.”

Euler3D reports that it recognizes and employs all 32 of the Xeon E5 system’s available threads, and the result is a score that blows away the Xeon X5680s. Based on the fact that Core i7-3960X beats two Xeon 5500s and comes close to the 5600s suggests a big advantage from Sandy Bridge-based CPUs.

The rest of our productivity-oriented tests are decidedly less workstation-specific, though ABBYY’s FineReader 10 OCR app does a better job of taxing two Xeon E5s than video workloads like Adobe After Effects.
FineReader 10 shows Intel’s Core i7-3960X nearly matching two Xeon W5580s. A pair of Xeon X5680s yield a sizable 29% speed-up, while the Xeon E5s improve 21% compared to the 5600s.



None of our compression workloads are able to fully utilize these workstation-oriented configurations. 7-Zip comes the closest, yielding a slight advantage to the Xeon E5s over the 5600s and 5500s. However, even a six-core Core i7 is fast enough to pass the Xeon 5500s.
Because WinRAR employs fewer threads than a Core i7-3960X offers to it, anything in excess (read: every two-chip platform) goes unused, resulting in a performance chart defined by architecture and clock rate. Clearly, Intel’s Sandy Bridge design is favored over Nehalem, which is why the Xeon E5s and Core i7 excel.
The i7 actually scores a win in WinZip 16. We’ve been critical of single-threaded versions of this software in the past, which always took significantly longer than similar tasks in WinRAR or 7-Zip. A move to WinZip 16 makes this a 64-bit app that shows activity on four cores. But it’s still way slower than our other compression tests. Low utilization suggests that the Core i7 is enjoying a higher Turbo Boost multiplier, giving it the edge over Xeon E5.

When we compile the results from all of our tests and compare two Xeon E5-2687Ws to two Xeon X5680s, we see that the E5s are, on average about 21% faster.
Some of those tests aren’t good representations of what a professional would do on a workstation, though. Lame is in there explicitly to show the difference between these CPUs with a single core active, for example. The compression tests are pretty lightweight, and the transcoding tests don’t really necessitate a dual-processor machine. So, let’s take all of that out and see where we end up:

Now we’re closer to a 23% improvement. Euler3D skews the E5’s advantage quite a bit, but so do curiously-low numbers from Blender’s new cycles rendering engine and the SolidWorks 2010 render.
Regardless, more than 20% is significant for money-making applications.

e-on’s Vue 8 turned out to be the best candidate for measuring system load power because it’s a nice, long workload, and it fully taxes each of these configurations.
We already knew the finishing order from our performance tests. Now we can add to that power use over time, thanks to our Extech logger.

As the line graph suggested, Intel’s Xeon E5-2687Ws average the highest power consumption in this workload (then again, we might have guessed that would be the case, given two CPUs with 150 W TDPs).
Surprisingly, the Xeon W5580s are the second-worst offenders. Remember that Intel switched to 32 nm manufacturing for its Xeon 5600 series, so even though those processors sport an additional two cores each, they’re able to outperform 5500s in threaded apps while using less power.
Naturally, a single Core i7-3960X offers the absolute lowest average power numbers, albeit with the worst performance.

Multiply the average power by the fraction of an hour each configuration took to finish its rendering task and you end up with energy use in Watt-hours.
High power use and mediocre performance really hurt the old Xeon 5500s here. In comparison, the 5600s are much more attractive (though they use marginally more energy than a single Core i7-3960X, which is slow but draws a lot less from the wall).
The real winners are Intel’s Xeon E5s, though. Despite averaging the highest power consumption, stellar performance under a full load translates to the lowest energy use.
The Sandy Bridge architecture was a really big deal on the desktop. More than a year after its introduction, Core i5-2500K is still the processor I recommend to friends who ask for buying advice. And although it took Intel a long time to incorporate Sandy Bridge into its server and workstation portfolio, the resulting effort is complex, and yet scalable in a way that only the Xeon E7s can rival.
There are 37 different Xeon E5s. We only got our hands on one. But the Xeon E5-2687W is the fastest model, and we were able to benchmark it against three other flagships in their respective families: Xeon W5580, Xeon X5680, and Core i7-3960X. Obviously, the performance you get from any dual-processor platform is wholly dependent on the tasks you throw at it. Our test suite is predominantly workstation-oriented. But even with lightly-threaded benchmarks folded in, the Xeon E5s were about 21% faster, on average, than the Xeon 5600s. After factoring out the tests you typically wouldn’t see on a workstation, the advantage grew just a hair to almost 23%.
But while comparing the Xeon E5s to the Xeon 5600s was interesting, I was more impressed by the efficiency calculation than any other piece of data. The Xeon E5-2687W is etched using the same 32 nm node. It’s way larger. And its TDP is 20 W higher per processor. Indeed, you can clearly see that, under full load, two Xeon E5-2687Ws draw more power from the wall than the Xeon X5680s. But the speed-up attributable to Intel’s Sandy Bridge architecture and two additional cores per socket outweighs the power spike, yielding better efficiency.
Consider also that the E5’s strengths are more accessible across a wider range of segments. There are now eight-core processors available for entry-level dual-socket servers in the Xeon E5-2400 family. A single-socket Xeon E5-1600 workstation line-up offers similar functionality as the Core i7-3000 series, adding RAS functionality important to some folks. The Xeon E5-2600s cover a broad range of 2S servers and workstations. And a line of Xeon E5-4600 processors introduces the idea of more commoditized quad-socket configurations that maximize performance/watt in HPC environments.
Obviously, if you’re a professional working in a data center, the prospect of improving efficiency is a head-turner. Similar, engineers and artists looking at next-gen workstations have to appreciate a platform that averages 20% better performance. But even if you’re a hardware enthusiast with no reason to use any of this gear, it’s still pretty cool to pop open Windows’ Task Manager and watch 32 threads go to town rendering a scene that could end up in the next game you enjoy.







