Sign in with
Sign up | Sign in
Your question

Intel Xeon E5-2600 V3 Review: Haswell-EP Redefines Fast

Tags:
  • Workstations
  • Intel
  • CPUs
  • Servers
  • Hardware
  • Components
  • Enterprise
  • Processors
Last response: in Reviews comments
Share
September 8, 2014 9:26:01 AM

We compare three generations of Intel's Xeon E5-2690 processors, plus the flagship Xeon E5-2699 v3 to see how Haswell-EP affects the datacenter.

Intel Xeon E5-2600 V3 Review: Haswell-EP Redefines Fast : Read more

More about : intel xeon 2600 review haswell redefines fast

a b à CPUs
September 8, 2014 9:44:29 AM

Wonder how long it is until 18-core CPU's are utilized well in games...Maybe 2018 or 2020?
m
-2
l
a b à CPUs
September 8, 2014 10:23:01 AM

Quote:
Wonder how long it is until 18-core CPU's are utilized well in games...Maybe 2018 or 2020?


Actually we should be trying to move away from traditional serial-styled processing and move towards parallel processing. Each core can handle only one task at a time and only utilize it's own resources by itself.

This is unlike a GPU, where many processors utilize the same resources and perform multiple tasks at the same time. The problem is that this type of architecture is not supported at all in CPUs and Nvidia is looking for people to learn to program for parallel styled architectures.

But this lineup of CPUs is clearly a marvel of engineering and hard work. Glad to see the server industry will truly start to benefit from the low power and finely-tuned abilities of haswell along with the recently introduced DDR4 which is optimized for low power usage as well. This, combined along with flash-based storage (aka SSDs) which also have lower power drain than the average HDD, will slash through server power bills and save companies literally billions of dollars. Technology is amazing isn't it?
m
-4
l
September 8, 2014 10:55:31 AM

There is still a lot in games that doesn't translate well into parallel processing. A lot of gaming action only happens as a direct result of the user's input, and it usually triggers items that are dependent upon the results from another item. So parallel processing doesn't help a lot there; single-threaded performance helps more.

However, with multiple cores, now we can have better AI and other "off-screen" items that don't necessarily always depend upon the user's direct input. There's still a lot of work to be done there, though.
m
9
l
September 8, 2014 10:57:23 AM

The new Haswell-EP Xeons are definitely going to help with virtualization. However, I see the high-price of DDR4 and the relative scarcity of it now as being a bit of a handicap to fast adoption, especially since that is one of the major limiting factors to how many servers you can virtualize.

I think all of the major server vendors are going to suck up all of the major memory manufacturers DDR4 capacity for a while before the prices go down.
m
2
l
September 8, 2014 1:00:56 PM

Quote:
The new Haswell-EP Xeons are definitely going to help with virtualization. However, I see the high-price of DDR4 and the relative scarcity of it now as being a bit of a handicap to fast adoption, especially since that is one of the major limiting factors to how many servers you can virtualize.

I think all of the major server vendors are going to suck up all of the major memory manufacturers DDR4 capacity for a while before the prices go down.


Whether it helps or hinders will ultimately depend on the VM admin. What most VM admins don't realize is that HT can actually end up degrading performance in virtual environments unless the VM admin took specific steps to use HT properly (and most do not). A lot of companies will tell you to turn off HT to increase performance because they've dealt with a lot of VM admins that don't set things up properly (a lot of VM admins over allocate which is part of the reason using HT can degrade performance, but there are other settings as well that have to be set in the Hypervisor so that the guest VMs get the resources they need).
m
0
l
a c 145 à CPUs
September 8, 2014 1:38:45 PM

dovah-chan said:
Actually we should be trying to move away from traditional serial-styled processing and move towards parallel processing. Each core can handle only one task at a time and only utilize it's own resources by itself.

This is easier said than done since there are tons of everyday algorithms, such as text/code parsing, that are fundamentally incompatible with threading. If you want to build a list or tree using threads, you usually need to split the operation to let each thread work in isolated parts of the list/tree so they do not trip over each other and waste most of their time waiting on mutexes and at the end of the build process, you have a merge process to bring everything back together which is usually not very thread-friendly if you want it to be efficient.

In many cases, trying to convert algorithms to threads is simply more trouble than it is worth.
m
6
l
September 8, 2014 3:44:48 PM

Great to see these processors out, and overall good article. I only wish you used the same benchmark suite you had for the Haswell-E processors: 3DS Max, Adobe Premiere, After Effects, Photoshop. I'd also love to see Vray added to the mix. Not much useful benchmark data in here for 3D professionals. Some good detail on the processors themselves however.
m
2
l
September 8, 2014 4:27:29 PM

Just take my money. Pls.
m
1
l
September 8, 2014 5:24:07 PM

Quote:
Wonder how long it is until 18-core CPU's are utilized well in games...Maybe 2018 or 2020?

Simply never.
A game is made by sound, logic and graphics. You may dedicate this 3 processes to a number of cores but they remain 3. As you split load some of the logic must recall who did what and where. Logic deals mainly with FPU units, while graphics with integers. GPUs are great integers number crunchers. They have to be fed by the CPU so an extra core manage data through different memories, this is where we start failing. Keeping all in one spot, with the same resources reduces need to transfer data. By implementing a whole processor with GPU, FPU, x86 and sound processor all in one package with on board memory makes for the ultimate gaming processor. As long as we render scenes with triangles we will keep using the legacy stuff. When the time will come to render scenes by pixel we will need a fraction of today's performance, and half of the texture memory (just scale the highest quality) and half of models memory. Epic is already working on that.
m
0
l
September 8, 2014 6:39:38 PM

Quote:
Great to see these processors out, and overall good article. I only wish you used the same benchmark suite you had for the Haswell-E processors: 3DS Max, Adobe Premiere, After Effects, Photoshop. I'd also love to see Vray added to the mix. Not much useful benchmark data in here for 3D professionals. Some good detail on the processors themselves however.


Great points. One minor complication is that the NVIDIA GeForce Titan used in the Haswell-E review would not have fit in the 1U servers (let alone be cooled well by then.) Onboard Matrox G200eW graphics are too much of a bottleneck for the standard test suite.

On the other hand, this platform is going to be used primarily in servers. Although there are some really nice workstation options coming, we did not have access in time for testing.

One plus is that you can run the tests directly on your own machine by booting to a Ubuntu 14.04 LTS LiveCD, and issuing three commands. There is a video and the three simple commands here: http://linux-bench.com/howto.html That should give you a rough idea in terms of performance of your system compared to the test systems.

Hopefully we will get some workstation appropriate platforms in the near future where we can run the standard set of TH tests. Thanks for your feedback since it is certainly on the radar.
m
1
l
September 8, 2014 7:30:19 PM

Makes sense, hope to see more about these in the future, good work.
m
0
l
September 8, 2014 11:08:49 PM

Quote:
Great to see these processors out, and overall good article. I only wish you used the same benchmark suite you had for the Haswell-E processors: 3DS Max, Adobe Premiere, After Effects, Photoshop. I'd also love to see Vray added to the mix. Not much useful benchmark data in here for 3D professionals. Some good detail on the processors themselves however.


Agree, the conclusion says "Server and Workstation" while the benchmarks only shows Server application.. I came here only to saw Workstation performance specially 3ds max rendering ( and I hope to see vray and mental ray benchmarks also ) and Adobe applications also as mentioned above
m
1
l
a b à CPUs
September 9, 2014 9:33:17 AM

Quote:
dovah-chan said:
Actually we should be trying to move away from traditional serial-styled processing and move towards parallel processing. Each core can handle only one task at a time and only utilize it's own resources by itself.

This is easier said than done since there are tons of everyday algorithms, such as text/code parsing, that are fundamentally incompatible with threading. If you want to build a list or tree using threads, you usually need to split the operation to let each thread work in isolated parts of the list/tree so they do not trip over each other and waste most of their time waiting on mutexes and at the end of the build process, you have a merge process to bring everything back together which is usually not very thread-friendly if you want it to be efficient.

In many cases, trying to convert algorithms to threads is simply more trouble than it is worth.



Then wouldn't a smart move be to move towards an HSA oriented architecture that combines the parallel compute abilities with the serial-oriented task managing? I believe that is essentially what AMD did with Kaveri actually. It is more befitting towards consumer/workstation workloads that can utilize OpenCL.

Although that wouldn't really be the best option for a server setting. There are usually two scenarios: you'll either need huge amounts of raw compute ability for services such as OnLive

Or the streamlined style of multiple CPUs performing just general server tasks such as accepting a large amount of packet requests and ping queries which is what the run of the mill server is built for.

In relation to what I was speaking about concerning Nvidia, it was this little piece:

http://www.nvidia.com/object/what-is-gpu-computing.html

Forgive me if I am incorrect about anything. I'm certainly not an engineer or a talented programmer by any means.
m
0
l
a c 145 à CPUs
September 9, 2014 10:14:47 AM

dovah-chan said:
Then wouldn't a smart move be to move towards an HSA oriented architecture that combines the parallel compute abilities with the serial-oriented task managing? I believe that is essentially what AMD did with Kaveri actually. It is more befitting towards consumer/workstation workloads that can utilize OpenCL.

In theory, yes. In practice, not necessarily - algorithms like parsing are full of non-linear and highly context-sensitive branch-driven code which makes those sorts of algorithms effectively impossible to thread no matter how close you bring the extra compute power. That is what I mean by fundamental algorithms that are also fundamentally non-threadable.
m
2
l
September 9, 2014 10:29:31 AM

balister said:
Quote:
The new Haswell-EP Xeons are definitely going to help with virtualization. However, I see the high-price of DDR4 and the relative scarcity of it now as being a bit of a handicap to fast adoption, especially since that is one of the major limiting factors to how many servers you can virtualize.

I think all of the major server vendors are going to suck up all of the major memory manufacturers DDR4 capacity for a while before the prices go down.


Whether it helps or hinders will ultimately depend on the VM admin. What most VM admins don't realize is that HT can actually end up degrading performance in virtual environments unless the VM admin took specific steps to use HT properly (and most do not). A lot of companies will tell you to turn off HT to increase performance because they've dealt with a lot of VM admins that don't set things up properly (a lot of VM admins over allocate which is part of the reason using HT can degrade performance, but there are other settings as well that have to be set in the Hypervisor so that the guest VMs get the resources they need).


Ummm... buddy... I didn't mention HT (hyper-threading) at all... just memory.
m
0
l
a b à CPUs
September 9, 2014 1:39:34 PM


Xajel,

An AE test will only be useful if the system has a lot of RAM (64GB+),
and that could be hard to set up atm, given the cost involved (unless a
kind RAM maker can provide a whole pile of kits to toms).

Ian.

m
0
l
a b à CPUs
September 9, 2014 2:07:49 PM

Patrick,

"c-ray 1.1 is a popular and simple ray-tracing benchmark for Linux systems ..."

Blimey, wasn't expecting to see that in the benchmark list. :D  Funny how
c-ray has taken on a life of its own (I didn't know it was being so widely used
until about a year ago). I took it over from John because he didn't have time
for it anymore.

One thing though, can you change the link to the results page please? The
Blinkenlights site is a mirror (I have no control over its persistence) and may
not be around in the future. The primary location is here, my own domain.

I'm glad you didn't use the simple test, it is indeed really small, and on any
kind of modern hardware it completes way too fast for useful measurement.
It's a pity though that the other tests don't use the settings I've used, since
the results can't be compared, but never mind.

The other tests do impose a degree of main memory access, but not much.
I created them mainly to have something which lasted long enough to be
useful for testing multicore systems. Even then, the slowest test takes just
11s to complete on an old 8-core XEON. Maybe I should start a separate
new table for something like 'sphfract' at 7500x3500 with 8X oversampling...

Btw, c-ray's threading is by scanline, so there's no gain from having more
threads than the no. of lines in an image.

Ian.

PS. Just a thought - any chance you could manually run the C-ray tests
using the settings on my page? I'll add them to the tables. 8) Include the
'simple' test aswell, just for the hell of it.



m
1
l
September 9, 2014 3:56:02 PM

Quote:
balister said:
Quote:
The new Haswell-EP Xeons are definitely going to help with virtualization. However, I see the high-price of DDR4 and the relative scarcity of it now as being a bit of a handicap to fast adoption, especially since that is one of the major limiting factors to how many servers you can virtualize.

I think all of the major server vendors are going to suck up all of the major memory manufacturers DDR4 capacity for a while before the prices go down.


Whether it helps or hinders will ultimately depend on the VM admin. What most VM admins don't realize is that HT can actually end up degrading performance in virtual environments unless the VM admin took specific steps to use HT properly (and most do not). A lot of companies will tell you to turn off HT to increase performance because they've dealt with a lot of VM admins that don't set things up properly (a lot of VM admins over allocate which is part of the reason using HT can degrade performance, but there are other settings as well that have to be set in the Hypervisor so that the guest VMs get the resources they need).


Ummm... buddy... I didn't mention HT (hyper-threading) at all... just memory.


You mentioned virtualization in the very first line, that's the crux of my post.
m
0
l
September 9, 2014 11:35:30 PM

dovah-chan said:
Quote:
dovah-chan said:
Actually we should be trying to move away from traditional serial-styled processing and move towards parallel processing. Each core can handle only one task at a time and only utilize it's own resources by itself.

This is easier said than done since there are tons of everyday algorithms, such as text/code parsing, that are fundamentally incompatible with threading. If you want to build a list or tree using threads, you usually need to split the operation to let each thread work in isolated parts of the list/tree so they do not trip over each other and waste most of their time waiting on mutexes and at the end of the build process, you have a merge process to bring everything back together which is usually not very thread-friendly if you want it to be efficient.

In many cases, trying to convert algorithms to threads is simply more trouble than it is worth.



Then wouldn't a smart move be to move towards an HSA oriented architecture that combines the parallel compute abilities with the serial-oriented task managing? I believe that is essentially what AMD did with Kaveri actually. It is more befitting towards consumer/workstation workloads that can utilize OpenCL.

Although that wouldn't really be the best option for a server setting. There are usually two scenarios: you'll either need huge amounts of raw compute ability for services such as OnLive

Or the streamlined style of multiple CPUs performing just general server tasks such as accepting a large amount of packet requests and ping queries which is what the run of the mill server is built for.

In relation to what I was speaking about concerning Nvidia, it was this little piece:

http://www.nvidia.com/object/what-is-gpu-computing.html

Forgive me if I am incorrect about anything. I'm certainly not an engineer or a talented programmer by any means.


It's a great concept, and AMD was brilliant thinking about it. the two main problems I saw at that time ( times of Kaveri launch ) was the not that great serialised performance and the almost lack of Software support. not to mention the weak OpenCL implementation... I mean weak by comparing it to CUDA, don't think that I prefer CUDA over OpenCL, infact I love OpenCL more ( I always loved open standard where every company can adopt it and the users will have the choice of whatever hardware they want )

Most pro-applications now like 3d rendering prefer CUDA over OpenCL for at least for the time being, I saw GPU accelerated beta's for vray but it only support CUDA duo to weak OpenCL implementation ( on both sides NVIDIA and AMD )
I'm not saying that OpenCL is weak, but from a software standpoint there's a still a lot of work to do, OpenCL by it self is very promising, but it should work good at first..


mapesdhs said:

Xajel,

An AE test will only be useful if the system has a lot of RAM (64GB+),
and that could be hard to set up atm, given the cost involved (unless a
kind RAM maker can provide a whole pile of kits to toms).

Ian.



RAM manufacturing really need to promote their products you know, and one of promoting ways is giving these RAM to hardware sites... so If requested by the site I bet there will be some companies that will be interested in this promising new market ( DDR4 + ECC DDR4 )
m
0
l
September 10, 2014 7:33:51 AM

Quote:
I think all of the major server vendors are going to suck up all of the major memory manufacturers DDR4 capacity for a while before the prices go down.


At UK wholesale prices there's less than 15% difference between DDR3 and DDR4 for the same speed/size. This has as much to do with DDR3 having gone up ~50% in the last 12 months as with ready availability of DDR4 if you look in the right places.

m
0
l
a b à CPUs
September 10, 2014 8:18:35 AM

That doesn't really mean very much when there's no availability. Typical places I've checked have
almost no DDR4 in stock, or if they do then it's only some combination of 4GB modules. The one site
which does have a 32GB kit (ie. 4x8, which is what I'd need) has it priced at 443 UKP (2666MHz
GSkill Ripjaws); that's 77% more expensive than a new 32GB DDR3/2400 kit I bought just a few
weeks ago (GSkill TridentX). DDR4 RAM pricing is effectively killing the option of getting an 8-core
in some cases, ie. 8x8GB would be an extra 400 UKP, a hefty bit more than the difference between
a 5930K and a 5960X. It's a sucky choice for those who need a lot of RAM (AE users moving up to 4K
are a typical example) and had been hoping to upgrade from SB-E.

More cores needs more RAM. Without a ready supply of 8GB-based kits at a sensible price, X99 is
just too expensive atm for some.

Higher-capaciity server-type RAM for XEONs is available, but it's not exactly cheap either - 318 UKP
for 4x8GB, 610 UKP for 4x16GB, both just at 2133 of course.

I'm more certain than ever now that DDR3 prices have been deliberately raised in the last 18 months
to make DDR4 pricing look less crazy at launch. Funny how price fixing is not tolerated in other consumer
products/services, but when it comes to computer components, nobody seems to care.

Never mind... I've just bagged some more DDR3 off eBay (171432748826 and 321508807563).

Ian.

m
0
l
a c 145 à CPUs
September 10, 2014 8:42:16 AM

mapesdhs said:
I'm more certain than ever now that DDR3 prices have been deliberately raised in the last 18 months to make DDR4 pricing look less crazy at launch. Funny how price fixing is not tolerated in other consumer products/services, but when it comes to computer components, nobody seems to care.

Most of the price rise on DDR3 is because memory manufacturers were making losses selling DDR3 under $100/16GB until Elpida went bankrupt and Hynix's factory fire. Now the surviving memory manufacturers are trying to recover their losses so they can survive the next dip which will probably happen in about four years from now.

This cycle of profits, stagnation, losses, bankruptcies then profit again occurs once per DRAM generation almost like clockwork.
m
0
l
a b à CPUs
September 10, 2014 9:13:58 AM


That may be their excuse, or what people on forums choose to accept, but I don't believe it;
the increase since early Feb/13 has just been far too large.

Ian.

m
0
l
a c 145 à CPUs
September 10, 2014 12:13:05 PM

mapesdhs said:

That may be their excuse, or what people on forums choose to accept, but I don't believe it;
the increase since early Feb/13 has just been far too large.

You want to talk about large? The lowest-low shortly after Elpida's bankruptcy was around $65/16GB since manufacturers were sitting on several months worth of unsold inventory so prices did not really start bouncing back up until late-2012. By early-2013, prices were up to about $120/16GB... almost double.

Today's prices are only 10-20% higher.

Back when DDR3 was worthless, many memory manufacturers switched some of their production lines to other memory types, which obviously helps with raising prices once the market recovers. You do not survive in the DRAM manufacturing business without long-term planning... Hynix almost kicked the bucket a few times too.
m
0
l
September 10, 2014 2:39:53 PM

Quote:
One thing though, can you change the link to the results page please? The
Blinkenlights site is a mirror (I have no control over its persistence) and may
not be around in the future. The primary location is here, my own domain.

I'm glad you didn't use the simple test, it is indeed really small, and on any
kind of modern hardware it completes way too fast for useful measurement.
It's a pity though that the other tests don't use the settings I've used, since
the results can't be compared, but never mind.

The other tests do impose a degree of main memory access, but not much.
I created them mainly to have something which lasted long enough to be
useful for testing multicore systems. Even then, the slowest test takes just
11s to complete on an old 8-core XEON. Maybe I should start a separate
new table for something like 'sphfract' at 7500x3500 with 8X oversampling...

Btw, c-ray's threading is by scanline, so there's no gain from having more
threads than the no. of lines in an image.

Ian.

PS. Just a thought - any chance you could manually run the C-ray tests
using the settings on my page? I'll add them to the tables. 8) Include the
'simple' test aswell, just for the hell of it.


Ian - your site as linked did not work. If you want to get in touch with me using my first name at servethehome.com I am happy to help out.
m
0
l
a b à CPUs
September 10, 2014 4:11:36 PM

pjkenned said:
Ian - your site as linked did not work. If you want to get in touch with me using my first name at servethehome.com I am happy to help out.


I'm confused... even the hot link from my post you've quoted works fine for me...
I'll send an email though.

Ian.



m
0
l
September 17, 2014 8:33:27 AM

I love your benchmarks as 2x sounds more real than some manufacturer's claims of 4x performance. It would be nice if you tested some server business apps like sql standard and enterprise, in ram vs in hard drive, comparing v2 VS v3 cpus and ddr3 VS ddr4. Or some other business apps like VMware Vsphere, oracle, SAP etc. SQL would be the easiest to test I would imagine.
m
0
l
a b à CPUs
September 17, 2014 9:59:12 AM

"productized?"
m
0
l
September 19, 2014 3:35:28 PM

I don't need this Xeon processor. I'll do 4-6 extreme workstation builds using Asus X99-E WS boards. But it supports only one Xeon! :( 

The Xeon E5-1620 v3 was launched in 2013 and spec not that interesting with 4 cores and 10MB cache. And there's Core i7-5960X 8 cores and 20MB cache.

Those builds will each have Quadro K6000 or K5200 GPU; 64GB DDR4 RAM and lot of stuff. But what processor should I go with?
m
0
l
September 22, 2014 11:54:21 PM

Yeah, the 18-core CPUs for gaming will probably never take off. The reason is that GPU processing is getting better and already handle a lot of the massively-parallel processing. If you look at the latest OpenGL and Direct3D, most things are shader-based. Shaders are matrix-processing powerhouses

Lots of tasks aren't easy to split up. You can divide up the tasks, but a lot of 3 GHz CPU cores can handle quite a bit of small tasks by themselves. In fact, you can end up causing more processing lag by breaking it up into too small of chunks. You can also easily cause more bugs from multithreading

However, if they made the vector units in a CPU good enough, then there may be an end to the need of a GPU for lower-end graphics with an 18-core CPU (Direct3D has something called WARP, which is a software renderer). Then again, GPUs will probably evolve as well
m
0
l
September 23, 2014 7:33:48 AM

None of the builds for Gaming purpose. Total high-end workstations will be 10: five 5960X & five Xeon machines.

All machines will go for design, modeling and simulation tasks; multiple computing apps; two linux VMs on Windows host; all these run on triple displays.

I'm either unsure or awaiting for some parts to arrive to finish extreme Workstation builds. Example:

Intel Core i7-5960X - $899.99 (Micro Center)
Cooler Master Glacer 240L
ASRock X99 WS EATX (awaiting for Asus X99-E WS)

G.Skill Ripjaws 64GB DDR4-2400 (2666/2800? or Corsair?)
Samsung 850 Pro Series 1TB SSD
WD Black 4x 4TB (considering WD Red Pro 8x 4TB NAS)

2x EVGA GTX Titan Black 6GB Superclocked (or 2x 980?)

Cooler Master HAF X Blue (or Corsair 780T or Phanteks Enthoo Primo?)
Cooler Master 1200W ATX12V (or Corsair AX1200i?)

3 x Asus PB278Q monitors (each build)

Misc. parts under consideration:

Pioneer BDR-2209 Blu-Ray/DVD/CD Writer
D-Link DWA-182 802.11a/b/g/n/ac
Logitech MK710 w/Optical Mouse
.....

I'll wait for some more time until I get a clear idea on Xeon support on Asus X99-E WS, parts, quality and prices.

Anyway, what do you think?
m
0
l
!