Sign in with
Sign up | Sign in
Your question
Closed

Dual processor scaling

Last response: in CPUs
Share
October 22, 2010 4:42:32 PM

I'm looking at the possibility of a two-processor workstation post-sandy bridge. I've seen some benchmarks showing excellent scaling on serverlike jobs (not a whole lot of cross-talk between processes) with two processor systems in general, but I'm curious about their performance characteristics on more fine-grained jobs.

Its primary purpose will be multithreaded, non-distributed, real-time physical simulations where all resources are devoted to a common simulation.

Does anyone have any thoughts or benchmarks about how a two processor system might behave in that kind of situation? Thanks!

More about : dual processor scaling

a b à CPUs
October 23, 2010 1:18:18 PM

Well I will give this a try.
I am familiar with older arch but alot of the same theories apply.
The situation is running 2 physical cpus with 4 cores per cpu (for now lets forget about Hyperthreading).
So now you have 8cores/8 threads.
On a duallie setup your FSB will be shared between both CPUs.
The purpose you need it for (simulation) is something that the dual workstations were designed for.
I would think that the only issue would be if more than one thread within the 8 threads were to share the same resources (I/O, memory address etc) than that would create a slowdown.
I have heard of the Opterons using NUMA arch which has seperate memory banks per processor which would reduce the conflicts among threads since they would have dedicated memory.
Like I said I am not familiar with newer arch but I do know that newer Xeons are using HyperThreading which would be 8cores/16threads with 4 cores being logical cores. Works great if all the threads are not using same resources.
They did reduce cache thrashing on newer Xeons by using cache queing so HyperThreading is more efficient.
This link might help:
http://en.wikipedia.org/wiki/Symmetric_multiprocessing
I am still learing about much of this so these are just amateur observations.
I would definitely say for the purpose (simulation) that you need it for that a dual processor workstation is designed for tasks like that.
I would look into NUMA arch though I am not sure of pricing and availability.
Historically AMD has had an edge on benchmarks (Opterons usually crushed Xeons in benchmarks) over Intel though in the past there was better support for Intel solutions (better chipsets and mobos).
This site is dedicated to duallie rigs:
http://www.2cpu.com/



Score
0

Best solution

a c 100 à CPUs
October 23, 2010 3:54:29 PM

RDunningo said:
I'm looking at the possibility of a two-processor workstation post-sandy bridge. I've seen some benchmarks showing excellent scaling on serverlike jobs (not a whole lot of cross-talk between processes) with two processor systems in general, but I'm curious about their performance characteristics on more fine-grained jobs.

Its primary purpose will be multithreaded, non-distributed, real-time physical simulations where all resources are devoted to a common simulation.

Does anyone have any thoughts or benchmarks about how a two processor system might behave in that kind of situation? Thanks!


The scaling largely depends on your application, OS, and how you have your platform set up. Some applications scale poorly simply because they are not very well-threaded, such as many games. Some scale poorly because the OS can't handle the thread and memory allocation very well, such as running Windows XP on a NUMA system. Others are waiting on external I/O, such as some database setups and video encoders if your disks aren't fast enough.

king smp said:
Well I will give this a try.
I am familiar with older arch but alot of the same theories apply.
The situation is running 2 physical cpus with 4 cores per cpu (for now lets forget about Hyperthreading).
So now you have 8cores/8 threads.
On a duallie setup your FSB will be shared between both CPUs.

Not any more. The last FSB dual CPU setups were Xeon 5200/5400 units, which are two generations behind current. The Xeon 5500s and 5600s use a NUMA setup with independent memory controllers per CPU and a high-speed point-to-point link between the two CPUs, just like how Opterons communicate.

Quote:
The purpose you need it for (simulation) is something that the dual workstations were designed for.
I would think that the only issue would be if more than one thread within the 8 threads were to share the same resources (I/O, memory address etc) than that would create a slowdown.
I have heard of the Opterons using NUMA arch which has seperate memory banks per processor which would reduce the conflicts among threads since they would have dedicated memory.
Like I said I am not familiar with newer arch but I do know that newer Xeons are using HyperThreading which would be 8cores/16threads with 4 cores being logical cores. Works great if all the threads are not using same resources.
They did reduce cache thrashing on newer Xeons by using cache queing so HyperThreading is more efficient.
This link might help:
http://en.wikipedia.org/wiki/Symmetric_multiprocessing
I am still learing about much of this so these are just amateur observations.
I would definitely say for the purpose (simulation) that you need it for that a dual processor workstation is designed for tasks like that.
I would look into NUMA arch though I am not sure of pricing and availability.
Historically AMD has had an edge on benchmarks (Opterons usually crushed Xeons in benchmarks) over Intel though in the past there was better support for Intel solutions (better chipsets and mobos).
This site is dedicated to duallie rigs:
http://www.2cpu.com/
[/quote]
Quote:


The bus contention issues are largely gone as both AMD and Intel now use a NUMA setup and there isn't enough coherency traffic to swamp the QPI/HT links. Thus the Xeons (5500s and 5600s) are very competitive with the Opterons today. The only real contention issues I've seen are with Xeon 5500s and 5600s using HyperThreading, where two threads on the same core are fighting over the same core resources at the same time (FPUs, ALUs, SIMD units, cache access, etc.) and thus you get little gain or even some slowdown compared to keeping HyperThreading off. AMD doesn't use SMT so all of its threads get a full core's worth of resources. However, both newer Intel and AMD units can have slowdowns if the OS doesn't schedule for NUMA very well. You'd then have to deal with added latency of going out to the other CPU, pulling data through its memory controller, and then shipping it back to the CPU that has the running thread instead of just going to the on-die memory controller.

Both AMD's and Intel's chipsets are pretty comparable today and both have dedicated server platforms made in-house. There are more dual Xeon motherboards out there than dual Opteron motherboards, but there are enough of both such that you can usually make a good server or workstation on either platform. The one real advantage that the Xeons have is that the top models have much better per-thread performance than the Opterons, mostly because of a much higher top clock speed when Turbo Boost is active. AMD's fastest dual-CPU processor runs at 2.8 GHz (Opteron 4184) while the top Xeon (X5680) can hit 3.60 GHz on one core. The Xeon cores are also a little faster clock-for-clock than the Opteron cores as well. However, AMD offers a lot more cores and generally more performance for the dollar than the Xeons, so heavily multithreaded applications tend to not only run somewhat better on Opterons, but the Opterons are notably less expensive as well.

In my opinion, you really can't go wrong with either Opterons or Xeons, as long as you stay away from the horribly crippled Xeon 550x series (E5502, E5503, E5504, E5506, L5506, E5507) as they have a bunch of their L3 cache disabled, severely restricted memory and bus speeds, abnormally high idle power requirements, no HyperThreading, and no Turbo Boost. I'd probably lean toward the Xeons if you have a decent number of applications that do not scale very well as the Xeons have higher clock speeds and single-threaded performance, plus there are no dual-socket workstation motherboards out there in the U.S. for the higher-clocked, 4 and 6-core Opteron 4100s. If your applications do scale well to a lot of cores, you'd do best to get a dual Socket G34 motherboard as the 8 and 12-core Opteron 6100s outperform similarly-priced Xeons by a pretty notable margin in heavily multithreaded applications.
Share
Related resources
a c 113 à CPUs
October 23, 2010 5:02:15 PM

RDunningo said:
I'm looking at the possibility of a two-processor workstation post-sandy bridge. I've seen some benchmarks showing excellent scaling on serverlike jobs (not a whole lot of cross-talk between processes) with two processor systems in general, but I'm curious about their performance characteristics on more fine-grained jobs.

Its primary purpose will be multithreaded, non-distributed, real-time physical simulations where all resources are devoted to a common simulation.


Does anyone have any thoughts or benchmarks about how a two processor system might behave in that kind of situation? Thanks!


Probably hard to say without a close look at the software environment.

I suspect without optimization this could be a recipe for NUMA Hell. Piling on the threads is not such an issue for the CPUs as the page faults if I understand what you are describing -- and either you or I might be confused, here.

You may well end up with low cpu utilization as needed data gets flushed from one DIMM bank to the other, ad naseum.

Something like a single G34 socket (or Intel equivalent) with 8/12-core processor and single DIMM bank may well be more efficient overall -- with an upgrade path to Interlagos Bulldozer 16 core on G34, I presume.

Not sure about how the Sandy Bridge-EP sR will play out as of yet -- though I have heard you can move your HSF from 1366 to 2011 :lol:  but I think that is the least of your problems.
Score
0
November 1, 2010 4:08:45 PM

Best answer selected by RDunningo.
Score
0
a b à CPUs
November 1, 2010 4:46:15 PM

This topic has been closed by Mousemonkey
Score
0
!