aakks

Distinguished
May 19, 2008
13
0
18,510
Basically I need to build a windows machine with the most/fastest processors possible for a specific engineering simulation application written to take advantage of multiple processors. The app basically needs as much processing power as I can possibly throw at it. Our sims take weeks on a quad core to give you an idea (with processor performance being the bottleneck).

What should I be looking at? How far can I push this?
 


This solution is much more practical\affordable.



How many cores can your program take advantage of?
 

Devastator_uk

Distinguished
Jan 11, 2009
649
0
19,010
Well you really want a system with HT or QPI because the old "FSB" method like the Pentium/Core2 based Xeons is pretty crap. So go for either a Nehalem based Xeon system or an Opteron system.

From what I can see (only checked one site) the Socket 1366 (Xeon) are only readily available in dual-socket, but they are quad-core CPUs (upto 8 cores total @ 3.33GHz).
Whereas Socket 1207 (Opteron) can easily be found in quad-socket, but you can get 6-core CPUs (upto 24 cores total @ 2.6GHz).

You can get better (like 8-socket Opteron boards) just not sure how easy to find.
 

werxen

Distinguished
Sep 26, 2008
1,331
0
19,310
if you can wait - the new gulftown is 6 cores with hyperthread = 6 logical + 6 cores = 12 threads total.

other than that i would go with the cheaper AMD opteron solutions for sure - many super computers use the opterons as a cheap viable supercomputing solution for many engineering problem/solutions.
 


The AMD systems do seem to be significantly cheaper. Depending on how your programs runs the performance sacrifice might not even be that big.
 

Devastator_uk

Distinguished
Jan 11, 2009
649
0
19,010
Based on what TC said it seems Intel can also give you 24 cores (although that Dell one only allows 2*6-cores or 4*4 cores so seems the maximum on that deal is 16 cores).
 

werxen

Distinguished
Sep 26, 2008
1,331
0
19,310
also consider looking at a GPU based super computer - depending on what kind of problems you are trying to solve via computers.

www.google.com for more info on gpu vs. cpu supercomputing.
 


You can do 4x6, you just have to go to the "additional processor" option and double it (but unlike Sham-WOW doubling the offer causes an increase in the price).
 

aakks

Distinguished
May 19, 2008
13
0
18,510
Thanks guys, this is exactly the stuff I wanted. Hours of research here.

As far as how many cores it could use, I assume its more than I can provide. The fastest we run on now is a Q6600 (which isn't even close to enough horsepower). It needs to run hundreds of thousands of (beefy) floating point calcs that can be done simultaneously over and over and over (with brief periods of single threaded syncing).

Question on hyperthreading - doesn't that share registers? I know we tried an (old) hyperthreaded intel cpu and got worse performance than just running it single threaded, but I'm not terribly informed on them.
 

aakks

Distinguished
May 19, 2008
13
0
18,510
As for GPU, that is distributed computing, right? If I am correct, it is not appropriate for our app.
 

aakks

Distinguished
May 19, 2008
13
0
18,510
Or are you talking about using graphics processors for computations? I'm pretty unfamiliar with this.
 


No, it doesn't have to be. You can write an app to take advantage of GPUs. In some cases the GPUs can process WAY more data than a CPU can. It all depends on the circumstances and all of that is beyond what I know.
 

werxen

Distinguished
Sep 26, 2008
1,331
0
19,310


not necessarily. visual computations/calculations are vastly superior when performed on a GPU rather than a cpu. i am not sure what kind of engineering calculations you are attempting to perform - can you be a little bit more specific? floating point calculations does not say much as to what you are specifically doing.
 

aakks

Distinguished
May 19, 2008
13
0
18,510
I cannot get terribly specific without violating non disclosure agreements, unfortunately. It is stormwater modeling.
 

aakks

Distinguished
May 19, 2008
13
0
18,510
Budget is about $30k.

I looked a little more into the GPU. Interesting stuff. Looks like it would require a rewrite though (if its even suitable), which is definitely NOT in budget.
 


I would suggest putting together a cluster of inexpensive quad-core desktop CPUs and motherboards put in 1U or 2U rack cases if you can get away with it. You can get a bunch more cores, GHz, and RAM for the money chaining together cheap desktop parts rather than making a single huge multi-socket server. I have done work in this area and if you can say "yes" to all of these following questions, you may be well-suited to getting a cluster:

1. Does your program have very little data being shared between threads?
2. Does your program run on Linux or one of the BSD UNIX variants?
3. Does your program need less than 4 GB RAM per core?
4. Can your program legally be run on a cluster? (I don't want to suggest violating license terms, dumb as they may be.)

If you answered "yes" here, I would suggest that you make a little trial run by installing a cluster manager on your Q6600 box and then borrowing a few desktop machines from somewhere and installing the cluster client on them and then letting the program have a run on the cluster. I'd track network usage and CPU/RAM utilization to see if clustering will work. If it does, get those desktop boards and rack cases and go. If the program doesn't run well on the cluster for some reason, then you will need to get a big single machine.
 

aakks

Distinguished
May 19, 2008
13
0
18,510
1. No - very tightly intertwined
2. No - windows app
3. Yes
4. Yes - no legal issues, we are in partnership with the developers for this project

Basically distributed computing is out. We need something like the 24 core dell linked above.
 
I think someone came through here in the last 3-4 months with the same interests.

If that's your competition we'll take a little pay-ola to hook you up with a better deal :lol:

A year or so ago you could snag an 8P Opty kit with 4P mainboard, HT risers and 4P daughter board for around $1,300. When Shanghai came out the price nearly doubled and they started getting scarce - with Istanbul out I'd say: No chance.

If you ran with a 'Nix I'd guess Power6 would be your best move - too bad a cluster like MU-E suggested is out. I reckon that kills a SuperMicro SuperBlade.

Does anybody know anything about Windows HPC Server ?? Depending upon your database and software maybe it could que up 48 Istanbul cores across 2 modules in that SuperBlade for $30k ...
 


Supposedly it is the version of Windows Server that can do clustering, IIRC. You would need to have a VERY stout connection between the two modules to get much of any performance boost out of the second module, since the processes are so intertwined. I don't know if the FC connection between the modules would be sufficient.

I would recommend the OP get a system with 8 six-core Opterons since his app won't cluster well. The six-core "Dunnington" Xeons are going to take a huge hit because the application has a lot of inter-thread communication and the Xeons are using the old shared FSB setup.
 
Although, if the OP can wait, Intel will be coming out with the Nehalem 4p and 8p setups, which can have up to 8 cores per CPU and 8 CPUs and 2 threads per core for up to 128 threads in a single server.
 


Here's what's coming up by early 2010:

1. The "Beckton" Xeons you referred to, which are 8-core, 16-thread CPUs used in machines with 4-32 sockets. These are 45 nm chips with 8 cores and 24 MB L3 cache all on one huge die, so they will be very expensive. I'd say at least $3000 each since the "only" six-core 45 nm "Dunnington" Xeons with 16 MB L3 cache start at $2000, and the Becktons are going to be even bigger with lower yields.

2. AMD's 12-core, 12-thread "Magny-Cours" Socket G34 Opteron CPUs, which are an MCM comprised of two six-core CPUs with 6 MB L3 cache. These are available in two and four-CPU setups, so you could get 48 cores/threads in a four-socket machine. These will probably be a bit less expensive than the Beckton Xeons because of the smaller dies used to make the CPUs.

3. Intel upgrading its current stable of 45 nm quad-core Nehalem-based Xeon DPs to six-core 32 nm units. These will give you 12 cores/24 threads in a two-socket box, which is nice but probably a bit small for the OP's needs.

4. AMD refreshing its Socket F platform and then migrating the single- and two-socket Opterons to the new Socket C32. C32 is basically Socket F with DDR3 and can support one or two quad- or six-core CPUs. This is also going to be a bit small for the OP.

In short, it depends on exactly how much power the OP would want. A quad 12-core Magny-Cours Opteron setup will be most likely be considerably less expensive than a quad Becton Xeon MP and have a 50% core count advantage, although the Opterons will top out at only four CPUs and 48 cores. If he needs as much power as he can get, he'd have to go with the Xeon MPs since they can be run in an up to 32-socket configuration.