Sign in with
Sign up | Sign in
Your question

Itanium 2, Xeon 5XXX or Opteron 2XX?

Tags:
Last response: in CPUs
Share
October 3, 2006 9:02:24 AM

Hi

I'm using a microcluster of 2 HP workstations (P4 3GHz 2GB RAM) to run scientific software (ANSYS, NASTRAN, Fluent, Starcd, etc...), mainly CFD (Computational Fluid Dynamics) and finite element method.
The problem is my systemis getting to old and to slow, what was once acceptable, is nowadays so damn slow (sometimes it takes up to 4 days to prosue convergent results).
I need to change my machine, so i was thinking of a 2xdual core system, with at least 8GB RAM (and it has to be very upgradable). The new Xeons seem to be competing quite nicely with the Opteron processors but 'm afraid Intel will upgrade its socket in some near future whilst AMD will keep the socket F for a long time. As for Itanium, i don't really know the CPU, but i think it's not x86 native, is it?

Summing up, i need a very upgradabe workstation 2xdual core, but i dont know which CPU to choose. The budget is not a big issue, so if i can get some opinions, i would be grateful.

Thx and be cool
October 3, 2006 10:14:18 AM

Itanium is fast as 80486SX for x86 software, forget about it.
2P Xeon 5xxx is better choice than 2P Opteron. Maybe you should wait for the quadcore Clovertown if you really need the best performance from 2P server.
October 3, 2006 10:23:12 AM

I've been working for a couple of years with fluent (have no experience with the other software packages) so my advice is solely for fluent simulations:

-best bang for your bucks you'll get by buying desktop computers and put them in a cluster.
I am currently working on a cluster of 26 pentium 4 computers (3.2 Ghz 2 Gb ram), for the ammount of money we put into that cluster we might have bought one system with 4 opteron processors with 8Gb (1.5 year ago). This while the p4 cluster gives us more calculation power. I still have benchmarks lying arround with:

-pentium 4 computers
-dual athlon machine
-dual xeon (old ones, with i think 333 single channel DDR and 2Ghz processor)
-dual xeon (3.4 Ghz with double channel DDR)
-AMD 64
-pentium D
-AMD X2

My conclusions were: pentium's were faster, cause? i think their higher clockspeed. The xeon's were crap because they were limited by their memory. The AMD machines all have lower t-steps/hour and this seemed to be caused by their lower clockspeed.
2 pentium 4 computers in parallel loose arround 20% efficiency/computer, so you get 1.6 time the calculation power.
pentium D machines loose 10% efficiency when using 2 cores in parallel
4 p4 =2.5 times more time-steps/hour.


If you contact me tomorrow i can look into this data again. I dont have data on itanium's but they are awfully expensive and i dont think i'd advice you to buy them. New xeon's or new opterons... hmmm, waht's your budget?

Some other remarks:
-we usualy work on smaller domains and if possible in 2D, so 1 processor can calculate the job.
-when going to 4 and 6 processors i had stability issues, leading to fluent freezing, to solve this i just let him save data and log in once/day to restart if needed. Usualy i had a frozen case every 5-6 days so it isn't that difficult
-i work in windows environment because i'm not adequatly skilled in unix/linux. Other OS might improove stability.
-usualy time-dependant simulations with arround 1Gb ram used/processor, species+reaction. Our velocity fields are usualy solved in a couple of hours while the species/reaction takes a couple of days, and sometimes even weeks.
-as i said earlier: i have no experience with other CFD packages, are they as easily in parallel processing as fluent?

What is your budget? How many licences do you have? parallel processing only?
I'm not sure but i dont think a new rig will go THAT much faster, maybe if you buy another 2 p4's you'd get the same results? Or with your budget you buy 6 p4's, allowing you to have 2 clusters of 4 CPU's. Your simulation will probably go faster (3 days instead of 4?) AND you can run 2 simulations next to each other. A workstation with 2 dual core might work faster (2.5 days instead of 4?), will offer more stability, less power consumptium, less space,... but i think in the end you'll get less calculated.

(edit) Another option: pentium d's, one of their nodes offers the same performance as a p4's, they are cheaper. if you combine their 2 cores you'll loose arround 10% of their calculation power (compared to 20% for p4's). They are cheap and you can equip every Pentium D with 4 Gb ram.
Upgrade? just buy some new computers for in your cluster.
Related resources
October 3, 2006 10:35:13 AM

If I were you I'd go dual 51XX, they're quite powerful for the money and the true quad-core (45nm) upgrades aren't going to be available until Q3 2007 at earliest. My suggestion: go with a dual 5140 or higher machine. You won't be dissappointed.
October 3, 2006 10:39:52 AM

As I don't know any of the applications I'm going to give a broad answer.

Itanic... oops, itanium: terribly expensive, and awful with x86 code (it runs on emulation mode). Excels with native code (but support isn't that good)

Xeon series 5000: pentium 4 derived. Not good. being EOL'ed.

Xeon series 5100: (finally) a very good processor from intel. all are dual core Suffers from being a new "architecture" from intel, and suffers from it (e.g. raid bug - in raid cpu usage goes to 30-50% usage). They excell at int calculations (usually very good at web serving, database and similar workloads). With quad-core i believe we'll start seeing the problems of the FSB again (now they have DIB - dual independent bus that relieve the FSB problems), but being a FSB dependent arch the memory bandwidth is always the same.

Opteron series 200: has single and dual core. Has been king for 3 years now. Very stable now, and with broad support (platforms/chipsets). They excell at fp calculations (usually very good at science calculations, encryption and similir workloads). With "direct connect" (NUMA platform) hasn't any FSB trashing problems, and being NUMA memory escalates almost in a linear manner. It's being EOL'ed.

Opteron series 2000: all are dual core. Continues on stable of 200 series, and with same arch has the same advantages.

servers with 2 sockets aren't the same as a uniprocessor pc, even though there are some similarities.

If I was you, I would look at Xeon 5100 or Opteron 2000.

Depending on you really need, if I was you i'd look at a single cpu machine (they are usally cheaper).

As for dual sockets right now I'm buying Opterons 200 (I take 4 months to validate a platform) and for what I've seen from intels 5100 and amds 2000 we'll continue to buy AMD (raid bug is really a very bad thing for me), but our 2S servers are for webserving, and we only serve HTTPS (SSL uses alot more CPU than generating the page).
October 3, 2006 10:41:35 AM

Quote:
and the true quad-core (45nm)

What's wrong with the untrue quad-cores (65nm)?
October 3, 2006 10:44:10 AM

There's nothing wrong with them, but a true quad-core would get better performance from the lack of FSB load (even though it's not slowing down kentsfield!). That and it lowers power consumption due to the smaller process.
October 3, 2006 10:44:57 AM

Quote:
and the true quad-core (45nm)

What's wrong with the untrue quad-cores (65nm)?

Really much more FSB trashing (all inter-core comunication goes through the FSB to the chipset).
October 3, 2006 10:59:35 AM

Quote:
and the true quad-core (45nm)

What's wrong with the untrue quad-cores (65nm)?

Really much more FSB trashing (all inter-core comunication goes through the FSB to the chipset).
1. what is FSB trashing?
2. have you seen the C2Q scaling in perfromance compared to C2D for SMT software?
3. Do you know how the 2 "glued" C2D on the C2Q are communicating?
October 3, 2006 11:04:09 AM

To answer what you asked aladar:

FSB trashing is the unecessary communication across the FSB between cores or dies.

I think we all have seen how well it scales, but if the seperate conroes didn't have to communicate via the FSB the performance would be increased a little (not as much as presler to conroe).

C2D aren't "glued" they're placed in the same package so the cores communicated via the L2 cache. On the C2Q you have 2 conroes that are in the same package but seperated into their own dies. Thus you have to use the FSB if one conroe wants to communicate to the other. Now if one core wants to communicate with the core adjacent to it (on the same die) then it just uses the L2 cache.
October 3, 2006 11:04:14 AM

somthing else: when using a cluster in parallel you shouldnt see much difference when using Gb switches and network communications. Our cluster runs on 10/100 Mb lines/switches and when monitoring traffic i hardly use 10-20% of the max bandwidth.
October 3, 2006 11:10:20 AM

Quote:
1. what is FSB trashing?

Complete usage of the FSB with cache coenrecy, intercore comunication (all except memory<->cpu usage)
Quote:
2. have you seen the C2Q scaling in perfromance compared to C2D for SMT software?

I haven't seen C2Q. However i've been seing Xeon 5100, and my code is massive multithreaded (sometimes 1000+ threads), and with these workloads the inter-core comm is usually very high (Xeon MP vs Xeon DP had sometimes lower performance, due to FSB thrashing)
Quote:
3. Do you know how the 2 "glued" C2D on the C2Q are communicating?

Intel says is through the FSB (as it is worse than direct communications) I believe intel (remember intel DC performance with 2+ sockets, before intel 5100 or C2D?)
October 3, 2006 11:18:40 AM

i dont think FSB trashing is a issue for Fluent (and i think also all other CFD software packages), if it was i'd see a lote more performance loss between:
2 pentium 4's in parallel (communicating over network)
1 Pentium D using both it's cores (using it's FSB)

Fluent (and other CFD software should do the same) divides it's calculation area in 2 equal zones, each processor calculates it's dedicated zone independantly of the other (and uses it's own memory to store this) and the only data that needs to go from one processor to the other is the surface area between those 2 zones. Which only makes up 1% of the total.
FSB however works limiting because the complete case has to be stored in memory, if bandwidt to the memory is too low (as is the case with older xeons) your calculation will slow down because your processor can't read/write fast enough. It will not limit inter-processor data traffic.

Not sure if this is clear.


Fluent (not sure about the other packages wareva mentioned) is single threaded, or double threaded if you start it in parallel processing...
I think this is also one of the reasons why pentium 4's performed better than AMD 64's, despite the amd 64 being a far superior processor.
October 3, 2006 11:19:13 AM

Quote:
somthing else: when using a cluster in parallel you shouldnt see much difference when using Gb switches and network communications. Our cluster runs on 10/100 Mb lines/switches and when monitoring traffic i hardly use 10-20% of the max bandwidth.

Clustering was designed to have a huge number of nodes (in the uni i went there was a cluster with 5k nodes), and so the comms has to be highly optimized. Of course, more bandwidth, and lower latency is always better. But clusters tend to send the least possible.
October 3, 2006 11:21:35 AM

Quote:
1. what is FSB trashing?

Complete usage of the FSB with cache coenrecy, intercore comunication (all except memory<->cpu usage)
Quote:
2. have you seen the C2Q scaling in perfromance compared to C2D for SMT software?

I haven't seen C2Q. However i've been seing Xeon 5100, and my code is massive multithreaded (sometimes 1000+ threads), and with these workloads the inter-core comm is usually very high (Xeon MP vs Xeon DP had sometimes lower performance, due to FSB thrashing)
Quote:
3. Do you know how the 2 "glued" C2D on the C2Q are communicating?

Intel says is through the FSB (as it is worse than direct communications) I believe intel (remember intel DC performance with 2+ sockets, before intel 5100 or C2D?)
If the software that Wareva uses scales good enough on 2P Netburst and compares well to 2P Opteron, do you think that the FSB will bottleneck the perofmance of C2Q? I don't think so.
I think that the native quadcore will bring perofrmance improvements not becouse there will be no FSB trashing, but becouse there will be "glued" octo-cores.
IMO If he needs performance now, Clovertown is the best 2P choice for his purposes and with a reasonable performance/price factor. If he needs more perofmanace next year, he can replace the "glued" quad-cores with "glued" octo-cores.
October 3, 2006 11:27:15 AM

i work with fluent and untill now i limited parallel processing to 6 nodes (could go to 8 nodes), this because of stability AND because of limited ammount of licences. Wareva will not use 5000 nodes for his CFD calculations.
Fluent charges arround 1000Euro/licence each year. For every processor you need 1 licence (or you can use a parallel licence allowing you to use 8 cores on one single case, hence my 8 computer limitation).

I dont think he will need Gb connectivity of his cluster.
October 3, 2006 11:39:24 AM

Quote:

Xeon series 5100: (finally) a very good processor from intel. all are dual core Suffers from being a new "architecture" from intel, and suffers from it (e.g. raid bug - in raid cpu usage goes to 30-50% usage). They excell at int calculations (usually very good at web serving, database and similar workloads). With quad-core i believe we'll start seeing the problems of the FSB again (now they have DIB - dual independent bus that relieve the FSB problems), but being a FSB dependent arch the memory bandwidth is always the same.

Has anybody, other than one article on theinquirer shown that there is such a RAID5 problem?

Quote:

Opteron series 200: has single and dual core. Has been king for 3 years now. Very stable now, and with broad support (platforms/chipsets). They excell at fp calculations (usually very good at science calculations, encryption and similir workloads). With "direct connect" (NUMA platform) hasn't any FSB trashing problems, and being NUMA memory escalates almost in a linear manner. It's being EOL'ed.

It doesn't have the FSB trashing problem, but it does have the HyperTransport trashing problem which starts showing up at 4S and the problem increases quadratically as you increase sockets.
October 3, 2006 12:28:34 PM

Who actually uses mb raid on a server ? I certainly would not unless it is a very lightly loaded server.
Buy a real raid card.

As far as the HT thrashing anything up to 8 is fine, after 8 it is a problem.

And this doesnt apply to a cluster anyway which is what they have been talking about.

The people who only suggest intel for a server tend to be fanboys.

Anyway if you want 1 single system or if you want a cluster it doesnt really matter that much.

An opteron or woodcrest system will do nicely either way.
However if you want 2 or more sockets you should definately go with the AMD Opterons.

If you do go the xeon route be prepared to spend quite a bit more and do make sure you get woodcrest and not a netburst pos.
October 3, 2006 12:59:40 PM

Quote:

As far as the HT thrashing anything up to 8 is fine, after 8 it is a problem.

You can't go past 8 sockets gluelessly period. And it's not fine at 4S, since big cache and better logic already enables Xeon MP to outscale Opteron in important server tasks.

Quote:

An opteron or woodcrest system will do nicely either way.
However if you want 2 or more sockets you should definately go with the AMD Opterons.

If you do go the xeon route be prepared to spend quite a bit more and do make sure you get woodcrest and not a netburst pos.

Tulsa-based Xeon MP systems are the fastest x86 servers for commercial applications. And unlike Opteron, they scale to 32S.
October 3, 2006 1:32:42 PM

1. My budget is around 7500€ (9000$ US)

2. Clustering is not really an option (we already have a 50 comp cluster, room is an issue here and so is licencing), although i know it's the ideal solution for starcd and fluent.

3. This has to be a single machine, Windows based. It's for those application who take two long on a personal computer, but aren't valid candidates (yet) for the cluster (booking the cluster is a real problem).

4. For what i read, Xeon 51XX or Opterons dual core are the answer. The software needs are, in order of importance CPU/RAM bandwidth, CPU floating point calculations capabilities, amount/speed of RAM.

5. I need this machine to be up and running by late october, and i need to be able to upgrade it (CPUs and RAM) for a period of 3/4 years.

Thank you all for replying
October 3, 2006 1:40:53 PM

Quote:

You can't go past 8 sockets gluelessly period. And it's not fine at 4S, since big cache and better logic already enables Xeon MP to outscale Opteron in important server tasks.


This sounds like typical fanboyism...

Quote:


Tulsa-based Xeon MP systems are the fastest x86 servers for commercial applications. And unlike Opteron, they scale to 32S.


on 4S Xeon can get to the heels of opteron thanks to large L3 cache (up to 16MB) and QIB (quad independent bus) and are really much more expensive.

for 4S or 8S, only opteron is good.

I won't even mention temps/power usage.

1S or 2S depends on your typical workload (int is better intel, fp is better amd).

4S+ amd, always!
October 3, 2006 1:47:05 PM

Quote:
1. My budget is around 7500€ (9000$ US)

2. Clustering is not really an option (we already have a 50 comp cluster, room is an issue here and so is licencing), although i know it's the ideal solution for starcd and fluent.

3. This has to be a single machine, Windows based. It's for those application who take two long on a personal computer, but aren't valid candidates (yet) for the cluster (booking the cluster is a real problem).

4. For what i read, Xeon 51XX or Opterons dual core are the answer. The software needs are, in order of importance CPU/RAM bandwidth, CPU floating point calculations capabilities, amount/speed of RAM.

5. I need this machine to be up and running by late october, and i need to be able to upgrade it (CPUs and RAM) for a period of 3/4 years.

Thank you all for replying


(contais references to brands/products)

What you say is possible. And for the price you want I'd say you to see Sun's X2200 M2 or X4200 M2 depending on what exactly you want.

PS: I have no interest on SUN, just a happy costumer. This is just my personal opinion.
October 3, 2006 1:52:50 PM

Quote:
Who actually uses mb raid on a server ? I certainly would not unless it is a very lightly loaded server.
Buy a real raid card.

I use linux, and linux raid, specially after some problems with a raid controller (complete hardware solution), but long story short, even with an equal controller (different firmware) wasn't able to recover the data (lost almost 1 hour because of backups - this was an online shop).

After using linux software raid I haven't any problems (with recovery that is). Backup controller is any pc/server that support the hard drive.

PS: I use always raid 1. Even with the controller (data security is really important).
October 3, 2006 2:22:57 PM

I was reading this thread and I was just curious, how do you utilize or access the processing power of different machines. I know they use this for several industrial programs instead of using supercomputers but I never heard of anyone doing it on a low level basis. How is this done? I assume it can't be done with applicaitons or can it?
October 3, 2006 3:28:14 PM

As you are dealing with Nastran and Ansys , both Xeon 5100 (up) and opteron 2000 will be great choices.
Opteron 200 series would also be a great choice if you can find a nice deal on them for their older socket...
You should consider all system cost in order to make your plans

Xeon will require more expensive memory, but have more horsepower in general, but also may have stability issues which an analysis with thousand knots in Ansys will not tolerate...
Opterons will deal with more cheap memory and system should be more stable, specially on 200 series... performance will be slower... but must check if you can handle with lower horsepower , with reducing the others gap in your system ( Quadro cards, more memory, SCSI HD ...etc)
Maybe you can run your system with SATA II HD , convert 6800 Ultra VGA into Quadro ...so you can have a powerfull system with reasonable money... of course depending on your company demands...
Here we use our PC's for windows based Nastran and Catia analysys, and we are starting to use Ansys also. Our tests with Opteron 165 helped us a lot in our experiments of our change from unix servers and silicon graphics workstations into windows based workstations... so maybe you can also do the same... (that's why my signature have so many 6800 Ultra, in order to save some on Quadro cards) ...
Of course...with higher budget...you will be able to assembly systems like Mac did with xeon (which by the way is a good reference for your new system if you choose xeon as Dell have a workstation similar to apple for windows ) ... as a Brazilian company...we had to improvise in order to get our projects done... but we did and opteron help us a lot on this ...so for limited budget...I would go for a maximum system with good deal with opteron 200 series... (hope to get soon a xeon to test...)
We are starting to assembly 280 opteron workstation with Tyan MB and 8 GB of memory ...so we made our choice.... luck on yours...
October 3, 2006 4:20:42 PM

Quote:
Hi

I'm using a microcluster of 2 HP workstations (P4 3GHz 2GB RAM) to run scientific software (ANSYS, NASTRAN, Fluent, Starcd, etc...), mainly CFD (Computational Fluid Dynamics) and finite element method.
The problem is my systemis getting to old and to slow, what was once acceptable, is nowadays so damn slow (sometimes it takes up to 4 days to prosue convergent results).
I need to change my machine, so i was thinking of a 2xdual core system, with at least 8GB RAM (and it has to be very upgradable). The new Xeons seem to be competing quite nicely with the Opteron processors but 'm afraid Intel will upgrade its socket in some near future whilst AMD will keep the socket F for a long time. As for Itanium, i don't really know the CPU, but i think it's not x86 native, is it?

Summing up, i need a very upgradabe workstation 2xdual core, but i dont know which CPU to choose. The budget is not a big issue, so if i can get some opinions, i would be grateful.

Thx and be cool


I'm a CFD specialist. My company is busy developing several new biomedical products relying heavily on Virtual Prototyping e.g. mathematical optimisation via FSI.

My current "desktop" machine on which I'm doing the initial model development work is a Tyan K8WE with two 275 Opterons.

Due to the physical characteristics of the product I'm modelling, I need to use the DES hybrid turbulence model, rather than traditional URANS, to get results that validate well against experimental LDA data.

A full DES analysis on a coarse 3.5 million cell problem will take 60 days on this box (4 cores @ 24/7). As a result we are currently designing a new compute cluster. As it happens, CFX (the code I use) scales extraordinarily well on commodity hardware and gige interconnects:
http://www.ansys.com/assets/white-papers/wp-amd-cfx-par...
http://www.ansys.com/assets/tech-briefs/cfx-parallel.pd...

In fact it scales much better than Fluent due its its coupled solver.

Given CFXs outstanding scaling on commodity hardware our provisional plan for the cluster is as follows:
20 QE6600 Kentsfield boxes with dual gige interconnects.

This will provide 80 cores with near linear scaling at a ridiculously low cost ~ £10k ~ $20k.

However, given Fluents less than stellar scaling on commodity hardware you will probably need to go for a more upmarket solution. This will help you judge the performance of Woodcrest based servers vs. Opteron servers:
http://www.sstc.co.jp/biz/projects/HP2C/PDFs/VXTECH_Fus...

Bottom line: Woodcrests whip the Opterons by a country mile. On CFX, the 3 Ghz Woodcrest is almost exactly 100% faster than the Opteron 275.

I would generally recommend you look at the following options (in order of increasing cost):
- Dual socket woodcrest servers with Clovertown CPUS (8 cores/box) linked via high speed interconnects e.g. myrinet. You can buy and link as many of these boxes as needed.
- A pre-integrated rackmount server cluster like the Fusion box linked to above.

Which one you chose depends largely on your budget and whether you want a turn-key solution.

Above all else the most important thing you need to find out is what type of interconnects Fluent needs to scale well to a large number of cores e.g. 16+. I can recommend two sources here:
Fluents own scaling reports: http://www.fluent.com/software/fluent/fl5bench/flbench_...
Fluent user forum: http://www.cfd-online.com/Forum/fluent.cgi (Ask them to describe the clusters they work on ...)

Dont bother with the Itanium, Woodcrests are just as fast and a fraction of the cost.

Good luck ...
October 3, 2006 4:29:24 PM

Quote:
There's nothing wrong with them, but a true quad-core would get better performance from the lack of FSB load (even though it's not slowing down kentsfield!). That and it lowers power consumption due to the smaller process.


CFX, which I work on, which is an equivalent product to Fluent, scales linearly with 90+% efficiency on the desktop Kentsfield. Given Woodcrests DIB it should scale even better.

CFD codes (CFX, fluent) are coded from the ground up to scale well to thousands of cores.
October 3, 2006 4:41:25 PM

Quote:
1. My budget is around 7500€ (9000$ US)

2. Clustering is not really an option (we already have a 50 comp cluster, room is an issue here and so is licencing), although i know it's the ideal solution for starcd and fluent.

3. This has to be a single machine, Windows based. It's for those application who take two long on a personal computer, but aren't valid candidates (yet) for the cluster (booking the cluster is a real problem).

4. For what i read, Xeon 51XX or Opterons dual core are the answer. The software needs are, in order of importance CPU/RAM bandwidth, CPU floating point calculations capabilities, amount/speed of RAM.

5. I need this machine to be up and running by late october, and i need to be able to upgrade it (CPUs and RAM) for a period of 3/4 years.

Thank you all for replying


Well with that set of constraints its a no brainer: Dual socket Woodcrest mobo with two Clovertown CPUs = 8 cores.

Aladar,
You best not pontificate about something you know little about (CFD cluster computing). Intel's Core architecture based CPUs are NOT limited by the FSB as far as commerical CFD codes are concerned. First rule of cluster design: Understand your application.
October 3, 2006 6:45:32 PM

fluent is an industrial application program drifter, and designed to utilise different CPU's on a single problem.
October 3, 2006 7:32:00 PM

is clovertown allready available? he wants it this month.
And will the motherboard he buys today support it?

@Wareva

1)7500 will buy you 6 dual core machines with 4 Gb memory each (=12 calculation nodes).
Or 1 machine with 2 xeon 5xxx cpu's=4 calculation nodes (but i'm note sure about that, am not familiar anymore with pricing) and arround 8 Gb memory.
Please correct me if i am mistaken about this, i am just guessing and would like to know if anyone has better idea about how much hardware costs. I think a quad-cpu machine will cost too much for your budget.
You have budget for a small cluster, not for a high end workstation. cluster also allows better downscaling or dividing of computer power (between different programs, simulations or users). For example: you can use 6 CPU's on a fluent calculation and still have a 4-node starCD calculation and still able to check out something else on that last dual core machine.

2)not enough room? Go for the dual xeon machine or dual opterons.

3)booking the cluster is a real problem? not if you own your own computer cluster, finding room should not be that hard, take into account a workstation needs its own room too, usualy they make a lot of noise and you dont want to sit next to it. besides, a small cluster shouldnt take that much room.

4) xeon 5xxx or opterons will give you similar bang for buck, but they are expensive. You have budget for a small cluster, not for a high end workstation.

5) upgrading a cluster is easy: add more nodes. whatever you choose: you will not be able to upgrade a workstation for 3-4years to come. I think i heard people telling the opteron's are better because they will not change socket for the next few years and xeons were supposed to change anyway (or am i wrong again? just correct me). But i think buying a new processor for those xeon/opteron machine would cost the same as a new cheap desktop computer which, by then, will outperform your outdated rig. Other upgrades? ram? you can add more ram and calculate larger cases but this will not make your simalations go faster, if you upgrade you need more ram (larger cases) and more calculation power (to calculate those larger cases). Workstations are limited in upgrades you can do with them and will be outdated soon.


I am still biased towards clustering as it is cheaper, will give you more flexibility, upgradeability and processing power than whatever workstation you buy. your budget is not big enough to buy a high end workstation which could outperform some cheap computers and the software you use is perfectly suited for parallel processing on a small cluster. But it's your choice and your money.

On the Xeon vs opteron problem:
Xeon's are faster and better for your calculations.
opteron's will be easily upgradeable in the future towards newer CPU.
but that's my limited knowledge of high end CPU's.
October 3, 2006 7:37:41 PM

and if i'm in favour of clusering it's because we bought several workstation with were outdated verry soon:

A compaq alpha 64 workstation costed us 7500 euro and was outperformed by a pentium 4 1 year later (costing 1/3 of the alpha).

Dual athlon machine costing 2500 was outperformed by pentium 4 after 1 year.

Dual Xeon workstation (cost was 3000 euro) was outdated 6 months after purchase.

After that we always bought desktop computers and put them in a cluster. Room and power is easily obtained at university, money for computers is harder to come by. Even high end machines are outperformed after a short while. Only buy them if you absolutely need the best and if you have plenty of money for the hardware and money is not an issue.
just my 2 cents
October 3, 2006 7:45:57 PM

Xeon 5160 3Ghz @ $850
Opteron 2220 2.8 Ghz @ $1200

Xeon is ~50-75% faster and 30% cheaper. When you are paying serious licence fees per core you want to run the fastest CPUs possible!

2 Clovertowns for 8 core system will be ~$1200 each = $2400 total for CPUs.

AMD 8 core system will cost ~$2400 per CPU = $9600 total for 4 CPUs.

Its a no brainer ... running hellishly expensive per-core-licensed software on slow ass cores is a very stupid thing to do. You need to consider the TOTAL cost of the hardware and software in order to minimse your expendititure.
October 3, 2006 8:30:30 PM

Quote:
Really much more FSB trashing (all inter-core comunication goes through the FSB to the chipset).
Benchmarks from engineering sample Kentsfield chips have shown that even with 4 cores, the 1066MHz bus isn't suffering from much "FSB trashing".
October 3, 2006 8:38:53 PM

Quote:
Really much more FSB trashing (all inter-core comunication goes through the FSB to the chipset).
Benchmarks from engineering sample Kentsfield chips have shown that even with 4 cores, the 1066MHz bus isn't suffering from much "FSB trashing".

Yup. Kentsfield benchmarks on 3D studio max running with 4 threads gave identical results for 1066 and 1333 MHZ FSB settings.

Of course, for certain applications FSB performance is important. Commercial CFD codes are not among them.

I dont know how Intel managed to side step the crippling effect of FSB thrashing on the netshit arches but I'm happy to say they did :trophy:
October 3, 2006 8:40:14 PM

Quote:

on 4S Xeon can get to the heels of opteron thanks to large L3 cache (up to 16MB) and QIB (quad independent bus) and are really much more expensive.

for 4S or 8S, only opteron is good.

None of the Xeon MP platforms use more than 2 FSBs per 4 sockets. Yet Tulsa-based Xeon MP outscores Opteron in TPC-C, SAP-SD and SPECjbb, all important commercial benchmarks.
a b à CPUs
October 3, 2006 9:16:53 PM

LOL. I love the idiot that says the xeon (a retro p4 cpu) beats the opteron. Funny he didn't bother to look on tomshardware for the true answer.
http://www.tomshardware.com/2003/04/22/duel_of_the_tita...

I would suggest looking for the same or similar program benchmarked on cpus to find the real performer. So far, it's the 4s Opteron by a large margin on many applications.
A RISC cpu probably would be better (if you can use a RISC OS), but may not be an option for ya.
October 3, 2006 9:21:31 PM

Since clustering is out and your biggest priority is FPU consider looking into an FPU accelerator on a 2P machine (Opteron 22xx or Xeon 51xx)

Toms has featured sseveral articles recently on FPU acceleration either as a seperate card or a software solution the utilizes your GPU.

Peakstream GPU supercomputer ~$2000 per node for up to 750GFlops using dual ATI X1950XTX

Clearspeed FPU accelerator ~$7000 per card for +50 GFlops

If it will work with the CFD packages the Peakstream solution will trounce a good sized 4P Woodcrest cluster in a single box for way less $$$.

Deffinatly would want to check how per processor licensing would work in this setup though.
October 3, 2006 9:30:46 PM

Quote:
Since clustering is out and your biggest priority is FPU consider looking into an FPU accelerator on a 2P machine (Opteron 22xx or Xeon 51xx)

Toms has featured sseveral articles recently on FPU acceleration either as a seperate card or a software solution the utilizes your GPU.

Peakstream GPU supercomputer ~$2000 per node for up to 750GFlops using dual ATI X1950XTX

Clearspeed FPU accelerator ~$7000 per card for +50 GFlops

If it will work with the CFD packages the Peakstream solution will trounce a good sized 4P Woodcrest cluster in a single box for way less $$$.

Deffinatly would want to check how per processor licensing would work in this setup though.


None of the commercial CFD codes support hardware FPU accelerators AFAIK. CFX certainly doesnt. A real pity ... using ATI 1900s as cheap hardware FPU accelerators would rock. However they certainly will in future e.g. AMDs torrenza and Intels equivalent.

Of course, for REALLY massive performance increases you can implement a CFD code on FPGAs. These offer ~1000X performance increases. The thing is, those of us who work in industry dont have the time to implement a general purpose CFD code in an FPGA :cry: 
October 4, 2006 3:49:42 AM

Quote:
LOL. I love the idiot that says the xeon (a retro p4 cpu) beats the opteron. Funny he didn't bother to look on tomshardware for the true answer.
http://www.tomshardware.com/2003/04/22/duel_of_the_tita...

What relevance does a test from 3.5 years ago have?

That is what I was thinking. That test used the old paxville cores, didnt It? The new Woodcrest Xeons are much more efficent and are generally faster than the older Xeons.

It's like comparing C2D to P4 Prescott.
October 4, 2006 8:03:11 AM

where do you get your prices wombat? i googled arround and found this

http://www.macnn.com/articles/06/09/25/quad.core.intel.xeons/

Quote:
Intel Clovertown Xeon DP processors will be $1172, $851, $690 and $455 for the CPU in 1,000-unit lots (respectively).


Quote:
20 QE6600 Kentsfield boxes with dual gige interconnects.

This will provide 80 cores with near linear scaling at a ridiculously low cost ~ £10k ~ $20k.


So you are able to buy a dual xeon box with some ram for 1000$???




Clovertown will take some time before they will be available and will be more expensive. I dont think upgrading the xeon's he buys today to clovertowns will fit into his budget.

Licensing should not be a problem, he uses different codes and different programs so he should be able to work them next to each other. Besides, if he buys 8 cores (cluster or not) he can get a cheaper 'parallel' license which allows him to use his cores.
For fluent academic pricing is arround 1000€/license and i think it was 1200€ for a parallel licence (up to 8 cores). Other packages should have something similar.

Again: 7500€ will buy him either a dual xeon machine (4 cores) and i doubt a license will fit into it OR a small cluster of 4 dual core desktop machines (slow ass cores )+license for 1 year.
Those 4 dual core machines will grind the xeon platform to dust.
7500 should allow you to even get 12 'slow ass cores'.

(sidenote: i think pentium D's which are pretty inexpensive should be able to keep up with core 2 duo's. Not sure, i am just extrapolating this from previous tests between p4's and amd 64's: p4 3 Ghz was 35% faster than amd 64 2.2Ghz on Fluent. This while amd64 was a better processor and more efficient)



CFD packages are all designed with clustering in mind and so is their licensing policy .
October 4, 2006 9:00:54 AM

I have no doubt in my mind that clustering is a more cost-efficient tool that a multi-processor single server.

The problem here is licencing (if space is available). Not only software licencing but also OS. i checked the availability for Windows 2003 Cluster Server Edition, and it would cost around 2,500€ (3000$) for 6 machines. I would gladly run a Linux/Unix based system, but this has to be a Windows system.

I'll have another look at clustering though.

Thx
October 4, 2006 10:27:51 AM

Maybe i am confusing things and i dont know about your other applications. Fluent allows to use parallel processing of your different cores without need of your OS to do so. I think your other applications should do the same.

Our 'cluster'is just a bunch of computers put together in a rack, each computer is equivalent to a desktop computer and can operate on its own, no special OS, just win XP running on it. So i dont think you actualy need win 2003 server edition. Check your other applications.
I think they should support parallel processing, win 2003 server edition will just make administration a little easier and add your computers together in 1 cluster (like in a dual core, dual processor machine in ordinary windows) and make them act as 1 machine. Maybe someone working with servers can help you more with the advantages of win 2003 server edition.

Our cluster is a cheap ass configuration but it works pretty good: just computers connected together through a 100Mb network connection. A VPN router (for security) to log in the cluster and remote administration software installed on all of the machines. You log in to one of the machines, start fluent and tell fluent which other computers in the cluster it can use for the calculation you want to start. Pretty easy, very cheap. You can even do without the VPN router if you don't need people working @ home to acces it. Just your computers, a router and cables. A KVM switch if you dont want to acces them remote.
Anyway, you can live without the server edition of windows.
October 4, 2006 10:42:55 AM

Go for Opteron 2XXX. It is better than woodcrest in all CAD applications. Mainly for Fluent. In your case the doubt of quad core upgrade in Intel, yes it will be on the same socket as dual core, but there will be 40W thermal jump per socket. So you need to plan your power supply to take future load.

My suggestion to go for Opteron as it is going to maintain the same momentum for power and socket in quad core. Also it is native quadcore compared to non native of intel quad core. AMD is doing lot of modifications to the IPC and CORE so you can expect better performance than Intel quad core.
October 4, 2006 4:22:12 PM

Quote:
Go for Opteron 2XXX. It is better than woodcrest in all CAD applications.

This is a lie!

Quote:
In your case the doubt of quad core upgrade in Intel, yes it will be on the same socket as dual core, but there will be 40W thermal jump per socket. So you need to plan your power supply to take future load.
This is BS!

Quote:
My suggestion to go for Opteron as it is going to maintain the same momentum for power and socket in quad core.
This is a lie also.

Quote:
Also it is native quadcore compared to non native of intel quad core.
This is a meaningless BS!
Quote:
AMD is doing lot of modifications to the IPC and CORE so you can expect better performance than Intel quad core.
This is a BS and most likely will never happen!

CoolChil, can you provide any link with realible data, arguments and facts?
P.S. My list is so long. You can see on the sticky thread about Core2 on this forum
October 4, 2006 5:48:17 PM

The thing is, when i talked about building a mini-cluster at work, the boss said we could get the room, but all computers should be independent i.e. could be used as a stand-alone computer. Our "big" cluster is just a bunch of boxes with motherboard (with LAN), CPU and RAM, only 1 computer is complete (terminal), but then again, it's a UNIX system.

So i guess my knowledge on Windows based cluster is pretty shitty, i just assumed they all needed to run 2003 Cluster Server Edition. So what u r saying is that i can run XP on all of them, connect them through LAN and enjoy the benefits of clustering (apart some messaging configuration)?
October 4, 2006 5:49:13 PM

Wait until Clovertown arrives, thats the best you can do.
October 4, 2006 9:28:09 PM

Quote:
CoolChill wrote:
Go for Opteron 2XXX. It is better than woodcrest in all CAD applications.

This is a lie!

Quote:
In your case the doubt of quad core upgrade in Intel, yes it will be on the same socket as dual core, but there will be 40W thermal jump per socket. So you need to plan your power supply to take future load.
This is BS!

Quote:
My suggestion to go for Opteron as it is going to maintain the same momentum for power and socket in quad core.
This is a lie also.

Quote:
Also it is native quadcore compared to non native of intel quad core.
This is a meaningless BS!
Quote:
AMD is doing lot of modifications to the IPC and CORE so you can expect better performance than Intel quad core.
This is a BS and most likely will never happen!

CoolChil, can you provide any link with realible data, arguments and facts?
P.S. My list is so long. You can see on the sticky thread about Core2 on this forum


WORD
October 5, 2006 7:28:46 AM

first make sure you can run ALL of your software in windows and run it in parallel so you can use the different nodes.

Our cluster consists of:
Pentium 4 3.2 Ghz, 2 Gb DDR 400 Mhz, an intel MoBO with on board lan, sound and VGA. 80Gb hard disc (7200 rpm) and a floppy disc reader+CD player. An ordinary desktop computer.

OS: windows XP

Fluent can be run in parallel between different computers if you install one of their communicators (i use RSHD-remote shell daemon because it was free).
Connection: 40 port 10/100 Mb switch and 100Mb cabling

Because we are working with 4 people on this cluster we usualy work remote. For connection outside of university and for security we have a VPN router/firewall to connect to our cluster and we use remote administrator for connecting with the computers, but i think you can as well use remote desktop incorporated in win XP.
all computers are also connected through a KVM switch to be able to acces them non-remote.
KVM switch is not needed if you can live with only working remote.
VPN router is not needed either. Only thing you need is computers and connection to the network.


This setup is cheap and gives us plenty of processing power for our calculations. One drawback: stability caused by either (or a combination) of using desktop computers and using windows. High end systems + unix/linux should give you more stability. But for small clusters this should pose no problem:
2 computers in parallel run without any problems at all (=4 cores)
4 computers in parallel tend to encounter freezing in our simulations (=restart needed of fluent) every now and then but that's only once each 2 weeks, once a month,...
6 computers in parallel are fragile, sometimes i can simulate for months in a row, sometimes it crashes every day.

Solution: save your data every day so you can easily restart.
!