Home-Built Supercomputer for Scientific Computing

bgood44

Distinguished
Mar 12, 2009
6
0
18,510
Hello all!

I am an Aerospace Engineer and looking to build a homemade "super computer", for lack of a better name. The computer's sole purpose will be to perform computational fluid dynamics (CFD). As you may know, CFD is very demanding when it comes to cpu performance. I would like the system to run a linux os. It will primarily running FORTRAN 90, C, and C++. It will also be running MATLAB, but in a less demanding manor. Along with fortran and C++, GAMBIT will be used for CFD grid generation and Fluent will be used for its cfd codes.
Complex cfd programs can produce data files in the terabytes range, although I dont plan on getting that in depth. Speed is primary concern. What can I build for under $10k USD?

Thanks Guys
 

ShadowFlash

Distinguished
Feb 28, 2009
166
0
18,690
how about a VX50? I know it's E-bay....but this guy has some real high-end stuff with warranty. Check out the double wide chassis S4985 he's got listed if you need alot of drive space too. About as close as you'll get to a "personal supercomputer".

http://shop.ebay.com/merchant/marsbrode_W0QQ_nkwZQQ_armrsZ1QQ_fromZQQ_mdoZ

If you have your heart set on building one yourself, you can always pick-up a barebones VX50 or the "cheaper" FT48. Don't know if this is the type of thing you have in mind, or if you're looking for something a little more traditional.
 

bgood44

Distinguished
Mar 12, 2009
6
0
18,510
To be honest, Im not exactly sure what I'm looking for. Im not really familiar with the VX50, although I will look into it. My knowledge of computer hardware is quite limited, I just know how to use them :). I am not opposed to buying a system like this, I just figured that I could build something tailored specifically for my needs, and do it more cost effectively. Oh and i don't know if this is important but the system needs to be able to perform double precision calculations. Again, speed speed speed...
 

ShadowFlash

Distinguished
Feb 28, 2009
166
0
18,690
I actually don't know anything about that type of software :( I've used tyan stuff for years for CAD. I'm actually in the process of building a FT48 (16-core) right now as a virtualization/CAD rig. I do know that the advertised primary use for these type of boxes are CFD and they definately will run Linux. From what I do understand, this type of work usually requires a good number of cores, a large amount of RAM, and some fast HDD's or some sort of clustering set-up using infiniband networking. I did some quick googling, and apparently results can drastically differ with slightly different types of calculation depending on how you set it up to run. Most of the info I've found is waaayyyy over my head. Sorry I can't be of more help....
 

MRFS

Distinguished
Dec 13, 2008
1,333
0
19,360
I would start with a careful inventory of all production software
you intend to run on a regular basis.

Let me give you a good example from our daily workload:
after we installed COPERNIC desktop search software
on a quad-core Q6600 workstation, we compared it
head-to-head with a dual-core D 945 in a similar workstation.

The Q6600 finishes just about twice as fast as the D 945
updating the very same 5GB database.

That can only be the result of parallel programming in COPERNIC.

If your software is not coded to exploit multiple cores,
you're wasting your time bulking up on multi-socket systems.
Those are intended chiefly for busy servers with lots of
multi-tasking to process.

After you've completed your analysis of your production software,
the real choice for you is between a single-socket Core i7 machine,
or a twin-socket Xeon with Core i7 architecture.

A single-socket Core i7 machine has 4 CPU cores with hyperthreading:
each socket is THUS capable of running 8 threads simultaneously.

That should be enough for any software you want to throw at it,
particularly if you are only running one copy of your fluid dynamics model
at any given point in time.

Next decision is the clock speed of the Core i7: go for the fastest
because you've got plenty of budget for it, and it was just reported
on the Internet that all Intel Core i7 CPUs are "unlocked" --
meaning they can all be overclocked.

So, buy a motherboard that makes overclocking easy, e.g. ASUS P6T.

As for RAM, X58 chipsets now support either 12 or 24GB of DDR3 RAM:
look into Corsair's high-end Dominator series, with the memory module cooler.
Kingston make engineering samples of 4GB DIMMs x 6 = 24GB total e.g.:

http://www.hexus.net/content/item.php?item=17187


The Corsair power supplies w/ 850 Watts and up should be enough:
they are very highly regarded for high-end workstations like the
one you want to build:

http://www.newegg.com/Product/Product.aspx?Item=N82E16817139009&Tpk=N82E16817139009

I'll leave your graphics hardware choices up to your own good research.

If you are really serious about writing very large data files,
then be sure to give serious consideration to motherboards
and/or RAID controllers that can pump data quickly to and from
SAS (Serial Attached SCSI) hard drives: these come in both
10,000 and 15,000 rpm e.g. Seagate.

Despite what lots of amateurs claim, in ignorance,
a RAID 0 with multiple (4 x or 8 x) HDDs can really move a lot of raw data
very fast: 250-300MB/second is quite easy, and 500MB/second
is within reach, if you know what to buy and how to configure
your storage subsystem. The key is choosing a RAID controller
that does parity computations in hardware, e.g. Areca, 3Ware
or Highpoint's "enterprise" class controllers (there are others).

Out on the "bleeding" edge you will find Fusion-io's ioDrive Duo
(Google that one), or OCZ's new Z-drive (actually just a
Highpoint RocketRAID with 4 x OCZ MLC SSDs inside the
plastic shell).

Lastly, we've had enormous success with ramdisks
by installing RamDisk Plus from SuperSpeed, LLC
in Sudbury, Massachusetts: www.superspeed.com

A little bit of intelligent memory management, tailored
to your application software, can pay enormous dividends:
Core i7 memory bandwidth BEGINS at 25,000 MB/second
(25GB/second), and goes up from there when triple-channel
DDR3 is overclocked.


This is THE hottest computer hardware available
at the present time for workstation-class machines.
AMD still can't even come close.


hope this helps


MRFS
 

bgood44

Distinguished
Mar 12, 2009
6
0
18,510
MRFS, WOW! What a reply! It will take a while for all that information to settle into my mind, like I said im not to knowledgeable on computer hardware. I will have to do some more research on your post. If you dont mind me asking, what field are you in?
 

MRFS

Distinguished
Dec 13, 2008
1,333
0
19,360
I first began working with computers during the Summer
before my first year of grad school at U.C. Irvine in 1971.

I've been using and developing advanced computer systems
ever since then, now 38+ years.

I build about one new workstation every year or so,
with the "trickle-downs" going to friends and neighbors.

I have also submitted 3 patent applications to the U.S.
Patent Office, with a fourth ready to submit, in the area
of high-speed solid-state data storage subsystems.

(I've also been banned by certain websites merely
for asking what "VISTA" stands for: nevertheless,
witness how fast MS is now pushing Windows 7.)


MRFS (= Memory Resident File Systems)
 

MRFS

Distinguished
Dec 13, 2008
1,333
0
19,360
http://www.tomshardware.com/news/Tesla-C1060-S1070,5672.html

[begin quote]

Besides performance improvements, the T10P also delivers 64-bit or double-precision capability, which is required for most fluid dynamics and financial stream processing applications. Double precision is substantially more intensive than single precision calculations and with decrease the performance of the card dramatically. Nvidia told us that double-precision calculations will result in a 90% speed penalty and deliver only 100 GFlops per T10P processor.

[end quote]


Also:

http://www.tomshardware.com/reviews/nvidia-cuda-gpu,1954.html


MRFS


 

MRFS

Distinguished
Dec 13, 2008
1,333
0
19,360
http://www.supermicro.com/products/motherboard/Xeon1333/


I'd call SuperMicro to inquire about dual-socket Core i7
motherboards they are designing for imminent release.


MRFS
 

MRFS

Distinguished
Dec 13, 2008
1,333
0
19,360
http://news.softpedia.com/news/Intel-Nehalem-EP-Gets-Early-Benchmark-98425.shtml

http://www.techradar.com/news/computing-components/processors/world-exclusive-intel-s-dual-socket-nehalem-ep-platform-benchmarked-487131


Just to confirm that our eyes did not deceive us, we also gave Nehalem EP a quick going over with the Stars Euler3D benchmark. It's a computational fluid dynamics simulation that majors on floating point performance.

Sure enough, Nehalem EP roasts all comers in this benchmark, too – it's twice as quick as a pair of 2.7GHz Shanghai processors (14.34 seconds to complete five instances versus 30.32 seconds).

[end quote]


MRFS
 

MRFS

Distinguished
Dec 13, 2008
1,333
0
19,360
Re: Nehalem EP (multi-socket Core i7)

http://www.tweaktown.com/news/10575/clearer_picture_of_intel_s_nehalem_ep_lineup/index.html

http://www.digitimes.com/news/a20081112PD218.html

Intel is planning to launch Xeon 5500 (Nehalem-EP) and Xeon 3500 series (Nehalem-WS) server CPUs in the first quarter of 2009, according to sources at server makers.

Intel will launch ten CPUs for the Xeon 5500 series: quad-core W5580 (3.2GHz), X5570 (2.93GHz), X5560 (2.8GHz), X5550 (2.66GHz), E5540 (2.53GHz), E5530 (2.4GHz), E5520 (2.26GHz), E5506 (2.13GHz), E5504 (2GHz) and dual-core E5502 with prices at US$1,600, US$1,386, US$1,172, US$958, US$744, US$530, US$373, US$266, US$224 and US$188 in thousand-unit tray quantities.

For the Xeon 3500 series, Intel will launch three CPUs: quad-core W3570, W3540 and W3520 priced at US$999, US$562 and US$284.

In additional news, Intel is planning to phase out seven notebook CPUs including the Core 2 Extreme X7900 and X7800, and Core 2 Duo T7800 and L7700 in January next year.

[end quote]


http://www.theregister.co.uk/2009/03/09/intel_nehalem_mar31/

Intel 'Nehalem' Xeons poised for March 31 launch

[begin quote]

With the quad-core Nehalem EP processors and their QuickPath Interconnect offering between three and four times the memory bandwidth of the current quad-core Xeon 5400 series processors and their antiquated front side bus architecture, it will be easy to make the technical case for an upgrade to the new chips, code-named "Gainestown" and paired with the "Tylersburg" chipset. The Nehalem EPs will be sold as the Xeon 5500 series. They were outed last week by Apple, which plunked the Nehalem 3500 (for single-socket machines) and 5500 (for dual-socket boxes) into its Mac lineup. It is not clear how much more raw oomph the Nehalems will have, but on some early benchmarks, system performance has increased by nearly 80 per cent.

[end quote]


http://www.hardware.info/en-US/news/ymiclZqXwpeaaZY/15_new_Intel_Xeons_in_March/


Excellent photos and block diagrams here:

http://forum.***.com/mainboards-chipsets/16074-nehalem-ep-discussion-thread.html
("***" should be "*** above (no spaces): don't know why the Forum is hacking this URL ??)

So, Google "With the impending release of the Dual Socket Nehalem EP platform I figured we should start a discussion of what is coming down the pike."


http://www.ewiz.com/detail.php?name=MB-Z8PD12X#


Google "ASUS Z8PE-D12X"


MRFS
 

MRFS

Distinguished
Dec 13, 2008
1,333
0
19,360
http://www.crn.com/white-box/215900275


Intel To Launch Nehalem Server Chips March 30

By Damon Poeter, ChannelWeb
8:08 PM EDT Fri. Mar. 13, 2009

Intel (NSDQ:INTC) is prepping its channel for the wide release of its Nehalem-class Xeon server microprocessors and platforms to whitebox partners on March 30, Channelweb.com has learned.

Code-named Gainestown, the first quad-core Xeon parts featuring the company's next-generation Nehalem microarchitecture hit the market in early March with the launch of Apple's new Mac Pro workstations.

[end quote]


MRFS
 

bgood44

Distinguished
Mar 12, 2009
6
0
18,510
Two things, one observation and one question.

First and foremost, I definitely need to do a TON of research and get a better understanding of all of this hardware.

And the question, With all of the hardware that you are presenting, would I need to alter my code to run on these platforms, or will it compile and run in the same way that it would on a desktop ( hopefully a lot faster)?
 

MRFS

Distinguished
Dec 13, 2008
1,333
0
19,360
"Nehalem EP" is the code phrase for multiple Core i7 CPUs
running in the same motherboard e.g. dual-socket
(e.g. see photos above of the ASUS MP version).

"MP" means multi-processor (i.e. multiple CPUs).

As far as I know, the instruction sets are the same,
with extensions.


You may need to re-compile, but the languages you
mentioned fully support the x86 instruction set by now.

AMD's 64-bit implementation was designed from the outset
natively to support the 32-bit instructions of that x86 set.


What machine(s) are your fluid dynamics codes
running on presently?


I used to do a LOT of FORTRAN conversions (long time ago):
the main problems resulted from system-specific
SUBROUTINE calls, and different compiler defaults
e.g. INTEGER defaulted to *2 on some machines
and *4 on other machines.

In your case, you should look into the
defaults for DOUBLE PRECISION:
is the default REAL*8 or REAL*16?

You may need to hard-code REAL*8 to achieve maximum performance,
because REAL*16 may cause a severe performance penalty.


Do you have any colleagues currently running Core i7 machines?


From what I've gleaned after a lot of reading,
ASUS and Gigabyte are the only games in town
for a serious single-socket Core i7 machine.

"Core i7" is synonymous with "Nehalem".

With your budget, I would strongly suggest that
you start with a modest "test" machine, e.g.:

http://enthusiast.hardocp.com/news.html?news=MzgzNTIsLCxoZW50aHVzaWFzdCwsLDE=

http://www.asus.com/Product.aspx?P_ID=6i86Hj0lGriFHfY9

Then, when your codes are running, you can either demote
this mATX motherboard to the role of a backup server, via a Gigabit switch;
or, cannibalize the parts for your dream workstation e.g. dual-socket.
The single-socket Core i7 CPU will need to stay in the mATX motherboard, however.

SDRAM vendors are competing fiercely right now for this Core i7 market,
and you wouldn't need to buy THE fastest DDR3 for such a "test machine".
Start with 6GB of triple-channel RAM, 1 x VelociRaptor for your OS,
and 1 x 1TB SATA/3G hard drive for data storage e.g. Western Digital.


Dual-socket Core i7 machines are now or will very soon
be available from ASUS, Tyan, Supermicro and Intel.
I don't know enough about these MP motherboards
to make a recommendation to you, however.


If you like, I can put you in touch with the ASUS
Marketing Manager for North America. Let me know.


MRFS
 

bgood44

Distinguished
Mar 12, 2009
6
0
18,510
Well right now, I am a graduate aerospace engineering student at Arizona State. Everything I have done thus far has been on my personal laptop ( not a high performance machine at all...) and on standard desktop computers. Although ASU does have a very good HPC facility, I haven't had the privilege to use it yet ( I think the Saguaro super computer is number 150 or so in the world).

Any way, Im not immediately in the market for this hardware as i wont be graduating for another year or two. But, Im starting to look into formulating a business plan for a CFD outsourcing business. For this type of thing, computing power is a major consideration, and Im trying to figure out if it is feasible for me to acquire the necessary hardware at a decent price. Ill admit, Im still new to this type of computing and there are still large gaps in my knowledge. What I do know is, I do enjoy the coding involved with CFD and I enjoy the results even more! That being said, I think there is a market for CFD simulations.

Sorry, kind of got of topic there, but you should know what my intentions are and a little bit about my background.

answering your questions...

Machine Currently being used:

Gateway tablet PC, win xp, 1.5 Gig Ram, 1.74GHz intel mobile cpu
like I said, not a great machine, but does the job for very simple code. I run MATLAB most the time and it is a very slow running language from my experience. It would be completely inadequate for anything really meaningful.

REAL*8 or REAL*16:
No clue! but ill look into it. Im not a fortran guru (yet) but If its as simple as hard coding REAL*8 , no problem right?

Colleagues Running i7?:
Right now my colleagues are fellow students, so no, they are running the same type of garbage I am.

No need to put me in contact with Marketing Manager as Im not immediately in the market but thanks for the offer!

Stupid Question for you. So for this type of system, would it consist of multiple i7's or just 1? I know they are multi core processors but am confused if more than one processor is actually used.

I very much appreciate all of the help and insight you are providing. You are helping me learn a lot.

 

MRFS

Distinguished
Dec 13, 2008
1,333
0
19,360
The Intel Core i7 CPU aka "Nehalem" is a quad-core chip
with hyperthreading, that installs into a single LGA1366 socket.

Dual-socket motherboards are being called Nehalem EP.

I think you should start out with a single socket motherboard
like the ones built by ASUS or Gigabyte (see above).


MRFS


 

SuperCruise

Distinguished
Apr 12, 2009
4
0
18,510
Guys, I understand the traditional interest in a hotter CPU but that misses the point of the last two years of personal supercomputing innovation: all that revolves around NVIDIA CUDA. Using hundreds or thousands of stream processors within NVIDIA G200 series GPUs costs almost nothing compared to the high price of Intel CPUs. It also makes it clear that old-fashioned silicon like Nehalem CPUs have very little role in computation anymore. They are so incredibly slow they mostly get in the way even if all you ask them to do is be the modern equivalent of a keyboard or disk controller.

Intel no doubt will get around to getting competitive sooner or later (only a fool thinks the Empire won't try to strike back) with things like Larrabee, but for now they are very, very far behind NVIDIA.

To get an idea for what can be accomplished for far under $10,000, see http://www.manifold.net/info/pr_gpu_record.shtml where there's 1440 stream processors in use and the Core i7 is used mainly only as a disk controller. The eight hyperthreads in the Core i7 are just loading up the GPUs, where the real comptuation is being done.

As a cost-effective strategy, it doesn't even make sense to buy a Core i7 Extreme. Get a Core i7 920 and take the $750 you save and you have one and a half times the cost of a GTX 295, that is, about 720 stream processors - in other words, over a teraflop of computational power. How many Core i7 Extremes would you have to buy to get a teraflop of computational power?

Buy a Core i7 for around $250, put it into a $250 mobo like the ASrock X58 WS Supercomputer, buy 12 GB of RAM for $130 (for the Corsair 1333Mhz DDR3 triple-channel memory), get a couple of 1 TB disks for a total of around $150 and then buy four GTX 295's for $2000. You've only spent $2780 so far. Spend another $300 for a couple of power supplies (the cheapest way for big rigs, like that at http://estoniadonates.wordpress.com/ and in round numbers you've only spent around $3100 and you have 1920 stream processors and about four and a half teraflops of computational power.

If you really have $10,000 to spend and you don't mind writing the code to cluster out, heck, you could configure three such systems with money left over for a cool rackmount case and some buck to pay your local electrician to add a few extra 20A circuits to your house wiring. That would be 5760 stream processors and over 13 teraflops of computational power.

If you are serious about personal supercomputing, go CUDA, my son. :)
 

RazvanP

Distinguished
Apr 14, 2009
1
0
18,510
SuperCruise, I am also a CUDA enthusiast (bought a 8-series video-card only to see how can CUDA practically work), but keep in mind that what you are talking about is theoretical computational power. Not only his software needs to be ported on CUDA, but also the algorithms have to be re-designed in order to use effectively the large amount of cores. Also you need to understand the CUDA constrains about synchronization, memory access, etc. The incentive of this sustained effort is the tremendous speed improvement to benefit of.
bgood44, what you have right now is a very mobile computer. Its performance is measured in pounds and battery life when idling, it was clearly not designed for extensive computational work. If you purchase a decent price i7 processor, with 4-8 GB of RAM and a regular 1T B hard drive, your will be on a budget less than $800 and you will observe a significant improvement in performance, I would say x5 in computations and x2 in data storage. Later on, when you have very clear in mind what algorithm do you have to implement, how many independent session you have to support, how long is a session supposed to last, you can decide between a large computer with 2 or even 4 processors, a CUDA approach based on NVidia hardware, a cluster of inexpensive one processor nodes or a mixture of these 3.
 

stefanbanev

Distinguished
Jul 7, 2009
5
0
18,510
Well, I have tried using CUDA (9800GTX) for volumetric ray tracing and could barely match single core i7(single Nehalem core 3.0GHz) ray-tracer counterpart. Software ray-tracer I used is definitely a top performer and CUDA version has demonstrated around the same performance as probably the today best GPU volumetric ray tracer - ImageVis3D 1.1.1 (I'm not sure but unlikely they used CUDA). Anyway; from my experience with CUDA it is clear that GPU SIMD architecture makes it effectively usable only for specific SIMD friendly class of algorithms: texture mapping, back-projection, many tasks of linear algebra etc... Once threads run asynchronously speed goes down dramatically; for example volumetric ray-tracer running on dual E5540 outperforms ImageVis3D 1.1.1 on 9800GTX by factor 4..6 for majority rendering scenario relevant for medical applications (CT data). Hysterical marketing BrainWash about GPU as a general computing device is really just to make you invest your time in this GPU "crap" so, rationales like above is understandable. It does not mean that there is no area where GPU is really great but it is definitely not an universal computational devise. Besides new coming GPU (Larabee is one of them) is going to be MIMD so all SIMD limitations and skills to go around its limitations is getting to be irrelevant. By the way, dual E5540 machine is an example of perfect implementation of MIMD architecture; I would just increase number of core by factor 100 to satisfy my appetite.

Stefan
 
G

Guest

Guest
NVidiA Tesla HPC cards do 1 Teraflop of single precision floating point operations per second and 280 GFLOPS of double precision operations per second. They are completely scalable for many Teraflops just buy using additional cards. The cards are about $1399 at Tiger Direct and fit in any PCIe 16 X slot. They do however, require lots of system RAM. They use CUDA and are well suited to fluid dynamics. They are not a video card, strictly a massive number crunching machine.
 

stefanbanev

Distinguished
Jul 7, 2009
5
0
18,510
>NVidiA Tesla HPC cards do 1 Teraflop of single precision floating point operations per second

It is Very True for SIMD computational model. Once algorithm can be formulated as a single instruction stream crunching wide array of number you may have 1 Teraflop from Tesla. If you need to run thousands of totally undependable threads processing undependable data (MIMD model) you will have performance probably slower than i7 can provide.

Stefan