Quote:
Hello Particleman, FDTD-
I am in the very early planning stages of a build that sounds like you what two have constructed. My aim is ultimately a four-way Tesla Fermi crunching machine for "extra-curricular" projects where I can't justify using my workplace's real supercomputer. I will be able to find a use for however much power I can cram into one box, but I don't have a lot of experience in high-performance computing from the hardware end. Therefore, my concerns are where I need to spring for server-grade components and if I am bottlenecking the system somewhere or overkilling in one area. I have a lot of questions on the subject to which I haven't been able to find adequate answers, and I realize you probably don't have all of them, either.
I was looking at building from the EVGA Classified SR-2 (1), since it's the only board I've seen that seems to have the full 64+ PCI Express lanes needed for the Teslas as well as SATA 6 GB. If I mod the Teslas to fit in a single slot, would the in-between slots be able to use the four remaining PCIe lanes on each 5520 chipset?
Or should I take a server-grade board and just use SATA 3 GB drives? I haven't been able to find any news on when the 6 GB will appear in that niche. The EVGA also has dual Gigabit and USB 3.0, both of which are nice future-proofing. With that as a foundation, do I need to populate both Xeon sockets to be able properly to feed the Teslas? Any advice on choosing between the different 5600s? Is ECC memory necessary for that kind of workload? I was planning on using two of Crucial's new SATA 6GB SSDs (2) in RAID 0, but is RAID 1 more important? There are only two SATA 6GB ports, so higher-order RAIDs are not an option.
Finally, with so much invested in the hardware, I am considering OCing and possibly WCing. Will the 20-series Teslas be compatible with, e.g. the Swiftech waterblocks for the related GeForce cards? (3) Or should I stick to stock speeds for equipment longevity and stability? Thanks.
Andrew
(1)
http://www.evga.com/articles/00537/
(2)
http://www.newegg.com/Product/Product.aspx?Item=N82E168...
(3)
http://hothardware.com/News/Wet-and-Wild-EVGA-Releases-...
Hi Andrew,
Insofar as motherboards – I’ve been evaluating the EVGA Classified SR-2 as well. For this board, you’re looking at 2 XEON chips costing around $1780 apiece. (
http://www.8anet.com/ShowProduct.aspx?pid=7887).
So, you’ve got $3500 invested in your CPU, having 2 x 6 physical cores. I’m a little concerned about maintaining parity with a 2-chip system, when overclocked. EVGA indicates that the board has great overclocking features, I’ve just not seen anybody who’s actually done this, yet – at least for the type of 24 x 7 processing that I need to maintain.
The Tesla cards run really hot – when they run. I’m concerned about being able to keep enough air flow through my system to keep the cards running. I received my C1060 card in December, which immediately died. I RMA’d the card, and only today (nearly 5 months later) received a replacement. I’ve not installed the card, yet. I ordered the new Fermi C2050, and am (supposedly) one of the first in line to get the card. I’ve not even gotten a serious estimate from the supplier as to when they’ll ship the unit(s). The C2050 was supposed to be available early Q2 …. Well, that’s come and gone – and there’s still no word.
As an alternative, I’m considering just using a single CPU (the I7 980), with either the new ASUS Rampage III motherboard or the supercomputer motherboard. With this setup, I’d have 4 SLI slots for Fermi / Tesla cards. For myself, I use intense graphics with 2 30-inch Dell monitors. I need at least one slot with one of the new Fermi graphics cards (water cooled). For the remaining 3 SLI slots, I can use for the 3x C2050’s. Here, I’ll have a single cpu / 6 physical core processor that I’ll be more confident about overclocking.
For the Tesla (C1060) and Fermi (C2050) cards – NVIDIA firmly states that the cards run ‘cool enough’, so that they don’t require any cooling modifications. My experience thus far does not support their contention. If you’re running 3 or 4 Tesla’s or C2050’s, you’re sure to have heating problems.
For SATA 6 GB availability, your options for existing ports are limited. You can purchase a 6GB card for your ASUS board as an add-on that will work. What I’ve done to make use of 3GB SSD’s is this – I keep one 6GB xfer, 500 GB storage SSD loaded with my OS, and primary software. That SSD is placed in front of my RAID 10 system, populated by 4 x 500 GB Raptor units. I’m reticent to use 4 x SSD’s in a RAID system due to cost, and possible problems with dependability at this time.
For your Tesla’s and the C2050’s or above, you’ll need 4GB RAM / unit. You will need and use this memory. I use Mushkin Blackline 2 x (3 X 4 GB) kits. Mushkin doesn’t have any of the redline kits for the 4GB RAM. The Mushkin RAM is rock solid, and can be tuned to accommodate your Tesla / Fermi system. Depending upon your applications, you have a strong possibility of pounding your RAM.
I use a hybrid cooling system. The C1060’s are not amenable to water cooling (NVIDIA will void your warranty – given my sketchy experience with the cards, this probably isn’t a good idea). I water cool my CPU, motherboard, and video card. For my new system, I’ll be using phase-change system (I won’t mention the name, but will have a person who builds a lot of these single / double phase change cooling systems – e-mail me and I will send you his contact information), to cool my CPU. I’ve had some issues with overheating the motherboard, and will be using a water-cooled system for that. Water cooling seems to work just fine for the video card.
Stay in touch regarding your build, I'm interested in your results.
Particleman529