I work as a Computer Scientist at an academic image processing department. In my research, I regularly work with nvidia Cuda (http://www.nvidia.com/object/cuda_home.html) for high performance computing on the graphics card (GPGPU). These are general computations that have nothing to do with rendering 3D scenes. For GPGPU, SLI is not required. Therefore, it seems possible to me to put four 9800GX2 cards on a single motherboard, for example the MSI K9A2 Platinum V2 (http://global.msi.com.tw/index.php [...] incat_no=1) that has four (physical) PCI-Express X16 slots with double spacing between the slots.
When using CUDA, SLI is not used and each 9800GX2 is "seen" by the system as two separate GPUs. Therefore I am hoping to get access to eight separate GPUs by using four 9800GX2 cards. My computations can be distributed over these eight GPUs without requiring communication between the different cards (or between the two GPUs on each card).
So far, I have not found any information from others who have tried this before. Cooling and power are obvious problems that can probably be dealt with. Any ideas on problems that I can expect on the software side? For example, will the (non-SLI) driver see the four 9800GX2 cards as eight independent GPUs? Or is there a maximum limit built into the current WinXP drivers for the non-SLI mode?
Interesting questions - I cannot help here and I'm not sure this is the right forum to ask (see previous posts :-/) , try talkig to NVidia
Questions this forum might be able to help with are:
Do the non-SLI drivers 'see' a SINGLE 9800GX as two independent GPU's?
and
Anyone out there know of a MB that will actually TAKE four 16x PCIe cards?
Could you by any chance install 3DMark06 and Crysis on that machine.
Yeah, I am curious about how the performance would be. But, since he isn't going to use SLI, it won't really be any different than having a single card, unless he ran the game across multiple monitors.
I believe the system will see all 8 GPUs, it will see 2 for every card put in. I have not tested this but I am a computer repair technician and work with computer hardware a lot. From my knowledge of how they work it will see all 8 GPUs and you can assign the driver for each of them. They will not function in unison because of no SLI support but they will be operable. So what you plan to do should, in theory, work just fine.
Oh, and yes i would reccommend a 790FX board like the MSI one you posted. It doesnt matter that it is AMD and only does CrossfireX because you are not looking for Crossfire or SLI
Message edited by xpyrofuryx on 03-19-2008 at 07:26:14 PM
Thanks a lot for all your inputs! I have in fact contacted NVidia about this, but as I did not yet receive a reply and I am quite anxious to start ordering components , I decided to post the question here too.
I work in Belgium, where the most powerful supercomputer is capable of 2TFlops. If this works, the single box should have a peak performance beating that (NVidia claims ~500 GFlops for a single 8800GTX). I know this is just true for very specific applications, but still it would be quite remarkable.
About running 3DMark: the 790FX board is not capable of SLI, so it doesn't make much sense to benchmark the system on 3D capabilities. Has anyone ever tried the MSI K9A2 Platinum board with four graphics cards at all (for example, four ATI cards in cross-fire setup)?
You will have one limitation IF it works. The 9800GTX shares 1 PCI slot, even on a PCI-2 board, it will be limited to PCI-1 speeds, as well as 8x per card/slot
Message edited by jaydeejohn on 03-20-2008 at 01:06:22 AM
---------------
Every artist is a cannibal,every poet is a thief,they all kill their inspiration then sing about their grief
There's no reason why M$ shouldn't see all 8 VPU, the only issue is whether or not nVidia has opened up Cuda to recognize them.
Support for multi-VPU GPGPU work far preceeded their gaming counterparts with both IHV's getting them to work prior to any gaming support.
If nV doesn't get you a quick answer then get in touch with some of the large software devs like Rapidmind or Acceleware, or even the competition PeakStream.
Not sure if the Tesla experiences would apply as eaily because of it's more focused purpose, but they are into multi-rack rigs there.
Probably a good place to ask would be the GPGPU.org forums, I check there every once in a while for F@H updates.
--------------- You need a license to buy a gun, but they'll sell anyone a stamp (or internet account) - REDGREEN. GA to SK HD Freedom: 45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
kjoost, I see a promblem with your setup. As I remember CUDA needs at least one processor core per GPU core, at least to work effectively. The MSI motherboard only takes one quad core CPU, so you shoukd consider something Intel Skulltrail or some other dual Xeon motherboards.
jaydeejohn: You are absolutely right about the bandwidth. For my applications (tomographic image reconstruction), bandwidth from CPU to GPU is not the bottleneck, but in general this will be a problem.
ahu1: I am aware of that problem, but I was under the impression that this was only an issue for early versions of CUDA, for which the CPU thread blocked on execution of a kernel on the GPU. In any case, I will check it out first with nvidia.
The one-core-per-GPU requirement is not a hard constraint, more like advice from NVIDIA.
It seems that noone can guarantee that it won't work, but also noone has ever tried. MSI say that their board should work, "in theory", with 8 GPUs, but they don't want to give any guarantees either...
Anyway, I have managed to get my boss behind this effort and we will now start ordering some components. I will post pictures as soon as anything is running (it may take a while, though)
The one-core-per-GPU requirement is not a hard constraint, more like advice from NVIDIA.
It seems that noone can guarantee that it won't work, but also noone has ever tried. MSI say that their board should work, "in theory", with 8 GPUs, but they don't want to give any guarantees either...
Anyway, I have managed to get my boss behind this effort and we will now start ordering some components. I will post pictures as soon as anything is running (it may take a while, though)
heh i wouldnt trust MSI for anything important
intel skulltrail will give you reliability and full pcie bandwidth, and more memory bandwidth for that matter - the better way to go.
I agree that Skulltrail would be a better choice, but... Do you know of any Skulltrail motherboard that has double slot spacing between all four PCI-Express X16 slots?
I guess there's only Skulltrail motherboard and one of the PCIe slots is single spaced. But there are also some dual-socket Opteron motherboards eg. Tyan with four PCIe x16 slots, so check other options also.
About the one-core-per-GPU requirement: AFAIK there are some latency issues with shorter CUDA kernels if you don't have enough CPU cores.
Anyway, keep us informed... Interesting project you have there
Just a quick update: I have ordered several components, including the case (Lian Li PC-P80) and PSU (Thermaltake Toughpower 1500W). The ETA for the PSU is yet unclear, so I hope it will work. I will keep you posted
--------------- You need a license to buy a gun, but they'll sell anyone a stamp (or internet account) - REDGREEN. GA to SK HD Freedom: 45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2