I don't intend to connect these cards to a display, but I will run them at top load for extended periods of time. Many ATX motherboards do not support the required amount of 2.0 PCIe slots for a 2xquadfire. I'm aware of only one ATX mobo that has four 2.0 PCIe slots, and they are rated 16x/16x/8x/1x, which means a drastic decrease in the rate of performance gain in bandwidth heavy applications per added card. I'm still willing to purchase these cards because I need to take advantage of cheap heavy parallel processing, and bandwidth between card and motherboard isn't as critical. My case will be a Xaiser from thermaltake, and I was wondering if the fans included in that case would be enough, or if I need to invest in some additional cooling. I will not be overclocking my AMD phenom 2 955, or the graphics cards. I will only buy ATI graphics cards for parallel processing, because the architecture is familiar to me, and i don't feel like coding another assembler. I don't use the high level environments, so CUDU isn't a mitigating afterthought. I haven't name what GPU from ATI I intend to use, but the individual cards must have Dual GPUs if I expect to make an dual quadfire arrangement. I'm aware that I can't set up an octofire without modifications to firmware and hardware, so I will use a soft approach.
You will not be able to support octofire without creating your own drivers. Perhaps your own BIOS for MB and GPU as well. I am rather certain, regardless of how much of a programming guru you may be, that this will never ever work... Beyond that.. what CPU could you possibly get that could run 8GPU's without choking itself to ash? 4 barely works as it is...
I can sort of understand what you are trying to do.. but it would take you hundreds of man hours to even get close to working it... Why not just by a server with a dozen opterons?
You will not save yourself any money going this route.. in fact it would probably cost more in the long run when you end up discovering that what you want to do is impossible without AMD's developement team helping you for months..
You will not be able to support octofire without creating your own drivers. Perhaps your own BIOS for MB and GPU as well. I am rather certain, regardless of how much of a programming
guru you may be, that this will never ever work... Beyond that.. what CPU could you possibly get that could run 8GPU's without choking itself to ash? 4 barely works as it is...
I don't have my own development team. Your assumption was partially correct; I intend to run a server wholly compressed in ram, and support my own server side search engine. I am trying to pioneer a new web interface that is somewhat like a forum but easier to navigate and will be search oriented. It will be a advantageous to support this on a single motherboards without the excessive overhead of maintaining expensive server equipment. I don't need a dozen hard drives, or a dozen CPUs. The choking problem you mentioned can be circumvented with proper coding. The degradation in the rate at which performance increases when adding multiple GPU cards beyond what would result from hardware limitations only signifies inefficient coding that is typical for programs that must port to other hardware configurations easily, even after recompiling it. I'm not a guru; but I am certain that if I spend personal effort that I can integrate these cards the way I indicated. I will not be using third party code or software; I have my own assembler. I won't need excessive libraries of functions that I will never use, or high level abstractions of something that can easily be done with faster and (most of the time) fewer operations low level, that compilers and code optimizers seem to miss. There was a time when servers could run in less than 20kb of ram, and I'm desperate enough to pursue this as much more than a hobby. You didn't answer my first question though. I apologize if my attitude seems impatient or imperative. I don't converse with others often.
I'll be more specific about the software routine if your interested, but it pertains little to the hardware question I stated in the first post. The data structure will be a hash that transforms a 64 bit integer into another 64 bit integer, and there will be no collisions; this is referred to as a perfect hash, but it has to be recompiled if additional output values added added to the structure. Traditionally a hash's output is taken as a pointer and is used as a substitute to slower indexing methods. I won't be using the hash algorithm for indexing purposes. The output will be interpreted as data instead of a pointer to the address of data. This 64 bit output value will be broken into 8 segments 8 bits wide, and will be passed into a parser that modifies a generic hash algorithm based upon what each of those 8 values are. The generic hash's operators will not be modified, but the operands will be. Its unlikely that I would need 256 different operands, or that I could develop a sufficiently complex algorithm in 8 phases, so I'm not certain about the number of segments I will have to break the 64 bit output into. For simplicity I'll assume that Ill need 16 phases, which leaves a half-byte to each operand if the operands have parallel significance to the parser. That would mean 16 different possible operands, and each can be executed in succession though it isn't necessary for each operand to be different. The operands are determined by the output of the perfect hash I described above. In practice this would be a hash of algorithms the logic of which is offset from a generic algorithm. I use the final resultant algorithm as a hash, so that the final output is a 64 bit value. This value is interpreted a both a pointer to data and an input that is returned into the entire data structure. This creates a cyclic effect to an extent, but it wouldn't be wise to compile an infinite loop. The pointer to data actually points to a simple procedure that takes the place of an single low level instruction along with minor overhead, executable by the CPU. Because the graphics cards are SIMD (single instruction multiple data) the generic format of the algorithm that the perfect hash modifies can be used on every stream processor en parallel. The instructions can be handled by the CPU if the load is distributed evenly over each clock cycle. Its unlikely that I could develop the hash phasing required to do that all by myself especially with the other complications, but it is worth a try. I'm considering that the CPU will be a limiting factor, and memory latency times will be even worse. Perhaps instead of the final output being a pointer to a simple procedure it could be both a 64 bit return value (as described above) AND either the left or right 32 bit portion be a 32bit instruction! The AMD processors support simultaneous execution of both 64 bit and 32 bit code. This explanation has little to do with the cooling question that I originally asked, but I hope it seems interesting to you.
Cooling is easy enough, if you get a large enough MB.
The ASUS 'super computer' MB (The one designed for multiple nvidia GPU's for CUDA applications, not an actual super computer) is the only one I know of that for sure has a PCIe slot every second slot. Most that have 4 put a couple right next to each other. Provided the cards are spaced every other slot you can get away with stock cooling. (If you have a well ventilated case.. but I'm sure you will). If the cards have to fit into a single slot you will either ahve to have water cooling, or be out of luck.
Picking the right MB will be key, as having enough room will be hard with 4 cards as massive as the ones you are looking into.
As for power. You will need 2 PSU's, obviously..
Getting this to "work" won't be hard. It is simple enough to install 4 4870x2's and have the all run. Getting it to do what you want will be another thing entirely. Will be interesting though.
Thanks for sharing your project with us. Good luck, even though it may seem slightly nuts .
You will notice the fourth one is right next to the third one. Fitting 3 with stock cooling would be 'easy.' A fourth would not fit though. I'm not aware of any AMB MB that would actually fit 4 double slot cards. As I stated before, this is the only one I know of that would allow it:
Which brings you to water cooling.. perhaps the only option, at least for the bottom two cards. But even a water cooled 4870x2 might take up a bit more than a single slot as the coupling is always pretty sizable. Creative piping would be required.
Thank you for the kind response daedalus685! It will probably take a year of development, and by then the graphics cards I use will be outdated, but I promise to share it when (if) I complete it. My dream is to support a perpetual compression on a desktop machine so that it can store such a vast amount of data that a server loaded with even hundreds of hard drives would strain to compete with the simplicity and speed offered by this solution (I hope that didn't sound like a commercial)
I have these options ordered from least effective to most effective: fans, passive heatsinks, active heatsinks, peltier (TEC) cooling, water cooling, submersion in a dieclectric fluid, phase change (refrigeration), liquid nitrogen, liquid helium, or complete depressurization with protection from thermal emission.
I know you said you prefer amd/ati, but intel has the supercomputer platform that is made exactly for what you are trying to do. There are a couple of readily available x58 supercomputer MBs, and of course the graphics cards you need are already setup for this and would need no alteration.
Forget all that compaining! I found my dream motherboard! ASUS P6T7 Supercomputer with two nFORCE 200 Chips! It can support both 3-way sli and 4-way crossfire, but I wonder if it can do both at the same time. According to teh specifications it has 7 2.0 PCIe 16x slots! (not double spaced though) Perhaps I could circumvent the driver trouble of running multi-GPU solutions above the officially supported 4 (for ati) and 3 (for nvidia) by doing both at the same time!
The tesla (supercomputer) platform allows for one controller graphic card, and 3 additional graphics cards for computation. If you get the supercomputer platform you might as well stick to what it was designed for....if you need some links and are not familiar with the tesla supercomputer platform let me know, and I'll try to find something that can explain it better than I can.
Well I am glad to get such a good response, hopefully I won't get negative feedback this next post, but it would be dishonest to put you under a fase state of pretense. I am developing the ideas in this thread, but its unlikely that I will be able to implement them soon. I'm unable to invest in these ideas for personal reasons, and I'm not interested in money donations. ANYTHING that you could post to help me develop the ideas here would be helpful. I will not be able to start saving money for this project until I find employment, but please don't let that discourage you from contributing! I really do intend to follow through.