I'll be more specific about the software routine if your interested, but it pertains little to the hardware question I stated in the first post. The data structure will be a hash that transforms a 64 bit integer into another 64 bit integer, and there will be no collisions; this is referred to as a perfect hash, but it has to be recompiled if additional output values added added to the structure. Traditionally a hash's output is taken as a pointer and is used as a substitute to slower indexing methods. I won't be using the hash algorithm for indexing purposes. The output will be interpreted as data instead of a pointer to the address of data. This 64 bit output value will be broken into 8 segments 8 bits wide, and will be passed into a parser that modifies a generic hash algorithm based upon what each of those 8 values are. The generic hash's operators will not be modified, but the operands will be. Its unlikely that I would need 256 different operands, or that I could develop a sufficiently complex algorithm in 8 phases, so I'm not certain about the number of segments I will have to break the 64 bit output into. For simplicity I'll assume that Ill need 16 phases, which leaves a half-byte to each operand if the operands have parallel significance to the parser. That would mean 16 different possible operands, and each can be executed in succession though it isn't necessary for each operand to be different. The operands are determined by the output of the perfect hash I described above. In practice this would be a hash of algorithms the logic of which is offset from a generic algorithm. I use the final resultant algorithm as a hash, so that the final output is a 64 bit value. This value is interpreted a both a pointer to data and an input that is returned into the entire data structure. This creates a cyclic effect to an extent, but it wouldn't be wise to compile an infinite loop. The pointer to data actually points to a simple procedure that takes the place of an single low level instruction along with minor overhead, executable by the CPU. Because the graphics cards are SIMD (single instruction multiple data) the generic format of the algorithm that the perfect hash modifies can be used on every stream processor en parallel. The instructions can be handled by the CPU if the load is distributed evenly over each clock cycle. Its unlikely that I could develop the hash phasing required to do that all by myself especially with the other complications, but it is worth a try. I'm considering that the CPU will be a limiting factor, and memory latency times will be even worse. Perhaps instead of the final output being a pointer to a simple procedure it could be both a 64 bit return value (as described above) AND either the left or right 32 bit portion be a 32bit instruction! The AMD processors support simultaneous execution of both 64 bit and 32 bit code. This explanation has little to do with the cooling question that I originally asked, but I hope it seems interesting to you.