Hi Y'all,
I write scientific programs that use recursive addition of large arrays.
Literally take a 421x421x421 array, add it to another 421x421x421 array and save it to memory and repeat. Of course all with double floating point precision.
Obviously, the main memory bandwidth is my bottleneck. The number of cores doesn’t directly help because each core would be pulling more data through the front side bus.
I would love a $3000 double precision CUDA card, but that isn’t going to happen.
Now, an I7 with the fastest DDR3 memory would work, but I want to build a small army of these machines and can’t afford the high unit cost, plus most of the cores wouldn’t be used.
I been eyeing low cost 3.3 GHZ dual core AMD processors with 2.6 hyperlink front side bus and 16 gigabytes of 1066 DD3 memory but have ran into some questions.
First thing, is it my memory bandwidth or my front side bus that is my bottleneck? I can easily write my code for more cores if the main memory subsystems can take the stress.
Clearly, L2 and L3 caches mean very little when adding large arrays, so how cheap a processor can I go with? Will all modern processors add and save main memory at bus speed?
In other words, if I have the AMD 2.6 hyperlink front side, what is the minimum processor speed that can add memory arrays at DDR3 speeds?
What’s the lowest cost memory I can go with? For example my DD2 800 memory has a CAS access time of 5 and the DD3 1066 memory I have been looking has a CAS of 9.
Because of the different CAS’s, I’m not sure which memory has more bandwidth.
In other words, I don’t care about the fastest access times; I care about memory bandwidth per dollar. I don't care about processor speeds, as long as it can add and save memory at bus speeds.
So, with these design requirements what build would you use to build a small army of these machines?
AMD or Intel is ok, I assumed AMD is the better deal.
Thanks,
Michael
I write scientific programs that use recursive addition of large arrays.
Literally take a 421x421x421 array, add it to another 421x421x421 array and save it to memory and repeat. Of course all with double floating point precision.
Obviously, the main memory bandwidth is my bottleneck. The number of cores doesn’t directly help because each core would be pulling more data through the front side bus.
I would love a $3000 double precision CUDA card, but that isn’t going to happen.
Now, an I7 with the fastest DDR3 memory would work, but I want to build a small army of these machines and can’t afford the high unit cost, plus most of the cores wouldn’t be used.
I been eyeing low cost 3.3 GHZ dual core AMD processors with 2.6 hyperlink front side bus and 16 gigabytes of 1066 DD3 memory but have ran into some questions.
First thing, is it my memory bandwidth or my front side bus that is my bottleneck? I can easily write my code for more cores if the main memory subsystems can take the stress.
Clearly, L2 and L3 caches mean very little when adding large arrays, so how cheap a processor can I go with? Will all modern processors add and save main memory at bus speed?
In other words, if I have the AMD 2.6 hyperlink front side, what is the minimum processor speed that can add memory arrays at DDR3 speeds?
What’s the lowest cost memory I can go with? For example my DD2 800 memory has a CAS access time of 5 and the DD3 1066 memory I have been looking has a CAS of 9.
Because of the different CAS’s, I’m not sure which memory has more bandwidth.
In other words, I don’t care about the fastest access times; I care about memory bandwidth per dollar. I don't care about processor speeds, as long as it can add and save memory at bus speeds.
So, with these design requirements what build would you use to build a small army of these machines?
AMD or Intel is ok, I assumed AMD is the better deal.
Thanks,
Michael