"SIMD" is an acronym for "Single Instruction, Multiple Data."
An example of where this might help is when you have two packed arrays of 16-bit integers. An x86 integer data register can hold a 32-bit integer, and some fancy bitwork can fit two 16-bit integers into one register. When you want to add two 16-bit integers, though, the only way to do that would be to put one in each 32-bit register (or one in a 32-bit register, and one in memory) and do the addition.
It would be nice to be able to fit two 16-bit integers in each register and then perform a single instruction to do two 16-bit adds at once. When you have the two packed arrays of 16-bit integers mentioned above, such an instruction arrangement would be very nice. But with traditional x86 registers, it can't happen that way--you have to do an addition instruction for every pair of corresponding elements from each array. That's "Single Instruction, Single Data."
MMX was actually Intel's first SIMD extension. It provided extra registers and instructions to be able to do two (or more) add operations in one instruction. In the above example (two packed arrays of 16-bit integers), MMX extensions could do four additions with one instruction. Instead of adding one integer to one integer, the instruction adds four integers to four integers. Thus why it's called "multiple data" instead of "single data".
MMX was nice, but one of the problems with it was that it didn't apply to floating-point (real) numbers--only to integers (numbers with no decimal point). AMD created 3Dnow! to cover that shortcoming and compete with Intel a bit better; SSE was created by Intel as a "me too" answer to 3Dnow! Both extend the concept of SIMD to floating-point numbers. SSE2 is based on SSE and uses the exact same register set, but it adds quite a few more instructions.
"/join #hackerz. See the Web. DoS interesting people."
SSE stands for Streaming SIMD Extensions. SIMD means: Single Instructions, Multiple Data. I am not 100% sure about this, but this is what I think it is... When a program is written and compiled with SSE optimizations, it is done so in the way that when the CPU performs an operation, it returns the value for multiple sets of data in that program. Because it does that, it therefore does not have to work as hard and therefore runs the program faster. Again, I am not 100% sure about this... Anyone know for sure?
Oh wow, Kelledin and I posted our responses at the exact same time... His answers a bit more detailed, so go off that.
"Trying is the first step towards failure."<P ID="edit"><FONT SIZE=-1><EM>Edited by ksoth on 06/29/01 10:06 PM.</EM></FONT></P>