Rememebr when P4 came out? Tom's Hardware reviewed it, rane LAME on it, and the results were less than spectacular. Overnight they received a recompiled version of the program (from an Intel engineer, but that aside) and the performance jumped what? 30% or something? it was spectacular.
Now we're at the next step - the next new architecture, and a couple of things come to mind:
How much of today's software is optimized for P4? I know Intel works with ISVs to get their code optimized for P4 - both SSE1/2/3 and the pipeline lengths, etc...They have undoubtedly been working with them in recent months to re-optimize that code to Conroe specifications - but we haven't sen it yet, I'm sure. Little things can make a lot of difference. In the P4 you wanted fewer compare instructions, so loops got unrolled to prevent mispredictions and flushing the long pipeline. Now with conroe, you want (relatively) more compare instructions - especially with their ability to macrofuse cmp/jmp. When we see these changes on programs currently optimized for P4, we will surely see at least a few percent improvement yet.
Conroe has a bunch of new instructions yet to come, SSE4, they will call them. But there is virtually no talk about them on any site. Same thing holds true here as did when SSE3 came out on Prescott and SSE2 on Northwood. Intel put in hardware to give programmers the ability to work with large chunks of vectorized data all at once. if you don't use the instructions, that hardware isn't getting used, or used fully. Exepct to see rediculous gains on some programs, and at least another few percent again overall cause of this. Also expect AMD to follow suit and copy Intel's instruction set yet again in their next processor rev (Note here how everyone loves to call out Intel for copying AMD's 64-bit instruciton set, but they never mention how many times AMD has adopted Intel's various SSE sets, let alone the base IA)
I won't even get into multi-threading, as that has been talked to death.
End point is that Conroe has shown to be just THAT good, and i would just expect it to get better and better as the programmers can take advantage of its full capacity.
Now we're at the next step - the next new architecture, and a couple of things come to mind:
How much of today's software is optimized for P4? I know Intel works with ISVs to get their code optimized for P4 - both SSE1/2/3 and the pipeline lengths, etc...They have undoubtedly been working with them in recent months to re-optimize that code to Conroe specifications - but we haven't sen it yet, I'm sure. Little things can make a lot of difference. In the P4 you wanted fewer compare instructions, so loops got unrolled to prevent mispredictions and flushing the long pipeline. Now with conroe, you want (relatively) more compare instructions - especially with their ability to macrofuse cmp/jmp. When we see these changes on programs currently optimized for P4, we will surely see at least a few percent improvement yet.
Conroe has a bunch of new instructions yet to come, SSE4, they will call them. But there is virtually no talk about them on any site. Same thing holds true here as did when SSE3 came out on Prescott and SSE2 on Northwood. Intel put in hardware to give programmers the ability to work with large chunks of vectorized data all at once. if you don't use the instructions, that hardware isn't getting used, or used fully. Exepct to see rediculous gains on some programs, and at least another few percent again overall cause of this. Also expect AMD to follow suit and copy Intel's instruction set yet again in their next processor rev (Note here how everyone loves to call out Intel for copying AMD's 64-bit instruciton set, but they never mention how many times AMD has adopted Intel's various SSE sets, let alone the base IA)
I won't even get into multi-threading, as that has been talked to death.
End point is that Conroe has shown to be just THAT good, and i would just expect it to get better and better as the programmers can take advantage of its full capacity.