All this and SSE4/recompiling still to come..but no mention?

morttt

Distinguished
Feb 25, 2006
6
0
18,510
Rememebr when P4 came out? Tom's Hardware reviewed it, rane LAME on it, and the results were less than spectacular. Overnight they received a recompiled version of the program (from an Intel engineer, but that aside) and the performance jumped what? 30% or something? it was spectacular.

Now we're at the next step - the next new architecture, and a couple of things come to mind:

How much of today's software is optimized for P4? I know Intel works with ISVs to get their code optimized for P4 - both SSE1/2/3 and the pipeline lengths, etc...They have undoubtedly been working with them in recent months to re-optimize that code to Conroe specifications - but we haven't sen it yet, I'm sure. Little things can make a lot of difference. In the P4 you wanted fewer compare instructions, so loops got unrolled to prevent mispredictions and flushing the long pipeline. Now with conroe, you want (relatively) more compare instructions - especially with their ability to macrofuse cmp/jmp. When we see these changes on programs currently optimized for P4, we will surely see at least a few percent improvement yet.

Conroe has a bunch of new instructions yet to come, SSE4, they will call them. But there is virtually no talk about them on any site. Same thing holds true here as did when SSE3 came out on Prescott and SSE2 on Northwood. Intel put in hardware to give programmers the ability to work with large chunks of vectorized data all at once. if you don't use the instructions, that hardware isn't getting used, or used fully. Exepct to see rediculous gains on some programs, and at least another few percent again overall cause of this. Also expect AMD to follow suit and copy Intel's instruction set yet again in their next processor rev (Note here how everyone loves to call out Intel for copying AMD's 64-bit instruciton set, but they never mention how many times AMD has adopted Intel's various SSE sets, let alone the base IA)

I won't even get into multi-threading, as that has been talked to death.

End point is that Conroe has shown to be just THAT good, and i would just expect it to get better and better as the programmers can take advantage of its full capacity.
 

mesarectifier

Distinguished
Mar 26, 2006
2,257
0
19,780
End point is that Conroe has shown to be just THAT good, and i would just expect it to get better and better as the programmers can take advantage of its full capacity.

...and your point is? Conroe is better than everything else already. If it gets even better, so be it. People are going to buy it on what it performs like now and not what it will perform like in a few months.
 

morttt

Distinguished
Feb 25, 2006
6
0
18,510
MesaRectifier said:
...and your point is? Conroe is better than everything else already. If it gets even better, so be it. People are going to buy it on what it performs like now and not what it will perform like in a few months.

2 Points:

If anybody had the slightest reservation, just expect even more.

Why is there is still no talk of SSE4, even from Intel...or if there is, anyone have a link?
 

ltcommander_data

Distinguished
Dec 16, 2004
997
0
18,980
Yes, the Core 2 results we've seen are impressive considering most of the software is unoptimized.

The reason why we don't see talk about SSE4 even from Intel is because it actually isn't a significant feature. Especially in comparison with all the other microarchitecture features. SSE4 was actually developed for Tejas which was a Netburst Prescott successor. This means that those instructions were designed around a super long pipeline (probably longer than Prescott) as well as other Netburst features like double pumped integer units that Core 2 doesn't have. I think some of the instructions were probably further improvements for HT. This means that SSE4 won't bring much performance improvement to Core 2.

Intel is working on a Core 2 specific instruction set (SSE5?) which will arrive in the 45nm shrink of Conroe in early 2008. That will be more significant, combined with other improvements and higher clock speeds those processors will bring.
 

spud

Distinguished
Feb 17, 2001
3,406
0
20,780
morttt said:
...and your point is? Conroe is better than everything else already. If it gets even better, so be it. People are going to buy it on what it performs like now and not what it will perform like in a few months.

2 Points:

If anybody had the slightest reservation, just expect even more.

Why is there is still no talk of SSE4, even from Intel...or if there is, anyone have a link?

It isn't something that will raise the bar by a large amount, it will help but technically SSE3 and SSE4 are revisions to the SSE instruction set, more or less algorithms to deal with complex arithmetic (SSE generally isn't all that complicated math wise), allowing for much more robust math to be vectorized.

This little program shows off the efficiencies attained from normal C code x87 code and SSE code (allows enhancements if your processor supports SSE3 though), it works with single data sets and double data sets. Just something to play with if you are genuinely interested in code performance using different instruction sets.
 

spud

Distinguished
Feb 17, 2001
3,406
0
20,780
The actually is an SSE4 set of instructions...

http://www.dailytech.com/article.aspx?newsid=788

Can actually help up to 25% or so in a very very small limited set of situations (mainly streaming video type data flows)

Not a big deal, but there none the less.. :)

I never said they weren't instructions, but they are algorithms none the less. But like you said small limited situations is were complex arithmetic comes into play code wise.