Intel four-cores equal two plus two! Already!!

GiDDY_SOUL

Distinguished
Apr 11, 2006
80
0
18,630
030706intel_quad500x387.jpg


Beyond Godlike? 8O

Click here If want to see Intels Quad Core!! Some1 already has IT! (An engneering sample)

n thanks to coolaler.
 

cryogenic

Distinguished
Jul 10, 2006
449
1
18,780
Will 4x4 b able to beat it?
This is an interesting question if you think about it for a sec. To answer this first one shoud answer this question : "Will Intel's quad core be FSB bottelnecked or will they manage to squize something out of the old FSB yet again ?"

The memory presure of four cores accessing the memory through one point only might be a little to much this time. On the ohther side AMD already has better memory tranfer rates usiong AMD2 than Intel. Using two processors with two memory HT links and on die memory controlers 4x4 will see an inpromvement in memory bandwidth tha's for sure. But higher badwidth doesn't guarantee higher performance especialy on high memory latency. To have higher performance with using high latency memory you either need a dam good prefech mecanism that doesn't miss or a rather large cache to hide the latency.

As you see I havent actualy aswered your question, but what I think it will happen is that performance will be higly benchmark sensitive in the future.
 

GiDDY_SOUL

Distinguished
Apr 11, 2006
80
0
18,630
Are the current Core 2 FSB bottlenecked?

And AMDs processors doesnt use extra bandwidth of DDR2?
So 4x4, as it has 2 IMC wont use the extra bandwidth!
I think AMDs Quad core will use that bandwidth.

I heard Intel is going to use 2 FSB for thir Quad core.. Is it true?



And wat about the K8L?
 

cryogenic

Distinguished
Jul 10, 2006
449
1
18,780
Will 4x4 b able to beat it?
This is an interesting question if you think about it for a sec. To answer this first one shoud answer this question : "Will Intel's quad core be FSB bottelnecked or will they manage to squize something out of the old FSB yet again ?"

The memory presure of four cores accessing the memory through one point only might be a little to much this time. On the ohther side AMD already has better memory tranfer rates usiong AMD2 than Intel. Using two processors with two memory HT links and on die memory controlers 4x4 will see an inpromvement in memory bandwidth tha's for sure. But higher badwidth doesn't guarantee higher performance especialy on high memory latency. To have higher performance with using high latency memory you either need a dam good prefech mecanism that doesn't miss or a rather large cache to hide the latency.

As you see I havent actualy aswered your question, but what I think it will happen is that performance will be higly benchmark sensitive in the future.

everyone mentions the FSB is getting old for intel and they dont really understand the real bottle neck. AMD uses the FSB also. go read about it and stop posting this stuff. infact now that you can see intels bench scores using their fsb
http://www.xtremesystems.org/forums/showthread.php?p=1527913

how does this compare to amd's 4x4 benchmarks? please explain these bench mark scores to me and how the fsb bottlenecked these scores.
if you really look at how amd and intel use the FSB you will see they hit the same exact bottleneck but just have a different path to get to the bottleneck

FSB DOES offer LOWER memory thoughoutput comapred to AMD memory aceess infrastructure. The problem is that current AMD processors can't hide DDR2 latency (lack of and advanced memory prefetc + rearanged access pattern) and also can't chew up all that data because of lower pipeline parallelism comapred to Conroe. So basicaly AMD right now can't take full advantage of an superior architecture and it's real world performance is below FSB's performance because of processor architecture only. At these speeds and high cost of non sequencial memory access you realy don't want any cache misses, and this is conroe's strong part for DDR2 memory. These are the problems for wich the K8 architecture wasn't projected. All those issues will be solved using K8L architecture, and more.

I do understand that Conroe is an amazing architecture to be able to squize every bit of performance using an extended array of tricks and techologies.
All of this comes from Intel's early adoption of DDR2 wich gave them some extra R&D time to solve all of the problems related to high bandwidth and also high latency memory.[/code]
 
Probably get flamed for the following comments, but articles like this are fanboy fodder...I mix articles like this in my compost pile to fertilize my lawn.

I've said it before and I'll say it again; meh, who cares. As impressive as it may be, Intel is following the same implementation with quad core as they did with the dual core pentiumD...with this model, they could cram 4 dual cores, 8 dual cores, or even 16 dual cores onto a die...not gonna make apps load or run any faster or get more fps out of it...looks pretty though when you open Task Manager...
 

MatTheMurdera

Distinguished
Mar 19, 2006
366
0
18,780
Will 4x4 b able to beat it?
This is an interesting question if you think about it for a sec. To answer this first one shoud answer this question : "Will Intel's quad core be FSB bottelnecked or will they manage to squize something out of the old FSB yet again ?"

The memory presure of four cores accessing the memory through one point only might be a little to much this time. On the ohther side AMD already has better memory tranfer rates usiong AMD2 than Intel. Using two processors with two memory HT links and on die memory controlers 4x4 will see an inpromvement in memory bandwidth tha's for sure. But higher badwidth doesn't guarantee higher performance especialy on high memory latency. To have higher performance with using high latency memory you either need a dam good prefech mecanism that doesn't miss or a rather large cache to hide the latency.

As you see I havent actualy aswered your question, but what I think it will happen is that performance will be higly benchmark sensitive in the future.

Memory bandwidth doesnt realy matter all that much with desktop stuff, but the latency of HTT helps. FSB is old but so is AGP, but if you compare the same card class with PCI express theres not much of a difference. Im not saying FSB should stay, just that maybe its not so much of a bottleneck as is believed. However, with 4 cores it may be cuting it very close. The new unified cache helps alot from benchmarks, but with kensfield its more like 4 cores across 2 caches. In the end all that matters is results, and without those no questions can be answered.
 

MatTheMurdera

Distinguished
Mar 19, 2006
366
0
18,780
Are the current Core 2 FSB bottlenecked?

And AMDs processors doesnt use extra bandwidth of DDR2?
So 4x4, as it has 2 IMC wont use the extra bandwidth!
I think AMDs Quad core will use that bandwidth.

I heard Intel is going to use 2 FSB for thir Quad core.. Is it true?



And wat about the K8L?

From benchies current Core 2s are NOT FSB bottleneck, and its not that AMD doesnt use the bandwidth, its that almost no programs do. Yes they will use 2 FSBs and as a matter of fact i think they already have some out.
 

MatTheMurdera

Distinguished
Mar 19, 2006
366
0
18,780
Any word on the price tag of this big boy?

Sorry should have been more sepecific, they have 2 FSBs for servers. As for prices of Kensfield, no one knows, not even the classification (extreme edition, value, a new class altogether.... for examples).
 

ElMoIsEviL

Distinguished
Probably get flamed for the following comments, but articles like this are fanboy fodder...I mix articles like this in my compost pile to fertilize my lawn.

I've said it before and I'll say it again; meh, who cares. As impressive as it may be, Intel is following the same implementation with quad core as they did with the dual core pentiumD...with this model, they could cram 4 dual cores, 8 dual cores, or even 16 dual cores onto a die...not gonna make apps load or run any faster or get more fps out of it...looks pretty though when you open Task Manager...
Did you not click the link and check out the benchmark results? It does make apps run faster.. MUCH faster.

19 seconds out of Cinebench at stock clocks is amazing! That was using a Kentsfield clocked at 2.4Ghz. Now compare that to an AthlonFX-62 oc'ed to 3Ghz which get's 33.3s... HUGE performance boost.
 
Right now there isn't much use for Quad-core unless you use software that takes advantage of it.

Intel is going to be hurting in memory bandwidth benchmarks, however, in real-world benchmarks that use programs that take advantage of multiple cores it will be spanking. It will spank 4x4, hands down.

4x4 is just a duct tape way to make a quad-core processor system. If AMD was able to produce quad-core right now you wouldn't be hearing about 4x4.

An Intel quad-core system will be much easier and cheaper to implement than a 4x4 system.

Although I do not like them gluing together two dual core chips, Intel does this as a step. The next step will be a true quad-core chip. Then they'll glue two of those together for 8 cores, and then they'll make a true 8-core chip, etc...

Adding sockets onto a motherboard is NOT the answer for home users and even most gamers.
 
Did you not click the link and check out the benchmark results? It does make apps run faster.. MUCH faster.

19 seconds out of Cinebench at stock clocks is amazing! That was using a Kentsfield clocked at 2.4Ghz. Now compare that to an AthlonFX-62 oc'ed to 3Ghz which get's 33.3s... HUGE performance boost.
Actually, I did...and at the very least I'm glad to see Intel produce a quad core cpu that runs well on a multi-thread enabled benchmark...as noted:
Right now there isn't much use for Quad-core unless you use software that takes advantage of it.

But as of today. Meh, who cares! I'll give props to Wusy for noting one of the more productive uses for quad core...a worthy cause and use of idle cycles...
It gets you more F@H points! :D

Don't get me wrong, I think the recent advances in CPU micro architecture have been jumps forward in technology...but I quit realizing real world everyday performance gains once procs hit 2.4GHz...I am far more impressed by PCIe than CoreDuo, AM2, and especially Kentsfield...I want fiber channeled equivalent I/O subsystems...start producing flash drives so they're as cheap as a 320GB SATA drive...all this processing power is just about pointless when I still have to load applications onto my machine using a CD or DVD!!!!

But that's just me...
 

cryogenic

Distinguished
Jul 10, 2006
449
1
18,780
Will 4x4 b able to beat it?
This is an interesting question if you think about it for a sec. To answer this first one shoud answer this question : "Will Intel's quad core be FSB bottelnecked or will they manage to squize something out of the old FSB yet again ?"

The memory presure of four cores accessing the memory through one point only might be a little to much this time. On the ohther side AMD already has better memory tranfer rates usiong AMD2 than Intel. Using two processors with two memory HT links and on die memory controlers 4x4 will see an inpromvement in memory bandwidth tha's for sure. But higher badwidth doesn't guarantee higher performance especialy on high memory latency. To have higher performance with using high latency memory you either need a dam good prefech mecanism that doesn't miss or a rather large cache to hide the latency.

As you see I havent actualy aswered your question, but what I think it will happen is that performance will be higly benchmark sensitive in the future.

everyone mentions the FSB is getting old for intel and they dont really understand the real bottle neck. AMD uses the FSB also. go read about it and stop posting this stuff. infact now that you can see intels bench scores using their fsb
http://www.xtremesystems.org/forums/showthread.php?p=1527913

how does this compare to amd's 4x4 benchmarks? please explain these bench mark scores to me and how the fsb bottlenecked these scores.
if you really look at how amd and intel use the FSB you will see they hit the same exact bottleneck but just have a different path to get to the bottleneck

FSB DOES offer LOWER memory thoughoutput comapred to AMD memory aceess infrastructure. The problem is that current AMD processors can't hide DDR2 latency (lack of and advanced memory prefetc + rearanged access pattern) and also can't chew up all that data because of lower pipeline parallelism comapred to Conroe. So basicaly AMD right now can't take full advantage of an superior architecture and it's real world performance is below FSB's performance because of processor architecture only. At these speeds and high cost of non sequencial memory access you realy don't want any cache misses, and this is conroe's strong part for DDR2 memory. These are the problems for wich the K8 architecture wasn't projected. All those issues will be solved using K8L architecture, and more.

I do understand that Conroe is an amazing architecture to be able to squize every bit of performance using an extended array of tricks and techologies.
All of this comes from Intel's early adoption of DDR2 wich gave them some extra R&D time to solve all of the problems related to high bandwidth and also high latency memory.[/code]


so if as you state fsb does lower memory latency and amd doesnt do prefetch and AMD right now can't take full advantage of an superior architecture and the reason amd is behind FSB right now is because of proc architecture.

1. what is superior architecture about amd?

2. what is amd changing in the newer processors to address these issues?
A. making a prefecther? B. changing the proc architecture? C. more pipeline parallelism?

3. I would not claim that early adoption of DDR2 is the reason conroe dominates. you said it yourself FSB is slower than amd's superior technology.

4. you never figured out or answered my question how can a "Bottlenecked FSB out perform amd's "Superior architecture"
the answer is right infront of you. if i see that you try to answer it then i will tell you what it is

1. AMD architecture is superior because it alows even with curent k8 processor a higher memory throughoutput an lover latencies du to the integrated memory controler. But this isn't enough to offer superior performace comapred to Conroe because this simply is a too big o bite to swalow for K8. K8 isn't memory bottlenecked, nether C2D isn't memory
bottlenecked.

But in the case of quad core the perf increase from doa core is 54% as shown from aforementioned tests on extremesystems. The dual core increase from sigle core was 80%. This is where the FSB bottleneck begins to show up.

2. A, C and a little of B

3. That's exactly the reason C2D dominates it was build to be ablea to chew up high bandwidth with low latency. All the architectural improvements in C2D were for this purpose.

4. AMD K8 even if they had used FSB wouln't had been bottlenecked by it until quad core arival. The problem is that K8 is from DDR era while Core 2 Duo is a DDR2 era prpocessor this is were the performance diference comes from.

How can you explain C2 Duo's performance increase over Netburst having the same bandwidth available ?
 

ElMoIsEviL

Distinguished
But in the case of quad core the perf increase from doa core is 54% as shown from aforementioned tests on extremesystems. The dual core increase from sigle core was 80%. This is where the FSB bottleneck begins to show up.
Umm no that guy made a mistake in his calculations.

See, a single core CPU attains 63s in Cinebench, a Dual core attains 33s and a quad core attains 19s.

You realize that you are calculating that 54% performance increase between chips with two different clock speeds?

The scaling is almost perfectly linear:

Single CPU: 63 seconds
Quad Core: 19 seconds

63/(19*4) = 0.82

So there is about an 18% decrease from theoretical max.

82% of theoritical max not 54% ;)
 

cryogenic

Distinguished
Jul 10, 2006
449
1
18,780
Will 4x4 b able to beat it?
This is an interesting question if you think about it for a sec. To answer this first one shoud answer this question : "Will Intel's quad core be FSB bottelnecked or will they manage to squize something out of the old FSB yet again ?"

The memory presure of four cores accessing the memory through one point only might be a little to much this time. On the ohther side AMD already has better memory tranfer rates usiong AMD2 than Intel. Using two processors with two memory HT links and on die memory controlers 4x4 will see an inpromvement in memory bandwidth tha's for sure. But higher badwidth doesn't guarantee higher performance especialy on high memory latency. To have higher performance with using high latency memory you either need a dam good prefech mecanism that doesn't miss or a rather large cache to hide the latency.

As you see I havent actualy aswered your question, but what I think it will happen is that performance will be higly benchmark sensitive in the future.

everyone mentions the FSB is getting old for intel and they dont really understand the real bottle neck. AMD uses the FSB also. go read about it and stop posting this stuff. infact now that you can see intels bench scores using their fsb
http://www.xtremesystems.org/forums/showthread.php?p=1527913

how does this compare to amd's 4x4 benchmarks? please explain these bench mark scores to me and how the fsb bottlenecked these scores.
if you really look at how amd and intel use the FSB you will see they hit the same exact bottleneck but just have a different path to get to the bottleneck

FSB DOES offer LOWER memory thoughoutput comapred to AMD memory aceess infrastructure. The problem is that current AMD processors can't hide DDR2 latency (lack of and advanced memory prefetc + rearanged access pattern) and also can't chew up all that data because of lower pipeline parallelism comapred to Conroe. So basicaly AMD right now can't take full advantage of an superior architecture and it's real world performance is below FSB's performance because of processor architecture only. At these speeds and high cost of non sequencial memory access you realy don't want any cache misses, and this is conroe's strong part for DDR2 memory. These are the problems for wich the K8 architecture wasn't projected. All those issues will be solved using K8L architecture, and more.

I do understand that Conroe is an amazing architecture to be able to squize every bit of performance using an extended array of tricks and techologies.
All of this comes from Intel's early adoption of DDR2 wich gave them some extra R&D time to solve all of the problems related to high bandwidth and also high latency memory.[/code]


so if as you state fsb does lower memory latency and amd doesnt do prefetch and AMD right now can't take full advantage of an superior architecture and the reason amd is behind FSB right now is because of proc architecture.

1. what is superior architecture about amd?

2. what is amd changing in the newer processors to address these issues?
A. making a prefecther? B. changing the proc architecture? C. more pipeline parallelism?

3. I would not claim that early adoption of DDR2 is the reason conroe dominates. you said it yourself FSB is slower than amd's superior technology.

4. you never figured out or answered my question how can a "Bottlenecked FSB out perform amd's "Superior architecture"
the answer is right infront of you. if i see that you try to answer it then i will tell you what it is

1. AMD architecture is superior because it alows even with curent k8 processor a higher memory throughoutput an lover latencies du to the integrated memory controler. But this isn't enough to offer superior performace comapred to Conroe because this simply is a too big o bite to swalow for K8. K8 isn't memory bottlenecked, nether C2D isn't memory
bottlenecked.

But in the case of quad core the perf increase from doa core is 54% as shown from aforementioned tests on extremesystems. The dual core increase from sigle core was 80%. This is where the FSB bottleneck begins to show up.

2. A, C and a little of B

3. That's exactly the reason C2D dominates it was build to be ablea to chew up high bandwidth with low latency. All the architectural improvements in C2D were for this purpose.

4. AMD K8 even if they had used FSB wouln't had been bottlenecked by it until quad core arival. The problem is that K8 is from DDR era while Core 2 Duo is a DDR2 era prpocessor this is were the performance diference comes from.

How can you explain C2 Duo's performance increase over Netburst having the same bandwidth available ?

imagine all the cores on your system using the same cache and the cache is huge. then add the ability to do more than one instruction per core (and there are 4 of them). if you look at the bench data it is pretty much tied to the number of cores. although C2D did this and made it power efficient. now do you see the bottle neck was never the FSB?
also think about this
with an amd proc if a needed page of memory is not in the proc cache where does the memory controller get the page from? in this situation wouldnt you want a larger proc cache? so even though the imc in amd is good shouldnt you always want to get your data from the proc cache? intel has eliminated the advantage amd used to have by getting a great prefetch with a large proc cache. amd simply needs to work on this

Some benckmarks simply don't put any memory preasure on the processor, and if you run the same aplication on all 4 cores you will definatly have alot of cache hits and the bottleneck is practicaly hidden.

As always these tests are made to show the procesor in a very good light and hide any posible bottlenecks.

But when you run difent memory hungry aplications on each core, I can bet my money (all of them) that Intel's quad core will be higly bottlenecked by FSB, and AMDs 4x4 or quad core wont be!
 

spud

Distinguished
Feb 17, 2001
3,406
0
20,780
Will 4x4 b able to beat it?
This is an interesting question if you think about it for a sec. To answer this first one shoud answer this question : "Will Intel's quad core be FSB bottelnecked or will they manage to squize something out of the old FSB yet again ?"

The memory presure of four cores accessing the memory through one point only might be a little to much this time. On the ohther side AMD already has better memory tranfer rates usiong AMD2 than Intel. Using two processors with two memory HT links and on die memory controlers 4x4 will see an inpromvement in memory bandwidth tha's for sure. But higher badwidth doesn't guarantee higher performance especialy on high memory latency. To have higher performance with using high latency memory you either need a dam good prefech mecanism that doesn't miss or a rather large cache to hide the latency.

As you see I havent actualy aswered your question, but what I think it will happen is that performance will be higly benchmark sensitive in the future.

everyone mentions the FSB is getting old for intel and they dont really understand the real bottle neck. AMD uses the FSB also. go read about it and stop posting this stuff. infact now that you can see intels bench scores using their fsb
http://www.xtremesystems.org/forums/showthread.php?p=1527913

how does this compare to amd's 4x4 benchmarks? please explain these bench mark scores to me and how the fsb bottlenecked these scores.
if you really look at how amd and intel use the FSB you will see they hit the same exact bottleneck but just have a different path to get to the bottleneck

FSB DOES offer LOWER memory thoughoutput comapred to AMD memory aceess infrastructure. The problem is that current AMD processors can't hide DDR2 latency (lack of and advanced memory prefetc + rearanged access pattern) and also can't chew up all that data because of lower pipeline parallelism comapred to Conroe. So basicaly AMD right now can't take full advantage of an superior architecture and it's real world performance is below FSB's performance because of processor architecture only. At these speeds and high cost of non sequencial memory access you realy don't want any cache misses, and this is conroe's strong part for DDR2 memory. These are the problems for wich the K8 architecture wasn't projected. All those issues will be solved using K8L architecture, and more.

I do understand that Conroe is an amazing architecture to be able to squize every bit of performance using an extended array of tricks and techologies.
All of this comes from Intel's early adoption of DDR2 wich gave them some extra R&D time to solve all of the problems related to high bandwidth and also high latency memory.[/code]


so if as you state fsb does lower memory latency and amd doesnt do prefetch and AMD right now can't take full advantage of an superior architecture and the reason amd is behind FSB right now is because of proc architecture.

1. what is superior architecture about amd?

2. what is amd changing in the newer processors to address these issues?
A. making a prefecther? B. changing the proc architecture? C. more pipeline parallelism?

3. I would not claim that early adoption of DDR2 is the reason conroe dominates. you said it yourself FSB is slower than amd's superior technology.

4. you never figured out or answered my question how can a "Bottlenecked FSB out perform amd's "Superior architecture"
the answer is right infront of you. if i see that you try to answer it then i will tell you what it is

1. AMD architecture is superior because it alows even with curent k8 processor a higher memory throughoutput an lover latencies du to the integrated memory controler. But this isn't enough to offer superior performace comapred to Conroe because this simply is a too big o bite to swalow for K8. K8 isn't memory bottlenecked, nether C2D isn't memory
bottlenecked.

But in the case of quad core the perf increase from doa core is 54% as shown from aforementioned tests on extremesystems. The dual core increase from sigle core was 80%. This is where the FSB bottleneck begins to show up.

2. A, C and a little of B

3. That's exactly the reason C2D dominates it was build to be ablea to chew up high bandwidth with low latency. All the architectural improvements in C2D were for this purpose.

4. AMD K8 even if they had used FSB wouln't had been bottlenecked by it until quad core arival. The problem is that K8 is from DDR era while Core 2 Duo is a DDR2 era prpocessor this is were the performance diference comes from.

How can you explain C2 Duo's performance increase over Netburst having the same bandwidth available ?

imagine all the cores on your system using the same cache and the cache is huge. then add the ability to do more than one instruction per core (and there are 4 of them). if you look at the bench data it is pretty much tied to the number of cores. although C2D did this and made it power efficient. now do you see the bottle neck was never the FSB?
also think about this
with an amd proc if a needed page of memory is not in the proc cache where does the memory controller get the page from? in this situation wouldnt you want a larger proc cache? so even though the imc in amd is good shouldnt you always want to get your data from the proc cache? intel has eliminated the advantage amd used to have by getting a great prefetch with a large proc cache. amd simply needs to work on this

Some benckmarks simply don't put any memory preasure on the processor, and if you run the same aplication on all 4 cores you will definatly have alot of cache hits and the bottleneck is practicaly hidden.

As always these tests are made to show the procesor in a very good light and hide any posible bottlenecks.

But when you run difent memory hungry aplications on each core, I can bet my money (all of them) that Intel's quad core will be higly bottlenecked by FSB, and AMDs 4x4 or quad core wont be!

Intel's prefetch technologies will counter that quite well, you obviously have little knowledge on how software is being executed on the Core 2's wait till the recompiled software starts to show up.
 

cryogenic

Distinguished
Jul 10, 2006
449
1
18,780
Intel's prefetch technologies will counter that quite well, you obviously have little knowledge on how software is being executed on the Core 2's wait till the recompiled software starts to show up.

Prefetch technologies help to hide latencies, and can't help in the case of bandwidth bottleneck!

And I have lot's of knowlege regarding the most initmate parts of an CPU a from the software's point of view, starting with assemly code (32 adn 64 bit) and cointinuing with multi threaded sofware design in higl level languges. It's what I do ;).

Core 2 Duo doesn't have any special or new instuctions sets that will add performance gains from recompiling software. Thodays software hardly ever uses SSE2 or SSE for that mather not to mention SSE3 so if you're wainting for (non specialised) softaware recompiled for Conroe better make sure you'll live long enough to see it happen (I recomend some Cryogenic sleep).
 

spud

Distinguished
Feb 17, 2001
3,406
0
20,780
Prefetch technologies help to hide latencies, and can't help in the case of bandwidth bottleneck!

Bandwidth has never been a issue for the Core uArch, the Pentium M's are a perfect example of the Core uArch being extremely efficient with memory and the available memory bandwidth this holds true with the current implementation in the form of the Core 2 Duo. Increasing FSB speed and subsequently bandwidth, doesn’t push performance very high. That illustrates the uArch isn't close to bandwidth constrained even under heavy load, the only positives I can attribute the small performance increase to is due to lower latencies that higher speed DDR2 offers.

Now yes prefetching helps drastically reduces FSB activity. With regards to the fact the data flow is more constant or predicable the CPU is able to properly gauge and manage FSB transactions minimizing latencies as you said but overall usage of the FSB all together. This in turn lowers the overall bandwidth needed for normal and high end usage of the processor.

And I have lot's of knowlege regarding the most initmate parts of an CPU a from the software's point of view, starting with assemly code (32 adn 64 bit) and cointinuing with multi threaded sofware design in higl level languges. It's what I do ;).

Really then why would you even be so bold as to say;
Thodays software hardly ever uses SSE2 or SSE for that mather not to mention SSE3

A real programmer would be well aware of the superiority of SSE scalar and vectorized instructions over x87 and that’s a FACT regardless of your pseudo occupation.

Secondly you don't need specialized software to utilize SSE into your code base. It's a everyday replacement to x87 as I have stated, it is used in Windows, games, multi media software, productivity software, in fact it can be used in every single software environment if the programmer is willing to take the time to work with it. Additionally it will give a performance increase of x87 code even if it is only 2 system ticks.

so if you're wainting for (non specialised) softaware recompiled for Conroe better make sure you'll live long enough to see it happen (I recomend some Cryogenic sleep).

How about we wait for Unreal 3, Quake Wars, and Vanguard to name a few games that will be compiled with the Core uArch in mind? I also did not state anywhere that there was a new instruction set, the Core uArch is radically different to that of the Netburst uArch. There is a great deal of software out there coded and compiled with the longer pipeline, weaker ALU performance, smaller cache sizes and excessive SSE optimizations in mind.

That software gets a quick recompile a few days on debugging, and if everything goes well you will have, a better tuned code base for the processor.