On this poitn I have to disagree. Pipeline efficiency is as or more important than # of issues. Look at the comparison between PD and X2. They are both 3-issue but PD needs 500+MHz to be on par.
This says that is AMD can increase the efficiency of the decoders with preschedulers ( or the like) they will over take Intel EASILY. Even Core 2 is limited to around 2.5-3.5 IPC on average.
Agreed, on this, in fact i'm not really considering this to be a main issue.
Most people is overestimating the impact of the issue rate on Conroe's performance.
I have no data concerning the current IPC of Core 2 / K8, but if Core 2 could really do 3.5 IPC, then of course K8L would never be able to match it with just a 3-issue design; however i believe the actual number should be far from it.
Unless we're talkin here about IPC under ideal conditions.
Just because a CPU can issue 4/5 instructions per clock (but in fact, it can only dispatch and retire 4 instructions per clock, the macro op fusion is a trick to treat a compare and a jump as a single instruction, which logically makes sense, since it's very common to use conditional branches, and the jump instruction itself does not need to be really executed into an ALU), it does not mean that it is also able to process and retire that many under most common conditions.
Especially when a CPU has to access main memory, up to a hundred clocks are wasted sitting idle, or a dozen in case of branch misprediction; in a way, having a high issue rate helps dealing with memory latency issues and branch mispredictions, because you quickly refill the buffers/schedulers.
But to achieve a really higher IPC, having a wide front end (issue, decode, dispatch, schedule) is not enough, you need to have also a wider back end (execution units, ALUs).
If we look at K8 and Core architectures (
anandtech)
[/quote] we can notice the following:
* Core has 2 integer ALUs + 1 branch/integer ALU, K8 has 3 general purpose integer ALUs
* Core has 2 address generation units, K8 has 3; this can give
* Both have the same number of floating point units, 2, however, Core has a huge advantage due to the enhanced SSE engine, which can process 2x 128bit ALU instructions per clock (yes it can also do SSE loads/stores in parallel to that)
Now K8L will have a similar SSE processing power as Core, and it seems to have a slight advantage in terms of buffers and schedulers and integer units.
It is also still unclear whether K8L's OOO loads will match the reordering flexibility of Core's memory disambiguation; however K8L should enjoy the lower latency thanks to the integrated memory controller, and 3 levels of cache.
Of course there are still too little details available about K8L's architecture, but from what we can see now, the 2 architectures should perform very similarly, at least on single threaded applications.
I'd be really surprised to see K8L, given this data, outperform Core 2 by a margin higher than 10% clock for clock; i see it more likely to have both CPUs within a 5% delta, and it's even possible that Core will still be the better performer.
The area where Core seems to still have the advantage, though is clock speed, where it's not unreasonable to predict a 20% advantage for Intel.
AMD will have a better platform though, and the 4 cores much better interconnects, so on multithreaded applications, i guess K8L will be very competitive.
This is all wild speculation though.