INQ says rev. A1 K8L ES Demo close

spanishfleee

Distinguished
Jul 7, 2006
111
0
18,680
I'm hoping it beats Intel back to the stone age, because this will mean more even more competition (if that possible) and cheap CPUs for us all :D
 

Pippero

Distinguished
May 26, 2006
594
0
18,980
Hmm, according to the specs, i find it hard to believe.
At least in single threaded performance, i'd expect them to be quite evenly matched.
Maybe as a quad core with shared L3 cache it will perform significantly better on heavily multithreaded applications against Kentsfield.
But as of now, it seems that Intel will enjoy a healthy clock speed advantage, and this will sadly widen when Intel transitions to 45nm.
That is, unless AMD has somehow revised the K8L pipeline for more clock speed headroom, but nothing has been announced concerning that :/
 

BaronMatrix

Splendid
Dec 14, 2005
6,655
0
25,790
Hmm, according to the specs, i find it hard to believe.
At least in single threaded performance, i'd expect them to be quite evenly matched.
Maybe as a quad core with shared L3 cache it will perform significantly better on heavily multithreaded applications against Kentsfield.
But as of now, it seems that Intel will enjoy a healthy clock speed advantage, and this will sadly widen when Intel transitions to 45nm.
That is, unless AMD has somehow revised the K8L pipeline for more clock speed headroom, but nothing has been announced concerning that :/

On this poitn I have to disagree. Pipeline efficiency is as or more important than # of issues. Look at the comparison between PD and X2. They are both 3-issue but PD needs 500+MHz to be on par.

This says that is AMD can increase the efficiency of the decoders with preschedulers ( or the like) they will over take Intel EASILY. Even Core 2 is limited to around 2.5-3.5 IPC on average. If AMD can get higher peak numbers by adding bandwidth and buffer size to the L1, integer will be a piece of cake without a 4th issue. By using superscalar techniques it is possible to execute 2 loads and retires per cycle of FP or SSE3.

Taht ha sthe potential to once again "poke Intel in the eye and beat them with a stick" (quote from Rahul Sood of VooDooPC).

I just hope that it's faster than K8, then I have an upgrade path for my soon to be purchased 4x4 system.

P.S.

Anybody want a decked out 4400+ with 4GB RAM and 7800GT?
:p
 

GiDDY_SOUL

Distinguished
Apr 11, 2006
80
0
18,630
I liked the working of the of the L3 cache in K8L. Its neither inclusive or exclusive!

I would like to kno the results of K8L's Small cache (L2+L3=4MB) vs Intel Quad core huge cache (8MB).

How does intel's server L3 work? inclusive or exclusive?
 

Julian33

Distinguished
Jun 23, 2006
214
0
18,680
AMD has never been too dependant upon cache, so I cant see the extra cache on its own having a huge impact (maybe 5% improvement in some situations based on the difference between 512k and 1m X2's).

We will just have to wait and see I guess, it shouldn't be too long now before we get some first benchies. I just hope for the sake of the consumer that AMD can become competetive again at the high end.
 

ElMoIsEviL

Distinguished
Hmm, according to the specs, i find it hard to believe.
At least in single threaded performance, i'd expect them to be quite evenly matched.
Maybe as a quad core with shared L3 cache it will perform significantly better on heavily multithreaded applications against Kentsfield.
But as of now, it seems that Intel will enjoy a healthy clock speed advantage, and this will sadly widen when Intel transitions to 45nm.
That is, unless AMD has somehow revised the K8L pipeline for more clock speed headroom, but nothing has been announced concerning that :/

On this poitn I have to disagree. Pipeline efficiency is as or more important than # of issues. Look at the comparison between PD and X2. They are both 3-issue but PD needs 500+MHz to be on par.

This says that is AMD can increase the efficiency of the decoders with preschedulers ( or the like) they will over take Intel EASILY. Even Core 2 is limited to around 2.5-3.5 IPC on average. If AMD can get higher peak numbers by adding bandwidth and buffer size to the L1, integer will be a piece of cake without a 4th issue. By using superscalar techniques it is possible to execute 2 loads and retires per cycle of FP or SSE3.

Taht ha sthe potential to once again "poke Intel in the eye and beat them with a stick" (quote from Rahul Sood of VooDooPC).

I just hope that it's faster than K8, then I have an upgrade path for my soon to be purchased 4x4 system.

P.S.

Anybody want a decked out 4400+ with 4GB RAM and 7800GT?
:p

Rahul is a sales person... not an engineer. And Intel's C2D can issue up to 5 IPC by using Micro Ops Fusion (fusing two simple instructions into a single instruction).

AMD will need more then 3 Complex Decoders to overtake Intel on the IPC front. But there are other C2D bottlenecks that AMD can take advantage of.
 

m25

Distinguished
May 23, 2006
2,363
0
19,780
Rahul is a sales person... not an engineer. And Intel's C2D can issue up to 5 IPC by using Micro Ops Fusion (fusing two simple instructions into a single instruction).

AMD will need more then 3 Complex Decoders to overtake Intel on the IPC front. But there are other C2D bottlenecks that AMD can take advantage of.
And there will always be, on both fronts, the race is open now about who designs the most complex core. I'm very excited waiting for the results, however, it won't impress like Core2, don't see more than 15% ahead of it in the best case.
1st because it has started to be developped before C2D and is not prepared to counter it.
2ndTheir manufacturing process is still holding them back to unleash the full potential of theis architectures.
 

GiDDY_SOUL

Distinguished
Apr 11, 2006
80
0
18,630
AMD never said anything about the preformance of K8L. I heard mostly the power efficiency of K8L!

Makes me think K8Ls performance wont b too high. It may b only 10%-20% better than Conroe.
 

m25

Distinguished
May 23, 2006
2,363
0
19,780
AMD never said anything about the preformance of K8L. I heard mostly the power efficiency of K8L!

Makes me think K8Ls performance wont b too high. It may b only 10%-20% better than Conroe.
8O Core2Duo is a formidable piece of chip and beating it by 20% would be maybe beyond many expectations; it means that K8L will outperform it's mother arch, K8 by more than 55% and both these figures are more than respectable.
 

m25

Distinguished
May 23, 2006
2,363
0
19,780
55% is quite a thing to achieve. Is it even possible? thats like jumping from the 486 to the P3. In one year.
That is what I am saying; even 15% would be a miracle IMO. It's mot one year however, K8 started around 2004, but making an analogy of K7-K8 transition, it gave no such improvement but we will see. K7 to K8 was more about x64 and HTT than direct IPC improvement so now we can only guess K8L until it's out.
 

Pippero

Distinguished
May 26, 2006
594
0
18,980
On this poitn I have to disagree. Pipeline efficiency is as or more important than # of issues. Look at the comparison between PD and X2. They are both 3-issue but PD needs 500+MHz to be on par.
This says that is AMD can increase the efficiency of the decoders with preschedulers ( or the like) they will over take Intel EASILY. Even Core 2 is limited to around 2.5-3.5 IPC on average.
Agreed, on this, in fact i'm not really considering this to be a main issue.
Most people is overestimating the impact of the issue rate on Conroe's performance.
I have no data concerning the current IPC of Core 2 / K8, but if Core 2 could really do 3.5 IPC, then of course K8L would never be able to match it with just a 3-issue design; however i believe the actual number should be far from it.
Unless we're talkin here about IPC under ideal conditions.
Just because a CPU can issue 4/5 instructions per clock (but in fact, it can only dispatch and retire 4 instructions per clock, the macro op fusion is a trick to treat a compare and a jump as a single instruction, which logically makes sense, since it's very common to use conditional branches, and the jump instruction itself does not need to be really executed into an ALU), it does not mean that it is also able to process and retire that many under most common conditions.
Especially when a CPU has to access main memory, up to a hundred clocks are wasted sitting idle, or a dozen in case of branch misprediction; in a way, having a high issue rate helps dealing with memory latency issues and branch mispredictions, because you quickly refill the buffers/schedulers.
But to achieve a really higher IPC, having a wide front end (issue, decode, dispatch, schedule) is not enough, you need to have also a wider back end (execution units, ALUs).
If we look at K8 and Core architectures (anandtech)
[/quote] we can notice the following:
* Core has 2 integer ALUs + 1 branch/integer ALU, K8 has 3 general purpose integer ALUs
* Core has 2 address generation units, K8 has 3; this can give
* Both have the same number of floating point units, 2, however, Core has a huge advantage due to the enhanced SSE engine, which can process 2x 128bit ALU instructions per clock (yes it can also do SSE loads/stores in parallel to that)
Now K8L will have a similar SSE processing power as Core, and it seems to have a slight advantage in terms of buffers and schedulers and integer units.
It is also still unclear whether K8L's OOO loads will match the reordering flexibility of Core's memory disambiguation; however K8L should enjoy the lower latency thanks to the integrated memory controller, and 3 levels of cache.
Of course there are still too little details available about K8L's architecture, but from what we can see now, the 2 architectures should perform very similarly, at least on single threaded applications.
I'd be really surprised to see K8L, given this data, outperform Core 2 by a margin higher than 10% clock for clock; i see it more likely to have both CPUs within a 5% delta, and it's even possible that Core will still be the better performer.
The area where Core seems to still have the advantage, though is clock speed, where it's not unreasonable to predict a 20% advantage for Intel.
AMD will have a better platform though, and the 4 cores much better interconnects, so on multithreaded applications, i guess K8L will be very competitive.
This is all wild speculation though. :)
 

BaronMatrix

Splendid
Dec 14, 2005
6,655
0
25,790
55% is quite a thing to achieve. Is it even possible? thats like jumping from the 486 to the P3. In one year.

DO you remember the argument we had about 60-80% according to AMD? They were speaking of perf\watt but just cutting wattage won't do it alone.

I'm confident that the additons will total 60-80% over K8. I will still be interested to see the first A1 numbers. I would guess they will get to 20-30% higher.
 

BaronMatrix

Splendid
Dec 14, 2005
6,655
0
25,790
I'd be really surprised to see K8L, given this data, outperform Core 2 by a margin higher than 10% clock for clock; i see it more likely to have both CPUs within a 5% delta, and it's even possible that Core will still be the better performer.
The area where Core seems to still have the advantage, though is clock speed, where it's not unreasonable to predict a 20% advantage for Intel.
AMD will have a better platform though, and the 4 cores much better interconnects, so on multithreaded applications, i guess K8L will be very competitive.

I agree with most of your statements except the perf delta. The hardest thing to determine is what effect just adding two cores vs all of the enhancements wil have.

WHen looking at Kentsfield it gives I believe a good starting point for increases over two cores. Supposedly the laptop chip will debut first so some DTRs may give a hint as to what the increase will be per core with the changes.

It will be really difficult to compare going from dual core Opteron to Barcelona.
 

samxxxii

Distinguished
May 16, 2006
24
0
18,510
I Know that i do not often post, but every night i see all the post of everybody. And this time I want to ask you Jack if you can correct the next statement but if I am wright, the terms for A0, A1, B0, B1, etc. it's the term that Intel set's for their's chip revision? and AMD use another kind of code...!?
If what I'm saying it's correct then why the Inquirer say's that there is an A0 and the A1 it's comming soon..., so I'm questioning their real value of their statement....

Thank You

Cheers

Sam

P.D.: I am aware that my English it's not so good, so I appologize! :D
 

qcmadness

Distinguished
Aug 12, 2006
1,051
0
19,280
Your english is fine.

You are correct, Intel uses major/minor revision labels to describe their steppings. The stepping count increases, but the most informative is the A0, A1 etc. etc.

AMD using a slightly more convoluted method for revision labels, but major stepping revisions will increment the stepping counter.

cpuidhe2.jpg


The stepping increments when there is a physical change in the mask die field, the revision for Intel is typically a 2 character block, the first character is a letter and the second a number. An increase in letter is a major revision, how Intel catagorizes a revision as major I am not sure but a good example is say take a P4 CPU in one revision, then adding the logic to turn on hyperthreading in the next is likely a major revision.

Some revisions address interactions with process conditions which allows slightly higher clocking or lower power states (for example the P4D -- Presler revision to D0 helped power consumption), other fix errata or will be a very slight shrink to reduce die size or transistor count (cost reasons).

AMD's revision labeling typically follows a two block designation, and I have not researched to figure out how/what each chacter increment would represent.

However, in the INQ article they are refering to the second revision of Barcelona as A1 (A0 is the initial revision), this is not uncommon, if the processor is CPUID'ed, it would not necessarily show this type of naming.

Jack

For AMD, a change of the letter in the revision name is already a change in code-name (of course for same cache models).

For example:
Athlon64 Single Core (512KB Cache):
Rev. C: Newcastle
Rev. D: Winchester
Rev. E: Venice
 

endyen

Splendid
I can provide some data which I will dig up that puts the Netburst IPC between 1.7 and 2.0
I would love to see that data. I have believed for some time, that the P4s had an IPC of about 2/3rds of the AMD xp chips. That would mean that the xps were close to 3 IPC.
It was a long time ago, but I also seem to remember that the P3s had an IPC close to 1. When the P4 came out, most people pegged it's IPC @ between .6 and .78.
AFAIK wait is still the #1 state.
 

Pippero

Distinguished
May 26, 2006
594
0
18,980
Hmm, IPC waries wildly with application type and even with the user data fed to the application, so providing a kind of global "typical" IPC for a CPU is like trying to provide a global single performance index - you can't.
However, 0.9 to 1.2 for Netburst makes it more sense to me than 1.7-2.0, otherwise that would really mean that K8 was already close to 3.0; i do believe such IPC values are possible, but only on small highly optimized computational loops, not general applications working on real datasets.
Also, a 1.7-2.0 for Netburst means, since Netburst can execute 2 threads in parallel, that the P4 was close to an effective IPC of 3.4-4.0, which it would not accomplish even in its wildest wet dreams :lol:
The only thing that Netburst could bake at 4x clock were integer additions and bitwise logical operations.. anything else (even a simple shift, until Prescott), and it would start to crap.
 

Julian33

Distinguished
Jun 23, 2006
214
0
18,680
I think your probably right. Even if AMD beats Intel in terms of IPC, we know that Intel can just ramp the clock up to compensate. We know from the P4 that they could get back up to 3.8 or even 4GHz now, whereas it is doubtful AMD could get much past 3GHz based on current overclocking results of the K8, and this dosn't seem likely to change based on the rumoured 2.7-2.9GHz clock speeds of Altair.

To be honest, I see it as being unlikely that AMD will be able to retake the overall performance crown because of this. What is more likely is that they can do what they have always done and make the most of their strengths - a higher IPC at a lower clock speed may well allow them to retake the lead in terms of performance per watt. This being the case, I can see AMD managing to hold on to its server market. I think that until K10, which isn't going to be out until Nehalem at the earliest, AMD is not going to retake the high end enthusiast market, unless 4x4 has some major surprise we are not yet aware of. Interestingly, AMD is even now still managing to hold on to the lead at the low end thanks to the Core architecture not filtering down yet. By the time these parts become available, AMD could have K8L based budget parts out. Having said this, I think it could just be possible K8 could remain competetive with the new Celerons - we have no idea what the halving of cache will do to the performance of Core 2, and this could end up being its achilles heel. All we can say reasonably, based on the diminishing returns of more cache, is that the difference will be at least as big as between the 2 and 4 meg Conroes, based on the fact that the new celerons will have 512k of dedicated cache.

Just my thoughts, I appreciate however they are mostly speculation at this point :)