Sign in with
Sign up | Sign in
Your question

INQ says rev. A1 K8L ES Demo close

Tags:
  • CPUs
Last response: in CPUs
Share
October 27, 2006 5:15:21 PM

More about : inq rev k8l demo close

October 27, 2006 5:50:23 PM

very interesting, hopefully we see some stronger evidence soon.

Quote:
but give it 2-3 weeks before we know for sure
I doubt it very much
October 27, 2006 5:55:36 PM

Well, i really hope this is true.
Especially if they really wanna get it to the market for H2 '07...
October 27, 2006 6:35:36 PM

AMD siad they will demo b4 december.

I wonder how will it perform.
October 27, 2006 6:53:16 PM

I'm hoping it beats Intel back to the stone age, because this will mean more even more competition (if that possible) and cheap CPUs for us all :D 
October 27, 2006 7:02:50 PM

Hmm, according to the specs, i find it hard to believe.
At least in single threaded performance, i'd expect them to be quite evenly matched.
Maybe as a quad core with shared L3 cache it will perform significantly better on heavily multithreaded applications against Kentsfield.
But as of now, it seems that Intel will enjoy a healthy clock speed advantage, and this will sadly widen when Intel transitions to 45nm.
That is, unless AMD has somehow revised the K8L pipeline for more clock speed headroom, but nothing has been announced concerning that :/ 
October 27, 2006 7:59:04 PM

Quote:
Hmm, according to the specs, i find it hard to believe.
At least in single threaded performance, i'd expect them to be quite evenly matched.
Maybe as a quad core with shared L3 cache it will perform significantly better on heavily multithreaded applications against Kentsfield.
But as of now, it seems that Intel will enjoy a healthy clock speed advantage, and this will sadly widen when Intel transitions to 45nm.
That is, unless AMD has somehow revised the K8L pipeline for more clock speed headroom, but nothing has been announced concerning that :/ 


On this poitn I have to disagree. Pipeline efficiency is as or more important than # of issues. Look at the comparison between PD and X2. They are both 3-issue but PD needs 500+MHz to be on par.

This says that is AMD can increase the efficiency of the decoders with preschedulers ( or the like) they will over take Intel EASILY. Even Core 2 is limited to around 2.5-3.5 IPC on average. If AMD can get higher peak numbers by adding bandwidth and buffer size to the L1, integer will be a piece of cake without a 4th issue. By using superscalar techniques it is possible to execute 2 loads and retires per cycle of FP or SSE3.

Taht ha sthe potential to once again "poke Intel in the eye and beat them with a stick" (quote from Rahul Sood of VooDooPC).

I just hope that it's faster than K8, then I have an upgrade path for my soon to be purchased 4x4 system.

P.S.

Anybody want a decked out 4400+ with 4GB RAM and 7800GT?
:p 
October 27, 2006 8:43:32 PM

I liked the working of the of the L3 cache in K8L. Its neither inclusive or exclusive!

I would like to kno the results of K8L's Small cache (L2+L3=4MB) vs Intel Quad core huge cache (8MB).

How does intel's server L3 work? inclusive or exclusive?
October 27, 2006 9:04:25 PM

AMD has never been too dependant upon cache, so I cant see the extra cache on its own having a huge impact (maybe 5% improvement in some situations based on the difference between 512k and 1m X2's).

We will just have to wait and see I guess, it shouldn't be too long now before we get some first benchies. I just hope for the sake of the consumer that AMD can become competetive again at the high end.
a b à CPUs
October 27, 2006 9:20:42 PM

Quote:
Hmm, according to the specs, i find it hard to believe.
At least in single threaded performance, i'd expect them to be quite evenly matched.
Maybe as a quad core with shared L3 cache it will perform significantly better on heavily multithreaded applications against Kentsfield.
But as of now, it seems that Intel will enjoy a healthy clock speed advantage, and this will sadly widen when Intel transitions to 45nm.
That is, unless AMD has somehow revised the K8L pipeline for more clock speed headroom, but nothing has been announced concerning that :/ 


On this poitn I have to disagree. Pipeline efficiency is as or more important than # of issues. Look at the comparison between PD and X2. They are both 3-issue but PD needs 500+MHz to be on par.

This says that is AMD can increase the efficiency of the decoders with preschedulers ( or the like) they will over take Intel EASILY. Even Core 2 is limited to around 2.5-3.5 IPC on average. If AMD can get higher peak numbers by adding bandwidth and buffer size to the L1, integer will be a piece of cake without a 4th issue. By using superscalar techniques it is possible to execute 2 loads and retires per cycle of FP or SSE3.

Taht ha sthe potential to once again "poke Intel in the eye and beat them with a stick" (quote from Rahul Sood of VooDooPC).

I just hope that it's faster than K8, then I have an upgrade path for my soon to be purchased 4x4 system.

P.S.

Anybody want a decked out 4400+ with 4GB RAM and 7800GT?
:p 

Rahul is a sales person... not an engineer. And Intel's C2D can issue up to 5 IPC by using Micro Ops Fusion (fusing two simple instructions into a single instruction).

AMD will need more then 3 Complex Decoders to overtake Intel on the IPC front. But there are other C2D bottlenecks that AMD can take advantage of.
October 27, 2006 9:34:54 PM

Quote:
Rahul is a sales person... not an engineer. And Intel's C2D can issue up to 5 IPC by using Micro Ops Fusion (fusing two simple instructions into a single instruction).

AMD will need more then 3 Complex Decoders to overtake Intel on the IPC front. But there are other C2D bottlenecks that AMD can take advantage of.

And there will always be, on both fronts, the race is open now about who designs the most complex core. I'm very excited waiting for the results, however, it won't impress like Core2, don't see more than 15% ahead of it in the best case.
1st because it has started to be developped before C2D and is not prepared to counter it.
2ndTheir manufacturing process is still holding them back to unleash the full potential of theis architectures.
October 27, 2006 9:39:13 PM

AMD never said anything about the preformance of K8L. I heard mostly the power efficiency of K8L!

Makes me think K8Ls performance wont b too high. It may b only 10%-20% better than Conroe.
October 27, 2006 9:46:55 PM

Quote:
AMD never said anything about the preformance of K8L. I heard mostly the power efficiency of K8L!

Makes me think K8Ls performance wont b too high. It may b only 10%-20% better than Conroe.

8O Core2Duo is a formidable piece of chip and beating it by 20% would be maybe beyond many expectations; it means that K8L will outperform it's mother arch, K8 by more than 55% and both these figures are more than respectable.
October 27, 2006 9:49:14 PM

55% is quite a thing to achieve. Is it even possible? thats like jumping from the 486 to the P3. In one year.
October 27, 2006 9:50:56 PM

Quote:

Makes me think K8Ls performance wont b too high. It may b only 10%-20% better than Conroe.


I would be glad to see that much performance gain over conroe, sounds like a huge leap from K8
October 27, 2006 9:55:17 PM

Quote:
55% is quite a thing to achieve. Is it even possible? thats like jumping from the 486 to the P3. In one year.

That is what I am saying; even 15% would be a miracle IMO. It's mot one year however, K8 started around 2004, but making an analogy of K7-K8 transition, it gave no such improvement but we will see. K7 to K8 was more about x64 and HTT than direct IPC improvement so now we can only guess K8L until it's out.
October 27, 2006 9:57:23 PM

How much was the improvement from K6 to K8? That might give us some sort of idea of the performance increase we might see.
October 27, 2006 11:17:28 PM

Quote:

On this poitn I have to disagree. Pipeline efficiency is as or more important than # of issues. Look at the comparison between PD and X2. They are both 3-issue but PD needs 500+MHz to be on par.
This says that is AMD can increase the efficiency of the decoders with preschedulers ( or the like) they will over take Intel EASILY. Even Core 2 is limited to around 2.5-3.5 IPC on average.

Agreed, on this, in fact i'm not really considering this to be a main issue.
Most people is overestimating the impact of the issue rate on Conroe's performance.
I have no data concerning the current IPC of Core 2 / K8, but if Core 2 could really do 3.5 IPC, then of course K8L would never be able to match it with just a 3-issue design; however i believe the actual number should be far from it.
Unless we're talkin here about IPC under ideal conditions.
Just because a CPU can issue 4/5 instructions per clock (but in fact, it can only dispatch and retire 4 instructions per clock, the macro op fusion is a trick to treat a compare and a jump as a single instruction, which logically makes sense, since it's very common to use conditional branches, and the jump instruction itself does not need to be really executed into an ALU), it does not mean that it is also able to process and retire that many under most common conditions.
Especially when a CPU has to access main memory, up to a hundred clocks are wasted sitting idle, or a dozen in case of branch misprediction; in a way, having a high issue rate helps dealing with memory latency issues and branch mispredictions, because you quickly refill the buffers/schedulers.
But to achieve a really higher IPC, having a wide front end (issue, decode, dispatch, schedule) is not enough, you need to have also a wider back end (execution units, ALUs).
If we look at K8 and Core architectures (anandtech)
[/quote] we can notice the following:
* Core has 2 integer ALUs + 1 branch/integer ALU, K8 has 3 general purpose integer ALUs
* Core has 2 address generation units, K8 has 3; this can give
* Both have the same number of floating point units, 2, however, Core has a huge advantage due to the enhanced SSE engine, which can process 2x 128bit ALU instructions per clock (yes it can also do SSE loads/stores in parallel to that)
Now K8L will have a similar SSE processing power as Core, and it seems to have a slight advantage in terms of buffers and schedulers and integer units.
It is also still unclear whether K8L's OOO loads will match the reordering flexibility of Core's memory disambiguation; however K8L should enjoy the lower latency thanks to the integrated memory controller, and 3 levels of cache.
Of course there are still too little details available about K8L's architecture, but from what we can see now, the 2 architectures should perform very similarly, at least on single threaded applications.
I'd be really surprised to see K8L, given this data, outperform Core 2 by a margin higher than 10% clock for clock; i see it more likely to have both CPUs within a 5% delta, and it's even possible that Core will still be the better performer.
The area where Core seems to still have the advantage, though is clock speed, where it's not unreasonable to predict a 20% advantage for Intel.
AMD will have a better platform though, and the 4 cores much better interconnects, so on multithreaded applications, i guess K8L will be very competitive.
This is all wild speculation though. :) 
October 27, 2006 11:35:16 PM

Quote:
55% is quite a thing to achieve. Is it even possible? thats like jumping from the 486 to the P3. In one year.


DO you remember the argument we had about 60-80% according to AMD? They were speaking of perf\watt but just cutting wattage won't do it alone.

I'm confident that the additons will total 60-80% over K8. I will still be interested to see the first A1 numbers. I would guess they will get to 20-30% higher.
October 27, 2006 11:52:09 PM

Quote:
I'd be really surprised to see K8L, given this data, outperform Core 2 by a margin higher than 10% clock for clock; i see it more likely to have both CPUs within a 5% delta, and it's even possible that Core will still be the better performer.
The area where Core seems to still have the advantage, though is clock speed, where it's not unreasonable to predict a 20% advantage for Intel.
AMD will have a better platform though, and the 4 cores much better interconnects, so on multithreaded applications, i guess K8L will be very competitive.


I agree with most of your statements except the perf delta. The hardest thing to determine is what effect just adding two cores vs all of the enhancements wil have.

WHen looking at Kentsfield it gives I believe a good starting point for increases over two cores. Supposedly the laptop chip will debut first so some DTRs may give a hint as to what the increase will be per core with the changes.

It will be really difficult to compare going from dual core Opteron to Barcelona.
October 28, 2006 5:19:14 AM

I Know that i do not often post, but every night i see all the post of everybody. And this time I want to ask you Jack if you can correct the next statement but if I am wright, the terms for A0, A1, B0, B1, etc. it's the term that Intel set's for their's chip revision? and AMD use another kind of code...!?
If what I'm saying it's correct then why the Inquirer say's that there is an A0 and the A1 it's comming soon..., so I'm questioning their real value of their statement....

Thank You

Cheers

Sam

P.D.: I am aware that my English it's not so good, so I appologize! :D 
October 28, 2006 6:03:19 AM

Quote:
Your english is fine.

You are correct, Intel uses major/minor revision labels to describe their steppings. The stepping count increases, but the most informative is the A0, A1 etc. etc.

AMD using a slightly more convoluted method for revision labels, but major stepping revisions will increment the stepping counter.



The stepping increments when there is a physical change in the mask die field, the revision for Intel is typically a 2 character block, the first character is a letter and the second a number. An increase in letter is a major revision, how Intel catagorizes a revision as major I am not sure but a good example is say take a P4 CPU in one revision, then adding the logic to turn on hyperthreading in the next is likely a major revision.

Some revisions address interactions with process conditions which allows slightly higher clocking or lower power states (for example the P4D -- Presler revision to D0 helped power consumption), other fix errata or will be a very slight shrink to reduce die size or transistor count (cost reasons).

AMD's revision labeling typically follows a two block designation, and I have not researched to figure out how/what each chacter increment would represent.

However, in the INQ article they are refering to the second revision of Barcelona as A1 (A0 is the initial revision), this is not uncommon, if the processor is CPUID'ed, it would not necessarily show this type of naming.

Jack


For AMD, a change of the letter in the revision name is already a change in code-name (of course for same cache models).

For example:
Athlon64 Single Core (512KB Cache):
Rev. C: Newcastle
Rev. D: Winchester
Rev. E: Venice
October 28, 2006 7:45:20 AM

Quote:
I can provide some data which I will dig up that puts the Netburst IPC between 1.7 and 2.0

I would love to see that data. I have believed for some time, that the P4s had an IPC of about 2/3rds of the AMD xp chips. That would mean that the xps were close to 3 IPC.
It was a long time ago, but I also seem to remember that the P3s had an IPC close to 1. When the P4 came out, most people pegged it's IPC @ between .6 and .78.
AFAIK wait is still the #1 state.
October 28, 2006 8:51:11 AM

Hmm, IPC waries wildly with application type and even with the user data fed to the application, so providing a kind of global "typical" IPC for a CPU is like trying to provide a global single performance index - you can't.
However, 0.9 to 1.2 for Netburst makes it more sense to me than 1.7-2.0, otherwise that would really mean that K8 was already close to 3.0; i do believe such IPC values are possible, but only on small highly optimized computational loops, not general applications working on real datasets.
Also, a 1.7-2.0 for Netburst means, since Netburst can execute 2 threads in parallel, that the P4 was close to an effective IPC of 3.4-4.0, which it would not accomplish even in its wildest wet dreams :lol: 
The only thing that Netburst could bake at 4x clock were integer additions and bitwise logical operations.. anything else (even a simple shift, until Prescott), and it would start to crap.
October 28, 2006 9:42:45 AM

I think your probably right. Even if AMD beats Intel in terms of IPC, we know that Intel can just ramp the clock up to compensate. We know from the P4 that they could get back up to 3.8 or even 4GHz now, whereas it is doubtful AMD could get much past 3GHz based on current overclocking results of the K8, and this dosn't seem likely to change based on the rumoured 2.7-2.9GHz clock speeds of Altair.

To be honest, I see it as being unlikely that AMD will be able to retake the overall performance crown because of this. What is more likely is that they can do what they have always done and make the most of their strengths - a higher IPC at a lower clock speed may well allow them to retake the lead in terms of performance per watt. This being the case, I can see AMD managing to hold on to its server market. I think that until K10, which isn't going to be out until Nehalem at the earliest, AMD is not going to retake the high end enthusiast market, unless 4x4 has some major surprise we are not yet aware of. Interestingly, AMD is even now still managing to hold on to the lead at the low end thanks to the Core architecture not filtering down yet. By the time these parts become available, AMD could have K8L based budget parts out. Having said this, I think it could just be possible K8 could remain competetive with the new Celerons - we have no idea what the halving of cache will do to the performance of Core 2, and this could end up being its achilles heel. All we can say reasonably, based on the diminishing returns of more cache, is that the difference will be at least as big as between the 2 and 4 meg Conroes, based on the fact that the new celerons will have 512k of dedicated cache.

Just my thoughts, I appreciate however they are mostly speculation at this point :) 
October 28, 2006 9:45:47 AM

Quote:
55% is quite a thing to achieve. Is it even possible? thats like jumping from the 486 to the P3. In one year.


DO you remember the argument we had about 60-80% according to AMD? They were speaking of perf\watt but just cutting wattage won't do it alone.

I'm confident that the additons will total 60-80% over K8. I will still be interested to see the first A1 numbers. I would guess they will get to 20-30% higher.

Can you actually quantify 60-80% improvement in total?

Does this include the packaging being 5% better? The name being a small improvement?

What a load of fud.
October 28, 2006 10:12:06 AM

Quote:
55% is quite a thing to achieve. Is it even possible? thats like jumping from the 486 to the P3. In one year.


Or like jumping from Pentium D to Core 2?

I'd hazard a guess that K8 was regarded as ~20% faster than Netburst, and we know Core 2 is regarded as at least that much faster than K8, proving that differences in performance in that order of magnitude are possible.

Of course, K8L is a revision, and not a new mArch. We wll have to wait for K10 for that....
October 28, 2006 1:06:53 PM

Quote:
Agreed, on this, in fact i'm not really considering this to be a main issue.
Most people is overestimating the impact of the issue rate on Conroe's performance.
I have no data concerning the current IPC of Core 2 / K8, but if Core 2 could really do 3.5 IPC, then of course K8L would never be able to match it with just a 3-issue design; however i believe the actual number should be far from it.
Unless we're talkin here about IPC under ideal conditions.
Just because a CPU can issue 4/5 instructions per clock (but in fact, it can only dispatch and retire 4 instructions per clock, the macro op fusion is a trick to treat a compare and a jump as a single instruction, which logically makes sense, since it's very common to use conditional branches, and the jump instruction itself does not need to be really executed into an ALU), it does not mean that it is also able to process and retire that many under most common conditions.
Especially when a CPU has to access main memory, up to a hundred clocks are wasted sitting idle, or a dozen in case of branch misprediction; in a way, having a high issue rate helps dealing with memory latency issues and branch mispredictions, because you quickly refill the buffers/schedulers.
But to achieve a really higher IPC, having a wide front end (issue, decode, dispatch, schedule) is not enough, you need to have also a wider back end (execution units, ALUs).
If we look at K8 and Core architectures (anandtech)
we can notice the following:
* Core has 2 integer ALUs + 1 branch/integer ALU, K8 has 3 general purpose integer ALUs
* Core has 2 address generation units, K8 has 3; this can give
* Both have the same number of floating point units, 2, however, Core has a huge advantage due to the enhanced SSE engine, which can process 2x 128bit ALU instructions per clock (yes it can also do SSE loads/stores in parallel to that)
Now K8L will have a similar SSE processing power as Core, and it seems to have a slight advantage in terms of buffers and schedulers and integer units.
It is also still unclear whether K8L's OOO loads will match the reordering flexibility of Core's memory disambiguation; however K8L should enjoy the lower latency thanks to the integrated memory controller, and 3 levels of cache.
Of course there are still too little details available about K8L's architecture, but from what we can see now, the 2 architectures should perform very similarly, at least on single threaded applications.
I'd be really surprised to see K8L, given this data, outperform Core 2 by a margin higher than 10% clock for clock; i see it more likely to have both CPUs within a 5% delta, and it's even possible that Core will still be the better performer.
The area where Core seems to still have the advantage, though is clock speed, where it's not unreasonable to predict a 20% advantage for Intel.
AMD will have a better platform though, and the 4 cores much better interconnects, so on multithreaded applications, i guess K8L will be very competitive.
This is all wild speculation though. :) 

Nice explanation. I agree with everything you said, just you forgot that the 45nm shrink of Core2 will have better SSE engine, the caches will be larger and other architectural improvements are possible also (reduced latencies, prefetchers, branching and etc.). If I take this into account and if I take into account that K8L will come at same time with the 45nm Core2 shrink, I am suspicious if K8L(acording to the availble data today) will match C2 clock for clock.
October 28, 2006 1:31:08 PM

Quote:
55% is quite a thing to achieve. Is it even possible? thats like jumping from the 486 to the P3. In one year.

It just takes the old Tom's charts, there also are some AthlonXPs or Semprons (socket A) and it won't be hard to compare them with an equally cached and tuned A64. However, as I said, te advantages of K8 over K7 were more due to HTT, AMD64, added SSE2 and SSE3, cache latencies and ntroduction of an IMC. The processing units themselves did not get a thorough revoew, even though the pipeline streched from 10 to 12.
All this said, we could guess wrong if we estimete the outcomes of K8L because it has radical bus and registry widenings.
October 28, 2006 1:44:01 PM

Quote:
55% is quite a thing to achieve. Is it even possible? thats like jumping from the 486 to the P3. In one year.


DO you remember the argument we had about 60-80% according to AMD? They were speaking of perf\watt but just cutting wattage won't do it alone.

I'm confident that the additons will total 60-80% over K8. I will still be interested to see the first A1 numbers. I would guess they will get to 20-30% higher.

Can you actually quantify 60-80% improvement in total?

Does this include the packaging being 5% better? The name being a small improvement?

What a load of fud.


Sure I can. Go to Ars Technica. Or you can go to ExtremeTech. AMD reported that they expect 60% at least.

superscalar improvements for retires (IPC-enhanced core)
2x128b loads
128b L1 data
128b L1 instr
2x128b FP
2x128b SSE3
shared L3
exclusive L1
exclusive L2

This DIgitLife article shows that K8 (not Barcelona) is already a lot more efficient when L2 is filled.
Linkage!


Jack posted a link to a university which shows that it's possible to increase IPC without increasing decoders by adding FIFO preschedulers.


All in all, the demo scheduled for Dec will tell what can be expected from final silicon. I'm confident that Dirk Meyer's team will do the job and improve over K8 by their expected 60%.

Why would we think that adding TWO cores BY ITSELF WOUDN"T GIVE THEM 40-60% improvement over Opteron.

We haven't even gotten to the improvements that Extreme Tech mentions.
October 28, 2006 2:45:14 PM

Quote:
Why would we think that adding TWO cores BY ITSELF WOUDN"T GIVE THEM 40-60% improvement over Opteron.

But its obvious that you would expect to see at least a 60% improvement from 2 to 4 cores in suitably multithreaded applications. This means nothing though, because thats just like saying Kentsfield has a similar "improvement" over Conroe.
October 28, 2006 2:49:01 PM

Morning all.
Lets not have a simple question spark any kind of flames here. Keep it chill everyone.

With that, I'm just going to wait and see, and if I have the want, test out K8L myself.
October 28, 2006 3:51:49 PM

Good morning Ninja, it is evening here :) 
I need some more BS before I go into flaming mode. Do you have your popcorn ready? :wink:
October 28, 2006 3:57:13 PM

I'm not sure for the Mediterranean, but over here in the morning, we prefer sugary cereal when watching something entertaining in the morning. Such as cartoons, or flaming....

Therefore, I've broken out this:


Have at it.
October 28, 2006 4:18:32 PM

They do look tasty, in an artificially flavoured and coloured sort of way... :D 
October 28, 2006 4:46:59 PM

Quote:
They do look tasty, in an artificially flavoured and coloured sort of way... :D 
flaming is better.
October 28, 2006 4:58:58 PM

Quote:
Why would we think that adding TWO cores BY ITSELF WOUDN"T GIVE THEM 40-60% improvement over Opteron.

But its obvious that you would expect to see at least a 60% improvement from 2 to 4 cores in suitably multithreaded applications. This means nothing though, because thats just like saying Kentsfield has a similar "improvement" over Conroe.

That's exactly the point. Barcelona is designed for SQL, Oracle, Exchange, etc which will take as many cores as you can give them.
October 28, 2006 5:15:26 PM

Quote:
Barcelona is designed for SQL, Oracle, Exchange, etc which will take as many cores as you can give them.

OMG....... :roll:
Nostradamus.........For the good of this forum, please don't start! :x

P.S. fames are being prepared :wink:
October 28, 2006 5:20:28 PM

Hey Ninja, checked out those old charts (Tom's does not have them anymore but I had saved some PNGs) few figures are like:
Multitasking1: AthlonXP 3200+ (2.2G) 4:19 A64 3500+(2.2G) 3:12 > +34%
3DMax rendering: AXP 3200+ 2:01 A64 3500+ 1:43 > +17.4%
October 28, 2006 5:33:44 PM

Interesting. I'll look to see if I have the link somewhere..
October 28, 2006 5:53:23 PM

Well, it's already difficult to predict K8L Vs Core 2, now it's too early to speculate about the next revision of Core 2 as well ;) 
Now, in terms of foreseeing the future, it would also be interesting to know if and when AMD plans to introduce what is unofficially known as "K10".
This (anandtech) is the last technology roadmap that i've seen from AMD, and we can clearly see (in the last slide) that for 2007 it should be introduced a "Next Generation Core", which right now we know is K8L (whether it is truly a next generation, this is highly debatable) and in 2008 it is scheduled a "Core Update".
Now, since in common language, "Next Generation Core" is greater than "Core Update", we can expect only stuff like SSE4 and a die shrink or some other minor tweaking for 2008.
So the question is, when are we going to see K10? (if at all?)
Right now, it seems at earliest in 2009.
My impression is that AMD is focusing too much on platform, scalability and throughput, with faster interconnects and coprocessors (for example), which probably makes a lot of sense for servers and enterprise applications, but the risk is to bury themselves into a niche market.
I honestly do not see the mainstream/desktop market needing more than 4 cores in the foreseable future.
October 28, 2006 6:43:12 PM

Quote:

Jack posted a link to a university which shows that it's possible to increase IPC without increasing decoders by adding FIFO preschedulers.

Wait, here i believe we have to clarify a couple of concepts concerning IPC.
One thing is maximum IPC or peak IPC, which is the maximum number of instructions which can be executed in parallel, from decode to retirement.
Such a number cannot be increased without increasing the number of decoders, or without doing predecode smart tricks like instruction fusion; if you do not decode and issue more than, say, 3 instructions, then your IPC will never be higher than 3.
In this sense, the K8 core can decode and retire at most 3 instructions per clock; Core 2 can effectively decode and retire up to 4 instructions, but in a special case, when the predecode logic detects a jump and a compare instructions, it merges them into a "compareAndJump" instruction and then sends it to a decoder.
However, maximum IPC is not such an important metric, and an example of this, is Prescott, which had a maximum IPC of 4 ((link)).
What is really meaningful is the real average IPC which can be measured during the execution of a program, which will always be lower than the theoretical maximum, and it depends on pipeline stalls due to control hazards (branch misprediction), data hazards (dependencies among instructions, which make it impossible to execute them in parallel, cause one generates results which have to be used by the others) and resource hazards (if all ALUs are already busy processing other instructions, if the cache cannot provide more data per clock, if there is a load miss, etc).
As such, the "real world" average IPC varies widely depending on the instruction mix of a program, and also on the actual data set that the program is processing, beside the hw complexity of the CPU which executes it.
Any kind of schedulers (and i guess also your pre-schedulers) can only increase the real world IPC by preventing some of those conflicts to occur, but scheduling alone cannot increase the maximum theoretical IPC.
That said, i don't believe the number of decoders to be a real limiting factor for K8L compared to Core 2 (as i already mentioned, such a situation was already existent against Prescott); since the back end (or execution engine) of both CPUs is rather similar, i'd expect them to have very similar IPC under most conditions.
The wider decoding bandwidth comes handy when there is a load miss for example, cause then your internal buffers are empty due to the pipelines waiting for data from memory for hundreds of clocks, or in case of a mispredicted branch, when you have to cancel 12 decoded instructions which were already in-flight.
However K8L should have lower memory latency thanks to the IMC, and it seems that the branch predictor is also going to improve.
In the end, i still see the 2 CPUs to perform very closely in terms of IPC; the biggest difference seem to be clock speed, where Intel has a clear advantage, and platform architecture and bandwidth, where AMD has the advantage.
But a 20% clock speed advantage is hard to overcome, at least for desktop/mainstream applications; i'm quite confident AMD could do very well on the server/workstation multithreaded workloads instead.
October 28, 2006 10:21:05 PM

Quote:
Barcelona is designed for SQL, Oracle, Exchange, etc which will take as many cores as you can give them.

OMG....... :roll:
Nostradamus.........For the good of this forum, please don't start! :x

P.S. fames are being prepared :wink:
:)  Don't be too rough.... I read somewhere where AMD stated they were designing optimized for a target of two general application markets. Multimedia and scientific/computing intensive. Not sure where I read that, I will try to find the link. I don't recall AMD making a statement about SQL, Oracle, or Exchange specifically.


Firstly, everyone here should know that one of AMDs strong points is TPC. This is SQL\Oracle. MS uses Opteron to build nearly all server apps - including Windows. Oracle recently presented with AMD and Dell.

Exchange is again an MS server app. They are all being pushed to X64 after Vista. They have already been optimized for multicore.









I mean after all, what runs on servers? The things I mentioned. AMD is not creating Barcelona for Oblivion.
October 29, 2006 12:10:49 AM

Quote:
Baron, what you posted was mareketing speak for commerical space, if they had a bunch of gamers in the room they would be touting oblivious, quake 4 and such as being their strength.

All I said was I did not see AMD specify that they were optimizing the architecture for specific companies and databases during any technical presentation for K8L. It goes without saying, AMD is pushing more strength into server space with K8L --- DT will not benefit as much.

Hence, all the 'bandwidth' improvements as server apps are the most bandwidth hungry.

Jack



That's the problem with you guys, you read emotionally. A person who read that with no emotion would understand that Opteron Barcelona is a server chip and SQL, Oracle and Exchange are server apps.


Of course they do additional work to make sure these things run as efficiently as possible to their "max capability," but the design is, as K8 Opteron was, specifically targeting servers.

By handling the high use high end it becomes easier to lessen specs to create desktop and mobile.

The desktop and mobile variants are going to be dual core in this case.

Can we at least agree that physics dictates a close to 40% increase by adding the two cores and beefing up the XBar?

IF the other changes equal 20% all together that brings a minimum of 60% dependent upon the effect of adding the two cores and the efficiency of the improvements.

I can point to the Prescott article linked by someone up higher that discusses the superscalar nature of Prescott - though it was more than likely hindered by the extended pipeline. Barcelona should remain 14 stages so misses won't cost nearly as much and K8s predictor is hitting 95%.
October 29, 2006 12:54:35 AM

Quote:
Baron, what you posted was mareketing speak for commerical space, if they had a bunch of gamers in the room they would be touting oblivious, quake 4 and such as being their strength.

All I said was I did not see AMD specify that they were optimizing the architecture for specific companies and databases during any technical presentation for K8L. It goes without saying, AMD is pushing more strength into server space with K8L --- DT will not benefit as much.

Hence, all the 'bandwidth' improvements as server apps are the most bandwidth hungry.

Jack

In this light, merging with ATi makes even more sense because they can make their own server chipsets (more dedicated and easier than desktop chipsets). But they can't afford to make their DT market collapse, so what the heck are they going to do?! How do you think K8L will perform for the desktop because if it does not show some superiority, I bet Intel is going to sit on Core2 for much longer that it's charts predict?
October 29, 2006 1:12:06 AM

For those of you too impatient to wait for benchmarks of Barcelona to be released, there ARE some leaked benchmarks for FP performance floating around!!! (Pun intended) Good luck finding them!! :lol:  :lol: 
October 29, 2006 1:46:19 AM

Quote:
For those of you too impatient to wait for benchmarks of Barcelona to be released, there ARE some leaked benchmarks for FP performance floating around!!! (Pun intended) Good luck finding them!! :lol:  :lol: 


Impatient is enough.

As for Barcelona, are you talking about Gaudi's Sacred Family or is it the UAB, where alot of AMD's work is being done?

Good luck to you too... floating around.


Cheers!
October 29, 2006 1:59:35 AM

Quote:

If all goes well, Nahelem will be released in 2008, that will be a new architecture it is that the degree of changes will not be quite as many or as large as Netburst to Core.


Netburst to Core is irrelevant. Interesting question is P6 vs Core2 ;) 

Mirek
October 29, 2006 3:11:31 AM

There is a lot more of netburst in conroe than most people see.
It's pipes are half way between the two.
Netburst really put preasure on the process crew. That is why they can supply the quality of silicon that core relies on.
Where would core be without netburst's prefetch. Is the cache from P6?
What other route would have given Intel higher IPC on longer pipelines?
Conroe really is the best of bothe .
October 29, 2006 8:16:03 AM

Quote:


I don't think you will see a revolutionary change in Intel architecture in 2008 with nehalem, but who knows.

I agree.
The point is that there's no definition for a "new architecture" nowadays.
Intel has always significantly updated their cores, with the introduction of new instructions, larger caches, and even more important redesigns, like Hyperthreading (which IMO was a great idea and something which should be reproposed for a 6 scalar CPU, for example).
And if we look at Prescott Vs Netburst, well if Prescott is not a new uArchitecture, then i really don't know what is a new uArchitecture:
- huge pipeline redesign (20 to 31 stages)
- improved branch prediction
- doubled L1 cache
- L2 cache from 512Kb to 1Mb and ultimetely 2Mb
- better integer execution units: a barrel shifter and a dedicated integer multiplier (finally!)
- 4 uOps issued instead of 3
- SSE3
- (later) EM64T
- (later) VT virtualization
So IMO, this claim of a "new core" every 2 years must be taken with a grain of salt.
      • 1 / 3
      • 2
      • 3
      • Newest
!