Sign in with
Sign up | Sign in
Your question

Realistic look at K8L vs Clovertown

Last response: in CPUs
Share
January 25, 2007 8:01:39 PM

Barcelona will follow the Intel c2d lead and move to a single clock cycle 128 SSE execution engine.

Best I can tell from the info out there both Conroe and Barcelona will be, on average, able to issue 4 instructions per clock cycle. They differ in how the buffer/issue these instuctions and don't know enough to even guess which mechanism will work better in the real world on real code.

Barcelona will be able to load two 128 bit SSe instructions per clock, versus only 1 for Conroe which should give Team Green a bit of an edge on the Floating point side, while Intel's faster/better prefetch mechanisms will keep it ahead on integer code.

Predictions:

1 - The fastest K8L chip will be very close to the fastest Conroe chip (within 5%)

2 - When the 45 nano Core2due chips hit Intel retakes the lead by, roughly, the advantage of 1 process generation.
January 25, 2007 8:38:03 PM

Yep. Barcelona has a 32Byte fetch but remains 3 issue elsewhere.
January 25, 2007 8:40:39 PM

Quote:
Barcelona will follow the Intel c2d lead and move to a single clock cycle 128 SSE execution engine.

Best I can tell from the info out there both Conroe and Barcelona will be, on average, able to issue 4 instructions per clock cycle. They differ in how the buffer/issue these instuctions and don't know enough to even guess which mechanism will work better in the real world on real code.

Barcelona will be able to load two 128 bit SSe instructions per clock, versus only 1 for Conroe which should give Team Green a bit of an edge on the Floating point side, while Intel's faster/better prefetch mechanisms will keep it ahead on integer code.

Predictions:

1 - The fastest K8L chip will be very close to the fastest Conroe chip (within 5%)

2 - When the 45 nano Core2due chips hit Intel retakes the lead by, roughly, the advantage of 1 process generation.


Do you have some kind of split personality?
Related resources
January 25, 2007 8:51:01 PM

Quote:
Barcelona will follow the Intel c2d lead and move to a single clock cycle 128 SSE execution engine.

Best I can tell from the info out there both Conroe and Barcelona will be, on average, able to issue 4 instructions per clock cycle. They differ in how the buffer/issue these instuctions and don't know enough to even guess which mechanism will work better in the real world on real code.

Barcelona will be able to load two 128 bit SSe instructions per clock, versus only 1 for Conroe which should give Team Green a bit of an edge on the Floating point side, while Intel's faster/better prefetch mechanisms will keep it ahead on integer code.

Predictions:

1 - The fastest K8L chip will be very close to the fastest Conroe chip (within 5%)

2 - When the 45 nano Core2due chips hit Intel retakes the lead by, roughly, the advantage of 1 process generation.


Do you have some kind of split personality?


I like for people to try ro pigeon hole me ...then I come a whole differnt way and wierd them out. Its Fun. I even got called an Intelliot today.
January 25, 2007 8:53:17 PM

Baron..... two words for you.


Fandango mango.



OH and apparently you treasure Intel over ATM's ability to dispense money.


Also: has there been any plans by intel to increase clock speed when it switches to 45nm production?
January 25, 2007 8:53:24 PM

Quote:
Barcelona will follow the Intel c2d lead and move to a single clock cycle 128 SSE execution engine.

Best I can tell from the info out there both Conroe and Barcelona will be, on average, able to issue 4 instructions per clock cycle. They differ in how the buffer/issue these instuctions and don't know enough to even guess which mechanism will work better in the real world on real code.

Barcelona will be able to load two 128 bit SSe instructions per clock, versus only 1 for Conroe which should give Team Green a bit of an edge on the Floating point side, while Intel's faster/better prefetch mechanisms will keep it ahead on integer code.

Predictions:

1 - The fastest K8L chip will be very close to the fastest Conroe chip (within 5%)

2 - When the 45 nano Core2due chips hit Intel retakes the lead by, roughly, the advantage of 1 process generation.


Do you have some kind of split personality?

roger that. once minute he's predicting AMD's doom and next he's hailing barcelona. [/shrugs]
January 25, 2007 8:54:43 PM

:lol: 
January 25, 2007 9:23:09 PM

Quote:
Barcelona will follow the Intel c2d lead and move to a single clock cycle 128 SSE execution engine.

Best I can tell from the info out there both Conroe and Barcelona will be, on average, able to issue 4 instructions per clock cycle. They differ in how the buffer/issue these instuctions and don't know enough to even guess which mechanism will work better in the real world on real code.

Barcelona will be able to load two 128 bit SSe instructions per clock, versus only 1 for Conroe which should give Team Green a bit of an edge on the Floating point side, while Intel's faster/better prefetch mechanisms will keep it ahead on integer code.

Predictions:

1 - The fastest K8L chip will be very close to the fastest Conroe chip (within 5%)

2 - When the 45 nano Core2due chips hit Intel retakes the lead by, roughly, the advantage of 1 process generation.


Do you have some kind of split personality?
Makes you wonder don't it.
a b à CPUs
January 25, 2007 9:25:36 PM

Quote:
Baron..... two words for you.


Fandango mango.



OH and apparently you treasure Intel over ATM's ability to dispense money.


Also: has there been any plans by intel to increase clock speed when it switches to 45nm production?


thats the whole purpose of the die shrink
January 25, 2007 9:36:48 PM

PopeGold X maybe?
January 25, 2007 10:03:56 PM

No its not. If it was AMD would be faster with their new chips.


The other purposes are
--> Cheaper
--> Less heat
-->... and faster if possible
January 25, 2007 11:05:11 PM

I think you're way off, there are a lot of things that aren't disclosed about architecture that you really cannot predict. It's an interesting prediction, but I don't support it with no real backing. And if that is your only reason for prediction I don't know what to tell you except you have a flawed system of prediction. So, let's wait and see.
January 25, 2007 11:09:23 PM

BI-POLAR
January 25, 2007 11:12:19 PM

Quote:
I think you're way off, there are a lot of things that aren't disclosed about architecture that you really cannot predict. It's an interesting prediction, but I don't support it with no real backing. And if that is your only reason for prediction I don't know what to tell you except you have a flawed system of prediction. So, let's wait and see.



The biggest key to Barcelona is that is was DESIGNED for AMDs 65nm process. Brisbane is the second shrink of a 130nm arch(perhaps why there won't be a Brisbane Opteron). Maybe Barcelona's numbers are living up to AMDs SSOI dual stressor(?) tech which is supposed to switch much faster.
January 25, 2007 11:35:31 PM

Barcelona has a few advantages over Clovertown which will probably give it a modest IPC advantage.

The width of the SSE execution blocks has been expanded to 128 bits, so there is now no need to split 128 bit SSE vector instructions in two parts. (Two 64 bit halves like K8 did/does)

K8 is "one and a half pumped" (Three executions half SSE executions per two clock cycles) so ~~assuming~~ AMD can keep this 3/2 ratio at 128 bits, and decode 2 instructions per clock, then ~~theoretically~~ Barcelona will do 128-bit SSE macro-ops at a rate of 3 instructions per cycle. - That's a lot of "ifs" but based on AMD originated hype, they ~~may~~ have pulled it off.
January 25, 2007 11:46:46 PM

Quote:
I think you're way off, there are a lot of things that aren't disclosed about architecture that you really cannot predict. It's an interesting prediction, but I don't support it with no real backing. And if that is your only reason for prediction I don't know what to tell you except you have a flawed system of prediction. So, let's wait and see.



The biggest key to Barcelona is that is was DESIGNED for AMDs 65nm process. Brisbane is the second shrink of a 130nm arch(perhaps why there won't be a Brisbane Opteron). Maybe Barcelona's numbers are living up to AMDs SSOI dual stressor(?) tech which is supposed to switch much faster.

I think I would rather just wait for the first independent tests on ES CPUs.

I would really like to see how well Barcelona runs, and not just task manager.

It will be an interesting Spring/Summer, for sure.
January 26, 2007 12:51:30 AM

Quote:
I think you're way off, there are a lot of things that aren't disclosed about architecture that you really cannot predict. It's an interesting prediction, but I don't support it with no real backing. And if that is your only reason for prediction I don't know what to tell you except you have a flawed system of prediction. So, let's wait and see.



The biggest key to Barcelona is that is was DESIGNED for AMDs 65nm process. Brisbane is the second shrink of a 130nm arch(perhaps why there won't be a Brisbane Opteron). Maybe Barcelona's numbers are living up to AMDs SSOI dual stressor(?) tech which is supposed to switch much faster.

I think I would rather just wait for the first independent tests on ES CPUs.

I would really like to see how well Barcelona runs, and not just task manager.

It will be an interesting Spring/Summer, for sure.

Stop it with the Task Manager, please. It showed 16 cores running at 100%. If you also look you'll see that RAM is low so very few services are running. Obviously AMD has some load software that uses little RAM.
January 26, 2007 1:46:30 AM

Quote:
Barcelona will follow the Intel c2d lead and move to a single clock cycle 128 SSE execution engine.

Best I can tell from the info out there both Conroe and Barcelona will be, on average, able to issue 4 instructions per clock cycle. They differ in how the buffer/issue these instuctions and don't know enough to even guess which mechanism will work better in the real world on real code.

Barcelona will be able to load two 128 bit SSe instructions per clock, versus only 1 for Conroe which should give Team Green a bit of an edge on the Floating point side, while Intel's faster/better prefetch mechanisms will keep it ahead on integer code.

Predictions:

1 - The fastest K8L chip will be very close to the fastest Conroe chip (within 5%)

2 - When the 45 nano Core2due chips hit Intel retakes the lead by, roughly, the advantage of 1 process generation.


Do you have some kind of split personality?


i agree.
January 26, 2007 2:57:49 AM

Quote:
I think you're way off, there are a lot of things that aren't disclosed about architecture that you really cannot predict. It's an interesting prediction, but I don't support it with no real backing. And if that is your only reason for prediction I don't know what to tell you except you have a flawed system of prediction. So, let's wait and see.



The biggest key to Barcelona is that is was DESIGNED for AMDs 65nm process. Brisbane is the second shrink of a 130nm arch(perhaps why there won't be a Brisbane Opteron). Maybe Barcelona's numbers are living up to AMDs SSOI dual stressor(?) tech which is supposed to switch much faster.

I think I would rather just wait for the first independent tests on ES CPUs.

I would really like to see how well Barcelona runs, and not just task manager.

It will be an interesting Spring/Summer, for sure.

Stop it with the Task Manager, please. It showed 16 cores running at 100%. If you also look you'll see that RAM is low so very few services are running. Obviously AMD has some load software that uses little RAM.

Why? Not to trying to start anything, but that's all AMD showcased, and right now, that's all most people have to really judge Barcelona.

If they ran something else, or something along with the task manager, I wouldn't have to just mention it. But I won't try to get under your skin about it.

As for seeing other services, I didn't see anything else but 16 CPU windows running. At full load? Not even sure about that, to be honest.

I would still like to see the Barcelona CPU running more intensive apps/software, and see how it really stacks up, though. I'm not going to go gaa-gaa over an executive saying bold statements.
January 26, 2007 3:34:10 AM

Quote:
@ Everyone else --- hey, layoff this is a good thread.

Don't tell me to lay off, because your idea of a good thread is an opinion.
January 26, 2007 3:40:21 AM

Then sticking to the discussion I don't agree that this is a realistic look, because there is much more to a cpu that numbers and what you think FPU performance will be, 90 percent of Barcelona is built from the ground up.
January 26, 2007 3:44:57 AM

Don't steal other people's threads, if you want it to be known post it in a new topic.
January 26, 2007 3:46:24 AM

Quote:
Don't steal other people's threads, if you want it to be known post it in a new topic.
It has to do with Clovertown..so bite me!
January 26, 2007 3:56:00 AM

Quote:
Then sticking to the discussion I don't agree that this is a realistic look, because there is much more to a cpu that numbers and what you think FPU performance will be, 90 percent of Barcelona is built from the ground up.


This I don't disagree with entirely -- however, much of the fundamental architectural features remain.... it is still a super scalar OOOe core that is, in fact, only 3 issues wide, though it will fetch 32 bytes at a time (C2D will fetch 24). From the list presented at the Spring Microprocessor Forum, the list that made all the headlines, roughtly 1/2 of the work is being done to improve bandwidth --- this is full blown, ground up server targetted processor revision.

The biggest enhancement is definitely FPU and improvements to the SSE registers for single cycle 128 bit SSE execution.

On the FPU side, this will certainly be a huge improvement, and though Core 2 Duo right now shows strong FPU over K8, the FPU performance over K8 is not nearly as prevelant as the other functions. Barcelona, if it makes it out of the fab, will be an FPU beast.

SSE cycle will likely match to slightly exceed Intel clock for clock overall, but integer performance will likely remain in Intel's camp.

So you are correct, there is more to a CPU than FPU but AMD has focused on FPU, and attempts to widen the reorder buffer and increase the fetch depth in order to keept he 3 issues full. It remains to be seen how effectively they can do this, right now AMD can max theoritically fetch, decode, and retire 3 instructions per clock.... C2D can do 4+1 at the peak. However, there is much more to just the width of a CPU and the WIDTH!=IPC. How much IPC enhancement they have done will only be available with benchmarks.... expecting 40% or estimating 1.8x improvement is not the same as measuring it.

Jack
Right, and you really cannot say well, Core 2 has the feature that's 32bits and Barcelona has that same feature that's 32 bits, so they are equal, because it's not like they both built the architecture with the same general map and just made some things different here and there. They are completely different!
a b à CPUs
January 26, 2007 3:58:41 AM

Quote:
No its not. If it was AMD would be faster with their new chips.


The other purposes are
--> Cheaper
--> Less heat
-->... and faster if possible


yeah that too but clock speed for intel especially the P4 era (error), 180->130->90nm was more ghz, the die remained the same or similar size because of cache etc.
January 26, 2007 4:00:08 AM

You call your post "a realistic look" and then I see the word "prediction" in it.

K8L is not reality yet, so spare us your bull**it Nostradamus.
January 26, 2007 4:05:29 AM

Quote:
Barcelona will follow the Intel c2d lead and move to a single clock cycle 128 SSE execution engine.

Best I can tell from the info out there both Conroe and Barcelona will be, on average, able to issue 4 instructions per clock cycle. They differ in how the buffer/issue these instuctions and don't know enough to even guess which mechanism will work better in the real world on real code.

Barcelona will be able to load two 128 bit SSe instructions per clock, versus only 1 for Conroe which should give Team Green a bit of an edge on the Floating point side, while Intel's faster/better prefetch mechanisms will keep it ahead on integer code.

Predictions:

1 - The fastest K8L chip will be very close to the fastest Conroe chip (within 5%)

2 - When the 45 nano Core2due chips hit Intel retakes the lead by, roughly, the advantage of 1 process generation.


The first post in this thread is remarkable similar to a comment over at geek.com

Can we all say "cut and paste"... ????

Can I claim royalties or something? :p  :p  :p 

http://www.geek.com/news/geeknews/2007Jan/bch2007012500...

Hard to know... (3:45pm EST Thu Jan 25 2007)

Barcelona will follow the Intel c2d lead and move to a single clock cycle 128 SSE execution engine.

Best I can tell from the info out there both Conroe and Barcelona will be, on average, able to issue 4 instructions per clock cycle. They differ in how the buffer/issue these instuctions and don't know enough to even guess which mechanism will work better in the real world on real code.

Barcelona will be able to load two 128 bit SSe instructions per clock, versus only 1 for Conroe which should give Team Green a bit of an edge on the Floating point side, while Intel's faster/better prefetch mechanisms will keep it ahead on integer code.

Predictions:

1 - The fastest K8L chip will be very close to the fastest Conroe chip (within 5%)

2 - When the 45 nano Core2due chips hit Intel retakes the lead by, roughly, the advantage of 1 process generation.

3 - 45 nano Conroe gets pull in to 3rd Q 2007 (barely) - by vorlon
January 26, 2007 4:11:10 AM

Quote:
Don't steal other people's threads, if you want it to be known post it in a new topic.
It has to do with Clovertown..so bite me!

Even still... You did attempt to steal the thread. :wink:
January 26, 2007 4:13:41 AM

Quote:
Predictions:

1 - The fastest K8L chip will be very close to the fastest Conroe chip (within 5%)

2 - When the 45 nano Core2due chips hit Intel retakes the lead by, roughly, the advantage of 1 process generation.


I like this prediction. Though people don't like talking about the truth. If there's some variation to the prediction, it should be K8L vs Conroe in single/dual threaded apps. Kentsfield is bandwidth limited and will perform less per clock. Most apps won't show this however, but multi-tasking to fully saturate the CPU it will.

http://www.xbitlabs.com/articles/cpu/display/amd-quad-f...

See the last benchmark on the page. Though, I agree running 3 WinRAR processes in the background with Quake 4 is quite an unrealistic situation for most of the people, it'll be reached with workstation/server environments where Clovertown is used.
January 26, 2007 4:38:01 AM

Quote:
Don't steal other people's threads, if you want it to be known post it in a new topic.
It has to do with Clovertown..so bite me!

Even still... You did attempt to steal the thread. :wink:That wasn't my intention, and i'm sorry about that....I don't think it was worthy of a thread of it's own, but still interesting, so i put it in a thread dealing with Clovertown. :wink:
January 26, 2007 8:19:09 AM

Just to add, don't forget that AMD was talking about Barcelona as a modular design as well, long before FUSION. I think Barcelona/Shanghai is a stepping stone to FUSION. Because to start from scratch, it's gotta be pretty tough to design something that complex and simple in 2-3 years.
January 26, 2007 10:25:29 AM

The parts which I noted most in the article in "The Utility Belt" was:

Quote:
AMD will begin shopping the Barcelona chip around to customers in the April-June time frame (so, in about three months).


and

Quote:
Allen said Q4 2007 will be when the first real impact of Barcelona comes through in AMD's financial statements.


By customers, I suppose the big vendors as HP and Dell. That the time schedule is so flexible (april-june) and that it is only to vendors, must mean that Engineering Samples will first be ready by then. So any performance figures by now, must be projections from design or very early prototypes (so early, that they wont even show them to HP and Dell). Or the flexible timeline and no ES's means that they still are revising the design to meet their performance goal.
That it will be until Q407 until it can be felt financially, must mean that Barcelona first will ship by the end of Q3, which is later than first thought - and about the same time as Intels Penryn die shrink comes to market.
The whole announcement looks like a cover up for a delay.
January 26, 2007 10:48:57 AM

His prediction on Intel 45nm is real.
Intel is not going to transit to 45nm fast this year, according to HKEPC.
January 26, 2007 11:35:46 AM

Quote:
@Lordpope --- better, admirable, nicely done. I will come back with a post to discuss this...

@ Everyone else --- hey, layoff this is a good thread.

Finally, for AMD fans, this is a more credible blurb than Charlie D. and confirms the idea.

http://blogs.business2.com/utilitybelt/2007/01/amds_bri...
AMD will begin shopping the Barcelona chip around to customers in the April-June time frame (so, in about three months).


That's funny someone posted saying 32-bit is obsolete :lol: 
January 26, 2007 12:58:12 PM

Quote:
C2D will fetch 24

You mean 45nm Core2 will fetch 24 bytes?
Because 65nm Core2 fetches only 16 bytes.
January 26, 2007 1:05:44 PM

Quote:
C2D will fetch 24

You mean 45nm Core2 will fetch 24 bytes?
Because 65nm Core2 fetches only 16 bytes.

In fact I don't think the 65nm => 45nm transition is a pure die-shrink as the 45nm quad-core will be of a native design.
January 26, 2007 1:23:48 PM

the 45nm Core2 Quad will be 2 die MCM quadcore, just like Clovertwon/Kentsfield.
It will have more 2x6MB of shared L2 per die, FSB1333 and SSE4.
On architecture level, I don't know what changes will be involved. That's why I ask.
January 26, 2007 2:23:02 PM

Quote:
the 45nm Core2 Quad will be 2 die MCM quadcore, just like Clovertwon/Kentsfield.
It will have more 2x6MB of shared L2 per die, FSB1333 and SSE4.
On architecture level, I don't know what changes will be involved. That's why I ask.


Any source? I have some articles about single-die Yorkfield.
a c 102 à CPUs
January 26, 2007 3:12:14 PM

Great find, Jack, very interesting read, especially the part about the design/lead time for the CPU. I'd guessed that they would have had to start on the Barcelona a couple of years back, and starting right after as the K8 dual-cores were beginning to be produced makes a lot of sense. Apart from transitioning to 65nm and DDR2, AMD hadn't done very much with their CPUs since the X2s came out, and that would suggest that their resources were going elsewhere.

About the performance...I'll say the same as I did before the Conroe shipped and we were all taking stabs at its performance figures- "the new chip will certainly be competitive with the other manufacturer's parts and probably will be faster in most things. How much faster- we don't know yet until it gets benched."

Only time and benches will tell.
a c 102 à CPUs
January 26, 2007 3:16:41 PM

A simple infinite loop would suffice. Just start 16 copies and you have a way to load up 16 cores while using virtually no RAM.
a c 102 à CPUs
January 26, 2007 3:23:48 PM

About the only application that I know of that will saturate the Kentsfield's FSB at 2.67 GHz core and with a 1066 MHz FSB is SPECrate_fp. That's a synthetic bench and the its results versus other applications show that most applications aren't really all that RAM bandwidth dependent.

Hey, I have an idea. Jack, you have a QX6700 and were going to test the FSB speed versus performance, right? Why not get SPECrate_fp and see what FSB versus core speed it takes to get a more or less linear scaling?
January 26, 2007 4:17:03 PM

Quote:


Stop it with the Task Manager, please. It showed 16 cores running at 100%. If you also look you'll see that RAM is low so very few services are running. Obviously AMD has some load software that uses little RAM.


Could you link where it said 100% load?

Look at Task Manager in the video. http://virtualexperience.amd.com


Bottom right link
January 26, 2007 10:29:06 PM

Quote:
You mean 45nm Core2 will fetch 24 bytes?
Because 65nm Core2 fetches only 16 bytes.


http://www.realworldtech.com/page.cfm?ArticleID=RWT0309...

"Intel did not disclose precisely the fetch bandwidth, but the average x86 instruction is ~32 bits, and the Core microarchitecture fetches at least five x86 instructions each cycle."

That indicates at least 20 bytes.
January 26, 2007 11:09:29 PM

SUP, i was just wondering if the K8L will allow AMD to match Intel on the Ghz war? thanks l8er :) 
January 26, 2007 11:22:44 PM

Quote:
About the only application that I know of that will saturate the Kentsfield's FSB at 2.67 GHz core and with a 1066 MHz FSB is SPECrate_fp. That's a synthetic bench and the its results versus other applications show that most applications aren't really all that RAM bandwidth dependent.

Hey, I have an idea. Jack, you have a QX6700 and were going to test the FSB speed versus performance, right? Why not get SPECrate_fp and see what FSB versus core speed it takes to get a more or less linear scaling?


That's the problem. AMD is comparing K8L to Clovertown. It's essentially the same chip as Kentsfield but aimed for workstation/server market, and Clovertown has much bigger disadvantage.

The most extreme case of FSB saturation I can think of is TPC-C. Quad cores only improve performance by 30-40%, and even adding faster/more disks improve performance.
January 27, 2007 12:28:55 AM

Quote:
the 45nm Core2 Quad will be 2 die MCM quadcore, just like Clovertwon/Kentsfield.
It will have more 2x6MB of shared L2 per die, FSB1333 and SSE4.
On architecture level, I don't know what changes will be involved. That's why I ask.


Any source? I have some articles about single-die Yorkfield.
I'm confused too. Is Yorkfield the successor to Kentsfield? I thought Intel was shrinking the current 65 nm Core CPUs (single-die dual cores and two-die quad cores) and then releasing a single-die quad core. According to Wikipedia Yorkfield will be two-die, like gOJDO points out. Wikipedia's source is Mike's Hardware. Could anybody clarify this? If so, when is Intel releasing its monolithic quad-core? And wouldn't they then be lagging behind K8L in that regard?
January 27, 2007 12:34:01 AM

Quote:
the 45nm Core2 Quad will be 2 die MCM quadcore, just like Clovertwon/Kentsfield.
It will have more 2x6MB of shared L2 per die, FSB1333 and SSE4.
On architecture level, I don't know what changes will be involved. That's why I ask.


Any source? I have some articles about single-die Yorkfield.

He obviously means Penryn.
January 27, 2007 12:42:20 AM

I asked my friend who does work at Intel and the Yorkfield desktop processor will be two Wolfdale processors in a single package just like Kentsfield.
!