Sign in with
Sign up | Sign in
Your question

Architectures and generations

Last response: in CPUs
Share
May 28, 2006 9:23:16 PM

Given the latest discussions and speculations concerning Intel's "next generation core", i wonder what can be considered "revolutionary" or just "a new architecture" these days...
I mean, if we look at the past, Intel in the last 10 years has released 3 truly revolutionary architectures: P6 (Pentium Pro), Netburst, and IA64 (Itanium).
P6 was the first to establish a CISC front-end / RISC back-end architecture, which provided a quantum leap in performance over what would have otherwise been possible with the crappy x86 ISA.
Netburst, even if so much laughed upon from fanboys, it's simply beautiful, an extreme concept probably, but who would have bet that it was possible to achieve extremely high performance with a 30+ stage pipeline?
And IA64, some say is an evolution of VLIW cpus which were deemed to be dead, but with several interesting concepts like predication and speculative loads.
Now, was the Pentium!!! a real next generation? IMHO, it was just a superchaged P6, with a fast cache and SSE engine.
And the K7? Probably the best architecture designed from AMD, i'd classify it as "half generation" ahead of P6... kind of a big hairy beast, with interesting tradeoffs like the simple branch predictor, which made it possible to compete with P6 in die size while having a much higher IPC.
I know fanboys will flame me, but i don't see K8 as a real new architecture instead; basically it's a K7 with its hard edges smoothed out: sophisticated branch predictor, slightly longer pipeline, and most of all the integrated memory controller which removed the main bottleneck, the EV6 front side bus.
Now some people is claming that Conroe is just a P6, but IMHO it is probably "half generation" ahead of K7/K8, micro/macro op fusion, out of order loads and memory disambiguation are impressive features, beside being just wider/fatter in issue and execution.
Still, from an architectural point of view, i'd say is much less revolutionary than Netburst was..
What do you think?
May 28, 2006 9:47:36 PM

Conroe is a revolution if you compare it to the P4. It is an nice evolution of dothan/yonah. It's sure a nice kick in the ass for single threaded performance though thats for sure. The next revolution so to speak will be the perfect combination of efficiency and scalability in the 4-8core with the compiler's to make it all work.
May 28, 2006 9:54:51 PM

Hmm, yes speaking in terms of performance.
But from an architectural point of view, they exist on parallel lines.
You can say for example that Prescott was an important (and unfortunate) evolution from Northwood, due to the significantly longer pipeline and architectural enhancements to counter the performance penalty there incurred.
Conroe is instead an evolution of the P6 -> Banias -> Dothan line, quite a big evolution, or perhaps a little revolution.
But conceptually, Netburst was more innovative.
Related resources
May 28, 2006 10:00:26 PM

Netburst would have been great with 4thread's/core with like 8megs of L2 running at 6ghz but it never happened. AMD put their corporate balls on Intel's corporate chin and showed them that nobody wanted 200watt procs which is why we have conroe today.
a b à CPUs
May 28, 2006 10:28:47 PM

Quote:
Conroe is a revolution if you compare it to the P4. It is an nice evolution of dothan/yonah. It's sure a nice kick in the ass for single threaded performance though thats for sure. The next revolution so to speak will be the perfect combination of efficiency and scalability in the 4-8core with the compiler's to make it all work.


P4 -> Devolution
May 29, 2006 1:18:01 PM

Quote:
P4 -> Devolution

Partially correct. Only initially, and in the Prescott generation. Netburst's glory days started from Northwood, and ended with Northwood.
a c 99 à CPUs
May 29, 2006 4:02:49 PM

I'd say that was pretty much right. The other parts of the equation are that Intel realized that they pretty much ran out of headroom with Netburst (although they could have gotten a little more out of the 65nm Cedar Mill/Presler if they *really* wanted to, but not enough to beat AMD.) and that the Pentium M chips made by their Israeli division were just as fast or faster than the P4s and in the same ballpark as AMD's chips. Intel was just smart enough to see that the P-M had a good potential to give AMD a run for their money and took it.

However, I am disappointed that they didn't move forward on bringing the Pentium M arch back to the desktop earlier after they saw that Prescott didn't pan out as expected. The fact that Intel made the Preslers and Cedar Mills is a little puzzling since I bet that they could have made a good run with a tweaked Core Duo on the desktop. Maybe the yields weren't there to make THAT many chips or it was just too easy to buy time to make the Core 2 by die-shrinking the Prescott and Smithfield. But a large company like Intel is very slow to change and the marketers had to find a way to retrace their GHz war marketing, so it's not that surprising that it took a while to do. But even though I don't plan to buy one of these chips and am not that impressed by some of Intel's marketing/sales tactics, this is a step forward for the CPU industry and for customers like us.
May 29, 2006 11:06:25 PM

Quote:
Conroe is a revolution if you compare it to the P4. It is an nice evolution of dothan/yonah. It's sure a nice kick in the ass for single threaded performance though thats for sure. The next revolution so to speak will be the perfect combination of efficiency and scalability in the 4-8core with the compiler's to make it all work.


Agree.

However, even "small" steps within a chip's uArch/process may be considered "revolutionary" without, necessarily, making that chip revolutionary (take, for instance, the IMC and strained Si, for instance, at its early stages); these were "small" revolutions, in my opinion, evolving to new generations...

As a side note, I find this "revolution" / "evolution" terms more ambiguous in the microprocessor arena than in biology, for instance.


Cheers!
May 30, 2006 8:43:59 AM

Quote:
You nailed this one... when you look at the DT performance of Yonah clocked at DT speeds it pretty much plasters the P4-netburst.


It probably has to do a LOT with Intel being very very conservative with regards to timings. They probably even tightened it quite a bit after Pentium III 1.13GHz recall.

Whatever reasons you guys disagree with Pentium M not being able to reach higher clock speeds than they are currently at now because of limitations since they overclock so high doesn't matter to Intel. Whether they put the headroom for future clock speeds on the processor or to keep it running stable the way they want to see it, they wouldn't have anyway. It's all just dreams and hope.

What would have been possible is instead Intel making Core Duo being able to run at current Pentium D mobos. They could have probably done that, probably didn't because some marketing conflict they see with Netburst CPUs.
May 30, 2006 9:36:56 AM

Wow, an intelligent thread for once... I must add some something to it!

Conroe isn't a revolution, its just the logical next-step, an evolution of the Dothan/Yonah core processors. P4 had its days, when it was once called Northwood. Even then I didn't think it was that great of a chip. It was good, but not great. The one thing the P4 had over its competition was greater frequencies and larger L2 caches, but that doesn't always make it better. So now I move on to the K7/K8 architecture. From what I've read, I reckon these chips are awesome for their generation. The K7 practically came from nowhere to hurt Intel right in the softspot, and K8 didn't stop either. Of course, those AMD processors came round just at the right time, taking advantage of the flaws of P4.

Netburst isn't a perfect architecture, and I wouldn't say its beautiful, since it is fairly flawed as architectures go. Hell, its getting replaced completely with the Core 2 architecture, which I think is much 'prettier' in terms of potential. And then soon to come from the depths of Dresden is K8L, and I can only wonder how that square up against Conroe. Like I've said before, this should make for some interesting few months.
May 30, 2006 1:05:22 PM

Quote:
Conroe isn't a revolution, its just the logical next-step, an evolution of the Dothan/Yonah core processors.


I disagree. Think of it this way. Why is P6 a revolution over P5 architecture??

Differences between P6 and P5:
-P6 has better branch prediction
-P6 has 3-wide superscalar compared to 2-wide superscalar in P5
-P6 adds
-P6 adds Dual Indepent Bus, which unlike today, meant cache and FSB is seperate buses
-P6 added Out of Order execution, which P5 doesn't
-P6 has more powerful FPU

Ah, sounds just like Core microarchitecture Yonah doesn't it?? We can keep tracking back until, maybe Intel's 4004 CPUs, and say ALL CPU's DERIVE FROM 4004.

Looking at that way, Pentium 4 is not revolutionary, its also evolutionary
-it has better branch prediction over P6
-it has longer pipeline stages
-it adds double pumped ALU
-it adds Trace Cache


Generally, wider CPUs are considered new generations, or CPUs that change the concept of width like the Pentium 4. Are they revolutions or evolutions?? I don't know. It's a pointless argument.
May 30, 2006 2:13:56 PM

I'll say why Core is actually revolutionary. It represents revolutionary change for Intel's mindset on their CPUs!!! At least it will be since it's not here yet. I might as well go over the basics of the differences.

You see, Core is gonna the fact that ALL P6 derived CPUs, up to Core Duo, was lacking architecturally over K7, that is over 10 years.

P6 were lacking architecturally compared to K7 since the first day K7 came into this world. Core Duo is the first step out of P6.

-The two FPUs in P6, aren't fully pipelined. It has one fully pipelined FPU, FPadd, and one partially pipelined FPU, FPmul. Because of that, it lacked seriously in FP power over K7 which had 3 FPUs which were fully pipelined, which were FPmul, FPadd, and FPstore. So in case of double precision FP, K7 had TWICE the FP performance.

(Now, Core Duo still has same thing, lack of FP power, but it enhances it a bit by decreasing latency in some important FP instructions, like IDIV)

-There are only 2 ALUs in P6, while K7 has 3
-There are 1 complex decoder and 2 simple decoder in P6, while K7 has 3 identical decoders

(Yonah enhances the decoder section by putting the ability for the decoders to do almost all SSE instructions in its decoders, unlike the previous generations of P6, which is only true for some instructions, the simple ones)

-P6 had 16KB Instruction and 16KB Data L1 cache, which K7 quadruples it to 64KB each

(Pentium M has double the L1 cache, 32KB each)


Core microarchitecture enhances the decoder from Core Duo, since there is one more simple decoder, and more of the instructions that used to go to the complex decoder can now go to the simple decoders.

K7 had a big advantage over P6 because it can decode almost all instructions in all three decoders, while P6 can only do it sometimes, though to be fair to P6, the situations are not common.

K8 can decode almost all SSE instructions in the three decoders, while Pentium M's, can only do it in one decoder for complex instructions. It doesn't have that advantage over Core Duo, since all three decoders can do SSE instructions.

K8(and K7) still has advantage over Yonah that it can decode simple and complex instructions in all three decoders in non-SSE instructions, but its rare.

Core makes the advantage of K8 decoders have in non-SSE instructions even rarer because Core microarchitecture allows more of the instructions that used to go to the complex decoder go to the simple decoders.

Core has 2 fully pipelined FPU which are 128-bit each. That's a huge advancement over the P6 which had 1 fully pipelined and 1 partially pipelined FPU that are 64-bit.

Some definitions:
Decoder-Modern CPUs used in PCs convert one instruction to another instructions which are better(and higher performing) for the CPU to understand and execute. The hard-to-understand instructions are called x86 instructions. The decoder in modern CPUs convert those x86 instructions to a easier to execute micro-ops(or uops).

x86-Type of CPUs that has been widely used in PCs and workstations. They include almost all CPUs from AMD and Intel(Exceptions are Intel's IA64, and their StrongArm and XScale. AMD's only non-x86 is their older Geode line, the newer ones are x86). Includes companies like Via with their Cyrix line. Transmeta's CPUs can run x86 but they are *not* x86 CPUs.

(For people who wants to know why its called x86, remember CPUs were referred as 286, 386, 486?? Well, x in mathematics is a number, get it??)

Micro-Ops(uops)-Intel's term for instructions which decode from x86 instructions

Macro-Ops-where the confusing part starts, as for AMD Macro-Op means the instructions that decode from x86 instructions while for Intel it means another word for x86 instructions. It's estimated to be 1 AMD macro-op for 2 Intel micro-op.

"Simple" decoder-It's a type of a decoder which can only translate "simple" instructions, meaning the ones which decode into 1 uop. There's no such thing as "simple" decoder in AMD's CPUs, its only present in Intel's CPUs

"Complex" decoder-Decoder that can do more than 1 uop

Microcode sequencer-Basically the most complex decoder of them all, which can decode more complex instructions than the ones that the normal decoders can handle.

Double Precision FP: 128-bit, gotta read what it means
Single Precision FP: 64-bit
Vector: Something like SSE
Scalar: Something like non-SSE



Summary:
-Core has two fully pipelined FP units which are 128-bit, which means DP FP can be done in single cycle, while the other CPUs require two. P6 is slower since the two FP units are not fully pipelined. Core can do 4x FP ops per clock, K7/K8=2x, P6=1x

Core fixes the P6 architecture's lack of FP power over K7/K8.

-Core has total of 4 decoders, which means it is the widest CPU, can decode most SSE ops, and more of the instructions that used to go to complex decoder in Core Duo can now go into the simple decoders. K7/K8 can do most of instructions in all three decoders. K8 can do most SSE instructions in all three decoders. P6 has 3 decoders, with 2 simple decoders which are simpler than Core's, which apply to SSE instruction. Yonah fixes P6's lack of decoder power on SSE.

Core fixes P6 architecture's lack of decoder power over K7/K8.

-Core has total of 3 ALUs, K7 has 3, and P6 has 2

Core fixes P6 architecture's lack of ALU power over K7/K8.

-Core, although lacking in L1 cache compared to K7/K8, follows Pentium M architecture's doubled L1 cache over P6 advantage.

Core, while lacking in L1 cache capacity compared to K7/K8, due to greater associativity(4 or 8, its not confirmed, its 4 in Pentium M to Yonah), is much better than P6, while maintaining same latency.

I'll go over Pentium 4 too.
May 30, 2006 7:56:40 PM

With such a thorough post, I can't help but agree with your argument. It is a significant change to Intel's mindset. Bah, I can't even think of something to back up my own statement, so I'll just have to give in and accept it! :) 
May 31, 2006 12:50:48 AM

Quote:
I'll say why Core is actually revolutionary. It represents revolutionary change for Intel's mindset on their CPUs!!! At least it will be since it's not here yet. I might as well go over the basics of the differences.

You see, Core is gonna the fact that ALL P6 derived CPUs, up to Core Duo, was lacking architecturally over K7, that is over 10 years.

P6 were lacking architecturally compared to K7 since the first day K7 came into this world. Core Duo is the first step out of P6.


(...)

Nice post overall, DavidC1.

I do have some observations to make, though: There are good references on the web about the new Intel uArch (I confess: I hate this word!) like, for instance, http://arstechnica.com/articles/paedia/cpu/core.ars & http://www.realworldtech.com/page.cfm?ArticleID=RWT030906143144; although these go deeper into the subject (there are a few important missing points on your post...), you've done a fine job, comparing a bunch of x86 ISAs (being picky, x86-32/x86-64).

(As a side note, I won't be cynically skeptical about Conroe, mostly due to the fabulous data compilation iterations did, on his sticky :wink: ).

I'm not sure if I agree with you, when you state that «[Core] represents a revolutionary change for Intel's mindset on their CPUs», since they've been there, already; it seems more like a turnaround from an experimental (desperate?) uArchghhh, NetBurst.
In my opinion (I must stress this point!), Intel Core Microarchitecture has a few, though decisive, dramatic changes from previous "post-RISC" approaches, like Macro-ops fusion, Memory Sharing & Disambiguation, L1D core-to-core interconnect (by the way, it's 8-way set associative) and a few other outstanding improvements. However & simultaneously, its 65nm, 2nd generation strained-Silicon node process, has as much or even more to do with the amazing performance gain & TDP envelope, compared to the... uArch (see, for instance, http://www.realworldtech.com/page.cfm?ArticleID=RWT123005001504&p=14, courtesy of JumpingJack whom, by the way, is building a magnificent collection of installments, on transistor physics & manufacturing processes: http://forumz.tomshardware.com/hardware/modules.php?name=Forums&file=viewtopic&p=1086230#1086230).
If we're to look at Core's performance from a purely technical point of view (as far as I can go!), there are some revolutionary spots, both at the uArch & process level but, in my opinion, these only contribute to an evolutionary chip step. Most of the Core's improvements are incremental, not radical. Performance wise, it really stands out compared to an inefficient uArch (NetBurst) & a decade old one, K8.
So, and risking being flamed by [almost] everyone :D  , I'd call Cell, for instance, a revolutionary chip, uArch wise.
The bottom line is purely conceptual: revolutionary implies a short-term complete structural modification at the most basic system levels (in the present case, a microchip's uArch/Process); Intel did not pick up NetBurst & turned it into Core: the changes were more incremental, more evolutionary, in parallel with NetBurst.
This doesn't diminish the "Amazing" adjectivation of the ICM, aka, Core.

Well, I may be a little rhetorical, sometimes! :wink:


Cheers!
June 1, 2006 6:33:41 AM

Quote:
I'm not sure if I agree with you, when you state that «[Core] represents a revolutionary change for Intel's mindset on their CPUs», since they've been there, already; it seems more like a turnaround from an experimental (desperate?) uArchghhh, NetBurst.
In my opinion (I must stress this point!), Intel Core Microarchitecture has a few, though decisive, dramatic changes from previous "post-RISC" approaches, like Macro-ops fusion, Memory Sharing & Disambiguation, L1D core-to-core interconnect (by the way, it's 8-way set associative) and a few other outstanding improvements.


CPU-ID shows Yonah as 8-way set associative L1 cache, but Digitlife shows that its actually 4-way in measurements. Since nobody measured Core's we don't know the reality.

Netburst: it continued the frantic performance search by Intel that happened at a high cost, and not much either. One problem in P6 with decoders was made worse with Netburst. ALUs with double-pumped feature that can theoretically do 4 operations, but trace cache allows 3. The double pumped ALUs then only worked with simple instructions. If Trace Cache had a miss, then it would need to go through the single decoder, effectively putting the width of the CPU to 1. A FPU unit that went worse with Netburst. The changes that Intel put on Core Duo, and eventually to Core microarchitecture, wasn't put for 10 years. Most improvement focused on "easy" improvements like increasing cache, and clock speed. It was revolutionary, which came out to be "devolutionary" in practice.
June 1, 2006 6:56:09 AM

Quote:
although these go deeper into the subject (there are a few important missing points on your post...), you've done a fine job, comparing a bunch of x86 ISAs (being picky, x86-32/x86-64).


About the "deeper" part, there are reasons why I didn't do that.

-Some are sort of beyond me
-Various reading about Core microarchitecture and about general gave me the impression that some are not getting the information correctly, I still have doubts about some of the articles and see a possibility that they may have had their own translation from Intel's words/articles, which could be wrong. So I put ones that are really firm info.
-I didn't want it too wrong, but still proving my point

Quote:
Performance wise, it really stands out compared to an inefficient uArch (NetBurst) & a decade old one, K8.


K8 isn't a decade old, P6 is. K8 is largely K7, which is 6-7 years old, not to mention the fact the improvements to K8 over K7 is not insignificant.

The fact is, Intel may have never made a good response like Core, and continued to make CPUs which are not so far from being terrible.

I think the differences between "Pentium M" road, and "Netburst" could well be the differences between the design teams, there are ones that are just better. Also it may not be coincidential that the country that designed Pentium M had exceptional brainpower, like the development of the micro camera that measured in mm(!!).


Ok, ok, so I DID go a bit deeper than I should have to prove my point but being revolutionary or evolutionary doesn't really matter as long as the result is good. I did want to put a brief overview though(it didn't came out to be brief as I wanted it to be lol).
June 2, 2006 12:27:57 AM

Quote:
although these go deeper into the subject (there are a few important missing points on your post...), you've done a fine job, comparing a bunch of x86 ISAs (being picky, x86-32/x86-64).


About the "deeper" part, there are reasons why I didn't do that.

-Some are sort of beyond me
-Various reading about Core microarchitecture and about general gave me the impression that some are not getting the information correctly, I still have doubts about some of the articles and see a possibility that they may have had their own translation from Intel's words/articles, which could be wrong. So I put ones that are really firm info.
-I didn't want it too wrong, but still proving my point

Quote:
Performance wise, it really stands out compared to an inefficient uArch (NetBurst) & a decade old one, K8.


K8 isn't a decade old, P6 is. K8 is largely K7, which is 6-7 years old, not to mention the fact the improvements to K8 over K7 is not insignificant.

The fact is, Intel may have never made a good response like Core, and continued to make CPUs which are not so far from being terrible.

I think the differences between "Pentium M" road, and "Netburst" could well be the differences between the design teams, there are ones that are just better. Also it may not be coincidential that the country that designed Pentium M had exceptional brainpower, like the development of the micro camera that measured in mm(!!).


Ok, ok, so I DID go a bit deeper than I should have to prove my point but being revolutionary or evolutionary doesn't really matter as long as the result is good. I did want to put a brief overview though(it didn't came out to be brief as I wanted it to be lol).

I must say you've made your point quite clearly! :wink:

When I referred, in my previous post, that «I'm not sure if I agree with you, when you state that «[Core] represents a revolutionary change for Intel's mindset on their CPUs», since they've been there, already; it seems more like a turnaround from an experimental (desperate?) uArchghhh, NetBurst.», I meant that, from the outside (us, observers), it might appear as a revolution, mostly due to the amazing performance/power Core delivers (like I said, «I won't be cynically skeptical about Conroe...»); at some point in time, however, Intel was already working on the other microarchitecture, in parallel with NetBurst, most probably with different teams (ultimately, it was Intel's decision to handle the other uArch R&D job to its Israeli team; they are very good indeed; I just don't know if the NetBurst team was "worse"; probably, they did what they were told to... I don't really know).
As for your doubts about others' articles/reviews, I find them completely legitimate and I'm with you on that one (even if I find both Arstechnica's Jon Stokes & RWT's David Kanter very competent & reliable analysts; then again, it's my opinion).

Well, it seems that I'm going deeper than you in trying to prove my point! :D 
You did a fine job, both at your microarchitectural overview & at your rebuttal to my remark.

But, I'm stubborn: picking on your words, even if the results are good, I want to know if they just added more of the same & re-painted or if they built it from "scratch". So far, neither & both. :wink:


Cheers!
June 2, 2006 5:17:16 AM

Quote:
You nailed this one... when you look at the DT performance of Yonah clocked at DT speeds it pretty much plasters the P4-netburst.


It probably has to do a LOT with Intel being very very conservative with regards to timings. They probably even tightened it quite a bit after Pentium III 1.13GHz recall.

Whatever reasons you guys disagree with Pentium M not being able to reach higher clock speeds than they are currently at now because of limitations since they overclock so high doesn't matter to Intel. Whether they put the headroom for future clock speeds on the processor or to keep it running stable the way they want to see it, they wouldn't have anyway. It's all just dreams and hope.

What would have been possible is instead Intel making Core Duo being able to run at current Pentium D mobos. They could have probably done that, probably didn't because some marketing conflict they see with Netburst CPUs.

I believe you may misunderstand me often, Pentium M will reach higher clocks, in fact there have been benches showing one clocking as high as 2.8 GHz (FX-62 speeds) but consuming less power.

Intel's guard-band is typically 20% or more below top speed -- for various quality assurances. Intel generates these lower guard bands because their process allows it.

Unfortunately, simply making the Core Duo drop-in compatible with say socket 775 has little to do with actual architeicture and most to do with the last few metalization layers and packaging. Merom/Conroe/Woodcrest at design could account for this but to get a 775 pin out from die to package to socket would have been a pretty decent overhaul of the Yonah product as it was originally taped in / out only as a mobile part.

What is apparent, from the data around the web, Core Duo is, clock for clock, running with or slightly ahead of K8 -- it's a pretty good core itself. It's a pitty that Intel did not invest the resources and time to bring it forward beyond the mobil/SFF to the desktop, as it likely would have provided a better competitive position much earlier than what we have now.

Jack

Well put Jack. What is really sad is that we could have had conroe years ago if Intel had admitted their......misdirection..... years ago. There is no way in hell Intel didnt see 100+watts coming years ago so they had more than enough time to mend their ways but unfortunatly they didnt and netburst will go down in history and possibly be the architucture of choice come the next ice age.
June 2, 2006 5:35:16 AM

It is interesting that Intel is still going to be selling those 16MB L3 chips. I wonder how they will perform clocked up compared to conroe?
June 2, 2006 5:37:31 AM

Quote:
Yeah, I also believe (my opinion), Intel put way too much faith in the capability of the process to bring clock speeds up and paid little attention to power -- even as a long pipeline demands more power. How they thought they could get away with that in the long run -- no good.

As you watch me write, Intel has a leading process technology (so does AMD but Intel's produces better transistors -- feel free to challenge rebut) -- but it is not so great that it can get by the laws of physics. :) 

Now, Netburst has it's good points -- just not enough of them over the K8 to make it competitive. As I usually say, hats off to AMD 3 years ago they really set things in motion and did a better job of forseeing the future.

Jack
Amen to that.
!