Sign in with
Sign up | Sign in
Your question

Is Core 2 Duo the first true, nature dual core processor?

Last response: in CPUs
Share
April 28, 2007 7:03:05 PM

I read it somewhere else, but I couldn't remember the links now.

However, the poster's argument was that Athlon X2, although the first monolithic dual core processor, acts more like Pentium D, due to its individual L2 cache. With non-connecting L2 cache, X2 cannot share information between cores via cache, and can only communicate each other via Crossbar.

On the other hand, Core 2 Duo has a sharing L2 cache, with 2Mb capacity. With such big L2 cache, not only it can minimize the bottleneck caused by the aging FSB, but the processor is more efficent because the L2 will auto adjust itself to accomondate the information being processed by the cores.

I'm wondering if this theory is correct. Thanks in advance in clearing this up. Also, please don't flame each other. If you want to rebut this theory, please don't just post something like "AMD rocks, Intel sucks."
April 28, 2007 7:14:54 PM

AMD rocks! Actually it sounds correct except for the part that a shared cache a dual core maketh.

2 cores are 2 cores are 2 cores.
April 28, 2007 7:16:17 PM

I'd say the X2 was, since it was the first processor available with two cores, regardless of their cache structure or die type.
Related resources
April 28, 2007 7:19:29 PM

Quote:
I'd say the X2 was, since it was the first processor available with two cores, regardless of their cache structure or die type.


Wasn't it technically the Opterons? Servers hold priority over everything else.
April 28, 2007 7:23:35 PM

Core 2 Duo is just refined updated technology, but AMD is the first.
April 28, 2007 7:27:04 PM

Quote:
I'd say the X2 was, since it was the first processor available with two cores, regardless of their cache structure or die type.


Wasn't it technically the Opterons? Servers hold priority over everything else.

Yes, Opteron was the first according to this article

http://en.wikipedia.org/wiki/Amd
April 28, 2007 7:27:14 PM

Quote:
Core 2 Duo is just refined updated technology, but AMD is the first.

In this case, AMD was the first to come out with a technology, most of the time they take something crappy Intel did and make it a 1000X better and employ it as their own, but this time you have to admit that it was AMD, because a Intel did was to take 2 Existing Pentium 4 Chips and tape them together.
April 28, 2007 7:29:17 PM

What about the Sun V890 server? Supported up to eight dual core processors and I'm sure it's been about for quite a few years now.
April 28, 2007 7:31:25 PM

Did I see that technology employed from Server to Mainstream?? When was the last time you saw SUN Processors in a local LAN party?
April 28, 2007 7:34:53 PM

Quote:
Did I see that technology employed from Server to Mainstream?? When was the last time you saw SUN Processors in a local LAN party?


Apologies, I thought the original poster was after who got the technology onto the market place first and someone mentioned AMD Opteron processors, hence why I mentioned Sun as I thought Opterons were predominantly in the server market.
April 28, 2007 7:45:52 PM

True, SUN and Opteron were in the first few into the market with More than Core, but to my understanding he only wanted to hear AMD and Intel, we usually don't refer to much more than those two, VIA, SUN, IBM and few other forgotten names also make Processors but we just never hear of them, or talked about them because they are so far behind these 2, SUN may have said something, but not completely sure it was "True Nature", may have been "Synthetic Nature :p  "
April 28, 2007 7:51:45 PM

Quote:
True, SUN and Opteron were in the first few into the market with More than Core, but to my understanding he only wanted to hear AMD and Intel, we usually don't refer to much more than those two, VIA, SUN, IBM and few other forgotten names also make Processors but we just never hear of them, or talked about them because they are so far behind these 2, SUN may have said something, but not completely sure it was "True Nature", may have been "Synthetic Nature :p  "

humm I'm not aware of Sun's processor.

It would be really nice if you guys can expand on that :p 
April 28, 2007 7:58:51 PM

Generally, a native dual core CPU is regarded as one that has both cores on the same die. AMD will tell you that they feel discrete L2 cache is preferred with good reason. Intel has their reasons for shared L2 cache. AMD does not have any shared L2 designs on their roadmap that I'm aware of, but no one is disputing that they have native dual core chips now and that K10 will be a native quad.

With that in mind, the first native dual core was Opteron followed by X2. Intel's first native dual core was actually the Core Duo, not C2D.

Just like the L2 cache argument, there are pros and cons to native vs MCM dual/quad core designs. One is not necessarily better than the other; they are just different and each manufacturer has reasons for choosing one or the other.

Ryan
April 28, 2007 8:06:07 PM

Quote:
humm I'm not aware of Sun's processor.

It would be really nice if you guys can expand on that :p 


The 890 server was released in 2004, and could have upto eight dual core UltraSparc-IV+ processors (and a minimum of two). You could also have upto 8GB per every two processors as well, I think?

The original proc options were 1.2-1.8Ghz Sparc IV+.


I don't know if that was the first server that had the UltraSparc IV+ dual core processors though.

They have 2MB of L2 cache, and 32MB of L3.

The primary application for these systems would be heavy number crunching, and things like large databases.

It's not your everyday rig for surfing the web and playing games, but stick a graphics card in the thing and you can install FireFox :?
April 28, 2007 8:12:16 PM

With Firefox, you'll be able to multi thread your Porn, what takes an average man 5-10 minutes it'll take you 2.5-5 Minutes :lol: 
April 28, 2007 8:14:56 PM

IBM also produced some dual-core G5's, correct?
April 28, 2007 8:22:07 PM

Quote:
With Firefox, you'll be able to multi thread your Porn, what takes an average man 5-10 minutes it'll take you 2.5-5 Minutes :lol: 


2.5 minutes? Perfect those with no stamina then :lol: 
April 28, 2007 8:43:03 PM

Quote:
IBM also produced some dual-core G5's, correct?


You are correct. They released the PowerPC 970MP (970 is G5) in 3Q05. Probably its most notable use was in the "Quad core" Power Mac. Apple marketed it as quad core system even though it was just a dual socket system with 2 dual core processors.

Ryan
April 28, 2007 8:49:17 PM

Just like the L2 cache argument, there are pros and cons to native vs MCM dual/quad core designs. One is not necessarily better than the other; they are just different and each manufacturer has reasons for choosing one or the other.

Ryan[/quote]

I was wondering about this very same thing recently myself.

Jumping Jack........Whats your take on this?As for gaming how would this logic hold up?

Anyone?
April 28, 2007 9:33:56 PM

Quote:
Just like the L2 cache argument, there are pros and cons to native vs MCM dual/quad core designs. One is not necessarily better than the other; they are just different and each manufacturer has reasons for choosing one or the other.

Ryan


I was wondering about this very same thing recently myself.

Jumping Jack........Whats your take on this?As for gaming how would this logic hold up?

Anyone?

From a pure performance point of view (since you asked about gaming), native is preferred because power management can be more sophisticated thus allowing lower thermal envelopes. Therefore, clock speeds can be higher. So hypothetically speaking (since neither Intel nor AMD make the same dual core model in both native and MCM configurations), if an E6600 were to be available in either configuration, the MCM version would almost certainly have a higher power consumption and dissipation. So they would have more headroom with the native version to ratchet up clock speed. An overclocker would also have more headroom.

I believe the L2 cache depends on implementation and this is where Jack comes in. I don't know enough to describe the pros and cons of discrete vs shared.

Ryan
April 28, 2007 10:09:14 PM

Quote:
With Firefox, you'll be able to multi thread your Porn, what takes an average man 5-10 minutes it'll take you 2.5-5 Minutes :lol: 

if that's the case, i would rather use a Pentium Pro MMX for the job....

the longer the merrier :p  :p 
April 28, 2007 10:14:01 PM

Quote:
Generally, a native dual core CPU is regarded as one that has both cores on the same die. AMD will tell you that they feel discrete L2 cache is preferred with good reason. Intel has their reasons for shared L2 cache. AMD does not have any shared L2 designs on their roadmap that I'm aware of, but no one is disputing that they have native dual core chips now and that K10 will be a native quad.

With that in mind, the first native dual core was Opteron followed by X2. Intel's first native dual core was actually the Core Duo, not C2D.

Just like the L2 cache argument, there are pros and cons to native vs MCM dual/quad core designs. One is not necessarily better than the other; they are just different and each manufacturer has reasons for choosing one or the other.

Ryan

It is undisputed that monolithic approach for dual core processors has the advantage of lower power consumption and better performance.
I'm just wondering if shared cache a milestone towards the "real" dual core processor, since Intel was the first one implemented in their Core 2 lineups, and now AMD is doing the same thing too,
April 28, 2007 11:11:02 PM

You asked for jack but anyway I'll share my small experience about this as a programmer:
when you are accelerating an app with multithreading and efectively using more than one core, you can divide one single task and let it run on more cores by using diferent threads, imagine you are processing an image and you make two threads each processing one half of the image. The image which I will call the "dataset" is common for both cores, if one thread needs to know something about the other threads half, having a shared cache minimizes the effort because both parts reside on it. This is true ONLY if the data set fits inside the cache, lets say the image is not larger than the cache size sort of speak.

At the other hand, if you are accelerating one task by running different threads that have no common dataset, each thread will use the cache on its benefit but not on the other threads benefit: you write your image to the cache and you simple erased what the other thread maybe could need in the future and here is where you love the separated cache. MCM is ok if your program can efectivly keep the datasets nicely local in the same cache. Cache coherency is the issue in here.

AFAIK, barcelona is a combination of both aproaches, because you have 512k for each core, and 2M for shared cache, and given the fact they are exclusive(no data of L3 is kept in L2 and L1 and viceversas) you have more flexibility keeping thread's local data intact in your L2 and using L3 as the place for "common" data/code.

Returning to the coherency thing, when you have the same dataset on two diferent dies, they are on two diferent caches that need to comunicate to remain syncronized. Imagine you update something in the cache of die 1(core 1,2) and a few cycles later die2(core3,4) try to read the same data from its cache, die1 has to inform die2 that the data has changed, and this generates traffic that could kill the FSB as you move to more cores. In a barcelona system with 8 cores namely 2 quads conected via HT you'll have the same penalty.

After all this semitheory thing, let me tell you that depending on the app running and the size of the datasets you might want shared cache or not at all, where barcelona might have an advantage. But if your dataset is larger than your cache, shared cache simply suxxx. Ive seen it because using assembler instructions that bypass the cache boosts incredibly the speed of dual cores with shared cache, lets ssay you spit outside your own garden :) .

Ohh yes, you mentioned gaming.....again, depending on the style of the programmer...

I'm sure there is much more skilled people over here that can explain this much clearer and compact(and correct my mistakes of course) so the word to the experts...

sorry about the typos, not a native english speaker ;) 
April 28, 2007 11:29:30 PM

Quote:
You asked for jack but anyway I'll share my small experience about this as a programmer:
when you are accelerating an app with multithreading and efectively using more than one core, you can divide one single task and let it run on more cores by using diferent threads, imagine you are processing an image and you make two threads each processing one half of the image. The image which I will call the "dataset" is common for both cores, if one thread needs to know something about the other threads half, having a shared cache minimizes the effort because both parts reside on it. This is true ONLY if the data set fits inside the cache, lets say the image is not larger than the cache size sort of speak.

At the other hand, if you are accelerating one task by running different threads that have no common dataset, each thread will use the cache on its benefit but not on the other threads benefit: you write your image to the cache and you simple erased what the other thread maybe could need in the future and here is where you love the separated cache. MCM is ok if your program can efectivly keep the datasets nicely local in the same cache. Cache coherency is the issue in here.

AFAIK, barcelona is a combination of both aproaches, because you have 512k for each core, and 2M for shared cache, and given the fact they are exclusive(no data of L3 is kept in L2 and L1 and viceversas) you have more flexibility keeping thread's local data intact in your L2 and using L3 as the place for "common" data/code.

Returning to the coherency thing, when you have the same dataset on two diferent dies, they are on two diferent caches that need to comunicate to remain syncronized. Imagine you update something in the cache of die 1(core 1,2) and a few cycles later die2(core3,4) try to read the same data from its cache, die1 has to inform die2 that the data has changed, and this generates traffic that could kill the FSB as you move to more cores. In a barcelona system with 8 cores namely 2 quads conected via HT you'll have the same penalty.

After all this semitheory thing, let me tell you that depending on the app running and the size of the datasets you might want shared cache or not at all, where barcelona might have an advantage. But if your dataset is larger than your cache, shared cache simply suxxx. Ive seen it because using assembler instructions that bypass the cache boosts incredibly the speed of dual cores with shared cache, lets ssay you spit outside your own garden :) .

Ohh yes, you mentioned gaming.....again, depending on the style of the programmer...

I'm sure there is much more skilled people over here that can explain this much clearer and compact(and correct my mistakes of course) so the word to the experts...

sorry about the typos, not a native english speaker ;) 


I believe this is cache thrashing you are referring to when one core flushes the data for the other core. Someone feel free to correct me if I'm wrong. I think Intel has some features implemented into their shared L2 to help prevent this in many situations. I think that is part of the technologies that they refer to as "Smart Cache" on the C2D. Again feel free to correct me if I'm wrong.

Ryan
April 28, 2007 11:31:45 PM

Quote:
You asked for jack but anyway I'll share my small experience about this as a programmer:
when you are accelerating an app with multithreading and efectively using more than one core, you can divide one single task and let it run on more cores by using diferent threads, imagine you are processing an image and you make two threads each processing one half of the image. The image which I will call the "dataset" is common for both cores, if one thread needs to know something about the other threads half, having a shared cache minimizes the effort because both parts reside on it. This is true ONLY if the data set fits inside the cache, lets say the image is not larger than the cache size sort of speak.

At the other hand, if you are accelerating one task by running different threads that have no common dataset, each thread will use the cache on its benefit but not on the other threads benefit: you write your image to the cache and you simple erased what the other thread maybe could need in the future and here is where you love the separated cache. MCM is ok if your program can efectivly keep the datasets nicely local in the same cache. Cache coherency is the issue in here.

AFAIK, barcelona is a combination of both aproaches, because you have 512k for each core, and 2M for shared cache, and given the fact they are exclusive(no data of L3 is kept in L2 and L1 and viceversas) you have more flexibility keeping thread's local data intact in your L2 and using L3 as the place for "common" data/code.

Returning to the coherency thing, when you have the same dataset on two diferent dies, they are on two diferent caches that need to comunicate to remain syncronized. Imagine you update something in the cache of die 1(core 1,2) and a few cycles later die2(core3,4) try to read the same data from its cache, die1 has to inform die2 that the data has changed, and this generates traffic that could kill the FSB as you move to more cores. In a barcelona system with 8 cores namely 2 quads conected via HT you'll have the same penalty.

After all this semitheory thing, let me tell you that depending on the app running and the size of the datasets you might want shared cache or not at all, where barcelona might have an advantage. But if your dataset is larger than your cache, shared cache simply suxxx. Ive seen it because using assembler instructions that bypass the cache boosts incredibly the speed of dual cores with shared cache, lets ssay you spit outside your own garden :) .

Ohh yes, you mentioned gaming.....again, depending on the style of the programmer...

I'm sure there is much more skilled people over here that can explain this much clearer and compact(and correct my mistakes of course) so the word to the experts...

sorry about the typos, not a native english speaker ;) 

interesting read. I'll digest it a little bit before i respond to you.
April 29, 2007 12:20:05 AM

From Intel's doc

I couldn't assure that the data one core writes out to the cache wont fall on the place there is data the other core is using, Im affraid I'n not well documented there but the "trashing" term is somehow useful here. Imagine the dataset of core 1 is 3Meg and dataset of core 2 is 2 meg and the whole code is cached in L1 of each core. Although core 1 is not hurting itself it could be hurting the other core's dataset because they together dont fit the 4M cache (c2d 6600 for example). What I understood of smartcache has not much to do with preventing trashing. I can imagine that smart cache could see that the data belongs to one specific core and in an attempt to avoid flushing the other cache's data it will flush one line of its own space, but such a decision is more based on prediction. Again worst case of such a prediction is that you cause trashing to the other core. Trashing however can be avoided when you use the right programming techniques and the same multithreaded app could run flawlesly in a Q6600 or clog compleetly the fsb if its bad written.
April 29, 2007 10:55:53 PM

That brings me to an interesting question that I haven't been able to find an answer to. In the case of a Q6600 which has 2 discrete shared L2 caches, does the OS or applications allocate a core off the first die first, then a core off the second die second, then the second core off the first die third, etc? This would avoid cache thrashing if 2 threads are running on a quad core cpu. It would also be effective in 2+ socket systems with dual core chips like a double dual core Xeon workstation. (In that case, each thread would benefit from both larger L2 cache and the full FSB bandwidth.) I would assume this would also benefit dual core processors with Hyperthreading. I would like to know whether it is best to allocate in logical core order (0,1,2,3,etc) or to allocate based on die or chip order (0,2,1,3).

Ryan
April 30, 2007 5:12:10 AM

Quote:
That brings me to an interesting question that I haven't been able to find an answer to. In the case of a Q6600 which has 2 discrete shared L2 caches, does the OS or applications allocate a core off the first die first, then a core off the second die second, then the second core off the first die third, etc? This would avoid cache thrashing if 2 threads are running on a quad core cpu. It would also be effective in 2+ socket systems with dual core chips like a double dual core Xeon workstation. (In that case, each thread would benefit from both larger L2 cache and the full FSB bandwidth.) I would assume this would also benefit dual core processors with Hyperthreading. I would like to know whether it is best to allocate in logical core order (0,1,2,3,etc) or to allocate based on die or chip order (0,2,1,3).

Ryan


There is no such thing as "cache thrashing". It was a futile attempt by some AMD fanboys to cast the C2D in a negative light prior to launch.
April 30, 2007 3:20:53 PM

HAha, well yes and no: Lets say you have a Q6600, and you are running 2 threads. If these threads have a common dataset you want them running on the same die(core1,2) so you can benefit from the shared L2, and not on core 1,3 because the cache coherency traffic would have to run via FSB. At the other hand, if you have 2 threads without a common dataset, you want them running on different dies, so they can use their local L2 to its maximum potential and reduce FSB coherency traffic.

What I do most of the time is I try to get the number of sockets, dies, and cores with cpuid to know exactly what for arch my code is running on and by using affinity selection to try to keep some threads together and some others outside of the rest's way.

Quote:
There is no such thing as "cache thrashing". It was a futile attempt by some AMD fanboys to cast the C2D in a negative light prior to launch.


Assembler instructions like movntdq or movntq were made specifically to deal with the trashing problem. Trashing is not something related to AMD or INTEL, nor related to multicore. Even single cores suffer from it when you don't treat it with the right assembler optimizations( or intrinsics). Both intel and amd approaches have their pros and cons. Most optimized apps check what type of hardware they are running on and run code accordingly.
April 30, 2007 6:43:05 PM

Quote:
I'd say the X2 was, since it was the first processor available with two cores, regardless of their cache structure or die type.


Wasn't it technically the Opterons? Servers hold priority over everything else.

Yes, Opteron was the first according to this article

http://en.wikipedia.org/wiki/Amd

Actually, Intel undercut AMD by a few days and was first to market with an x86 dualcore CPU, the Pentium EE 840. AMD's opteron was the first Sever dual core by quite a margin though. Intel didn't release the Xeon dual core until much later.

http://news.com.com/AMD+releases+dual-core+server+chips...
April 30, 2007 6:45:42 PM

Quote:
Generally, a native dual core CPU is regarded as one that has both cores on the same die.


If you make this as the definition of native dual core, then the Pentium D smithfield was a native dual core CPU as well.
April 30, 2007 7:51:15 PM

LAWL! :D 
April 30, 2007 8:08:21 PM

Quote:
Generally, a native dual core CPU is regarded as one that has both cores on the same die.


If you make this as the definition of native dual core, then the Pentium D smithfield was a native dual core CPU as well.

I forgot Smithfield was on the same die. When I think of PD, I usually think of Presler which was MCM. Thank you for pointing that out.
!