"AMD Guru Says Intel Xeon Shared Cache Inferior "

9-inch · Jun 28, 2006

According to a senior technical manager at chipmaker AMD, the new dual core Intel Xeon 5100 server processors are not all they're cracked up to be because of a major design drawback. The problem, according to the AMD technical guru, is that the so-called dual core processors share a single memory cache unlike the dual core AMD Opteron processor, which has separate dedicated cache for each core.

Michael Apthorpe, senior technical manager for AMD in Australia and New Zealand, an electronics engineer with more than 25 years experience in chip design starting from his days at Mitsubishi Electric, says that any dual core processor design that involves both cores sharing a single memory cache has both performance and power consumption disadvantages.

"There are two types of cache with processors and they are known as exclusive and inclusive cache," says Apthorpe. "With inclusive cache, you have one allocation of RAM. What happens is that if you're running a program and all of sudden you change programs, you have to stop and flush the entire cache and reallocate and reload it. That takes clock cycles to do and that's our competitor's product."

The difference with AMD's Opteron dual core processor range, according to Apthorpe, is that each core on the processor has its own dedicated cache. "That means with our processors, they never have to stop and ask the cache to flush and wait for it to reload," he says. "Therefore, the processor always has full access to the cache which really speeds things up.

"When you go to multiple processors, the problem gets even more pronounced. If you have just one cache and one processor is dominating the cache utilisation and then you get a request on the other processor, the same thing happens. That is it has to stop, flush, reallocate and reload. This all takes clock pulses so again it hamstrings the performance of the system. That's why our competitor has to put so much more cache in their systems because they have to make up for the latency. They port information out of their main RAM into the cache of the processor and they try to make up the latency which they create by stopping, flushing and reloading the processor cache."

Is this the reason why the Woodcrest/Coroe are prone to cache trashing?

Is this the main reason why AMD is going for a large and shared L3 cache?

http://www.itwire.com.au/content/view/4785/53/

Ycon · Jun 28, 2006

O RLY?

Why is AMD going to use shared caches themselves then?

redman2025 · Jun 28, 2006

The FSB is what the problem with Intels are. The shared Cache from what I have read is overall better then it is bad. But we will see in the long run which is better.

Viperabyss · Jun 28, 2006

According to a senior technical manager at chipmaker AMD, the new dual core Intel Xeon 5100 server processors are not all they're cracked up to be because of a major design drawback. The problem, according to the AMD technical guru, is that the so-called dual core processors share a single memory cache unlike the dual core AMD Opteron processor, which has separate dedicated cache for each core.

Michael Apthorpe, senior technical manager for AMD in Australia and New Zealand, an electronics engineer with more than 25 years experience in chip design starting from his days at Mitsubishi Electric, says that any dual core processor design that involves both cores sharing a single memory cache has both performance and power consumption disadvantages.

"There are two types of cache with processors and they are known as exclusive and inclusive cache," says Apthorpe. "With inclusive cache, you have one allocation of RAM. What happens is that if you're running a program and all of sudden you change programs, you have to stop and flush the entire cache and reallocate and reload it. That takes clock cycles to do and that's our competitor's product."

The difference with AMD's Opteron dual core processor range, according to Apthorpe, is that each core on the processor has its own dedicated cache. "That means with our processors, they never have to stop and ask the cache to flush and wait for it to reload," he says. "Therefore, the processor always has full access to the cache which really speeds things up.

"When you go to multiple processors, the problem gets even more pronounced. If you have just one cache and one processor is dominating the cache utilisation and then you get a request on the other processor, the same thing happens. That is it has to stop, flush, reallocate and reload. This all takes clock pulses so again it hamstrings the performance of the system. That's why our competitor has to put so much more cache in their systems because they have to make up for the latency. They port information out of their main RAM into the cache of the processor and they try to make up the latency which they create by stopping, flushing and reloading the processor cache."

Is this the reason why the Woodcrest/Coroe are prone to cache trashing?

Is this the main reason why AMD is going for a large and shared L3 cache?

http://www.itwire.com.au/content/view/4785/53/

the main reason why AMD is going for a L3 shared cache is because they can't increase the size of L2 with IMC. its just not possible.

and FYI, Conroe basically annihilate FX-62 under almost every single situations. yes, that include multi-threading and multi-tasking.
now you tell me they have a cache thrashing problem?

you need to lay off sharikou.blogspot.com. it is extreme toxic to your logic.

P.S. i emailed one of Intel's tech guru's during the Computex period. since i can't disclose his information, i'll only post what he wrote on the email.

"Example, in the case of a media conversion from MPEG2 to MPEG4, the
MPEG2 decoder will get its stream from the front side bus and decode the
frame. The MPEG4 encoder will pick the data from the Smart L2 and encode
it to MPEG4 and sent the stream to disk. In Standard definition, this is
fully cached ... 98%! In the case of split cache, you have a cache miss
and write back request from each L2 to each other ... a real nightmare;
cache efficiency is around 10% for the decoder, and 80% for the encoder,
that is around 45% of the memory request fulfilled by the L2 cache
subsystem, far from 98% ..."

knewt · Jun 28, 2006

So that explains why Woodcrest/Conroe is getting thrashed in virtually every benchmark known to man! :roll:

And why AMD is going to shared cache in their next generation architecture. :lol:

Crimson_Yachiru · Jun 28, 2006

First of all, what authority does AMD have when talking about intel processors? Even if you are a "guru," if you're from AMD, we all know that you are going to trash intel, so why bother posting the article in the first place 9-inch?

Secondly, the shared cache is a major BONUS to intel's designs, it actually INCREASES performance because the cores can interact across the cache. Also, in multithreaded apps, the seperate caches slow down the performance, as is evidant from viperabyss's post.

Strangely enough, 9-inch hasn't come back to spread around more crap about AMD being better than intel, mabye he finally relized that it ws pointless

mjp1618 · Jun 28, 2006

Well, even with all these problems, the Core 2s thrash the K8s.

So is the AMD guy trying to say that their products can't beat even problematic products from their competitor?

Artmic · Jun 28, 2006

That article was pure jokes,
AMD saying smaller cache is better, share is not good, etc.. etc.. etc...
All the while they would do it themselves if their manufacturing process and fabs were up to the task.

shabodah · Jun 28, 2006

Cache thrashing can be a problem for ALL of these chips. Conroe is a little better in some cercumstances, but both manufacturers need to find a way to better manage cache. There was just an good article about that, I believe on Anandtech. Cache itself is a band-aid, however, I wish Intel would just suck it up and adopt a better bus or get theirs finished. I could see the Conroe march working a long time in the future if they did.

captaincharisma · Jun 28, 2006

I though Conroe was going to go back to Netburst like the PIII's used?

shabodah · Jun 28, 2006

? Are you being sarcastic?

Action_Man · Jun 28, 2006

Is this the reason why the Woodcrest/Coroe are prone to cache trashing?

It doesn't suffer from cache thrashing moron.

That's why our competitor has to put so much more cache in their systems because they have to make up for the latency.

The cache latency is same and it takes up the usual ~40-50% of the die just like AMDs.

ltcommander_data · Jun 28, 2006

Congradulations. You've posted an article based on false information. Intel's shared L2 cache has been described as a non-inclusive, non-exclusive cache for the specific reason to prevent the problems that AMD is trying to play up. Now how the shared L2 cache operates specifically I'm still not clear on, but it obviously isn't as clear cut as AMD wants you to believe.

Besides, I'm not sure what Apthorpe is talking about anyways. First of all, he makes it sound like you're flushing the entire cache every time you switch programs. As far as I know, the mechanism does not require the flushing of the entire cache but the swapping of the least used cache line in the L2 for incoming data from RAM. This would be true for exclusive or inclusive caches. While the entire cache may be swapped in the end, that only occurs if that is how much new data the processor needs and that is true regardless of of exclusive, inclusive, shared or nonshared caches.

What an inclusive cache really is, is the fact that the contents of the L1 cache is duplicated in the L2. What this means is that when a cache line (again things move in cache lines not the entire cache) needs to be copied from the L2 to the L1, all the L1 needs to do is delete it's copy to make room for new information from the L2. In an exclusive cache, there is no duplication (hence exclusive) which means that when a cache line is copied from the L2 to the L1, the existing L2 cache line needs to first be copied to the L2 before the L1 can except the new cache line from the L2. What this means is that an exclusive cache is actually slower than an inclusive design not the other way around. There are ways of speeding up an exclusive system such as using a victim buffer.

I bring this up every time I take about cache architecture, so once again I'll mention that an exclusive design like what AMD uses relies on the L1 cache more than the L2. That is why we see AMD using larger L1 caches than Intel. By the same token, an inclusive cache design benefits more from the L2 which is why Intel uses larger L2 caches. It's not that AMD doesn't benefit from larger L2 caches, its just that relatively they'd benefit less. AMD plays it off that their architecture doesn't require large L2 caches which is true, because even if they increase the L2, they hit the point of diminishing returns a lot quicker so it isn't worth it for them to have large L2 caches. Intel uses large L2 caches to gain a 2-fold benefit, the fact that inclusive caches gain more from large L2 caches, and the fact that large L2 caches alleviate any FSB bottleneck.

Moving on to cache trashing, that issue has been played up far too much. Cache trashing was an issue in Netburst processors with HT, because the cache had no control over allocation to each core which meant one core could be evicting a cache line that another core needs, and one core could theoretically monopolize the entire cache. That isn't the case with the "smart" shared L2 cache in Conroe. The prefetchers and other logic in the cache dynamically allocated cache to each core based on their usage patterns and projected needs. This means that it isn't possible for one core to completely take over the cache if the 2nd core needs cache space too. Now it's possible that the dynamic sharing mechanism isn't fool proof, but you shouldn't assume that it doesn't exist.

Finally,

Is this the reason why the Woodcrest/Coroe are prone to cache trashing?

Really, where? I'd love to see a link that conclusively proves cache trashing. You'd probably want to wait for the shipping products anyways since the last CPU revisions could tweak things, and newer BIOS would improve memory handling. (For a note, even though GamePC used stepping 5 for Woodcrest which is probably the shipping version, the CPU themselves were still labelled Engineering Samples. The Woodcrests that THG has right now are even older stepping 4 Engineering Samples).

Is this the main reason why AMD is going for a large and shared L3 cache?

I wouldn't call a 2MB shared L3 cache large, especially between 4 cores. If you want a large shared L3 cache you'd look to Tulsa with it's 16MB shared L3 cache for 2 cores. Even with sharing done in L3 instead of L2, with each dedicated L2 only being 512k you'd still be relying on properly working dynamic sharing mechanisms in the L3 cache, because the 512k L2 won't keep the core fed for long, especially if K8L has the vast performance potential that you believe.

shabodah · Jun 28, 2006

I was agreeing with you almost the whole page!!!! Then you brought up a worry that the cache wouldn't be able to keep up the the improvements of K8L? What was that? Obviously you have a good understanding of cache and its uses, so, being that it is necessary because of bandwidth issues, and the K8L will not have any, THAT part doesn't make sense. The rest was a very good explanation, though. Maybe you can explain your K8L reasoning better?

ltcommander_data · Jun 28, 2006

All I'm saying was that K8L is supposed to be an improvement over K8 which means that it'll be able to process things faster/more efficiently which means that it'll need more more bandwidth to keep it fed. Now, 9-inch assumes that the shared L3 cache is a saving grace, but if you look at the overall design, the L2 cache has been cut in half. Currently Opterons have 1MB of dedicated L2 cache. A quad core K8L will only have 512k of dedicated L2 per core, and 2MB of L3 cache between 4 cores. This means that the net result is from 1MB of fast L2 per core to 512k of fast L2 and 512k of slower L3 per core. Now, a shared L3 means that the 2MB is bigger than it appears, but if 9-inch is so worried about cache trashing, he'd better hope that it isn't occuring in the shared L3 cache or else the performance potential of the more bandwidth hungry K8L will be impacted.

shabodah · Jun 28, 2006

They are cutting the L1 down to 32/32 as well, which further proves my point. Increasing the Hypertransport and memory controller frequencies is going to handle bandwidth well. Heck, the Asus AM2 590 deluxe and ATI 3200 boards are already hitting 1.5ghz stable. So I guess it is funny how Intel has more processing power, but issues getting the data there, and AMD has less processing power but a better way to get the data there.

julius · Jun 29, 2006

why, oh why is amd turning into a fud factory?

syn1kk · Jun 29, 2006

Congradulations. You've posted an article based on false information.

That is 9-inch's trademark. All posts I have seen by 9-inch in the past month or so have used theinquirer... and now he is posting from another shady site.

So it leads me to believe that he is forced to use bad sources to insult Intel because there is no credible source that will back him up.

----------

I will listen once there is a credible source that you are getting info from. But until then... I hold my judgements. And ignore what you post. =P

syn1kk · Jun 29, 2006

I was originally going to say that this entire post doesn't really seem to be trying to state facts (aka reeks of fanboy)... it seems like one huge partisan debate ...

Action_Man · Jun 29, 2006

They are cutting the L1 down to 32/32 as well

No they're not!

Action_Man · Jun 29, 2006

lol what bs article.

Aside from performance and power considerations, Apthorpe claims the cache issue for Intel also gives AMD dual core technology a clear cost advantage over Intel.

Hmm, 140mm2 vs 183 and 230mm2. Real cost advantage there!

Apthorpe says the Intel Xeon involving large amounts of cache on the processor also has power consumption disadvantages. "Cache is a huge consumer of power, so when you have large amounts of cache you always have large amounts of heat," he says.

:lol: And this guy has 25 years experience!?

ivan_lee05 · Jun 29, 2006

Get your Crap out of here you moron.. we all know here that having a shared cache is too much better than independent cache.

read more...

gman01 · Jun 29, 2006

Isn't adding more levels of caches, actually adding more band aids? Aren't caches really needed because the processor is not fast enough to process the data you are sending it? Will we see a day where processors will not need a cache or only just 1 in case?

please edjucate me....

Mike995 · Jun 29, 2006

Wow, why even post this ? and AMD guru isnt goign to side with intel its the most biased information you can possibly post, and clearly since everyone including AMD is going towards a shared cache even not too far off designs such as the K8L, so you can pretty much say well amd is covering their ass, and I guess that spewing out platforms which wont take off and theoretical bullshit wasnt enough to cover themselves.

MagicPants · Jun 29, 2006

Aren't caches really needed because the processor is not fast enough to process the data you are sending it?

Nope it's the other way around. Main memory isn't fast enough for the processor.

"AMD Guru Says Intel Xeon Shared Cache Inferior "

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Glorious

Distinguished

Splendid

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Splendid

Splendid

Distinguished

Distinguished

Distinguished

Distinguished

Share this page