Sign in with
Sign up | Sign in
Your question

"AMD Guru Says Intel Xeon Shared Cache Inferior "

Last response: in CPUs
Share
June 28, 2006 4:57:00 PM

Quote:
According to a senior technical manager at chipmaker AMD, the new dual core Intel Xeon 5100 server processors are not all they're cracked up to be because of a major design drawback. The problem, according to the AMD technical guru, is that the so-called dual core processors share a single memory cache unlike the dual core AMD Opteron processor, which has separate dedicated cache for each core.

Michael Apthorpe, senior technical manager for AMD in Australia and New Zealand, an electronics engineer with more than 25 years experience in chip design starting from his days at Mitsubishi Electric, says that any dual core processor design that involves both cores sharing a single memory cache has both performance and power consumption disadvantages.

"There are two types of cache with processors and they are known as exclusive and inclusive cache," says Apthorpe. "With inclusive cache, you have one allocation of RAM. What happens is that if you're running a program and all of sudden you change programs, you have to stop and flush the entire cache and reallocate and reload it. That takes clock cycles to do and that's our competitor's product."

The difference with AMD's Opteron dual core processor range, according to Apthorpe, is that each core on the processor has its own dedicated cache. "That means with our processors, they never have to stop and ask the cache to flush and wait for it to reload," he says. "Therefore, the processor always has full access to the cache which really speeds things up.

"When you go to multiple processors, the problem gets even more pronounced. If you have just one cache and one processor is dominating the cache utilisation and then you get a request on the other processor, the same thing happens. That is it has to stop, flush, reallocate and reload. This all takes clock pulses so again it hamstrings the performance of the system. That's why our competitor has to put so much more cache in their systems because they have to make up for the latency. They port information out of their main RAM into the cache of the processor and they try to make up the latency which they create by stopping, flushing and reloading the processor cache."


Is this the reason why the Woodcrest/Coroe are prone to cache trashing?

Is this the main reason why AMD is going for a large and shared L3 cache?

http://www.itwire.com.au/content/view/4785/53/
June 28, 2006 5:04:51 PM

O RLY?

Why is AMD going to use shared caches themselves then?
June 28, 2006 5:14:02 PM

The FSB is what the problem with Intels are. The shared Cache from what I have read is overall better then it is bad. But we will see in the long run which is better.
June 28, 2006 5:34:05 PM

Quote:
According to a senior technical manager at chipmaker AMD, the new dual core Intel Xeon 5100 server processors are not all they're cracked up to be because of a major design drawback. The problem, according to the AMD technical guru, is that the so-called dual core processors share a single memory cache unlike the dual core AMD Opteron processor, which has separate dedicated cache for each core.

Michael Apthorpe, senior technical manager for AMD in Australia and New Zealand, an electronics engineer with more than 25 years experience in chip design starting from his days at Mitsubishi Electric, says that any dual core processor design that involves both cores sharing a single memory cache has both performance and power consumption disadvantages.

"There are two types of cache with processors and they are known as exclusive and inclusive cache," says Apthorpe. "With inclusive cache, you have one allocation of RAM. What happens is that if you're running a program and all of sudden you change programs, you have to stop and flush the entire cache and reallocate and reload it. That takes clock cycles to do and that's our competitor's product."

The difference with AMD's Opteron dual core processor range, according to Apthorpe, is that each core on the processor has its own dedicated cache. "That means with our processors, they never have to stop and ask the cache to flush and wait for it to reload," he says. "Therefore, the processor always has full access to the cache which really speeds things up.

"When you go to multiple processors, the problem gets even more pronounced. If you have just one cache and one processor is dominating the cache utilisation and then you get a request on the other processor, the same thing happens. That is it has to stop, flush, reallocate and reload. This all takes clock pulses so again it hamstrings the performance of the system. That's why our competitor has to put so much more cache in their systems because they have to make up for the latency. They port information out of their main RAM into the cache of the processor and they try to make up the latency which they create by stopping, flushing and reloading the processor cache."


Is this the reason why the Woodcrest/Coroe are prone to cache trashing?

Is this the main reason why AMD is going for a large and shared L3 cache?

http://www.itwire.com.au/content/view/4785/53/

the main reason why AMD is going for a L3 shared cache is because they can't increase the size of L2 with IMC. its just not possible.

and FYI, Conroe basically annihilate FX-62 under almost every single situations. yes, that include multi-threading and multi-tasking.
now you tell me they have a cache thrashing problem?

you need to lay off sharikou.blogspot.com. it is extreme toxic to your logic.

P.S. i emailed one of Intel's tech guru's during the Computex period. since i can't disclose his information, i'll only post what he wrote on the email.


"Example, in the case of a media conversion from MPEG2 to MPEG4, the
MPEG2 decoder will get its stream from the front side bus and decode the
frame. The MPEG4 encoder will pick the data from the Smart L2 and encode
it to MPEG4 and sent the stream to disk. In Standard definition, this is
fully cached ... 98%! In the case of split cache, you have a cache miss
and write back request from each L2 to each other ... a real nightmare;
cache efficiency is around 10% for the decoder, and 80% for the encoder,
that is around 45% of the memory request fulfilled by the L2 cache
subsystem, far from 98% ..."
June 28, 2006 5:54:11 PM

So that explains why Woodcrest/Conroe is getting thrashed in virtually every benchmark known to man! :roll:

And why AMD is going to shared cache in their next generation architecture. :lol: 
June 28, 2006 5:57:17 PM

First of all, what authority does AMD have when talking about intel processors? Even if you are a "guru," if you're from AMD, we all know that you are going to trash intel, so why bother posting the article in the first place 9-inch?

Secondly, the shared cache is a major BONUS to intel's designs, it actually INCREASES performance because the cores can interact across the cache. Also, in multithreaded apps, the seperate caches slow down the performance, as is evidant from viperabyss's post.

Strangely enough, 9-inch hasn't come back to spread around more crap about AMD being better than intel, mabye he finally relized that it ws pointless
June 28, 2006 6:06:31 PM

Well, even with all these problems, the Core 2s thrash the K8s.

So is the AMD guy trying to say that their products can't beat even problematic products from their competitor?
June 28, 2006 6:16:15 PM

That article was pure jokes,
AMD saying smaller cache is better, share is not good, etc.. etc.. etc...
All the while they would do it themselves if their manufacturing process and fabs were up to the task. :D 
June 28, 2006 6:22:29 PM

Cache thrashing can be a problem for ALL of these chips. Conroe is a little better in some cercumstances, but both manufacturers need to find a way to better manage cache. There was just an good article about that, I believe on Anandtech. Cache itself is a band-aid, however, I wish Intel would just suck it up and adopt a better bus or get theirs finished. I could see the Conroe march working a long time in the future if they did.
June 28, 2006 6:26:09 PM

I though Conroe was going to go back to Netburst like the PIII's used?
June 28, 2006 6:39:11 PM

? Are you being sarcastic?
June 28, 2006 7:42:57 PM

Quote:
Is this the reason why the Woodcrest/Coroe are prone to cache trashing?


It doesn't suffer from cache thrashing moron.

Quote:
That's why our competitor has to put so much more cache in their systems because they have to make up for the latency.


The cache latency is same and it takes up the usual ~40-50% of the die just like AMDs.
June 28, 2006 8:04:35 PM

Congradulations. You've posted an article based on false information. Intel's shared L2 cache has been described as a non-inclusive, non-exclusive cache for the specific reason to prevent the problems that AMD is trying to play up. Now how the shared L2 cache operates specifically I'm still not clear on, but it obviously isn't as clear cut as AMD wants you to believe.

Besides, I'm not sure what Apthorpe is talking about anyways. First of all, he makes it sound like you're flushing the entire cache every time you switch programs. As far as I know, the mechanism does not require the flushing of the entire cache but the swapping of the least used cache line in the L2 for incoming data from RAM. This would be true for exclusive or inclusive caches. While the entire cache may be swapped in the end, that only occurs if that is how much new data the processor needs and that is true regardless of of exclusive, inclusive, shared or nonshared caches.

What an inclusive cache really is, is the fact that the contents of the L1 cache is duplicated in the L2. What this means is that when a cache line (again things move in cache lines not the entire cache) needs to be copied from the L2 to the L1, all the L1 needs to do is delete it's copy to make room for new information from the L2. In an exclusive cache, there is no duplication (hence exclusive) which means that when a cache line is copied from the L2 to the L1, the existing L2 cache line needs to first be copied to the L2 before the L1 can except the new cache line from the L2. What this means is that an exclusive cache is actually slower than an inclusive design not the other way around. There are ways of speeding up an exclusive system such as using a victim buffer.

I bring this up every time I take about cache architecture, so once again I'll mention that an exclusive design like what AMD uses relies on the L1 cache more than the L2. That is why we see AMD using larger L1 caches than Intel. By the same token, an inclusive cache design benefits more from the L2 which is why Intel uses larger L2 caches. It's not that AMD doesn't benefit from larger L2 caches, its just that relatively they'd benefit less. AMD plays it off that their architecture doesn't require large L2 caches which is true, because even if they increase the L2, they hit the point of diminishing returns a lot quicker so it isn't worth it for them to have large L2 caches. Intel uses large L2 caches to gain a 2-fold benefit, the fact that inclusive caches gain more from large L2 caches, and the fact that large L2 caches alleviate any FSB bottleneck.

Moving on to cache trashing, that issue has been played up far too much. Cache trashing was an issue in Netburst processors with HT, because the cache had no control over allocation to each core which meant one core could be evicting a cache line that another core needs, and one core could theoretically monopolize the entire cache. That isn't the case with the "smart" shared L2 cache in Conroe. The prefetchers and other logic in the cache dynamically allocated cache to each core based on their usage patterns and projected needs. This means that it isn't possible for one core to completely take over the cache if the 2nd core needs cache space too. Now it's possible that the dynamic sharing mechanism isn't fool proof, but you shouldn't assume that it doesn't exist.

Finally,
Quote:
Is this the reason why the Woodcrest/Coroe are prone to cache trashing?

Really, where? I'd love to see a link that conclusively proves cache trashing. You'd probably want to wait for the shipping products anyways since the last CPU revisions could tweak things, and newer BIOS would improve memory handling. (For a note, even though GamePC used stepping 5 for Woodcrest which is probably the shipping version, the CPU themselves were still labelled Engineering Samples. The Woodcrests that THG has right now are even older stepping 4 Engineering Samples).

Quote:
Is this the main reason why AMD is going for a large and shared L3 cache?

I wouldn't call a 2MB shared L3 cache large, especially between 4 cores. If you want a large shared L3 cache you'd look to Tulsa with it's 16MB shared L3 cache for 2 cores. Even with sharing done in L3 instead of L2, with each dedicated L2 only being 512k you'd still be relying on properly working dynamic sharing mechanisms in the L3 cache, because the 512k L2 won't keep the core fed for long, especially if K8L has the vast performance potential that you believe.
June 28, 2006 8:16:20 PM

I was agreeing with you almost the whole page!!!! Then you brought up a worry that the cache wouldn't be able to keep up the the improvements of K8L? What was that? Obviously you have a good understanding of cache and its uses, so, being that it is necessary because of bandwidth issues, and the K8L will not have any, THAT part doesn't make sense. The rest was a very good explanation, though. Maybe you can explain your K8L reasoning better?
June 28, 2006 8:22:43 PM

All I'm saying was that K8L is supposed to be an improvement over K8 which means that it'll be able to process things faster/more efficiently which means that it'll need more more bandwidth to keep it fed. Now, 9-inch assumes that the shared L3 cache is a saving grace, but if you look at the overall design, the L2 cache has been cut in half. Currently Opterons have 1MB of dedicated L2 cache. A quad core K8L will only have 512k of dedicated L2 per core, and 2MB of L3 cache between 4 cores. This means that the net result is from 1MB of fast L2 per core to 512k of fast L2 and 512k of slower L3 per core. Now, a shared L3 means that the 2MB is bigger than it appears, but if 9-inch is so worried about cache trashing, he'd better hope that it isn't occuring in the shared L3 cache or else the performance potential of the more bandwidth hungry K8L will be impacted.
June 28, 2006 8:28:48 PM

They are cutting the L1 down to 32/32 as well, which further proves my point. Increasing the Hypertransport and memory controller frequencies is going to handle bandwidth well. Heck, the Asus AM2 590 deluxe and ATI 3200 boards are already hitting 1.5ghz stable. So I guess it is funny how Intel has more processing power, but issues getting the data there, and AMD has less processing power but a better way to get the data there.
June 28, 2006 9:23:45 PM

why, oh why is amd turning into a fud factory?
June 28, 2006 9:24:41 PM

Quote:
Congradulations. You've posted an article based on false information.


That is 9-inch's trademark. All posts I have seen by 9-inch in the past month or so have used theinquirer... and now he is posting from another shady site.

So it leads me to believe that he is forced to use bad sources to insult Intel because there is no credible source that will back him up.

----------

I will listen once there is a credible source that you are getting info from. But until then... I hold my judgements. And ignore what you post. =P
June 28, 2006 9:26:49 PM

I was originally going to say that this entire post doesn't really seem to be trying to state facts (aka reeks of fanboy)... it seems like one huge partisan debate ...
June 28, 2006 11:17:22 PM

Quote:
They are cutting the L1 down to 32/32 as well


No they're not!
June 28, 2006 11:55:57 PM

lol what bs article.

Quote:
Aside from performance and power considerations, Apthorpe claims the cache issue for Intel also gives AMD dual core technology a clear cost advantage over Intel.


Hmm, 140mm2 vs 183 and 230mm2. Real cost advantage there!

Quote:
Apthorpe says the Intel Xeon involving large amounts of cache on the processor also has power consumption disadvantages. "Cache is a huge consumer of power, so when you have large amounts of cache you always have large amounts of heat," he says.


:lol:  And this guy has 25 years experience!?
June 29, 2006 12:21:00 AM

Get your Crap out of here you moron.. we all know here that having a shared cache is too much better than independent cache.



read more...
June 29, 2006 12:36:44 AM

Isn't adding more levels of caches, actually adding more band aids? Aren't caches really needed because the processor is not fast enough to process the data you are sending it? Will we see a day where processors will not need a cache or only just 1 in case?

please edjucate me....
June 29, 2006 12:38:03 AM

Wow, why even post this ? and AMD guru isnt goign to side with intel its the most biased information you can possibly post, and clearly since everyone including AMD is going towards a shared cache even not too far off designs such as the K8L, so you can pretty much say well amd is covering their ass, and I guess that spewing out platforms which wont take off and theoretical bullshit wasnt enough to cover themselves.
June 29, 2006 1:02:30 AM

Quote:
Aren't caches really needed because the processor is not fast enough to process the data you are sending it?


Nope it's the other way around. Main memory isn't fast enough for the processor.
June 29, 2006 1:06:20 AM

Yeah, cost advantages will be mitigated by intel's much more aggresive process scaling. The space required by the extra cache covered by the 65 and later 45nm processes. Is AMD going to use shared L3 as a sort of crossbar? Maybe that's why they're using L3, it doesn't need to be as fast if it is just swapping results between processors.
June 29, 2006 2:00:44 AM

Quote:
Now, 9-inch assumes that the shared L3 cache is a saving grace, but if you look at the overall design, the L2 cache has been cut in half. Currently Opterons have 1MB of dedicated L2 cache. A quad core K8L will only have 512k of dedicated L2 per core, and 2MB of L3 cache between 4 cores.


Indeed, AMD will be offering 2MB and 4MB variants of K8L. I don't know how to feel about this whole shared cache idea, but I do hope AMD has resolved the cache trashing problem which is plaging conroe and woodcrest.
June 29, 2006 2:02:52 AM

Quote:
but I do hope AMD has resolved the cache trashing problem which is plaging conroe and woodcrest.


Yet they manage to thrash (pun unintended) AMD's line up. Can you provide some proof for your FUD though?
June 29, 2006 2:30:17 AM

Quote:
Indeed, AMD will be offering 2MB and 4MB variants of K8L. I don't know how to feel about this whole shared cache idea, but I do hope AMD has resolved the cache trashing problem which is plaging conroe and woodcrest.




Provide one single source that actually proves that there is serious cache issues going on? Seriously, just shut up with it. It's not even funny anymore. If AMD was first to market with shared cache, you would be humping it like a dog and a leg. You would say it's a great innovation, and that it will provide superior performance to those dated non-shared caches. Just another example of Intel "falling behind."

Seriously, "cache thrashing" is one of the most retarded things I've ever heard in my life. Sure, if it was terribly designed, I'm sure it could be an issue. But it isn't like a shared cache design can just be pulled out of your ass. A lot of thought is involved, and I'm quite sure that the Intel engineers are competent enough to have failsafes in place to deal with resource conflicts. I'm sure AMD's shared cache solution will be the same way.
June 29, 2006 2:38:34 AM

We're still waiting for proof FUDie.
June 29, 2006 2:40:17 AM

Quote:
Well, even with all these problems, the Core 2s thrash the K8s.

So is the AMD guy trying to say that their products can't beat even problematic products from their competitor?


What they're saying is that Intel has 12 fabs and can ruin a lot more die than AMD can. That ability enables you to study characteristics more thoroughly and develop more prototypes. AMDs position thugh good doesn't allow them - yet - to do that. I'm sure their profitabe quarters, lucrative deals and expanding demand allowed them to restructure their debt and get more money to fight.

It's a good fight. I like it.
June 29, 2006 2:41:25 AM

Quote:
Well, even with all these problems, the Core 2s thrash the K8s.

So is the AMD guy trying to say that their products can't beat even problematic products from their competitor?


What they're saying is that Intel has 12 fabs and can ruin a lot more die than AMD can. That ability enables you to study characteristics more thoroughly and develop more prototypes. AMDs position thugh good doesn't allow them - yet - to do that. I'm sure their profitabe quarters, lucrative deals and expanding demand allowed them to restructure their debt and get more money to fight.

It's a good fight. I like it.

3 Fabs that produce Conroe based processors actually.
June 29, 2006 2:52:44 AM

And the ownage continues.
June 29, 2006 2:58:48 AM

Sure, I'll drive him out to the woods and leave him there. :p 
June 29, 2006 3:01:26 AM

you actually believe that an AMD representative is going to be as neutral as he/she can be?

don't you understand the concept of "marketing"?
June 29, 2006 3:13:23 AM

so... basically the least one core can have if it is equally divided 512k and the max is 1mb.... ok..... so are you saying that AMD shared L3 cache won't have any problems with so called "cache trashing"
June 29, 2006 4:52:14 AM

This cache thrashing may have a merit. Intel's E6400 ES's have 4mb of L2 cache sent to review sites. 2mb cache E6400 (retail version) may run into the cache thrashing problem as described by AMD guru.
I am waiting to see some benchmarks on load program, stop, reload, stop, reload and finish, and see how long it takes for both platforms (Intel vs AMD).
I think I know the answer.
June 29, 2006 4:55:38 AM

If it has merit then prove it.
June 29, 2006 5:13:01 AM

How should I know? But thats anything but proof. Yonah gets along fine with just 2mb and at higher speeds.
June 29, 2006 6:31:19 AM

Quote:
If it has merit then prove it.

OK fine. check the link below and check E6400 w/ 4mb L2 cache.
why would Intel do such a thing? should'nt have Intel sent the retail version that is 2mb L2 cache? you wonder.
http://www.digit-life.com/articles2/cpu/intel-conroe-2-...
Maybe because the 2MB versions weren't ready yet until more recently like here:
http://www.hardware.fr/articles/623-9/intel-core-2-duo-...

The 2MB E6400 still matches the FX-62.
here are a few points to consider:
1)its in french and cant understand their analysis.
2) you should compare the E6400 to AM2 4600 x2 for price performance.
4600 x2 will be same price if not lower than E6400 at the time of conroe release.
3) the winrar is cooked. I get 724 for my A64 Venice 3000, yet Fx62 gets below 200.
wonder why the core due same speed same cache is lower score than core 2 due= cooked benches ( because controlled by Intel is beside the point)
4) you can buy any AM2 SLI 570 mobo for $125, where as any Intel's mobo will not have SLI and is over $200. price/performance become a factor for gaming enthusiasts. These days most enthusiasts will use SLI. unfortunately Intel can not fulfill this.
5) 65nm to 90nm is a cop out. AM2 platform is upgradable to K8L /65nm. We will see 4 new AMD 65nm processors by Nov of this year, at the time of conroe release. Yes I said Nov. You will not be able to buy any conroe product from retailer by then. Maybe by the end of Sep if you are lucky. But will have hard time setting it up because of motherboard fiasco.
So I would say by the time conroe is matured and available and un-bias reviews conducted, you will have no choice but to compare it against AMD's 65nm, which is after all is a fair comparison.
Unfortunately, non of you will enjoy the drama Intel has created for you. Back to old days again, when silence has always been the norm over @ intel camp.
So keep on jumping out of your chair for time being and dream about "Intel rules".
June 29, 2006 7:12:38 AM

Quote:

here are a few points to consider:
1)its in french and cant understand their analysis.

Learn to use translators, the graphs are easy enough to understand.

Quote:

2) you should compare the E6400 to AM2 4600 x2 for price performance.
4600 x2 will be same price if not lower than E6400 at the time of conroe release.

That's the point, the E6400 matches AMD's fastest processor at a mainstream price.

Quote:

3) the winrar is cooked. I get 724 for my A64 Venice 3000, yet Fx62 gets below 200.

wonder why the core due same speed same cache is lower score than core 2 due= cooked benches ( because controlled by Intel is beside the point)

It's in seconds, so the lower the better.

Quote:

4) you can buy any AM2 SLI 570 mobo for $125, where as any Intel's mobo will not have SLI and is over $200. price/performance become a factor for gaming enthusiasts. These days most enthusiasts will use SLI. unfortunately Intel can not fulfill this.

The overwhelming number of gamers don't use SLI, prices will inevitably come down as more motherboards come out, and any price differences will be reduced by the fact you can buy a $200 CPU that outguns the fastest AMD processor in games and you don't need DDR2-800 memory.

Quote:

5) 65nm to 90nm is a cop out. AM2 platform is upgradable to K8L /65nm. We will see 4 new AMD 65nm processors by Nov of this year, at the time of conroe release. Yes I said Nov.

Too bad they'll initially be clocker slower than the 90nm models.

Quote:

You will not be able to buy any conroe product from retailer by then. Maybe by the end of Sep if you are lucky. But will have hard time setting it up because of motherboard fiasco.....

Nothing but the raving of a lunatic.
June 29, 2006 7:14:03 AM

Quote:
5) 65nm to 90nm is a cop out. AM2 platform is upgradable to K8L /65nm.


Next focking year! How is it a cop out? You're happy to compare 65nm P4's to 90nm X2's but once its 65nm conroe against 90nm X2's its all of a sudden not fair. WTF are meant to do wait till next year till they release something 65nm based? Your reasoning is flawed and you're an idiot.
June 29, 2006 7:57:57 AM

Question:
Can the two (ore more) cores function at different speeds while using a shared L2 ?
June 29, 2006 9:08:14 AM

Why is it that everytime 9-inch posts sumthing. The pro-intel people tears his article apart. Why don't you just say "No," and point out his error in judgment?
June 29, 2006 9:11:42 AM

Yes they can.
June 29, 2006 3:24:44 PM

Because its more fun this way plus... he does deserve it, dont you think?
June 29, 2006 5:35:18 PM

Yeah, I wouldn't believe an Intel engineer's critique of an AMD processor.
Why would I believe what an AMD guy says about Intel?
Interesting read but not very useful, IMHO.
June 29, 2006 5:38:38 PM

To the great unwashed out there, paraphasing the old aphorism "performance talks BS walks". DOE did 4 months continuous testing of the Woodcrest against AM2. That is more testing than all the skippy reviews that either AMD or Intel ot "independant "reviewers have released on Conroe/ Woodcrest vs AM2. The contract award, June 18th, for the Oak Ridge expansion and upgrade goes to CRAY http://www5.sys-con.com/read/236783.htm and Opteron. Total 125,000 cpus. http://www.fcw.com/article95010-06-26-06-Web&RSS=yes Total power will be just short of (25 teraflops or 2%) 5X Blue Gene L at Lawrence Livermore. 1.25 petaflops. There are lesser contracts for 16,000 opterons going to Lawrence Livermore Labs and 7500 to go to Sandia. It will take the remainder of the top 45 to equal the Baker complex at Oak Ridge. Intel has to console itself with Pixar. If Intel had a "real world" performance advantage they would have gotten the Oak Ridge contract award. The only thing close is Rennselaer's new Blue Gene at 500 teraflops.
!