Sign in with
Sign up | Sign in
Your question

Phenom vs. Athlon Core Scaling Compared

Last response: in CPUs
Share
January 14, 2008 11:10:15 AM

We already compared a single Athlon 64 X2 core to a single Phenom core and found solid performance benefits in the new Stars architecture. Now we're looking at how well Phenom will scale with increasing clock frequency, by comparing the Athlon 64 X2 and Phenom 9000 using only a single core.

http://www.tomshardware.com/2008/01/14/phenom_vs_athlon_core_scaling_compared/index.html
January 14, 2008 12:23:29 PM

Thanks for the thread and for Tom's doing both of the articles relating to Athlon X2 vs. Phenom scaling. Since Phenom isn't a brand new architecture, the CPU's of greater importance to those of us upgrading AM2 boards than it is for someone trying to decide between a Phenom and a Q6600.

We all know it's a budget quad core. AMD won't be on top for quite some time, but who cares about that? All that matters is that the processor delivers as an upgrade path. Now, if only the motherboard manufacturer's get out new bios updates in time for B3.
January 14, 2008 12:38:25 PM

So the article shows that Phenom doesn't scale swell as an Athlon 64. Well that just fills you with hope. Despite the authors optimism about Phenom still being faster than A64 even he couldn't resist the link with Prescott.

This is further proof that something is fundamentally wrong with the Stars arch. AMD need to forget about schedules and make as many improvements as they can with the B3 stepping. Higher clockspeed obviously isn't the answer. Increasing IPC and fixing errata and the like are. Obviously I know it’s not easy to just make things better but given the time they must surely be able to make some performance improvements. Its not like existing Phenom owners will care because there aren’t any.

It doesn't matter at the moment if there isn't an enthusiast chip out at the moment. What matters is that it is barely competitive with its predecessor let alone the competition.

AMD's answer might even be to skip the B3 stepping. Instead of working on the fixes for the 65nm Phenom's, move the entire workforce onto implementing the fixes on 45nm and getting that out the door as quick as possible.
Related resources
a b à CPUs
January 14, 2008 12:39:00 PM

Still think the mem / cache frequency is the key to improved performance ... as a quick fix.

It's locked at 1.8Ghz ... was 2.0 Ghz on the ES ... or am I wrong ... Iv'e been known to be wrong once or twice ... lol.

Can someone play with an ES there ... presuming you can move the mem controller / L3 cache speed up and down on it ??

Results might be interesting.



January 14, 2008 12:39:16 PM

Well, it has been quite a long time I read THG guides, and I really am sorry to admit that the quality of the reviews are dropping.

I have good knowledge of cpu architectures (I work with and develop for HPC for 8 years), and I cannot agree with the conclusions of your last reviews of the Phenom processor. I thought you guys were "impartial", but taking all Phenom's reviews only lead to a tendency in favor of the current Intel offering.

First of all, in any of your articles is emphasised that each implementation of the x86 made by Intel or AMD has its own strengths and weakness, and that, despite these being multi-purpose cpus, each is best at doing certain tasks. All you say is that Intel's implementation is "unbeatable" today.

This said, I ask you: did you guys know that AMD's implementation is quite better than Intel's for a cluster in which scalability is desired? That an integrated memory controler can make better use of the available memory bandwidth?

So, if you guys think you know hardware and cpu architecture, didn't you notice anything strange with the synthetic benchmark for the Phenom with Sandra?? How can a Phenom have slower memory performance than every Intel quad in this sort of benchmark? Didn't you feel uncornfortable at the time of writing that, or it was, in fact, lack of knowledge?

Please, refer to your coleagues at http://legitreviews.com/article/597/4/ and http://legitreviews.com/article/617/1/ to read a "pretty print" version of what I am saying. You guys messed up here.

I think you should change your conclusions to something like: "we test a variety of simple tasks of a home user who might be interested in these cpus, and the most suitable or best performing for these is..." without quoting a dubious title of "unbeatable", "champion" or "king". Any cpu you point as that can be beaten by some other brand in some task you guys might not even imagine. And let's not forget that a certain cpu cannot be the best for someone who cannot buy it.

Note that by doing so, you can't complain that so many forum members waste their times uselessly "fighting" one another because Intel or AMD "is the best".

I expect more professional and impartial articles from THG, or if it is not the case, please, make it clear that you are not trying to be impartial. I really like your previous material, the interactive charts are great, site layout, news, etc.. So, the quality of the texts must be as great, not inferior.
a b à CPUs
January 14, 2008 12:50:04 PM

Good points Cav ... your a server man.

I acknowledge the AMD arch scales very well ... the chip was obviously respun with the server market clearly in mind.

No arguments from me.

We need people like you to post here ... yes the Intel fanboy's are a bit trollish ... but a loveable bunch of rogues nevertheless.

Core2 is an excellent choice for the enthusiast ... because of it's headroom ... AMD's is currently poor.

Most who post are interested in single socket systems I'd guess.

Not many of us have 19" racks and SuperMicro's !!

Cheers and all of the best to you !!



January 14, 2008 12:51:10 PM

^^ Agreed with Cav. The last few CPU articles seemed to be very biased. They always seem to lean toward Intel. If you're writing about stuff exclusively on AMD, keep Intel out of it. It will set the readers' minds to flashback on Intel's result's and will unbalance the statistics.

Oh and I noticed an error on the Gaming Benchmark page. All pictures show the Nvidia 8800 GTX 768 MB instead of ATI HD 3870. Nice copy/paste work there.
January 14, 2008 1:20:06 PM

Reynod said:
Still think the mem / cache frequency is the key to improved performance ... as a quick fix.

It's locked at 1.8Ghz ... was 2.0 Ghz on the ES ... or am I wrong ... Iv'e been known to be wrong once or twice ... lol.

Can someone play with an ES there ... presuming you can move the mem controller / L3 cache speed up and down on it ??

Results might be interesting.


I think it would make a big difference if they could get it to run at the same freq as the core like they do with A64's. It would also scale much better which is what the article was about but the author didn't mention.

I agree with Cav the articles have been getting worse missing out important facts. I am AMD fan but I think if anything the articles have been AMD biased. At least the author recocmomended to wait for B3 stepping. I understand were Cav is coming from with a server perspective, which are perfectly valid but not for this article. The article is specificly looking at the desktop aspect of K10 rather than server side, where K10 with the exception of the errata and slightly slower clockspeeds is very competitive.
January 14, 2008 1:50:03 PM

Cav misses a critical point, however.

This is a Phenom, not Opteron, Review.
As a result things such virtualization Hosting such as with VMWare ESX don't apply.

This is a desktop chip review.

Unfortunately, if anything, the review has a Phenom bias.
They are running w/o the BIOS patch.
They are running the NB at 2.0 per the ES settings vs the 1.8 Shipping settings.

I've read a number of reviews that this can be a noticable difference.

While I could understand not implementing the TLB fix since it may be something folks turn off, not addressing the NB speeds is simply bad testing.
January 14, 2008 3:14:27 PM

I actually wrote this bearing in my mind that it is a desktop cpu review. What I pointed is:

1) You can't go writing that a Phenom is "that much inferior" if you can't even benchmark it well.

2) A "generic" benchmark simply doesn't exist. If you do such a test, there is always a reason behind every result, being it good or not. THG is only showing results, without the "rationale" - it actually seems they publish results without even thinking of what it really means.

3) I only pointed the memory throughput case (which performance is more often observed in server environments) to show that this feature of the cpu is sometimes important. And honestly, it would be a poor excuse to say "oh well, we made the wrong test and showed wrong numbers, but it's ok because a home (desktop) user would never know anyway".

4) The most important: THG must point "best for this app", or "slower for that game", instead of giving its readers a generic good/bad classification - not forgetting the cost. The best system for a task is the one that performs best for the price you want to (or can) pay. (when the boss here tells me what he wants, my first question is how much is he willing to pay - not asking this can give serious issues some days latter... :p  )

I might not have given the best example, for it's from a different perspective, but THG's reviews are really losing the quality I was used to see here. It was not the first time I saw THG go wrong, but the benchmark mistake was detected by Sisoft a month ago, and THG never even made a 1-paragraph note.

This can lead some people who blindly believe in THG benchmark results to buy a system or a brand that might not be exactly the best deal.

Thanks guys, and the best to you too.
January 14, 2008 3:20:13 PM

I was just reviewing Tom's CPU charts.. I set it to compare the fastest Phenom you can buy vs the fastest Kentsfield CPU you can buy.. I checked EVERY SINGLE REAL WORLD BENCHMARK and the QX6850 wins in every single benchmark. There may have been a few ones I missed because I did it quickly, but from my rough calculations, Intel wins in 100% of the real world benchmarks. Now that sounds like the Intel offering is "unbeatable" to me... This is the reason I decided to buy an Intel CPU instead of a new AMD, I see no bias there..
January 14, 2008 3:40:27 PM

Oh I agree about the "For this app or for that app".

Truth be told, take an E2160 and OC it to about 3.2Ghz and toss in an 8800GTS 512MB card, you will likely not notice much of a difference than with even an OC'd Q6600 in most games. Oh, it may BENCHMARK different, but will you feel it?

On the othehand, if you were Tarring files, creating multimedia content, etc... you would DEFINTELY see the difference.

Part of the problem with Benchmarks, is that the results are often over-gyped. I know I can easily OC my system another 10-15%, but I could not feel the difference in use, but I could hear the difference in fan noise. The result is I clocked back down.



January 14, 2008 3:44:39 PM

zenmaster said:
They are running the NB at 2.0 per the ES settings vs the 1.8 Shipping settings.

I've read a number of reviews that this can be a noticable difference.


This is the point we are trying to make. If going from 1.8ghz to 2.0ghz on the IMC makes that big of a difference, the fact that Athlon x2 is running up to 3.2ghz on the IMC versus the Phenom's 2.0ghz on the ES is A HUGE ISSUE. IMHO, having zero L3 cache and running the IMC at core frequency would net the same or better scaling of Phenom than A64x2.


January 14, 2008 5:04:48 PM

This article's encoding benchmarks show pretty well the limiting returns of multiple processing cores. Going by some of the good multi-threaded encoding benchmarks, it seems that doubling cores will give a 50% performance increase, half of theoretical. Dual core is 150% the performance of single core, and Quadcore is 150% the performance of Dual core but only 225% the performance of single core. That would mean that an octocore would be only 337% the performance of a single core, less than 50% of theoretical. In other words, the collective computing power of four computers with single core processors could outperform one computer with an octocore. Granted, you can't really harness the power of four separate computers to perform a single task, such as encoding (I think even a beowolf cluster would have the same inefficiencies as a multi-core, but idk for sure), and of course power consumption would be much higher, but this meandering is more to point out the observation of diminishing return of core number than it is a motion for folks to adopt multiple PC configurations.

Here's how it works out, provided the 50% increase we've been witnessing is a given when continually doubling the core count. If at some point the # of cores becomes such that due to I/O bottlenecks (or software) there is no longer a 50% performance return for every doubling of the core count, we will have a much faster rate of diminishing return:

1 core = 100% (performance per core ratio of 100% of a single core processor)
2 cores =150% (performance per core ratio of 75% of a single core processor)
4 cores =225% (p/c ratio of 56.25% of a single core)
8 cores =337%. (p/c of 42%)
16 cores = 506% (p/c = 31%)
32 cores = 759% (23.7%)
64 cores = 1139% (17.79%)
128 cores = 1708% (performance per core ratio of 13.34% of a single core processor)

As you can see, we get some massive inefficiencies towards the top end of the spectrum. How many cores will Intel/AMD add to their CPUs before calling it quits?
January 14, 2008 5:21:20 PM

zenmaster said:
Oh I agree about the "For this app or for that app".

Truth be told, take an E2160 and OC it to about 3.2Ghz and toss in an 8800GTS 512MB card, you will likely not notice much of a difference than with even an OC'd Q6600 in most games. Oh, it may BENCHMARK different, but will you feel it?

On the othehand, if you were Tarring files, creating multimedia content, etc... you would DEFINTELY see the difference.

Part of the problem with Benchmarks, is that the results are often over-gyped. I know I can easily OC my system another 10-15%, but I could not feel the difference in use, but I could hear the difference in fan noise. The result is I clocked back down.


Im totally with you on the o/c point. I dont see the reason in taking a processor, overclocking it as far as you can and having it their for its entire lifespan, if your just making the system hotter and more power hungry and not feeling/seeing benefits in games, while also shortening the lifespan of the cpu into the bargain. I recently took my core 2 e6300 to 2.6ghz after having it since dec 2006, after having it at 2.4 for a few months and 2.150 for probably 6 months before that. I only did that because I got really into supreme commander which loves cpu power. If you cant see the benefits in the games you regularly play, then dont have a screaming oc on all the time. #end of rant lol#
January 14, 2008 5:26:27 PM

gpippas said:
So the article shows that Phenom doesn't scale swell as an Athlon 64. Well that just fills you with hope. Despite the authors optimism about Phenom still being faster than A64 even he couldn't resist the link with Prescott.

This is further proof that something is fundamentally wrong with the Stars arch. AMD need to forget about schedules and make as many improvements as they can with the B3 stepping. Higher clockspeed obviously isn't the answer. Increasing IPC and fixing errata and the like are. Obviously I know it’s not easy to just make things better but given the time they must surely be able to make some performance improvements. Its not like existing Phenom owners will care because there aren’t any.

It doesn't matter at the moment if there isn't an enthusiast chip out at the moment. What matters is that it is barely competitive with its predecessor let alone the competition.

AMD's answer might even be to skip the B3 stepping. Instead of working on the fixes for the 65nm Phenom's, move the entire workforce onto implementing the fixes on 45nm and getting that out the door as quick as possible.


The only thing wrong with the architecture in terms of performance (and not oc ability or power consumption) is:

1.) L3 cache is way too small
2.) HT speed doesnt change with increasing core clock

Alter that and alter amds fortunes. If the phenom had 4mb L3 cache im sure as eggs is eggs that it would really come alive.

On the points in brackets, well thats down to quality of the manufacturing process, and the fact that very large chips have heat and power consumption issues. #End of 2nd rant lol#
January 14, 2008 6:11:44 PM

spoonboy

I agree with you thats why I wrote it afterwards. More L3 cache would see a nice performance improvement but I don't think that at 65nm AMD can squeeze any more L3 cache in.

Even with the extra cache the largest performance increase would come from a scalable IMC. If it was at the cpu clocks speed the performance increase would be huge. Who cares if you end up with odd multipliers leading to wierd ram speeds. All the benchmarks show quite a significant difference between ES 2ghz IMC and the retail 1.8ghz models, which like others have said was neglected by THG.

Like I already said how the author neglected to mention the differences in IMC is beyond me because when it comes to clock scalability its of paramount importance. Also to ignore the fact that his ES sample runs faster. Surely THG have enough money to buy a 9600 BE if AMD refuses to provide them with one.

Oh and spoonboy where in Yorkshire are you? I'm in Sheffield.
January 14, 2008 6:20:57 PM

Sorry mate for not spotting you said it afterwards, I just glossed over most of the posts to be honest ;) 

Na here im gonna dissappoint you, Im actually from Exeter, devon, but one of my colleagues showed me the group on facebook which i joined "if its not from yorkshire its ****". Made me laugh so i joined it. Your also not gonna like that im a leeds fan. Come on the leeds! playing crewe this evening if your interested but probably not lol.

cheers
January 14, 2008 8:56:45 PM

hmmm has anyone actually tried upping the IMC clock on a phenom? It's actually supposed to be adjustable to within 100Mhz of the core clock for purposes of getting max speed out of your RAM if the core clock is based off an odd multi. I'm still waiting on UPS to get here tomorow, before I can throw the phenom 9600 BE into my mainboard and play around with it, so I can't confirm.

But according to the original phenom review, the split plane thing was suppose to allow you to do that, on an AM2+ based board.
January 14, 2008 9:13:00 PM

spoonboy said:
Sorry mate for not spotting you said it afterwards, I just glossed over most of the posts to be honest ;) 

Na here im gonna dissappoint you, Im actually from Exeter, devon, but one of my colleagues showed me the group on facebook which i joined "if its not from yorkshire its ****". Made me laugh so i joined it. Your also not gonna like that im a leeds fan. Come on the leeds! playing crewe this evening if your interested but probably not lol.

cheers


It is the sort of thing a yorkshire man would say. I'm not actually from Sheffield I'm from Cambridge so I'm actually a southener. I only moved up here a couple of years ago. As for Leeds your right I don't like them. My girlfriend is a Leeds fan. She's got a small Leeds emblem tattooed on her stomach. Her best mate is also a Leeds fan and they were down the pub earlier to catch a rare glimpse of Leeds on tv. On top of that Leeds won. As an Arsenal fan I do enjoy taking the piss out of her.

Back on topic I'm pretty sure all the reviews said that increasing the IMC by just a hundred mhz caused system instability.

Mathos
As the only person I have heard of that is going to own a Phenom BE that isn't a reviewer it would be quite interesting to let us know how you do with overclocking it including the IMC.
January 14, 2008 9:13:11 PM

Mathos said:
hmmm has anyone actually tried upping the IMC clock on a phenom? It's actually supposed to be adjustable to within 100Mhz of the core clock for purposes of getting max speed out of your RAM if the core clock is based off an odd multi. I'm still waiting on UPS to get here tomorow, before I can throw the phenom 9600 BE into my mainboard and play around with it, so I can't confirm.

But according to the original phenom review, the split plane thing was suppose to allow you to do that, on an AM2+ based board.


Yeah, well, a lot of the things phenom was supposed to be able to do got left out. The AM2+ boards do not allow you to adjust the IMC clock in bios and the AMD software does not have an option for it at this point. Hopefully it will come out soon.

What I want to know, is what clock do they run on an older AM2 boards?
January 14, 2008 9:17:18 PM

shabodah

I want to know how well an Athlon 64 performs in an AM2+ mobo with 1066 ddr2 and like you said vice versa but nobody seems to be bothering review anything.
January 14, 2008 9:52:17 PM

According to reviews, an Athlon 64 in an AM2+ mobo won't be able to take advantage of faster ram speeds without an overclock. I just bought a AM2+ mobo and an x2 5000+ BE, and ram defaults to 800MHz. Haven't yet tried overclocking, but I want to see if I can acheive ddr2 1066 speeds somehow... May be time for a new thread but anyone know how I would go about altering memory speeds with an unlocked multiplier?
January 14, 2008 9:55:15 PM

caveira2099 said:
4) The most important: THG must point "best for this app", or "slower for that game", instead of giving its readers a generic good/bad classification - not forgetting the cost. The best system for a task is the one that performs best for the price you want to (or can) pay. (when the boss here tells me what he wants, my first question is how much is he willing to pay - not asking this can give serious issues some days latter... :p  )


Thanks for your posts, they relate a few things I have known but have not taken the time to attempt to convey.

When comparing the AMD and Intel Quads I have seen benchmarks of AVG, WinRAR, Lame, iTunes etc. Most of the results are about the same when you compare CPU's of about equal pricing. (at stock speeds.)

(There are a few benchmarks with extremely different results. I have to question the validity of some of them. I suspect in the future we will see fixes for many of these benchmarks. Not just because of AMD... but when Intel comes out with their Nehalem chip... they will need to have those benchmarks fixed for the same reason AMD needs them fixed. But of course AMD needs these fixes NOW.)

I agree with your conclusion that we need more conclusive results on these benchmarks. When I see benchmarks times of: 2:40 vs 2:34, 0:52 vs 0:47, 1:32 vs 1:20, 2:44 vs 2:57 etc... or game FPS of 104.6 vs 113.4, 103.4 vs 110, 47.47 vs 49.85 etc... I have to consider that these numbers do not actually show a "winner" and a "loser" for the average desktop user. (If you put a user in front of both machines they won't see a difference in speed.)

HOWEVER ANOTHER MATTER: what about running several benchmarks concurrently? It is one thing to say that a chip runs two or three benchmarks faster. It is another thing to be able to say that the benchmarks still run faster when being run at the same time. I suspect a benchmark like that might show some surprising results. (ESPECIALLY since many of the quoted benchmarks ARE single threaded anyway.)

Over time we have started to see more multi-threading in benchmarks, I suspect we will start seeing more concurrent tasks being run as well as more complicated multi-threading. Perhaps even virtualization. What happens to that WinRAR speed in Windows when I'm also running Linux? What if I do some video encoding in windows while compiling the kernel in Linux?
January 14, 2008 9:58:30 PM

blackened144 said:
I was just reviewing Tom's CPU charts.. I set it to compare the fastest Phenom you can buy vs the fastest Kentsfield CPU you can buy.. I checked EVERY SINGLE REAL WORLD BENCHMARK and the QX6850 wins in every single benchmark. There may have been a few ones I missed because I did it quickly, but from my rough calculations, Intel wins in 100% of the real world benchmarks. Now that sounds like the Intel offering is "unbeatable" to me... This is the reason I decided to buy an Intel CPU instead of a new AMD, I see no bias there..


You see no bias in comparing a $240.00 CPU to a $980.00 CPU and concluding that the more expensive CPU is better? (Or were you being completely unrealistic and using sarcasm and I didn't catch onto that.)
January 14, 2008 10:02:04 PM

spoonboy said:
The only thing wrong with the architecture in terms of performance (and not oc ability or power consumption) is:

1.) L3 cache is way too small


MORE cache is not always the answer. Having more cache will benefit many benchmarks. It is also possible that more cache could be detrimental to multi-tasking.
January 14, 2008 10:32:44 PM

gpippas said:

Back on topic I'm pretty sure all the reviews said that increasing the IMC by just a hundred mhz caused system instability.

Mathos
As the only person I have heard of that is going to own a Phenom BE that isn't a reviewer it would be quite interesting to let us know how you do with overclocking it including the IMC.


Well, it's like a guy doin a fat girl, or a hot girl doin a fat guy. Every once in a while you gotta take one for the team. But then again, considering I'm upgrading from an X2 4200+ I'm gonna get a performance boost whether I oc or not.
January 14, 2008 11:09:31 PM

Mathos said:
Well, it's like a guy doin a fat girl, or a hot girl doin a fat guy. Every once in a while you gotta take one for the team. But then again, considering I'm upgrading from an X2 4200+ I'm gonna get a performance boost whether I oc or not.


Thats one way to put it. On another thread ages ago I said that for anyone with a 4600x2 or better shouldn't bother with Phenom yet, which I still stand by. Which means you fall into the catagory below that where you should see a decent performance gain. I'm on skt 939 with a 4800x2 so I would need new mobo, ram and cpu so if I was going to take the plunge and upgrade Phenom just wouldn't make sense for me.
January 15, 2008 2:59:30 AM

keithlm said:
MORE cache is not always the answer. Having more cache will benefit many benchmarks. It is also possible that more cache could be detrimental to multi-tasking.


More cache is NEVER detrimental to anything. It's always a good thing.
January 15, 2008 5:54:13 AM

Ya I'd be surprised to see excess cache causing slow down, especially in multitasking....
January 15, 2008 5:56:20 AM

jkflipflop98 said:
More cache is NEVER detrimental to anything. It's always a good thing.


It depends on the microcode. If the cache is doing read-aheads... and a LOT of task-switching is going on... IF the read-ahead has to complete before task switching can be done... then it COULD BE detrimental to a heavily multi-tasked system to have a larger cache.

(And it would be in a multi-tasking situation where this would show up.)

Now if the cache is coded to allow the task switching without having to wait for any read-ahead to complete... then it would be a moot point.

Of course I'm not a hardware guy... I'm basing this on software caching in databases. I wouldn't be surprised if it was different. However I wouldn't be surprised if it was the same and there is a detrimental effect of a larger cache on task switching.

Unless someone can definitively convince me otherwise... I have to go with what I know to be true: Larger caches are NOT always better. Usually... but NOT always.

BTW: I looked it up. Intel calls the read-ahead cache "prefetchers". And it appears that their "memory disambiguation" would determine how much time might be wasted during heavy task switching.
January 15, 2008 6:44:44 AM

Perhaps there were additional line items added to those "severe" NDA's that AMD had people sign?

Maybe that's why we're not seeing the types of reviews that are necessary to determine what is castrating the Starr.

I'm with most members in that it seems to be relegated to a starved architecture that needs higher bins to realize it's potential. Let's up the HT and add some more L3 dammit.
January 15, 2008 6:45:57 AM

keithlm said:
It depends on the microcode. If the cache is doing read-aheads... and a LOT of task-switching is going on... IF the read-ahead has to complete before task switching can be done... then it COULD BE detrimental to a heavily multi-tasked system to have a larger cache.

(And it would be in a multi-tasking situation where this would show up.)

Now if the cache is coded to allow the task switching without having to wait for any read-ahead to complete... then it would be a moot point.

Of course I'm not a hardware guy... I'm basing this on software caching in databases. I wouldn't be surprised if it was different. However I wouldn't be surprised if it was the same and there is a detrimental effect of a larger cache on task switching.

Unless someone can definitively convince me otherwise... I have to go with what I know to be true: Larger caches are NOT always better. Usually... but NOT always.

BTW: I looked it up. Intel calls the read-ahead cache "prefetchers". And it appears that their "memory disambiguation" would determine how much time might be wasted during heavy task switching.




Maybe Intel's prefetching is just that much more superior to AMD's solution?
January 15, 2008 6:47:51 AM

how does that relate to the post you quoted? ...eh???????
January 15, 2008 6:56:44 AM

I suppose it's an incomplete thought for people that can't read my mind ;) 


Keith was referring to how cache size can help and hurt situations depending on how well a processor "prefetch" works.

It seems common place that Intel has a much superior cache system to that of AMD. I could very well be wrong but most of my education is through fellow form members and articles I frequent.

So my presumption is that majority of AMD's problems with barcelona is that it is designed to scale with multiple cores as another poster explained.

If that is true, then majority of it's architecture would seem to be comparatively weak on the desktop scene due to it's poor cache design. But I would suppose this is where Intel derives most of it's performance lead over AMD...their cache is designed better for random execution vs. AMD's offering.

Therefore...intel's prefetching design doesn't hurt them as much when moving to larger cache sizes while it could actually hurt the starr. In fact, their prefetching design might even help them when moving to larger cache sizes.
January 15, 2008 6:58:17 AM

If I sound like an idiot I could certainly use some constructive criticism.

I'm trying to learn here so maybe you can pin point where I'm having issues with understanding these architectures.

I'm quite certain I've used many terms out of context.
January 15, 2008 7:07:06 AM

Performance per core does not scale as well as with an Athlon 64 X2 core between 2.2 and 2.8 GHz. This means that the performance gains of Phenom at future clock speeds will not be as significant as they have been with Athlon 64 X2 in the past. Let me give you some numbers to give you a better feeling: Athlon 64 X2 wins in 18 of our benchmarks, while Phenom 9000 only scales better in four categories. I would also like to emphasize that we used Asus's BIOS version 0603, which does not include a fix to the Phenom's TLB bug. Hence Phenom runs without any performance limitations.

Still, it's important to remember that my statements on the inferior scaling of Phenom relates to only a single processing core; looking at the entire processor with four cores (Phenom 9000) or three cores (Phenom 7000, expected later in Q1), Phenom does and will continue to outperform the Athlon 64 X2. Also, no one will actually run Phenom with only a single processing core, and benchmarking with only a single core also doesn't measure potential performance benefits introduced by the shared L3 cache when multiple cores access and modify the same data. said:
Performance per core does not scale as well as with an Athlon 64 X2 core between 2.2 and 2.8 GHz. This means that the performance gains of Phenom at future clock speeds will not be as significant as they have been with Athlon 64 X2 in the past. Let me give you some numbers to give you a better feeling: Athlon 64 X2 wins in 18 of our benchmarks, while Phenom 9000 only scales better in four categories. I would also like to emphasize that we used Asus's BIOS version 0603, which does not include a fix to the Phenom's TLB bug. Hence Phenom runs without any performance limitations.

Still, it's important to remember that my statements on the inferior scaling of Phenom relates to only a single processing core; looking at the entire processor with four cores (Phenom 9000) or three cores (Phenom 7000, expected later in Q1), Phenom does and will continue to outperform the Athlon 64 X2. Also, no one will actually run Phenom with only a single processing core, and benchmarking with only a single core also doesn't measure potential performance benefits introduced by the shared L3 cache when multiple cores access and modify the same data.




Is it me or does this just seem like common sense? The K8 was designed for desktop computing while the Barcelona was an "evolution" of sorts designed for scaling well with multiple cores. The author alludes to this indirectly but he doesn't state it. In the end he just makes AMD's new barcy sound like horse shiat compared to the K8...but then reminds us that Phenom has more cores so it probably fares better over all.

That just sounds kind of bland and simple. I expect a bit more critical thinking from these guys. I'm not a bright kid and I'm just now trying to understand core architectures.

Please tell me if I'm wrong but it just seems as if his deduction is lacking quite a bit. I have a harder time trying to understand why he compared two different concepts?

I thought the Starr was developed with future programming in mind...not yesterday's single threaded. Why doesn't the author point out these differences? The K8 and Starr obviously have different goals in mind. State the differences/purposes and then give us the translation into today's software environment. Right?
January 15, 2008 7:14:41 AM

sorry i just thought you went sideways a bit without showing how you got there. As a note, I would just like to say that more cache might not help in all cases, but in a desktop environment (windows & G A M E S, ...lol sorry thats all im interested in really, waiting another second for word to load on an amd machine vs. an intel machine doesnt bother me, just fps). Benchmarks might be helped by more cache more than real world performance might if phenom was tacked out with say 4mb L3 cache, but these same benchmarks I believe have quite alot of bearing on general windows and gaming performance. As a server chip? well i cant comment, although it does seem that the phenom is very scalable and does very well in a multi multi multi chip environment. Fix the tlb and all looks rosey for the phenom in the server field. I will say, and I would like someone elses thoughts on this, to sign off that the phenom seems far more adept at more complicated tasks than straightforward number crunching. That is, in say media encoding and file compression its left behind by the lowest intel quad core, but come to more complex dynamic tasks with lots of threads and it claws back alot of that lost ground. Hence the good 3dmark06 cpu test and supreme commander scores. For those that dont know, I seem to remember reading that the 3dmark06 cpu test doesnt use the cpu to render anything, rather to run the a.i. and physics of the little floating robot battle thing going on in that red valley. Which makes something like 7 threads in the first test and 5 in the second, hence the 2nd runs faster. (the 2nd test is simplified somewhat i mean).

cheers
January 15, 2008 7:25:16 AM

It seems to me that the Barcy core excels at multi-tasking with simple and routine calculations. That seems to relate to the server environment.

The only place available for an increase in performance would be through better prefetching. But even then, it might match Intel and then it would be a core v. core fight where it would lose.

I guess what I'm trying to conclude, where the author didn't, is that he should specifically state that the architecture of the starr was designged for a different purpose than that of it's K8 counterpart. State the engineering design concepts and then tell us how that relates to our needs as a consumer right now in today's world. He gave us the latter...but didn't really didn't give us any forethought as to design execution.

Instead he just said that in today's environment the Starr sucks, but just like the 90nm P4 it could be the precursor to something great.

I'm sorry...I'm just tired of poor reviews from THG and this one probably isn't as bad as I'm making it out to be.
a b à CPUs
January 15, 2008 7:38:56 AM

I really think the issue is simply not the size of the cache ... it's the overall level of sophistication of the cache / prefetch system ... where core 2 is far superior at present ... or the argument boils down to the IMC / L3 speed issue ... or a combination.

Look at the shocking latencies on Phenom once L3 cache is accessed ... mainly due to the 1.8Ghz (drop from core) frequency of the IMC / L3 system.

Even the 2.0 Ghz for the ES benchies floating around earlier showed more promise.

If AMD can increase the IMC / L3 Cache speed then single socket performance will improve markedly.

I think it is set low because they cannot bin fast enough parts.

So lowering this means more parts rescued from the last bin ... the bin that doesn't get sold.

I see the C stepping parts should address this deficiency ... it's similar to the increase in FSB speed we have seen in the past with other cpu's.

Perhaps AMD's original specs had the L3 cache and IMC running at (or close to the core speed) and the early samples did blow cloverfield away??

Then the cruel reality of an immature 65nm process resulted in a rushed part of much lower spec.

I guess we can't test much of this unless someone can access some samples and increase the IMC speed without it locking up.

The IMC is in the centre of the die too ...
January 15, 2008 7:52:49 AM

Now that actually seems to apply a bit more critical thinking than the author of the article handed over to the reader.

I can understand the premise of your argument and even go along with it as it makes much more sense. Maybe we're all stating the same thing differently but the bottom line is that the Starr arch doesn't seem to be sync'ed correctly.

Thanks for clearing that up Reynod.
January 15, 2008 8:28:03 AM

Reynod said:
I really think the issue is simply not the size of the cache ... it's the overall level of sophistication of the cache / prefetch system ... where core 2 is far superior at present ... or the argument boils down to the IMC / L3 speed issue ... or a combination.

Look at the shocking latencies on Phenom once L3 cache is accessed ... mainly due to the 1.8Ghz (drop from core) frequency of the IMC / L3 system.

Even the 2.0 Ghz for the ES benchies floating around earlier showed more promise.

If AMD can increase the IMC / L3 Cache speed then single socket performance will improve markedly.

I think it is set low because they cannot bin fast enough parts.

So lowering this means more parts rescued from the last bin ... the bin that doesn't get sold.

I see the C stepping parts should address this deficiency ... it's similar to the increase in FSB speed we have seen in the past with other cpu's.

Perhaps AMD's original specs had the L3 cache and IMC running at (or close to the core speed) and the early samples did blow cloverfield away??

Then the cruel reality of an immature 65nm process resulted in a rushed part of much lower spec.

I guess we can't test much of this unless someone can access some samples and increase the IMC speed without it locking up.

The IMC is in the centre of the die too ...


I think your pretty close with that. Like I said who cares about losing 50mhz on your ram speed because of an odd multiplier. I doubt AMD ever originally planned to cripple there own chips.

I also have another theory. Which actually might be more realistic. IMC was always meant to have its own clock generator. It would be odd to have added it afterwards. It probably seemed like a great idea to seperate IMC. As well as changing the core clocks they could have adjusted the IMC for a huge range of products. The low-high end being 1.8-3.4ghz ish. Come manufacturing they just couldn't get the speeds they wanted so it got slower and slower. Which means there is a problem with the design of the IMC clock gen. They have probably now discovered that it would have been easier to implement the IMC at core speed. We already know the IMC can function at high speeds with the A64's. I think they were trying to be too clever for there own good.
January 15, 2008 8:47:25 AM

"I think they were trying to be too clever for there own good."

Agreed, a little too much too soon?

thought the r600 smacked of that as well. I liked it end the end though, have a 2900pro myself.
January 15, 2008 9:45:05 AM

Engineering wise R600 was radically different and innovative. "I think they were trying to be too clever for there own good." and shot themselves in the foot with it just like with Phenom. I always liked R600. I find it funny how so many people say the 2900 series is crap yet they rave about the 3800 series being so much better.

Spoonboy don't know if you saw it, I did reply to you earlier in the post.
January 15, 2008 10:01:40 AM

jakemo136 said:
According to reviews, an Athlon 64 in an AM2+ mobo won't be able to take advantage of faster ram speeds without an overclock. I just bought a AM2+ mobo and an x2 5000+ BE, and ram defaults to 800MHz. Haven't yet tried overclocking, but I want to see if I can acheive ddr2 1066 speeds somehow... May be time for a new thread but anyone know how I would go about altering memory speeds with an unlocked multiplier?


Assuming the memory dividers work the same way for AM2+ as they are for AM2 and assuming you're stuck with the DDR2 800 dividers (as having 1066 dividers would make it oh so easy) you'll need the following settings to get 1066 memory running at full speed without your CPU exploding.

CPU Base Frequency = 266/267 (266 if your board is always slightly on the high side of selected speed, 267 if on the low side)
CPU Multiplier = 12
HT multiplier = 4x should work ... if you're very lucky, 5x

This gives a CPU speed of 3192-3204 MHz, with the RAM at 1064-1068.

You'll need a decent CPU cooler (half decent would probably do) and you'll need to be lucky with the motherboard.
a b à CPUs
January 15, 2008 10:03:17 AM

I posted something clever then ??

Won't happen again ... I promise.

We need someone like MU to look at the IMC / L3 cache issue.

January 15, 2008 10:12:04 AM

Oh, and for clarification here is how AMD's memory dividers can be worked out (If anyone actually wants to know)

M = Memory Divider
C = CPU Divider

DDR2 533: M=(C/2)+4 (Rounded UP to nearest whole number)

DDR2 667: M=(C/2)+2 (Rounded UP to nearest whole number)

DDR2 800: M=(C/2) (Rounded UP to nearest whole number)

DDR2 1066: M=(C/2)-2 (Rounded UP to nearest whole number)

I worked these out a couple weeks ago when I was without an interweb connection... It's quite easy to make a table in excel which will show you final CPU + Memory speeds for different multi's & base frequencies.

When inputting the formula for working out the memory divider in excel, make sure you use the =roundup(((C/2)+2),0) the 0 after the comma denotes # of decimal places to do it to.
January 15, 2008 1:34:49 PM

shabodah said:
This is the point we are trying to make. If going from 1.8ghz to 2.0ghz on the IMC makes that big of a difference, the fact that Athlon x2 is running up to 3.2ghz on the IMC versus the Phenom's 2.0ghz on the ES is A HUGE ISSUE. IMHO, having zero L3 cache and running the IMC at core frequency would net the same or better scaling of Phenom than A64x2.

This difference in speed is key here. Let's face it - the author claims to have identified the scaling limits of Phenom. He hasn't because he has failed to take this factor into account. All the current benchmarks prove is that some scalability exists even when you hold the IMC speed constant. Given that limitation I think the scalability shown in the benchmarks is pretty d*mn good.

Question for the original author: would it be possible to re-run the benchmarks in the following speed ranges: 1.2, 1.4, 1.6, 1.8 and 2.0 GHz and adjust the speed of the IMC to match the processor speed at each step? If so, this would give us a true sense of the *system's* scalability.
January 15, 2008 5:11:56 PM

*good game* rhat.
January 15, 2008 7:45:14 PM

You guys are absolutely right about the IMC frequency, but remember that it also has limitations - the faster IMC runs (stable!!!), greater the chance that you might saturate the memory bandwidth. If you come to this situation, the bottleneck becomes hardware design...

For me, the author obviously messed up entirely.

For Phenom, I think that proper software design (aware of the characteristics of NUMA) and corrected cache latency (B3) will help more than plp are saying. Larger caches from a die shrink (45nm?) or re-design will certainly have less influence - no matter the cache size if the core logic gives high latency cache access or has a faulty prefetch algorithm (which induces elevated cache misses).

I think that with proper software design, Phenom B3 will give some 20% increase in performance (at least, to the code I am working on).
!