Last message on previous page: You guys are absolutely right about the IMC frequency, but remember that it also has limitations - the faster IMC runs (stable!!!), greater the chance that you might saturate the memory bandwidth. If you come to this situation, the bottleneck becomes hardware design...
For me, the author obviously messed up entirely.
For Phenom, I think that proper software design (aware of the characteristics of NUMA) and corrected cache latency (B3) will help more than plp are saying. Larger caches from a die shrink (45nm?) or re-design will certainly have less influence - no matter the cache size if the core logic gives high latency cache access or has a faulty prefetch algorithm (which induces elevated cache misses).
I think that with proper software design, Phenom B3 will give some 20% increase in performance (at least, to the code I am working on).
Nice find thats exactly what we have been talking about. The biggest improvments to Phenom would come if the IMC clock is increased and the L3 cache removed and replaced with larger L2.
I'm convinced now that Phenoms performance hinderances are actually due to memory controller speed.
From what I can tell after installing the Phenom, HTT speed is tied to the IMC's speed. In this case for retail phenoms its tied at 2x the mem controller clock. 1.8Ghz Mem controller for 3.6Ghz Hyper transport speed. This is according to the bios on my K9a2 platinum. And also displays this way in AMD Overdrive utility.
Since installing the 9600 BE There are new options that have appeared in the Cell menu in the bios. About 4 for the cpu core, and 4 or 5 for the Northbridge/IMC. Once I have the time I'll post what they are, cause I'm gonna need help interpreting them. So I'll likely not try to OC until I get those figured out. Tried to mess around with AOD but it didn't work out properly, has some kind of issues setting the voltages. Though that may be due to the fact I originally installed with the x2 4200+. I also can't for the life of me find the option that allows you to disable the TLB patch, though not sure if that comes default on the 1.1 bios on my board.
I'm also coming to believe that the TLB erratum is probably what is causing them to limit the IMC/NB speed to 1.8 instead of full core speed. I believe if the IMC were running at core speed, the phenom may be able to give a core2 quad a run for its money.
---------------
AMD Phenom X4 9850 Black Edition, ZeroTherm Nirvana 120 Premium CPU Cooler, MSI K9a2 Platinum bios 1.1b3 or P.0J, 4GB (2x2) Mushkin DDR2 1066 (pc8500) 5-5-5-15 2.05v RAM, Sapphire Toxic HD3870, Raidmax RX-700SS PSU, Seagate Barracuda 7200.10 320gb SATA2 X
I read the article and actually used my windows calculator to calculate theoretical results between 2.2 and 2.8 ghz cores. Which should be 1:0.785
Looking at the figures then makes me wonder if its not just the adulthood of the athlon core and the chipsets and bioses, drivers, compilers which causes a more steady scaling. If every time there is a small loss compared to theoretical scaling this is evidence of soms lack of horse power being applied to the application this could badly tuned drivers, bios and chipset. Considering this, i found that also many tests score better than the theoretical scaling which then makes me consider if the extra horse power proves windows is an inefficient os to do such tight tests. Since extra services and daemons running in background take a consideral amount of time during their routines. The memory controller would only bump up these figures on higher clock speed for the Phenom because it has a higher bandwith. And 1:1 performance scaling works as long as the bottleneck is the pure processing power. And in most these synthetic tests the phenom reaches a tad more or less then 1:0.785. Which i consider to be as expected. And the small difference between the athlon and phenom core i consider to be adulthood as ofcourse the phenom will produce better results as it is clocked higher and brought under circumstances when lets say HT2 vs HT3 will become relevant, obviously. And ofcourse in multicore environments the phenoms better architecture will whipe athlons butt. But since this article is a 1 core adventure my guesses are the small performance deficiencies indicate adulthood.
Luckilly i have all the time in the world to see Phenom peak out and sadly AMD has not. So lets hope B3 at 3+ghz and better support from motherboard manufacturers will bump AMD next to intel, so we get an equal "price" scaling across the board. As i hate intel for selling their qx6400 for $200 and their extreme for $2000 and its only 500mhz faster. I get more mhz in a couple minutes of clocking and tweaking.
I read some of the discussion here about intel or amd fanboys, i use both and i am indifferent to platform. They all perform too marginal for my needs. I don't have batch tasks to give to my workstations all day. I like my foreground application(s) to run smoothely and my background tasks to finish at the end of the day not hindering my foreground apps. Both intel as amd deliver processors which can fulfill some of my demands and there is no bias in this. If it comes down to shear pleasure of a game then cpu is less important and videocard is everything. This is my penny for a thought. I never seen a quadcore beat a highly clocked dualcore in any game when good videocards are used. So if this article is about processors then it might be about people who are going to use the processor power and i agree with caveira that Toms has to be impartial. If you compress Divx all day long you'll notice that intel is faster and if you do number crunching you'll get more out of AMD without a doubt. If people at toms think all you buy a processor for is to play crysis then i admit, get a core2 and push it to 3.4-3.7 ghz. If you want to be able to compress video while downloading and decompressing very large files, whilst playing a game and not noticing any degree of stuttering or performance loss i'd go for amd, because it has huge memory throughput where intel won't let you rape its fsb like this without stuttering. But this article is not about just 1 app but about a platform.
Message edited by chris1979 on 01-16-2008 at 07:19:25 AM
I notice the Windows Performance Index is used.
Is the Windows Performance Index a REAL benchmark?
Not just looking up model numbers?
So a high Graphics Index will always mean a high 3DMark06 or game FPS (CPU & RAM independent) ?
If the WPI is a real benchmark, then I can take it more seriously.
Even though this should be updated so a 5.9 should mean good performance in Crysis for example. Will there be a 6 and 7 index soon so current high-end games can better display what's needed for min-requirements, recommended requirements, and max requirements.
Also for DIVX and 3D studio can use this.
Message edited by enewmen on 01-16-2008 at 07:21:53 PM
Nice find thats exactly what we have been talking about. The biggest improvments to Phenom would come if the IMC clock is increased and the L3 cache removed and replaced with larger L2.
Actually, there is no need to remove the L3 cache, nor to make L2 larger. The need is to reduce L3 latency (which is exactly the temporary "correction" given by the TLB updates of B2 stepping).
enewmen, if you liked it, ok, it is your opinion. But I trully missed the point that could make me conclude the same as you...
Well, it has been quite a long time I read THG guides, and I really am sorry to admit that the quality of the reviews are dropping.
I have good knowledge of cpu architectures (I work with and develop for HPC for 8 years), and I cannot agree with the conclusions of your last reviews of the Phenom processor. I thought you guys were "impartial", but taking all Phenom's reviews only lead to a tendency in favor of the current Intel offering.
First of all, in any of your articles is emphasised that each implementation of the x86 made by Intel or AMD has its own strengths and weakness, and that, despite these being multi-purpose cpus, each is best at doing certain tasks. All you say is that Intel's implementation is "unbeatable" today.
This said, I ask you: did you guys know that AMD's implementation is quite better than Intel's for a cluster in which scalability is desired? That an integrated memory controler can make better use of the available memory bandwidth?
So, if you guys think you know hardware and cpu architecture, didn't you notice anything strange with the synthetic benchmark for the Phenom with Sandra?? How can a Phenom have slower memory performance than every Intel quad in this sort of benchmark? Didn't you feel uncornfortable at the time of writing that, or it was, in fact, lack of knowledge?
I think you should change your conclusions to something like: "we test a variety of simple tasks of a home user who might be interested in these cpus, and the most suitable or best performing for these is..." without quoting a dubious title of "unbeatable", "champion" or "king". Any cpu you point as that can be beaten by some other brand in some task you guys might not even imagine. And let's not forget that a certain cpu cannot be the best for someone who cannot buy it.
Note that by doing so, you can't complain that so many forum members waste their times uselessly "fighting" one another because Intel or AMD "is the best".
I expect more professional and impartial articles from THG, or if it is not the case, please, make it clear that you are not trying to be impartial. I really like your previous material, the interactive charts are great, site layout, news, etc.. So, the quality of the texts must be as great, not inferior.
Interesting points. By the way, how many people have these Phenom clusters in their homes?
I acknowledge the AMD arch scales very well ... the chip was obviously respun with the server market clearly in mind.
No arguments from me.
We need people like you to post here ... yes the Intel fanboy's are a bit trollish ... but a loveable bunch of rogues nevertheless.
Core2 is an excellent choice for the enthusiast ... because of it's headroom ... AMD's is currently poor.
Most who post are interested in single socket systems I'd guess.
Not many of us have 19" racks and SuperMicro's !!
Cheers and all of the best to you !!
I think the author did a great job!! He set out to compare PHENOM to ATHLON and did just that. He found tested to see what asingle coreof each could do - and how well it scales from single to dual and then to Quad. Job well done.
What does Opteron (Barceona) or Xeon have to do with this article? NOTHING!
I actually wrote this bearing in my mind that it is a desktop cpu review. What I pointed is:
1) You can't go writing that a Phenom is "that much inferior" if you can't even benchmark it well.
2) A "generic" benchmark simply doesn't exist. If you do such a test, there is always a reason behind every result, being it good or not. THG is only showing results, without the "rationale" - it actually seems they publish results without even thinking of what it really means.
3) I only pointed the memory throughput case (which performance is more often observed in server environments) to show that this feature of the cpu is sometimes important. And honestly, it would be a poor excuse to say "oh well, we made the wrong test and showed wrong numbers, but it's ok because a home (desktop) user would never know anyway".
4) The most important: THG must point "best for this app", or "slower for that game", instead of giving its readers a generic good/bad classification - not forgetting the cost. The best system for a task is the one that performs best for the price you want to (or can) pay. (when the boss here tells me what he wants, my first question is how much is he willing to pay - not asking this can give serious issues some days latter... )
I might not have given the best example, for it's from a different perspective, but THG's reviews are really losing the quality I was used to see here. It was not the first time I saw THG go wrong, but the benchmark mistake was detected by Sisoft a month ago, and THG never even made a 1-paragraph note.
This can lead some people who blindly believe in THG benchmark results to buy a system or a brand that might not be exactly the best deal.
Thanks guys, and the best to you too.
An Intel Q6600 will murder Phenom 9000 chips in 90-95% of desktop applications. AND it's 65nm - imagine what those 45nm will do to Phenom!
Nice find thats exactly what we have been talking about. The biggest improvments to Phenom would come if the IMC clock is increased and the L3 cache removed and replaced with larger L2.
If AMD removes the L3 they will need to share the L2 between cores.
Whether or not the size needs to be increased would depend on how well it runs.
After a cache reaches a certain size if you still see speed improvements by going even larger... then you need to redesign your cache code... because something is not working properly. I would much prefer a small cache with efficient code... to a really big cache that is not as efficient.
Message edited by keithlm on 01-16-2008 at 08:20:40 PM
Not "upper limit of efficient cache". Larger caches are good, but useless with low memory bandwidth, which can become the main bottleneck.
As an example, CELL BE (imo the best processor around, and a "lobotomic" version is used in PS3's) uses only 256kb of (some sort of) cache-like memory to every of its cores and is something like 5.5 or 6 times faster than xeon 54xx (45nm) with 12mb cache.
Again, it is all a matter of chip design - Intel's current x86 implementation has a very poor memory bandwidth compared to the core processing units.
Just to add --> intel uses 6mb of cache per pair of cores (it is wrong to count them together), and it is a "brute force" solution, in a sense that a larger program can be entirely allocated in a fast access memory. If this is a serious performance issue, one could re-design the chip (what AMD did when introduced Barcelona) instead of using more cache (what intel is making with C2).
Message edited by caveira2099 on 01-16-2008 at 08:38:04 PM
soo would you say that the 6mb on intel 45nm core 2 duos is approaching the upper limit of an efficient cache? just interested.
I'd say that 512k-1Mb per core should have been approaching the limit of an efficient cache. If it didn't... then there is something wrong with how the cache was coded and it needs to be fixed. Otherwise they are using the cache as something other than the purposes of a cache.
Bad example... but something to consider:why do many performance hard disks have 16Mb of cache and not 512Mb or 1Gb? Perhaps because the added memory didn't speed it up because they have efficient code and don't need more memory. (I'm sure they tried more memory and ran into diminishing returns.)
Again, it is all a matter of chip design - Intel's current x86 implementation has a very poor memory bandwidth compared to the core processing units.
Gee... do you mean when Intel changes to a monolithic core with onboard memory controller... adding more cache won't really help them as much as it seems to now?
I'd say that 512k-1Mb per core should have been approaching the limit of an efficient cache. If it didn't... then there is something wrong with how the cache was coded and it needs to be fixed. Otherwise they are using the cache as something other than the purposes of a cache.
Bad example... but something to consider:why do many performance hard disks have 16Mb of cache and not 512Mb or 1Gb? Perhaps because the added memory didn't speed it up because they have efficient code and don't need more memory. (I'm sure they tried more memory and ran into diminishing returns.)
Not that bad example: caches are used in a way to hide latencies and to improve error-free instructions, and hd's now equip cache for this reason.
Quote :
Gee... do you mean when Intel changes to a monolithic core with onboard memory controller... adding more cache won't really help them as much as it seems to now?
I have already given an answer to this:
Quote :
--> intel uses 6mb of cache per pair of cores (it is wrong to count them together), and it is a "brute force" solution, in a sense that a larger program can be entirely allocated in a fast access memory. If this is a serious performance issue, one could re-design the chip (what AMD did when introduced Barcelona) instead of using more cache (what intel is making with C2).
Intel with on-CHIP IMC is a chip re-design strategy, not simply get a known implementation and add more memory.
Message edited by caveira2099 on 01-16-2008 at 08:46:30 PM