AMD and Intel benchmarketing exposed

Part 1 of 3: The changing nature of the game

By Jack Russell: Tuesday 27 May 2003, 12:43

THERE'S BEEN A LOT of brouhaha lately over benchmarking what with the uproar between NVIDIA and FutureMark and the inevitable mud-slinging that's been going around. But almost lost within that confusion was a grumble of discontent that threaded its way through various reviews when AMD launched the AthlonXP 3200+ earlier this month.
AMD and Intel have gone round and round over the fairness of various benchmarks since Sunnyvale first began offering a product that could compete with Santa Clara, but things are only getting worse on the P4 vs. AthlonXP front. When the Pentium 4 was introduced the Athlon pounded it all over the ring, winning almost every benchmark available and making a mockery of Intel's "best available performance" whimpers.

This situation didn't change much for the entire first year of the Pentium 4's availability. The P4 on its Willamette core didn't ramp very quickly and market demand for i850 was quite low, with Intel's SDRAM-based version of i845 doing nothing to impress the performance/enthusiast-oriented market segments. The P4 did briefly grab the performance crown when it launched at 2 GHz in late-August (albeit at a much higher price) but by October AMD's 1800+ had yanked the title away again.

The next eight months saw the AthlonXP falling farther and farther behind as the P4's speed rocketed upwards, and its only been recently that AMD was able to first approach and then strongly compete with Intel's highest-end P4's. Today's AthlonXP 3200+ is competitive with the P4 3 GHz (though not to the same degree that the 1900+ was over the P4 2 GHz) but the nature of the competition is very different. Then there were few benchmarks (save for the ubiquitous Quake 3) that the P4 did win, now both CPU's seem to be entrenched in their own benchmark camps, each possessing its hoard of guaranteed wins, with an overlap in the middle consisting mainly of games.

It's not that either CPU lacks for benchmark wins or that one CPU only wins in office performance while another wins in gaming, but that each CPU has tests in every category that it can take. The AthlonXP performs much more competitively against the P4 when encoding DVD's with Xmpeg, but tends to lose dramatically in Flask, while the P4 renders POV-Ray's "Glasschess" much faster than the AthlonXP, but loses by a wide margin when rendering "Chess2". Although it'd be easy to conclude this is due to Intel's consistent improvement of the Pentium 4, an empirical examination of the evidence proves this is not the case. While Intel has, indeed, improved the P4 dramatically since its launch, AMD has also dramatically improved the Athlon. In fact, the two CPUs are currently running at nearly the same ratios they held when AMD first launched the AthlonXP. AMD has operated at 50% of the P4's bus for most of its life, so the current 400 MHz / 800 MHz gap, despite looking more significant than the 266 / 533 MHz gap, isn't. Both CPUs now sport 512K of cache and both are comparable thermally, so where's the Intel gain coming from?

It's not explained entirely by platform advancement either. While Intel's 875 Springdale chipset is leagues above the original 845, nForce2 is quite a leap from KT266A (which was only beginning to become widely available when P4 launched) itself. Although P4 enhancements and platform improvements do account for some of the widening performance differential between AthlonXP and P4, the change hasn't been nearly as dramatic as the overall shift in benchmark performance.

The real reason for the widening performance delta between AMD and Intel lies not within the hardware changes introduced by each company but in the changes to the benchmark software used to measure relative performance between the two products. Fire up Sysmark 2000 and the AthlonXP will destroy the P4 in both Office Performance and Content Creation, switch to Sysmark 2001 and watch the AthlonXP maintain a healthy lead in OP while yielding CC to the P4, switch to 2002 and the P4's lead in Content Creation will widen dramatically and it'll nearly catch the AthlonXP in Office Performance. (It should be noted at this point that AMD officially does not use Sysmark 2002 due to what they feel are dramatic inconsistencies in that benchmark's scoring methodology. Rather than simply complaining about the situation, AMD joined BAPCo over it, so it'll be interesting to see what effect (if any) this has on Sysmark 2003.

This trend isn't limited to BAPCo products. In Content Creation 2001 the AthlonXP walks all over the P4, scores nearly identically to it in Content Creation 2002, and is creamed by Santa Clara in CC2003. Similarly, Business Winstone 2001 will hand an easy win to AthlonXP; Business Winstone 2002 will tie the AthlonXP 3000+ against the P4 3 GHz.

(Please note that all of the above assumes comparable CPUs and platforms).

Games, it should be noted, are exempt from the above trend and are one area where the P4 and the AthlonXP maintain near-identical performance parity with only a few exceptions. Gaming, interestingly enough, was once an area where Intel technology most clearly demonstrated its superiority over 3rd-party rivals—now its one area where the two companies are neck-and-neck.

So, why do modern benchmarks increasingly favor the Pentium 4? The cynical answer would be that Intel pays/influences companies to build benchmarks that favor the Pentium 4 over the AthlonXP, and to some degree this is undoubtedly true; Intel having shown no tendency in the past to stay up late worrying about the moral implications of playing dirty. Biased benchmarking has a part to play in this puzzle, but unless you believe every competent reviewer is "on the take" and every benchmark author is an Intel-loving slimeball, it's not going to put all the pieces together.

There's a much simpler explanation for this performance shift that doesn't require a vast Intel conspiracy to validate itself. Its not that Intel has cheated—it's that companies have chosen to optimize their products for the CPU they'll mostly be running on. Software companies want to sell their product to the most number of buyers and do so by advertising the greatest number of features / benefits they possibly can. If slapping an "optimized for the Pentium 4" sticker on a box can move a few more thousand copies, you can bet they'll do so. Although there are many reports (anecdotal and otherwise) of Intel leaning on companies to ensure the Pentium 4 is the preferred CPU of that company's product, for every case where Santa Clara has flexed its muscle, there's likely been a hundred where it never had to do so.

Over the last eighteen months, SSE/2 has evolved from an SIMD instruction set whose benefits were strictly theoretical into a widely-used invention that's boosting Pentium 4 performance above that of its AMD counterpart. Back in the days of the P3 when the two CPU's were far more similar, this was much less of an issue, but the fact that the P4's architecture is so different than the AthlonXP's is precisely what's fueling Intel's rising success.

Its not that AMD's CPU is optimized for "old" software nearly so much as it is that Intel, as the 800 lb gorilla on the block, knew that the software industry would inevitably shift in whatever direction it chose to go. By building a processor that emphasized different features than its competitor and used an SIMD set the other didn't have, Intel assured long-term that its flagship CPU would out-pace the other as the very nature of software changed around it. This is good news for Intel, but it doesn't put AMD in a very good position at all. The playing field, however, isn't entirely tilted towards Chipzilla—in Part II of this article we'll be examining how a few rules do give Sunnyvale a fighting chance in the CPU market. µ

This ends Part the First, in which we've discussed the reasons why relative performance between the AthlonXP and Pentium 4 has shifted so much. Part II will examine how performance between the two processors is likely to shift once Athlon 64 and Nocona are introduced.
  1. it is an interesting read.
  2. Very nicely written pice of text!

    But most of these discussions arises from the moral i.e. "fairness" viewpoint, although one could argue that there is no such thing in irl.

    When talking about tests, i have a few times argued against the idea of a "fair" test done with some sort of "unoptimized" software since this wouldn't reflect the irl conditions. But, is this really such a bad idea? It would be interesting with a pice of software which is able to run in both Intel and AMD optimized mode. But then again, this would only be of academic intereset "se what our CPU is capable of, now if only someone would make optimized software fore it".

    So I guess that the best idea, atleast for non company people, would be to see which CPU/platform is best for their needs. If possible to test the actuall software they are going to use with the different options on the market. Once again this is very difficult to do due to a number of reasons and hence the testsites like THG. Maybe some sort of an interactive test site where you can ASK them to test "this with that"?

    Anyway nice articel and I'm looking forward to the next part!


