Sign in with
Sign up | Sign in

Editor's Corner: Getting Benchmarks Right

Editor's Corner: Getting Benchmarks Right
By

In Monday's first look at AMD’s Socket AM3 interface, we observed some interesting gaming results on Intel’s Core i7 920 versus the new Phenom IIs (and subsequently got called out on them). This, of course, after overclocking both micro-architectures in a previous story and comparing their respective performances.

Of course, in that most recent piece, we used AMD Radeon HD 4870 X2 graphics cards, and in this one, we employed a pair of GeForce GTX 280s. It turns out that graphics makes all of the difference in gaming--who would have guessed?

Moreover, there were other sites that published their own evaluations of the AM3 platform, yielding a second round of comparisons.

Curious as to why, exactly, we were seeing different results from some of the other publications out there and in response to requests for more data from our readers, I dedicated the past two days to hypothesizing possible causes and re-running our gaming tests, using Far Cry 2 as my indicator of choice. A sincere thank you to the folks who posited helpful information  and constructive suggestions in comparing data.  I tried to replicate as many of the other test scenarios from Monday's round of reviews as possible here.

Without further ado let’s get into some troubleshooting, benchmarking, and hypothesizing.

UPDATE: At the suggestion of several readers, I've re-run some of these tests (still in Far Cry 2) with Hyper-Threading disabled and then again with the Threading Optimizations in Nvidia's driver disabled completely. The results are as follows:

Far Cry 2
1920x1200, no AA
2560x1600, no AA
Core i7 920 @ 2.66 GHz (Hyper-Threading Enabled, 8 Threads)
53.23
41.83
Core i7 920 @ 2.66 GHz (Hyper-Threading Disabled, 4 Threads)56.24
44.09
Core i7 920 @ 2.66 GHz (Hyper-Threading Disabled, Thread Optimization Disabled)56.12
44.11


Though there is some difference here, just like in the case of the fresh installation of Windows, our performance questions remain unanswered for the most part. Hopefully, these serve to cross one more possibility off of the list of explanations as to why the GeForce GTX 280 is under-performing when paired to a Core i7 processor. Now, the editorial as it originally appeared:

Possibility #1: Benchmarking with power-saving features enabled was causing performance problems with Core i7.

Far Cry 2
1920x1200, no AA
2560x1600, no AA
All Power-Saving Features Enabled (Numbers From The Launch)
53.23
41.83
All Power-Saving Features Disabled (New Results)
56.51
44.3


After disabling EIST (and hence turning off Turbo mode), C1E, and the thermal monitoring function that could throttle the processor if it broke past its pre-programmed 100A/130W limits, it was clear that this combination of features affected performance, but just slightly. Certainly, they weren't to blame.

Possibility #2: At DDR3-1066 (the max ratio of Intel’s engineering samples), we were starving our platform for memory bandwidth.

Far Cry2
1920x1200, no AA
2560x1600, no AA
i7 920 @ 2.66 GHz/DDR3-1066 and GeForce GTX 280 (Numbers From The Launch)
53.23
41.83
i7 920 @ 3.8 GHz/DDR3-1523 and GeForce GTX 280 (New Results)
60.33
46.43


Rather than simply clocking the memory bus up to 1,600 MHz by upping the Bclk, we cranked the reference setting up to 190 MHz, yielding a 3.8 GHz clock speed and 1,523 MHz memory bus. The boost helped a tad at 1920x1200 and a little less at 2560x1600, but in no way made up the difference between a Phenom II at its stock settings.  No go there, either, which is incredibly strange. Graphics bottleneck, anyone?

Possibility #3: The driver install was bad, and the GeForce GTX 280 was actually at fault.

Far Cry 2
1920x1200, no AA
2560x1600, no AA
Original Installation (Numbers From The Launch)
53.23
41.83
Fresh Operating System/Driver Installation (New Results)
56.12
44.02


Starting clean, with a fresh copy of Vista x64 and a fresh driver installation, we started over using Nvidia’s GeForce 181.22 build—the latest available, and the one we used in our story earlier in the week. The i7 setup scored a bit higher, but still failed to pass any of the Phenom II X4s, which would have been the result we were looking for to show Core i7 in the lead here. Still, no go.

Possibility #4: Our results are only representative of gaming on an Nvidia card, and those using AMD Radeon boards will see something different.

Far Cry 2
1920x1200, no AA
2560x1600, no AA
i7 920 @ 2.66 GHz and GeForce GTX 280 (Numbers From The Launch)
53.23
41.83
i7 920 @ 3.8 GHz and Radeon HD 4870 X2 (New Results)
105.08
79.46


No kidding, right? Of course gaming on a Radeon is going to give you a different result—especially in a game. But we didn’t expect variance to this extreme. Suddenly, we’re on to something. The i7 920’s results shoot up to levels that truly trounce the Phenom II X4 machine armed with the GeForce GTX 280 (as it should, given the faster processor and graphics card). We'll dive into a lot more depth on this point on the next page.

Possibility #5: Ok, turn down the CPU clock, silly. You’re still running at 3.8 GHz.

Far Cry 2
1920x1200, no AA
2560x1600, no AA
i7 920 @ 2.66 GHz and GeForce GTX 280 (Numbers From The Launch)53.23
41.83
i7 920 @ 2.66 GHz and Radeon HD 4870 X2 (New Results)85.87
74.85


The results scale lower, most notably at 1920x1200, where we’d expect a CPU to have a more profound impact on gaming performance (versus 2560x1600, at least). But the 4870 X2 still leads the X4s and X3s by a commanding margin, even with the 920 running at its stock 2.66 GHz. And look how much more scaling there is coming from the overclocked configuration above.

Possibility #6: Web site XYZ used a different driver version. Perhaps something happened between that version and the most recent update you used.

Far Cry 2
1920x1200, no AA
2560x1600, no AA
GeForce 181.22, Jan. 22, 2009  (Numbers From The Launch)53.23
41.83
GeForce 180.43 Beta, Oct. 24, 2008  (New Results)50.81
39.68


Stepping all the way back to October of last year, we installed Nvidia’s GeForce 180.43 package to test the difference between then and now. And, if anything, the GeForce GTX 280 only picks up performance given the more recent driver update. That’s not the problem, either.

Possibility #7: Your card is hosed.

Far Cry 2
1920x1200, no AA
2560x1600, no AA
GeForce GTX 280 1 GB (Numbers From The Launch)53.23
41.83
GeForce GTX 280 1 GB Replacement (New Results)50.73
40.72


Fair enough, we have more than enough cards here to at least try swapping that out. We didn’t suspect the processor, memory, or motherboard of being defective—after all, the Core i7 920 at 2.66 GHz served up compelling results in all of our audio/video encoding apps. It was only the gaming scores that looked funny.

But that isn’t the issue either. A new card ranged from the same to slightly worse in Far Cry 2—certainly within a margin of error, to be sure.

Possibility #8: Maybe you guys use different settings that make more of an impact on the i7 920’s performance.

Far Cry 2
1920x1200, no AA
2560x1600, no AA
i7 920 @ 2.66 GHz, Ultra High Settings (Numbers From The Launch)53.23
41.83
i7 920 @ 2.66 GHz, High Settings (New Results)64.74
48.77


Now it feels like we’re reaching for straws. Nevertheless, we were willing to try the same Far Cry 2 batch using High settings instead of the Ultra High DirectX 10 configuration used to test initially. And again, there’s a significant increase, but it isn’t so substantial that Intel is able to usurp the fastest X4 under the load of Very High settings.

Ask a Category Expert

Create a new thread in the Reviews comments forum about this subject

Example: Notebook, Android, SSD hard drive

Display all 96 comments.
This thread is closed for comments
Top Comments
  • 26 Hide
    dattimr , February 11, 2009 5:05 AM
    Nice one. Tom's is getting its act together again. Keep it up, guys.
  • 14 Hide
    Hamsterabed , February 11, 2009 5:51 AM
    How very odd, when i saw the benches i immediately thought there was a problem. Glad you guys made an article to explain and backup you numbers and i hope we get some answers. don't have another driver fail Nvidia...
Other Comments
  • 26 Hide
    dattimr , February 11, 2009 5:05 AM
    Nice one. Tom's is getting its act together again. Keep it up, guys.
  • 14 Hide
    Hamsterabed , February 11, 2009 5:51 AM
    How very odd, when i saw the benches i immediately thought there was a problem. Glad you guys made an article to explain and backup you numbers and i hope we get some answers. don't have another driver fail Nvidia...
  • 6 Hide
    Tindytim , February 11, 2009 6:53 AM
    Wow...

    Just wow.

    Right when I considering leaving this site forever for it's over Mac loving, Tom flashes me a glimmer of hope.
  • 4 Hide
    rdawise , February 11, 2009 7:04 AM
    Thank you Chris for this follow-up article..now where is kknd to argue....

    I am sorry but we all know that at lower resolutions the Core i7 will beat the P2, but as the article states, but real world the PII is hitting the high notes. Could this be a driver screw up from Nvidia...probably since you're elimnating everything else. Are there any other x-factors out there...oh yes plenty more. However I think people will get the wrong impression if they read this and think the PII is "more powerful" than the Core i7. Some one who reads this should come away thinking that the PII will give you almost as great gaming as some of the Core i7s can for less money. (Time for a price cut intel).

    I do a question what if you tried using memory with different timings. I believe 8-8-8-24 was used last test, but how about 7-7-7-20? Just trying to help think of reasons. Either way it gives us something to look forward to in the CPU world. Good follow-up.
  • -4 Hide
    rdawise , February 11, 2009 7:04 AM
    Thank you Chris for this follow-up article..now where is kknd to argue....

    I am sorry but we all know that at lower resolutions the Core i7 will beat the P2, but as the article states, but real world the PII is hitting the high notes. Could this be a driver screw up from Nvidia...probably since you're elimnating everything else. Are there any other x-factors out there...oh yes plenty more. However I think people will get the wrong impression if they read this and think the PII is "more powerful" than the Core i7. Some one who reads this should come away thinking that the PII will give you almost as great gaming as some of the Core i7s can for less money. (Time for a price cut intel).

    I do a question what if you tried using memory with different timings. I believe 8-8-8-24 was used last test, but how about 7-7-7-20? Just trying to help think of reasons. Either way it gives us something to look forward to in the CPU world. Good follow-up.
  • 0 Hide
    sohei , February 11, 2009 7:40 AM
    "I believe 8-8-8-24 was used last test, but how about 7-7-7-20? Just trying to help think of reasons"

    wow 7-7-7-20? this is the performance...indeed
    P2 works with ddr2 great and you wary about timings
  • 2 Hide
    Anonymous , February 11, 2009 7:42 AM
    great article!!
    just a thought: what about previous generation of nvidia cards? could be this is a GTX 260/285/280/... problem. maybe you could try with one of 9xxx series.
  • 8 Hide
    StupidRabbit , February 11, 2009 8:10 AM
    awesome article.. only two pages long but it changes the way i look at the previous benchmarks. good to see you focus not only on the hardware itself but also on the benchmarks with a real sense of objectivity.. its what makes this site great.
  • 1 Hide
    cobra420 , February 11, 2009 9:11 AM
    so it looks like a gpu issue . why not try a gtx 295 ? or is that why you set the video so low ? now you found the issue theirs no need to try a different card ? ati sure did a good job on there 4870 series . nice job toms
  • 0 Hide
    Anonymous , February 11, 2009 11:27 AM
    Maybe Farcry optimized it more on ATI, maybe Intel is throwing sticks at the wheels of nVidia at the hardware level, maybe, maybe ... :S
    Why is Intel supporting multi-ATI config, but not multi-nVidia? Why doesn't Intel let nVidia use its Atom freely? Why, oh why?
    There are so many factors. I think if you replace Farcry with a synthetic test, there will be less unknowns. Just maybe :) 
  • 8 Hide
    jcknouse , February 11, 2009 11:31 AM
    Fantastic work, Chris. Simply awesome. I really, really, really enjoyed that analysis.

    Something I need to ask tho:

    Is the GTX280 a dual processor VDA, like the Radeon 4870X2? If not, wouldn't you expect significant gains of 2 GPUs over 1?

    Also...

    Wouldn't you expect to see even a small decline in performance (possibly miniscule/negligible...but still present) simply because running 2 individual VDAs in 2 PCI-E x16 2.0 slots requires work to be sorted between two physical devices (handling going over the SLi interface), whereas using the Radeon 4870x2 work is sent to and split on-card?

    I'd really be interested in seeing a performance difference between 2 4870s and 1 4870x2 with the same catalyst version, and latest firmware.

    Also, I am quite shocked at the nVidia suffering. I've been an nVidia customer for years. I am going to have to look at going ATI Crossfire if I build a new AM3 gaming platform later this year.

    This is one of the reasons I value Tom's so much...articles like this.

    Thank you again, Chris. This was invaluable to me both technically as well as a consumer looking for the best bang for my buck, even tho my budget isn't limited.
  • 1 Hide
    arkadi , February 11, 2009 11:40 AM
    to the point, simple, logical, grate job!
  • -7 Hide
    jeffunit , February 11, 2009 12:07 PM
    Why are you running games which stress graphics for a cpu review?
    Wny not run *cpu benchmarks*, which measure cpu performance for a cpu
    review?
  • 3 Hide
    Anonymous , February 11, 2009 12:07 PM
    I would love to see a full testing article of AMD cpu's with ATI and NVidia cards VS Intel CPU's with ATI and NVidia cards. Maybe the architecture of the ATI cards works better with Intel chips and Nvidia works better with AMD chips. Wouldn't that be ironic since ATI is AMD.
  • 2 Hide
    squatchman , February 11, 2009 12:15 PM
    Much better, it was probably a ton of work, but you see that people complain less with comprehensive results.

    To jcknouse: Yea, the 4870x2 is two GPUs on one board and it should be beating the GTX280. To jeffunit: Check the original article for the non-synthetic CPU benchmarks, or any other article for that matter.
  • 3 Hide
    bf2gameplaya , February 11, 2009 12:37 PM
    Finally some quantitative analysis from Tom's, with reasonable methodology and an interesting subject. Well done, you have educated me!

    Yet, I see that Far Cry 2 was the only title tested and I have no way of knowing if any of these conclusions carry over to any other title.

    But now I have the right questions to ask, Thanks Tom's!
  • 2 Hide
    jameskangster , February 11, 2009 1:06 PM
    Chris, thank you for the editorial. I really appreciate the fact that you guys try your best to respond and answer our nit-picking comments and questions. Now based on what I have observed from your articles, I have some suggestions that might help (or might not at all). What if the bottleneck is related to the motherboard and its memory timing setup?

    I did a quick comparison of your past setups for benchmarks (From this article, 2009-02-09 AMD AM3 article, 2009-01-07 Phenom II review article, 2008-11-03 i7 review article).

    Basically, you used the same hardware for i7 920 in articles 2009-01-07 and 2008-11-03
    Motherboard: Intel DX58SO Revision 403
    Memory: A-DATA DDR3-1600 2X2GB set to DDR3-1333 CL 7-7-7-20
    Video: MSI N280GTX-T2D1G-OC

    Also just as an external reference I used Anandtech's setup from this article
    http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=3512&p=3. For those of you who hate Anandtech I apologize, but this article had the most comparable hardware setup:

    Motherboard: Intel DX58SO
    Memory: Qimonda DDR3-1066 4x1GB (7-7-7-20)
    Video card: eVGA GeForce GTX 280

    In these articles Intel's i7 920 2.66 GHz performance seemed to be dare I say better than Phenom II (although in one article Phenom II did not exist, but just looking at raw numbers) specifically related to gaming benchmarks. The interesting point here is that they all used Intel DX58SO motherboard, using 7-7-7-20 timing for the memory. The number of modules varied; however, it didn't seem to make a huge difference from what I have read so far.

    In the 2009-02-09 article and its subsequent editorial article Tom's Hardware used the following setup for i7.
    Motherboard: Asus Rampage II Extreme (X58/ICH10) LGA 1366
    Memory: Corsair Dominator DDR3-1600 8-8-8-24 @1.65V 3x2GB (caveat here Tom's hardware did try overcloking in the editorial article so that does vary)
    Video card: Nvidia GeForce GTX 280 1 GB

    Overall the video card chip model remained the same, the driver revisions were different but not siginifically; however Tom's used a different board and its memory setup was drastically different from its previous setups.

    It would be interesting to see the differences in performance comparing these setups (some of these setups might not be possible due to hardware/BIOS limitations, I didn't have time to look into that part):
    1. Rampage mobo with 8-8-8-24 memory timing vs Intel DX58SO with 8-8-8-24 memory timing
    2. Rampage mobo with 7-7-7-20 memory timing vs Intel DX58S0 with 7-7-7-20 memory timing
    3. Intel DX58SO with 8-8-8-24 memory timing vs Intel DX58SO with 7-7-7-20 memory timing
    4. Rampage mobo with 8-8-8-24 memory timing vs Rampage mobo with 7-7-7-20 memory timing

    Obviously I would use i7 920, the same video card and driver for above setups.

    I highly doubt that the memory timing would cause such a performance difference. My bet is on the motherboard.

    Also it would answer another question, or maybe obsfucate even more. Was it truly Nvidia's video card's fault that i7's potential was not translated into raw performance? or was it due to the motherboard, or due to memory setup? or both? or all three?

    I'm not expecting another editorial article for this, but it would be good to see get this straightened out. I'm really hesitant to place the blame on Nvidia for this yet.
  • 1 Hide
    Ho0d1um , February 11, 2009 1:06 PM
    Madis KalmeThere are so many factors. I think if you replace Farcry with a synthetic test, there will be less unknowns. Just maybe

    This article is testing real world results and not just number crunching
  • 0 Hide
    kknd1967 , February 11, 2009 1:21 PM
    This is simply a great analysis. thumb up for Chris' good work. I too originally suspected NV card + i7 behavior or X58 chipset in gaming. 2 other websites have i7+NV260 leading in CPU bounded low res test, but falling in GPU bounded high rest test significantly, which just does not make sense for a typical system behavior (usually one would expect leader still leads but by smaller margin in high res).

    What is interesting is now it seems best to use Intel i7 with AMD video card for gaming. So who is the loser? Nvidia...
  • 1 Hide
    jonyb222 , February 11, 2009 1:30 PM
    Quote:
    The data suggests that, using an AMD Radeon-based graphics card, you'll likely see the scaling that many other sites have presented, with Intel's Core i7 besting the Phenom II right up to 2560x1600 (refer to the first chart on this page for proof there).


    Quote:
    Nvidia's card cannot translate the Core i7's microarchitecture into the same performance advantage, giving AMD's Phenom II-series chips the advantage seen in the AM3 story and in the two pages you've just read.


    So, if I understood correctly, currently Nvidia cards/drivers are slowing down intel processors while AMD one can go full speed (which is why Phenom beats I7 in that case). And ATI cards let's both processors go fulll speed (which is why I7 beats Phenom in that case)

    Bummer for Intel/Nvidia, Horray for AMD/ATI?
Display more comments