XP3200+ - THG vs. X-Bit labs

goofee

Distinguished
Jul 17, 2002
14
0
18,510
I've been seeing reviews on THG for while, where the P4 is always a little better than on other sites. Actually, I wonder why. Also, the choice of benchmarks seems a little unfortunate. How about using SienceMark instead of SysMark?

Back to the comparison. While THG uses about the same system for the P4 setup, the Athlon system uses different mainboards. THG uses the Asus nForce Ultra and X-Bit labs uses the Abit version - both making good motherboards so one shouldn't expect too many differences.

Now, what got me startled for one thing, was the WinRar benchmark. While the Athlon processor is about 15% slower on Tom's Hardware Guide it's only about 2.5% slower on X-Bit labs. THG uses WinRar 3.11 while X-Bit labs uses v3.0. Can it be that WinRar recieved a major Pentium optimization in a minor version update? Or has one site wrong benchmarks?

Then there are the 3DMark results. While the Athlon is either better or equal to the P4 3Ghz on X-Bit labs it has a straight loosing session on THG.
Unreal Tournament is no man's land for the P4 on X-Bit labs while quite the opposite is going on on Tom's Hardware guide.

Then again, PCMark scores (has PCMark even been optimized for P4?) are quite equal on bothsites. So I wonder what went wrong when benchmarking? Someone misconfigured their systems? Is Abit that much better then Asus. Is the XP3200+ worth its money after all. It will be about 200$ cheaper than th 3.2GHz P4 when it comes out, not to mention the mobo.

So I wonder, what went wrong on either side
 

skyvader

Distinguished
May 13, 2003
5
0
18,510
The same can be found at www.firingsquad.com. In the CPU mark for 3DMark2003, the AMD was ahead and on Tom's it was behind. UT2K3 is an AMD winner on other sites but Tom has the P4 the winner. I like Tom's reviews but they are not consistent with many other reviews. In other reviews I have seen a consistent flow of similarities with results being 2-3% different. But they were consistent. If one site has the 3.0g P4 with a bench score of x in x said benchmark and other sites use the same benchmark with similar results then it is consistent. I almost get the feeling that some of the scores on Tom's site are mistakenly reversed. Not to purposely make the AMD look bad or Intel look good. But on firingsquad and other sites, the UT bench favored AMD with a score that looked reversed on Tom's bench. Almost like the AMD score was in the Intel line and the Intel score was on the AMD line. I just think that more sites need to converse with each other on their testing methods to come up with a good general reference for both platforms. Anand should talk to Tom and Thresh and get an idea of the test setup and work together to come to the same conclusion. Tom has great reports on stuff but his benchmarking is just not consistent with the vibe on the web from other sites.
 

slvr_phoenix

Splendid
Dec 31, 2007
6,223
1
25,780
If you want to know why the Athlon performed better in the Xbit Labs article than in the THG article, you have merely to look at the system setup done by Xbit Labs.

They used Corsair TWINX512-3200LL which is rated at DDR400 with timings of 2-2-2-<b><font color=green>6</font color=green></b>.

Xbit Labs tweaked the timings for the AMD platform to 2-2-2-<b><font color=red>5</font color=red></b>.

But what did Xbit Labs run the P4 platform at? Why 2-2-2-<b><font color=green>6</font color=green></b> of course.

How nice of XBit Labs to overclock the timings on the memory for the Athlon instead of keeping them at stock like they did for Intel.

On top of that there is also another possible biasing, with probably a much bigger effect than just the memory timings. THG makes special mention of what AMD marketing told <i>all</i> reviewers to do with the XP3200+:
Prior to the actual launch of the processor, we at THG received a benchmark guide for our test sample that explains to the press the best environment for testing the CPU. Among other things, it contains recommendations for many benchmarks and BIOS settings. <b>It also advises testers to <i>exchange a DLL</i> before starting Sysmark 2002 in order to attain better results in the Media Encoder.</b> We, however, did not make any changes to the benchmarks and stuck to accepted standards.

And one other thing: before installing the operating system, <b>AMD recommends deactivating the APIC mode (Advanced Programmable Interrupt Controller) in BIOS in order to boost performance</b>.
* My own emphasis was added to the words.

So AMD is telling reviewers to use specially-optimized software <i>and</i> to tweak the firmware/hardware setup significantly before benchmarking the AXP3200+. <SARCASM><i>Gee, that couldn't possibly bias the results.</i></SARCASM>

Funny how Xbit Labs <i>never</i> mentioned this. I would be willing to bet money that never mentioned it because they just followed AMD's orders to unfairly bias the results. Where as THG was outraged at the notion (and justly so) and refused to do anything to bias their results like AMD told them to.

<font color=purple><pre><b>There are 10 types of people in this world: those who can understand binary and those who can't.</b></pre><p></font color=purple>
 

goofee

Distinguished
Jul 17, 2002
14
0
18,510
Dude, actually the DLL you're talking about is a system DLL that was previously overwritten by SysMark. Also, I haven't been talking about Sysmark. Actually Sysmark was not tested on X-Bit labs.
In fact, 2-2-2-5 timings are not supported by the i875 but instead it uses 800MHz FSB which should easily make up for the difference concerning how well the P4 scales on higher FSB frequencies.
 

jardows

Distinguished
May 14, 2003
24
0
18,510
How many P4 optimized software tests were run on THG?
How many AMD optimized software test were run on THG?

If the P4 gets optimized software to benchmark with, the AMD should get the same as well.
 

wschuerm

Distinguished
Nov 11, 2002
336
0
18,780
people a banchmark is a benchmark who cares about optimisation if one is faster than the other accept it

maybe AMD achieved its goal (altough not living up to PR) they are getting a lot of publicity and who really cares about a 3200+ or a 3ghz just the geeks on this forum, an ordinary PC buyer will just hear AMD one more time than intel and although negative its not the 3200+ there buying it'll be the much cheaper and not so misrated 2400+ or something like that

SL6EF OC's GOOD
 

imgod2u

Distinguished
Jul 1, 2002
890
0
18,980
How many P4 optimized software tests were run on THG?
How many AMD optimized software test were run on THG?

If the P4 gets optimized software to benchmark with, the AMD should get the same as well.

That's AMD's job. They're the ones who are suppose to go out and work with software developers to get more software optimized for their processors. Tomshardware is *not* responsible for doing this. They merely gather what software people use more often (supposedly) and show the results. It's not the review sites job to do optimizations. If you think the Athlon should get optimized software, you need to talk to AMD and get them to work harder at convincing developers to make more optimized software for the Athlons.
Funny, if Intel had suggested that review sites tweak their system to perform better with their latest processor, people would be up in arms complaining about the vast injustice. Even when the software itself was written *out-of-the-box* to run better on the P4, people were still bitching even though the performance that the review sites showed was *exactly what the consumer would get*. How many consumers would know to replace the DLL in Sysmark? How many would even care about Sysmark? How many would turn off APIC?

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
 

mrdoubleb

Distinguished
May 14, 2003
10
0
18,510
I have been a THG reader/fan ever since I've been surfing the net - for 6 years now - but lately I had to start questioning the objectivity/professionalism of Tom's tests and turned away from the site. The latest test (3200+ vs. Pentium4) only strengthened my doubts. If it was only one site on the net that contradicted Tom's results I'd say I believe THG and no one else. However, the picture is almost the opposite. Again we find, that while no one says that the Athlon XP beats the P4, everyone seems to agree, that the two products (P4 @ 3.06/533 vs. XP3000+, P4 @ 3.00/800 vs. XP3200+) are more or less even - or to be more accurate: they both have areas where they are the best (P4: multi media, media encoding, etc., XP: +D gaming, office use, etc.). Some of the difference between THG's and the rest of the world's results could be explained by the fact that Tom uses heavily P4 optimized benchmarks only but to see UT 2003 charts like that is quite simply ridiculous. Everyone knows that UT's engine is Athlon heaven, and the results of everyone else's tests - no surprise here - confirm this (though P4 @800 MHz FSB wins at some sites) while at Tom P4 wipes the Athlon off the board like it was a Celeron. The fact is that Intel has huge pockets and can "ask" benchmark developers to optimize their software for P4 - just check out these charts @ tech-report.com (http://www.tech-report.com/reviews/2003q1/athlonxp-3000/index.x?pg=6). They've compared the results of Content Creation Winstone 2001, 2002 & 2003 on the same platform. No comment needed - the charts speak for themselves. I've also read somewhere recently that some benchmarks simply ignore the Athlon XP's SSE capability and test it like that against the P4. To make my point: there are some serious problems with benchmarking these days. I found two excellent articles on the issue on the Inquirer.net (http://www.theinquirer.net/?article=9445) & (http://www.theinquirer.net/?article=9465) which, as far as I see, have already created a major debate all over the net. I highly recommend reading them. I hope that the biggest on and offline PC magazines can shortly find a way to sit down and discuss the issue - the issue of putting down standards for benchmarking. In the meanwhile, has anyone noticed that AMD's own tests are always audited by Pricewaterhouse Coopers while all the other benchmarks on the net (Tom's included) are audited by... just whom exactly ??

Mr. Double B.
 

TheRod

Distinguished
Aug 2, 2002
2,031
0
19,780
X-Bits Labs THG
Benchmark AMD P4 Diff. AMD P4 Diff.
3DMark2001 SE 16334 16323 0,07% 16098 16772 -4,02%
3DMark03 4802 4860 -1,19% 4798 4885 -1,78%
3DMark03 CPU 704 714 -1,40% 696 758 -8,18%
PCMark2002 CPU 6872 7447 -7,72% 6858 7446 -7,90%
PCMark2002 Memory 6475 8948 -27,64% 6503 10062 -35,37%

This little table might help... Excuse me, but the "forum engine" seems to not consider TABs...

--
Would you buy a GPS enabled soap bar?
 

shadus

Distinguished
Apr 16, 2003
2,067
0
19,790
Another thing to be careful to note in p4 benchies is the use of a 3.0 as opposed to a 3.06... it makes a big difference in the benchies in cases where memory bandwidth is important.

Shadus
 

Mephistopheles

Distinguished
Feb 10, 2003
2,444
0
19,780
I still don´t understand how you people always come up with arguments like this one...

We´re not evaluating one chip in the physical sense! The software support and optimisations are <i>very</i> important, and Intel has the upper hand there, no doubt about it. They have better software support for their CPUs! It´s AMD´s fault if they don´t! It´s like the other poster here said - real-world benchmarks are real-world benchmarks, and optimisations are part of the real world, so if one processor fares better, then that´s it!

That still doesn´t mean that sites should be allowed to use only software that has been optimised for platform X or Y... That´s the importance of a balanced set of benchmarks, not just one program, and equal configurations whenever possible in hardware (like memory timings!).
 

slvr_phoenix

Splendid
Dec 31, 2007
6,223
1
25,780
imgod2u, you're just rocking. :) For the second time today: I couldn't have said it better myself. (And for the first time today: I'm glad that I don't have to say these things myself.)

<font color=purple><pre><b>There are 10 types of people in this world: those who can understand binary and those who can't.</b></pre><p></font color=purple>
 

slvr_phoenix

Splendid
Dec 31, 2007
6,223
1
25,780
We´re not evaluating one chip in the physical sense! The software support and optimisations are very important, and Intel has the upper hand there, no doubt about it. They have better software support for their CPUs! It´s AMD´s fault if they don´t! It´s like the other poster here said - real-world benchmarks are real-world benchmarks, and optimisations are part of the real world, so if one processor fares better, then that´s it!
Damn straight!

<font color=purple><pre><b>There are 10 types of people in this world: those who can understand binary and those who can't.</b></pre><p></font color=purple>
 

slvr_phoenix

Splendid
Dec 31, 2007
6,223
1
25,780
How many P4 optimized software tests were run on THG?
How many AMD optimized software test were run on THG?

If the P4 gets optimized software to benchmark with, the AMD should get the same as well.
How many software packages that consumers buy and use regularly are optimized for a P4?

How many software packages that consumers buy and use regularly are optimized for an AXP?

So what should consumers be looking at to know how the software that they use will run? Need I say more?

No, but I will anyway. The simple fact is that a <i>lot</i> of software is optimized for the Athlon AND the Pentium. The majority of the rest of the software is just not optimized at all. Hardly ever does anyone optimize for a P4 and <i>not</i> for an AXP.

<font color=purple><pre><b>There are 10 types of people in this world: those who can understand binary and those who can't.</b></pre><p></font color=purple>
 

slvr_phoenix

Splendid
Dec 31, 2007
6,223
1
25,780
No offense... but "the inquirer"? You couldn't find a barrel full of more half-wit unskilled monkeys writing what they pretend to call 'news' if you tried. (With the possible exception of Google's automated 'news' gathering, which accepts marketing announcements as 'news' sources.)

<font color=purple><pre><b>There are 10 types of people in this world: those who can understand binary and those who can't.</b></pre><p></font color=purple>
 

JimStapleton

Distinguished
Jun 21, 2001
145
0
18,680
have you noticed where 90% of the news articles on THGs right side panel come from?

What's it mean when THG reuses articles from the Inquirer if you think the Inquierer is so bad?

Athlon XP 1600+, MSI K7T PRO2 RU (POS), 2x256 MB CRUCIAL PC2100 CL2.5 memory, Asus V6800 DDR Delux (GF 256) video card, 6.4GB+27GB WD HD, 40GB IBM HD (all 7200RPM). My computer is an acronym
 

imgod2u

Distinguished
Jul 1, 2002
890
0
18,980
It means tomshardware wants page hits and would post anything that would get them more page hits. It's the same reason newspaper stands also sell tabloids. That's what the Inquirer pretty much is, the tabloids.

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
 

skyvader

Distinguished
May 13, 2003
5
0
18,510
It isn't about optimizations it is about reversed numbers. If X number of sites all had the same setup and came to the same conlusion (with a 1-2% difference) you would have consistent proof of the answer. Here we have Tom who is almost always different in his numbers. I am just asking if he could run the tests again with the help of some of the other big sites. Then you would at least have a consensius of agreement. If you remember when the bug in the PIII was discovered, Tom worked with HardOCP, ANAND and a few others to agree there was an error that would make the cpu look good to one reviewer and bad to another. Communication should be involved like: (Tom to Thresh) Hey Thresh I was looking through your review and noticed your numbers don't add up to mine. Based on the test setup and software we should be pretty close to each other. My numbers came out with Intel on top and yours (as well as other sites) have AMD on top. Lets work together and see what caused the differences.
 

goofee

Distinguished
Jul 17, 2002
14
0
18,510
While software optimization is defenitely important the use of benchmarks bought or influenced by Intel is not. Like Bapco even admitted they optimized for the vast majority of CPUs (meanning not for the sake of benchmarking and providing good comparison).
AMD has quite a few programs optimized for itself like FlaskMPEG and games. Obviously, UT2K3 is one of them but mysteriously on THG it doesn't show.
So while the use of benchmarks may be quesionable I believe the results shouldn't be.
 

slvr_phoenix

Splendid
Dec 31, 2007
6,223
1
25,780
Lets work together and see what caused the differences.
Except that THG already knows what caused the differences. Try actually reading through this thread. I already posted above what caused those differences. So far THG is the <i>only</i> review site that I've seen to even admit to not skewing the benchmarks like AMD wanted them to.

(And if anyone else has seen another review that refused to play by AMD's shady games, I'd love to see it please.)

<font color=purple><pre><b>There are 10 types of people in this world: those who can understand binary and those who can't.</b></pre><p></font color=purple>
 

skyvader

Distinguished
May 13, 2003
5
0
18,510
Show me a review that "does" what AMD is requesting. So far all sites with reviews simply state the test setup and apps but make no mention of tweaking the apps to better or hinder one or the other.
 

goofee

Distinguished
Jul 17, 2002
14
0
18,510
Well, obviously you didn't read the replies given: SysMark deliberately overwrites a Windows system DLL and thereby disabling SSE support that was previously provided for by Microsoft. So restoring the DLL should only be fair since Intel gets all their optimizations and more, too.

This is what Sudhian writes:
"Both of the above tests use Windows Media Encoder. When WME is installed to a WindowsXP system it overwrites a newer, XP-created DLL with an older version of the same file. The problem is as follows: The newer version of the DLL that is installed by WindowsXP properly detects and implements the AthlonXP's support for SSE. The older version of the DLL that the benchmark installs does not."

If you care to read more about this then take a look here:
http://www.sudhian.com/showdocs.cfm?aid=379&pid=1351

The difference is not a lot so I don't see why THG didn't include it or at least care to compare as their P4 values seem a little too over-optimized, anyway.
 

slvr_phoenix

Splendid
Dec 31, 2007
6,223
1
25,780
Show me a review that "does" what AMD is requesting. So far all sites with reviews simply state the test setup and apps but make no mention of tweaking the apps to better or hinder one or the other.
Show me a review site other than THG that <i>doesn't</i> do what AMD is requesting. Any unbiased professional who isn't afraid of losing their support from AMD would have been outraged by the instructions that AMD sent to <i>all</i> of their reviewers. Since no one is even so much as mentioning it, I find it very difficult to believe that anyone is <i>not</i> just following it blindly.

<font color=purple><pre><b>There are 10 types of people in this world: those who can understand binary and those who can't.</b></pre><p></font color=purple>
 

slvr_phoenix

Splendid
Dec 31, 2007
6,223
1
25,780
Well, obviously you didn't read the replies given
Actually, no. Obviously you didn't provide enough information the first time to even evaluate the validity of your post not to mention formulate any sort of a reasonable reply. Now that you <i>have</i> finally included actually useable information I can reply.

SysMark deliberately overwrites a Windows system DLL
I'm not convinced of any 'deliberate' overwriting. Accidental packaging perhaps, but proof of deliberate overwriting does not exist. And even then an accidental overwriting seems awfully unlikely, though marginally possible.

However, if AMD found such a flaw they should <b>not</b> be instructing people to overwrite the file with a version that they provide. They should be contacting the benchmark writers to get the installation fixed and in the meantime be suggesting that people restore the correct file from the WinXP distribution, <b>not</b> with a conveniently provided dll from AMD.

and thereby disabling SSE support that was previously provided for by Microsoft.
I have a <i>very</i> hard time believing this. SSE support has been in the Win2K kernel since the beginning. WinXP was built on the same technology as Win2K. Therefore <i>any</i> version of the dll from WinXP should have support for SSE.

So restoring the DLL should only be fair since Intel gets all their optimizations and more, too.
Again, see above. Frankly the whole reasoning just makes absolutely no sense. Even <i>if</i> it was indeed a WinXP dll that was overwritten by the install, it would still have SSE support in it and AMD should be making it publicly known for the benchmark to be fixed and telling people how to restore the correct file provided by Microsoft, not by AMD.

The difference is not a lot so I don't see why THG didn't include it or at least care to compare as their P4 values seem a little too over-optimized, anyway.
Because the whole thing sounds incredibly shady for starters.

If it were me, I would have verified the truth by checking the files involved and if it was indeed the installer was overwriting a file from Windows then I would have made sure to get Microsoft's version of the file, not AMD's version. And then I'd have sent a scathing letter to both the benchmark company and to AMD for both being idiots. (The benchmark company for not testing and not catching this and AMD for not handling it properly and making it all sound so shady.)

The difference is not a lot so I don't see why THG didn't include it or at least care to compare as their P4 values seem a little too over-optimized, anyway.
There were a number of other tweaks that AMD was also telling reviewers to do that THG refused to do. So even <i>if</i> AMD is being legit on this whole dll issue, that still leaves a noticable amount of biasing that AMD has to explain themselves for. If you think that THG's results are P4-biased, think again. The difference between THG's results and everyone else's is that THG wasn't Athlon-biased. (Which may actually be a first for THG in my opinion.)

<font color=purple><pre><b>There are 10 types of people in this world: those who can understand binary and those who can't.</b></pre><p></font color=purple>
 

TRENDING THREADS