Sign in with
Sign up | Sign in
Your question

[H] defending real gameplay vs benchmarks

Last response: in Graphics & Displays
Share
a b U Graphics card
February 11, 2008 9:43:47 AM

Many of you may know that Kyle and Brent over at [H] have a different way of testing video cards compared to most review sites. Also, they are quite vocal about defending their methods. Well it seems Crysis and the 3870x2 launch have caused them to explore this further to back up their views and reviews. Have a look:

http://enthusiast.hardocp.com/article.html?art=MTQ2MSwx...
a b U Graphics card
February 11, 2008 9:57:11 AM

I understand thier methods, and at the same time I dont understand them. How can I when I myself cant reproduce them? Though any given game, certain demands will be applied to varying degrees, which will favor one card over another, so this in itself ruins their idea. My edit is to clarify. In any game, at certain points in that game will demand differing things from the gpu. In that scenario, 1 card will do good at say shaders if its good at doing that, as opposed to a different point in the same game where shaders arent so demanded. Hope this clears it up somewhat
a b U Graphics card
February 11, 2008 10:26:04 AM

One final note. They (H), have gone to great lengths to defend their testing, flawed tho it is. If they persist on defending using said methods, thats ok, just maybe next time, use something other than a beta driver, or take the lows out (like the .7 fps) in the article, that makes the card look bad, as well as the methods used. Oh, and use the LATEST drivers, the theyre beta as well.
Related resources
Can't find your answer ? Ask !
February 11, 2008 11:02:11 AM

I am glad they do things differently. I mean, it's just another tool in the consumers' tool box, and I am sure that difference is what keeps the site hits coming.
February 11, 2008 11:02:19 AM

they have a good point.
a b U Graphics card
February 11, 2008 11:14:01 AM

One thing, if we start going in this direction, then it comes down to how each person plays as well. Does he run thru? Scope out areas? Avoid certain conflict? To each his own was never better fit. And wouldnt each one bring a different conclusion? I like the same demands, same play, then Ill know what is what. If one uses metric, the other Miles instead of kilometers, and yet another something different, how can we measure a thing? And how can I reproduce these things? Should I just trust them? To play as I play? Go where I go in game? Too many holes in this, with no tools to measure with, and no way of reproducing this, thus rendering me with no comparisons. There has to be a better way than what theyre doing. I also wonder, why didnt they use the newest drivers from ATI for the 3870X2? Was it because it was way too demanding to start over? They had them, and only used them in their Crysis benches. Shoddy, incomplete, and not reproducible. Im glad we dont use that as a standard regaurding any scientific research
February 11, 2008 11:41:22 AM

^good point.

they said they play the game. but they dont follow a single standard. if they say a scene that needs lots of rendering, then it is very subjective according to different games.

following a benched "canned" demo will help to make a standard. which in return reflects what the compared gpu can handle.

i think their problem is, in selecting the time frame they choose to study the frame rates. since it is very subjective
February 11, 2008 12:33:26 PM

their whole point about the optimizations with canned benchmarks is kind of moot... considering you can turn off the optimizations very easily
February 11, 2008 3:13:07 PM

I like the way HardOCP benchmarks video cards. And, I would like to know how a card really performs while playing a game. If it is going to get into the unplayable range, I would like to know this. Keep it up IMO.
February 11, 2008 3:25:02 PM

It's hardocp's opinion that redeems what they think is playable. No one else's. It is nothing but their opinion.

This kind of tests does show something but not as much as running apple to apple tests.
February 11, 2008 4:22:29 PM

I think I will be looking back in there now and again for reviews. It is a different twist, granted it is not ‘scientific’, there results will most likely change every time they test a card/game but it is more information then the same ol’

If I were buying an expensive card having the extras info might be something to consider, especially if I were buying specifically for a game like Crysis… Great the card owns in 3Dmark but how does it do in the only app I care about?

At the end of the day there [H] results would have to be taken as subjective.
February 11, 2008 4:30:08 PM

I honestly despise Hardocp's gaming benchmarks. It's a perfect example of how not to benchmark. Why compare cards running at different settings with all this "Max playable" and "lowest playable". It takes what, 5-15 minutes to figure out how well your system can play a game? So you fine tune it yourself to base your pc's performance ability. But running a benchmark which is suppose to compare apples to apples, but instead running it for apples to oranges? It's pointless, the point is to compare different hardware so we get a better understanding, not force us to try to compensate based on the different settings. We all know every architecture has strengths and flaws..Some being better at AA than others (ehem, r6xx)....This is honestly a flawed review system, that should be dropped. But we know that aint gonna happen ^_^.
a b U Graphics card
February 11, 2008 4:56:10 PM

I really don't understand why people oppose HardOCP's gpu benchmarking methodology. C'mon even the EPA recognized that using the estimated miles per gallon formula was misleading and updated their mileage estimates to include real-world driving tests.

If the friggin government can recognize the value of performing real-world tests, what's stopping a bunch of hardware geeks from doing the same? Given that it was proven nVidia and ATI tweaked drivers to perform better than real-world when running benchmarks, why wouldn't hardware geeks want a real-life comaraison?

At the very least, HardOCP's methodology offers an expert opinion on the quality of gameplay an everyday gamer can expect. Kudos to HardOCP!
February 11, 2008 5:04:17 PM

What makes you think Hardocp opinions are same as yours? It isn't. What they think might deem playable for some might not be playable for others. It might even even be overkill for some.

Hardocp should put up regular canned benches as well as what they think.
February 11, 2008 5:10:40 PM

It would be nice if they did the following:

Crysis
1) The top 10 video cards on the same Intel machine running exact same settings -->maxed everything
2) The top 10 video cards on the same AMD machine running exact same settings -->maxed everything

Rinse, repeat with the top 100 games :) 
February 11, 2008 5:15:54 PM

chunkymonster: You really can't compare what you just stated, it's apples and oranges all over again! =P.

When it comes to benchmarks, when identical settings/setups are uses, you can actually tell the difference between the cards, this isn't a skewed MPG system that are used with automotive vehicles, we have actual proof on hand....HARDOCP's methodology for benchmarking GPU's is doing exactly what you claimed they are trying not to do, skewing the benchmarks, and making it harder for the consumer/public to compare a product.

VERY bad benchmarking.
February 11, 2008 6:23:45 PM

First of all i like this idea of evaluating performance for particular games.
I think most everyone is aware that often times drivers/hardware are optimized to take advantage of benchmarks. Although I wish they would just do level run throughs at high/med/low res/settings and do away with the "playable" thing. The other thing is the reader is forced to assume there is no company bias. i.e. on one card runs through the level looking at the ground as much as possible and on the other card stares directly at explosions/effects throughout the level. Now if they want to get extensive, testing two cards, running through each level 3-4 times and providing min- max -avg for each run through for each card, then and overall for each level for each card. Then an overall for the entire game. Of course by unbiased testers. I think that would paint a pretty accurate picture. For those of you defending Synthetics they can be easily manipulated, the equivilent to me getting a copy of an upcoming exam before hand and scoring well on a test. My results on said test does not reflect my real world knowledge of the topic. Had the exam changed prior to my taking it, a much more realistic protrayal of my knowledge would have been recorded.

I mean its up to you, if card A scored better on a benchmark synthetic test than card B and card B performed better in rigorous real world application than card A, I guess it is up to you what you would like better, good on paper or good in application.

I would prefer repeated real world testing (i.e. running the same level multiple times on each card and recording the values for each run)
I mean come on, how can you argue against real world testing? if I play a game on a card then ran said games benchmark at LFPS 30 HFPS 80 and AVG 50, then when I actually play the game I get LFPS 15 HFPS 50 and AVG 30, thats a little misleading no? Especially if a card that benchmarked lower actually does better real world.

Not even the benchmarks are 100 percent accurate every time, run a bench mark 3 times in a row and tell me if you get the same score each time?

Barring somebody purposely influencing the the real world test (i.e. exploiting high frame situations looking at the ground/sky etc. and doing it for an extended period of time) I think you are going to get pretty Accurate) information.

Also, in my opinion why isn't this scientific? Say you are on a pool table and hit a cue ball into the 8 ball 3 times, each time from the same distance, force etc. and record the direction, speed, and distance in which the 8 ball stops after being hit. The, you use a computer physics simulation to do the same thing. How is the aforementioned not scientific? its the same situation as testing real world game performance to a synthetic benchmark. Of course the real world application wont be perfect every time, but the REAL WORLD ISNT PERFECT and what works in a simulation may prove not as good in actual application. The Key is repetition, and use averages. If the reviewers played through a level 10 times, and handed you an average low average high and general average over all ten instances, would you still not trust it over the synthetic?!

Sorry for the long rant, I just cannot understand why anyone would prefer a synthetic benchmark to rigorous real world testing.

Again, I am not a big fan of the way they do it, but I would prefer real world benchmarks to synthetics.
February 11, 2008 6:59:09 PM

tsd16 said:
Also, in my opinion why isn't this scientific? Say you are on a pool table and hit a cue ball into the 8 ball 3 times, each time from the same distance, force etc. and record the direction, speed, and distance in which the 8 ball stops after being hit. The, you use a computer physics simulation to do the same thing. How is the aforementioned not scientific?


It is not scientific because they cant reproduce the same result three times in a row, granted they will get close but they will not have scientific answers.
I for one would be happy with the average of three, as long as we can see all three results, and they are close.

**edit**
There are too many variables in this type of testing to be scientific

February 11, 2008 7:43:16 PM

They should add a confidence level. Like stating if the results are good within 3fps. They also should do the test more than 3 times, maybe like 5 times.
February 11, 2008 8:19:10 PM

For me it's all about looking at the bottom of the gamebox for the requirements. Thats's right I'm old school
February 11, 2008 8:22:32 PM

@Evilonigiri:
based on their arrogance, that confidence level would be always high... add that to their regular bias towards Nv (I personally am brand-agnostic and buy whichever is top at the time of purchase) and the fact that the reality of the increase in the margin of error that is never acknowledged... with all of that you get a moderately useful opinion piece...

...granted, most review sites anymore are full of bias and cloudy recommendations so smart tech-shoppers read as many as possible and take the mean/avg and go with their budget on that.

Just my 2 bits
February 11, 2008 8:31:06 PM

sojrner said:

...granted, most review sites anymore are full of bias and cloudy recommendations so smart tech-shoppers read as many as possible and take the mean/avg and go with their budget on that.

Just my 2 bits


Indeed. I always make sure to read around 4-5 reviews of the same product or more before I decide the route to take. Last thing I want is someones biased opinion affecting my decision....I never go for biased fan boyism. Whoever has the top performer, gets my dollar. Also depends on budget, and if OC'ing is a factor. Personally, pointless to go AMD right now if you OC, e2160...oc to 3.2 ghz...your set. If OC'ing isn't an issue though, the x2's are good candidates.

I can't even fathom how some people can be closed minded enough to only swear by one product, even when the facts right infront of them prove they can get better products for the money. :pfff: 
February 11, 2008 9:08:30 PM

Kamrooz said:
Indeed. I always make sure to read around 4-5 reviews of the same product or more before I decide the route to take. Last thing I want is someones biased opinion affecting my decision....I never go for biased fan boyism. Whoever has the top performer, gets my dollar. Also depends on budget, and if OC'ing is a factor. Personally, pointless to go AMD right now if you OC, e2160...oc to 3.2 ghz...your set. If OC'ing isn't an issue though, the x2's are good candidates.

I can't even fathom how some people can be closed minded enough to only swear by one product, even when the facts right infront of them prove they can get better products for the money. :pfff: 

I'll agree with you that many ignorant people has a strong brand preference, always believing what they're getting is the best.

However there are also many cases where a brand preference of one company has resulted from a bad experience from another company. Let's just say you had a terrible experience with Intel, but had great experience with AMD. Odds are you're never going to buy from Intel, at least for a long while.
February 11, 2008 9:15:22 PM

That's true. I've had the same situation with motherboards, but comparing it to CPU's is a bit of a stretch don't you think? I've had issues with many mobo's in the past...But Asus and Gigabyte have been good to me, never had any problems, so I stick with'em. Perfectly fits within your example...But when it comes to the performance of a processor? When one is completely dominant over the other? OC's EXTREMELY well even at the lowest budget processor, pretty much capable of 100% oc's? Along with a performance advantage clock for clock on a product that has been out for a couple of years compared to the competitors new release?

Don't get me wrong, someone likes and prefers AMD, power to them, but what I can't fathom are fanboys giving advice that is clearly biased, recommending a inferior solution that doesn't give the best performance for the value. Giving false information to someone asking for help?...I've seen it many times over...and it just boils my blood. That's one reason why I always type lengthy helpful posts, explaining why they should go for it and the strengths and weaknesses, market analysis for future products and the best route. If you want to be biased towards a product, that's cool. You like nvidia", stay with it, you like amd? more power to you...But when they try to force their opinions on others when it comes to something that's more than just motherboard/graphics card partner preferences, it crosses the line. I bet you'll agree with me on that one =P.
February 11, 2008 9:28:05 PM

Yes, I agree with you 100%. Those AMD fanbois are the people I classified as "ignorant". Using cpus as an example was a little extreme, however there are people like that out there. Also there's really only two choices when it comes to cpus, which is unlike motherboards.

Oh well, they say ignorance is bliss, so they might as well stay ignorant.
a b U Graphics card
February 11, 2008 9:50:37 PM

The main thing for me is that despite errors and issues internally within [H]'s own reviews, instead of looking to improve on them, they instead look to attack the criticism first or the other methods (what did Anand do to draw their ire personally?), and then only later if it becomes blatantly obvious to everyone else do they then even start to consider an options that is not of their own making. When it was pointed out there might be IQ problems with some drivers that they may have mised it became a problem that people focused on 'such a small issue' not that they missed it, and continued to use beta drivers despite using 'they are just beta drivers' as the excuse.

I like to use both [H]'s benchies and other sites, because they never match up. Which one is correct is all a point of view, [H] is subjective (no question it's the style they chose) and strictly valid just to the setup they have and the runs they ran. I'm not sure how that gives more information to someone not using a setup similar to there's nor someone who isn't playing the small handful of games they chose at the settings they prefer (Oblivion with little/no grass, but shadows on everything including grass?).

I find that instead of moving the art of benchmarking forward, this article is more of a defense of their methods and an attempt to condemn all those who don't follow this narrow path.

If someone were looking for a be all and end all single source of information, it would need to have more information and be more consitent than their current reviews are which pretend to be more than they are.

If anyone can explain to me why their hystograms don't match their min/avg/max numbers in the 'settings' that would also help alot. [:wr2:3]

Personally I prefer looking at th hystograms of the apples-apples tests because that tells me more than subjective op-onions, however if the hystograms too are questionable, then how can you validate what is essentially a 'look & fell' review rather than something reproduceable?
a b U Graphics card
February 11, 2008 9:54:22 PM

sojrner said:

...granted, most review sites anymore are full of bias and cloudy recommendations so smart tech-shoppers read as many as possible and take the mean/avg and go with their budget on that.


Couldn't agree with you more. [:thegreatgrapeape:6]

I don't care if they use Voodoo and the number of dead-chickens a card will kill as their benchmark, as long as I have a large selection of people testing it should give me an overall view for whatever I'm looking for.

To think that any one review/method tells the whole story, is to be ignorant of the complexities of this hardware. :pfff: 
a b U Graphics card
February 11, 2008 10:44:57 PM

Wow; been gone all day...this thread really took off. Thx for all the input.

@ TGGA - I think [H] lashed out on Anand because readers kept throwing Anand's 3870X2 links at Kyle in the forum for their HD3870x2 review. ;)  It really does seem like this article is just their attempt to defend their testing methods, no matter what. I agree with you unless Anand made some comments I missed, it's seems low to pick on them like that. Anyway, this [H] seemed worth reading though.

I do like real world gameplay results available to add to other benchmarks in basing our overall opinion. But yeah, I wish they did real world apples to apples tests at various resolutions, instead of deciding one card needs medium this, no grass, etc. Take firingsquads settings (typically max and 3 resolutions, sometimes with and without fsaa) and take the time to "real world" bench all them. That would be alot more usefull.

I've noticed in my own fraps benching that the min very often doesn't match the time fps listed that get put in the histogram. Is seems the min/max/ave usually reports a lower minimum than gets logged each second. I like the histogram myself as if both cards ave the same and hit 15 fps as a min, but card A drops in the teens alot (bigger fluctuation high/low) while card B only does it once. Obviously B is offers better gameplay.
February 11, 2008 10:51:45 PM

I just think Kyle and his crew smokes way too much weed. :lol: 
a b U Graphics card
February 11, 2008 11:22:08 PM

pauldh said:

I do like real world gameplay results available to add to other benchmarks in basing our overall opinion. But yeah, I wish they did real world apples to apples tests at various resolutions, instead of deciding one card needs medium this, no grass, etc. Take firingsquads settings (typically max and 3 resolutions, sometimes with and without fsaa) and take the time to "real world" bench all them. That would be alot more usefull.


Yeah in my comment in the thread I mention that one of my biggest issues (other than questionable internal validity) is the idea that there is more benefit in seeking out or creating differences in the two, because if there is equality then there is only 1 dataset. While a reviw like Xbit Labs or Firingsquad may have 6-9+ for that one game. And how do you value that 1920x1200 withou AA is better than 1280x1024 with 4-8XAA? That's information that you lose in a 'this is our favourite settings' benchmark/review. I prefered Beyond3D's testing that was restricted to a single IHV, no Red vs Green, it was just about the hardware, and the main comparison was last generation vs current generation, or high end vs mid-range, and thus much more questionable stuff was removed, includiung questionable beta drivers and floptimizations.

Quote:
I've noticed in my own fraps benching that the min very often doesn't match the time fps listed that get put in the histogram. Is seems the min/max/ave usually reports a lower minimum than gets logged each second. I like the histogram myself as if both cards ave the same and hit 15 fps as a min, but card A drops in the teens alot (bigger fluctuation high/low) while card B only does it once. Obviously B is offers better gameplay.


Yep and also if you look at some of the apples-to-apples test, I think you'd have a tough time convincing me that either card is singificantly better at full settings, yet one 'required' being brought from high to medium shader quality? That implies that one is totally unuseable at that setting while the other is the only one playable, which is not really born out by the dataset provided.

Overall I relied more on [H] in the past, now I treat them like any other 2nd tier information source because I don't think they provide enough information with which to get a good picture of limitations and such. I rarely get much from one review, but if one review has alot of data I may get an idea of some areas that may benifit or restrict a design, and then I look to more reviews to see whether they give credence to this view.

Firingsquad, TechReport and Xbit are usually the ones that gives me this large granular snapshot from which I can pull some insight Digit-Life used to do that more so than now. I can also get alot of background and tech information from both here at Tom's and also at Beyond3D or 3Dcentre, and then after that I start simply adding to the information by looking over for anomalies from places like [H], Digit-Life, Bit-Tech, Hexus, EliteBastards, Anand (rarely though) and others.

Anywhoo, I think this is more about [H] defending themselves, but I also think the way they're doing it by attacking everyone else and not seeing value in their tests, just hurts them a little more in the process too.
February 12, 2008 3:00:36 AM

They show themselves to be Nvidia fanboys on the last page. Neither the 8800gts or the 3870x2 can run Crysis with those settings. The 8800gtx comes out "on top" when settings are at medium, but others pointed out driver issues.

Besides, Crysis is a FPS and I bought my 3870x2 based on The Witcher. That was not a canned demo used by Anandtech. They showed that a 3870x2 got 47 fps instead of 30 vs. a 3870. The 3870x2 also beat the 8800gtx in that game:

http://www.anandtech.com/video/showdoc.aspx?i=3209&p=10

Only the 8800gt in SLI, at the highest resolution tested, beat the 3870x2. Since I'll be playing at around 1600 x 1200 when I get a 2O" LCD, that's fine by me.

Crysis favors Nvidia, especially if Nvidia's optimizations that hurt Crysis image quality are taken into account. Other games favor the 3870x2. I chose to buy it based on a real world FRAPS playthrough. I doubt that the people at H can test the 3970x2 without bias.

Supporting an FX 5800 on Screen Savers! Very funny bit.

I miss that show. There used to be Tech TV, now that channel's Dreck TV.

This Friday, I order an Antec Neo 650 with a 6+2 PCIe cable. My Neo 550 does not have it and the factory overclocked 3870x2 I have needs it. So, I'll get to FRAPS benchmark it at 1280 x 1024 on a 17" Viewsonic A71f. Can't get that 20" LCD until next month.

Will I be CPU limited at my CRT's resolution with an X2 4600+ Windsor? That is the queston.
a b U Graphics card
February 12, 2008 7:43:07 AM

My problem is, you get 2 people using this method and youll have 2 different sets of results. THAT is a no brainer. You simply CANNOT have apples to apples using this, much like a 250 pound man hitting an 8 ball vs a 5 yr old. Coordination, hieght, strength etc all different. We need to ask a scientist about these methods to see if theyre credible. I think not IMHO
a b U Graphics card
February 12, 2008 8:41:26 AM

As far as benchmarks, I like to see the 3DMark ones (just out of curiousity) and definately like to see real world ones to back up the Synthetics. I don't always believe all benchmarks from one place, but I do end up looking at several places to verify or get an opinion about a product.
I'm usually more interested in the mainstream products, since I can't afford the top end stuff. Here is what I'd like to see in reviews of GPU's:

Mobo, Memory, Case, PSU, CPU HSF, and HD be all the same for every test.
Test #1
I'd like to see the CPU changed out for every test, but don't change the GPU. Use the high end GPU (3870 x2 or 8800gts (g92)) and complete all tests.

Test #2
I'd like to see the GPU changed out (2600xt/3650/3850/3850 512mb/3870/8800gt/8800gt 256mb/8800gts (g92)/8800gtx/8800gtx Ultra) for every test, but don't change the CPU out (using an e2140 or AMD x2 3800).

Than perform the test above (#2), but with a mainstream CPU (e6750/x2 5000 BE). Follow this up with the same test, but with the CPU from the high end CPU, etc.... You see where I'm going with this.

This of coarse would be nice to have, but is probably not financially feaseable, but would definately let the consumer know what kind of performance gains/losses you would have in a certain situation. Most of these review sites use $1k CPU's and test out a 2600xt or some other low end GPU, which noone with the $ to buy a $1k CPU would do. This just doesn't make sense. I know they do this so you can see the maximum FPS that you would probably get with the given GPU, but c'mon!
February 13, 2008 9:21:07 AM

lunyone said:
As far as benchmarks, I like to see the 3DMark ones (just out of curiousity) and definately like to see real world ones to back up the Synthetics. I don't always believe all benchmarks from one place, but I do end up looking at several places to verify or get an opinion about a product.


3DMark will get more interesting when a game is developed by Futuremark. Otherwise, it's eye candy that tests GPU's but not under real world conditions. My midrange GPU's all did better in games than in 3DMark. I'll get to see if I'm CPU limited next week when I set up the 3870x2.

lunyone said:

I'm usually more interested in the mainstream products, since I can't afford the top end stuff. Here is what I'd like to see in reviews of GPU's:

Mobo, Memory, Case, PSU, CPU HSF, and HD be all the same for every test.
Test #1
I'd like to see the CPU changed out for every test, but don't change the GPU. Use the high end GPU (3870 x2 or 8800gts (g92)) and complete all tests.



What I find amusing are all the graphics card tests with very high end processors. I'd love to know which card ends up being CPU limited by which processor. We know that a Pentium C2D should limit an 8800gts, but where does that limitation stop on the Intel side? Where does it stop on the AMD side?

So, I like the idea of testing each GPU for review with a value processor, a mainstream processor and an enthusiast processor from each company. That would give a fairer real world comparison. Of course, most enthusiasts would say that no one buys a high end GPU with a low end system, but it happens. Plus, there's that great middle not represented in the reviews at all.
a b U Graphics card
February 13, 2008 10:28:07 AM

yipsl said:
3DMark will get more interesting when a game is developed by Futuremark. Otherwise, it's eye candy that tests GPU's but not under real world conditions. My midrange GPU's all did better in games than in 3DMark. I'll get to see if I'm CPU limited next week when I set up the 3870x2.

What I find amusing are all the graphics card tests with very high end processors. I'd love to know which card ends up being CPU limited by which processor. We know that a Pentium C2D should limit an 8800gts, but where does that limitation stop on the Intel side? Where does it stop on the AMD side?

So, I like the idea of testing each GPU for review with a value processor, a mainstream processor and an enthusiast processor from each company. That would give a fairer real world comparison. Of course, most enthusiasts would say that no one buys a high end GPU with a low end system, but it happens. Plus, there's that great middle not represented in the reviews at all.

Exactly what I would like to see. Low/Mid/High end CPU's with a high end GPU. Than one could figure out if a e2140 or a e6550 would all that you would need to play your particular game. But we know that none of these review sites would do this, since they are paid to push the latest technology. It would be nice to see it, but I'm just dreaming here.
a b U Graphics card
February 13, 2008 11:16:55 AM

Its been awhile, but the last time Ive seen this done was with the 939s, using a FX60. All they did was lower or raise the multiplier and used different cards. At the time, 2.2GHZ was considered the "bottleneck" point for the 1900xtx. Id like to see this again, at least once for each cpu/gpu generation. The 1900xtx was considered the top card at the time, tho they also tested lower and the prior gen of cards as well
February 13, 2008 11:37:15 AM

Unless the results are reproducable by a 3rd party I would consider them as false.

Real World Gameplay is WAY too subjective. As an example.. what if while playing it 5 frags go off with explosions and another time 10 go off? This is why we have canned GPU benchmarks.

Of COURSE it's not going to mimic real world framerates. Does the Crysis Bench have 20 people shooting at you while 15 explosions are going off?

However if you take a Video card and one gets 30 FPS on the Benchmark and the other gets 20 FPS, I would venture a guess to say the former is stronger.

This *IS* however why you don't just use a single benchmark. You use multiple ones from several games and include 3D Mark and take an average.

Unless [H]'s results are 100% repeatable (I'd say within 5-10% and that's lenient) on the same system by a 3rd party they can be regarded as false. Unless the time demos are exactly the same and used by both cards on the exact same system, they are false.

Apples to Apples... It means reproducable results. In general most sites did find the 3870X2 (the card that started all this controversy) to be better in most situations. Somehow [H] didn't... yet they are the only ones telling the "truth."

Edit: I'd also like to add that they should run a minimum of 12 Benches and throw out the highest and lowest. It is difficult though, as several options exist.
February 13, 2008 1:06:53 PM

Any site using Crysis as the defining benchmark are doing their readers a diservice. Crysis is still in beta as far as I'm concerned.
February 13, 2008 3:51:10 PM

I see no problem in how they test the cards ,as long as the tests are consistent.[H] is a consumers tool in which to judge the cards.They removed some of the games that they use to use because the test time was too long,which I think is wrong.More games in the test,since there are a lot using the same engine,would give us a more detailed view of what [H] is seeing.I guess my point is they need to use the top ten games to more define these cards,yes it would be longer for them to test,but I think its more fitting to there methods.
February 14, 2008 9:40:18 PM

chunkymonster said:
I really don't understand why people oppose HardOCP's gpu benchmarking methodology. C'mon even the EPA recognized that using the estimated miles per gallon formula was misleading and updated their mileage estimates to include real-world driving tests.


It's not that they test that way, it's that they have both ATI and Nvidia cards on medium settings in Crysis on the next to the last page and then on the final page, where they reveal their bias against ATI, they show the frames per second the 3870x2 gets on high settings. It's not like the 8800gts can handle Crysis on high settings; maybe R770 and G100 will manage to, but not today's cards.

The difference between the two page's settings is bad amateur journalism, pure and simple. Both the 3870x2 and Nvidia 8800gts do similarly on medium settings, with a slight nod to Nvidia. Also, their games tested are mostly, if not all, FPS. I'm not bothering to check all their titles because I don't have them bookmarked on this new install, but they leave out other genres where the difference is not on Nvidia's favor.

They just wish they were Anandtech. Perhaps they aren't getting the samples to bench that Anandtech gets. When a smaller site attempts to start a row with a major site, then motives must be questioned. I certainly do, and I'm not even posting at Anandtech all that much. I just read it, along with Tom's Hardware and Xbit Labs as my 3 favored benchmarking sites.
February 14, 2008 10:05:49 PM

JAYDEEJOHN said:
We need to ask a scientist about these methods to see if theyre credible.


Since validation against hard results is out of the question, they need to produce error estimations for their results.


That would require a great number of runs (3 is not enough), and plotting of the deviation in FPS between the runs. If the variation is large, more runs are needed until a bell-curve can be extracted and the average calculated.

A method of ensuring consistency between runs is also needed, like youtube videos of the runs nearest the average.



[H]'s current method would be laughed out of a peer-review assessment if that helps.

February 14, 2008 10:46:54 PM

Actually n=3 can be enough if the difference between means is great enough and the standard error is sufficiently small. Power analysis can help determine the sample sizes required to achieve statistical significance (p>0.05, or 95% confidence level). Obviously more runs or samples are better because it reduces your error and thus increases your statistical power.

The problem with [H]ardOCP's testing methods are that there is way too much uncertainty in the real time gaming environment - so Amiga500 is right and more samples (or runs) would be needed to determine any sort of statistical significance.
a b U Graphics card
February 14, 2008 11:06:01 PM

Quote:
They just wish they were Anandtech. Perhaps they aren't getting the samples to bench that Anandtech gets. When a smaller site attempts to start a row with a major site, then motives must be questioned.

[H]ardocp is no small site by any means. Their forums and folding team are huge. Matter of fact, Folding they are #1 by far so they obviously have quite the loyal crowd of readers. http://folding.extremeoverclocking.com/team_list.php?s=
a c 227 U Graphics card
February 15, 2008 12:24:54 AM

The reason the EPA can't do real world testing is that it would have to be done outside. Let's do two cars A and B.

Was Temperature exactly the same ?
Was Humidity exactly the same ?
Was Elevation exactly the same ?
Was Traffic exactly the same ?
Was Tire Pressure exactly the same ?
Was Road Surface Condition exactly the same ?
Was Driver's Mood exactly the same ?
Was the Tension in Driver's shoelaces exactly the same ?
Was Wind Direction exactly the same ?
Was Road Grade exactly the same ?

I could keep going :) 

So the test are done on a dynometer where every condition is exactly the same and the answers will be exactly the same all the time. EPA didn't change their methodology, they just changed the fudge factor which accounts for air resistance and other factors so as to provide an "approximation" which better reflected usage in real world conditions.

If ya want subjective reviews go to PC Ragazine. My favorite was the comparison they once did between Microsoft Access and Lotus Approach....Approach got 11 "Excellent" ratings and 1 "Good" in 12 categories.....Access got 7 Excellents / 5 Good. They "tied" for Editor's Choice.

There's a more recent one where they reviewed PC Protection Suites. The Zone Alarm product and the Norton product scored equally on most categories but on 2 or 3 of them the ZL product trounced by like a 4 to 2 margin. The Norton product edged out one category with like a 4.5 to 4.0 margin but when you read the text of the article relating to that category, it said that the ZL product "performed the best". That's why subjective reviews can't be accepted as too often the winner is chose before the test begins and then the test skewed to produce the desired result.



February 15, 2008 1:15:44 AM

I can't accept [H]'s review of the 3870 X2 because I don't get where they are coming from at all whatsoever.

First they say that a single 8800GT isn't playable at 1680x1050 with everything on high in Crysis. WTF? I have built 2 systems quite recently (one e6850 and one e8400) with EVGA Superclocked 8800GTs and both customers play Crysis on high @ 1680 and love it.

I've watched my buddy play through all of Crysis @ 1920x1200 on mixed high/very high with a single 8800Ultra and a 6850@3.3 with 8Gigs on Vista 64. It ran great and was more than playable throughout the whole game. Now the [H] would say even triple Ultras would be poor at this.......WTF??? I can't believe my own eyes?
February 15, 2008 1:41:18 AM

Scientific? Who CARES!

When I get a new game I do exactly what they do - find the best balance between FPS and quality.

Oh and FYI, this is scientific. It's called qualitative testing.
In the case of video cards it is VERY IMPORTANT.

I can understand wanting to compare performance with a standard, but would argue this method only tells part of the story. Admittedly, a mix of both graphs would be nice.


-Cheese


EDIT/PS: I wont defend their actual conclusion (necessarily) but I will defend their methods.
!