A Much Needed Benchmark, Just In Time
We're dialed in with benchmarks on the PC. Our CPU suite is mix of single- and multi-threaded real-world apps, some of which exploit OpenCL acceleration to represent as wide of a range of workloads as possible. Our graphics testing involves the latest games, evaluated using uncompressed video captures of performance. The rigs we have set up are really quite cool. And the storage data we generate can get very low-level.
But the benchmarks we run on mobile devices are nowhere near as mature, and that includes GFXBench. As software developers push the boundaries of what they can do in these more closed-up operating environments, we're slowly seeing better tools to help us characterize the performance of smartphones and tablets across Android, iOS, Windows RT, and Windows 8.1. What we're finding along the way is that, as in the early days of PC benchmarking, a lot of companies are intentionally gaming tests for better results.
GFXBench 3.0 goes the furthest of any mobile-oriented metric we've used thus far to help isolate graphics performance, graphics quality, and battery life in graphics workloads. The ability to compare all three vectors in one story makes the decision to increase clock rates or alter output fidelity tougher for manufacturers to make. Of course, GFXBench 3.0 isn't foolproof. There's nothing stopping those same companies from pushing higher numbers in other titles and then playing more conservative in this test, where they know they'll be evaluated more broadly. Even still, we appreciate the robust comparisons we can make in GFXBench 3.0, along with the advanced API support and low-level components that help drill down into a GPU's strengths and weaknesses.
Take our journey with Samsung's Galaxy Note 10.1” 2014 Edition as an example. It wasn't until we ran the Render Quality tests that we were able to more thoroughly able to explain why the tablet might have taken a beating. Previously, our only conclusion would have been that ARM's Mali-T628MP6 just wasn't as fast. Now we can add that graphics horsepower is spent on a closer-to-reference 3D rendering, whereas competing products are taking quality shortcuts in the name of better frame rates.
Armed with this information, the concept of cheating gets a little more interesting. There really is a difference between legitimately increasing performance in a taxing workload at the expense of power (or maybe even understanding that a device is going to get hotter than it should), which some companies do, versus targeting benchmarks for big numbers, leaving real-world apps running slower to prolong battery life or prevent debilitating throttling. The practice of increasing voltages and frequencies upon app detection are already well-known, so it stands to reason that SoC vendors will try to make themselves look better in other ways. Messing with render quality is an approach we've seen from AMD and Nvidia on the desktop, and now we have a tool that can test for this on the mobile side, too.
Really, what GFXBench 3.0 lets us do is figure out where devices might be “robbing Peter to pay Paul”. You can turn the dial up and get bigger frame rates, but hurt battery life. GPUs can do the same by making the workload easier, hurting image quality. You can shoot for all-day longevity at the expense of speed. Or, you can output a great-looking image, burning cycles that could have gone to performance. We can now tell you how the balance is struck on a per-device basis.
And obviously, GFXBench 3.0 gives us an OpenGL ES 3.0-class workload to test modern SoCs with. The latest Adreno, Mali, PowerVR, Vivante, and Nvidia GPUs all support the API's enhanced features, paving the way for better-looking mobile games. But this is the first benchmark that lets us test performance under OpenGL ES 3.0. For that reason alone, it's an important component of our mobile benchmark suite.