Skip to main content

How We Test Smartphones And Tablets

GPU And Gaming Performance

Fueled by dramatic increases in mobile GPU performance and increasing familiarity with touch-based controls, developers both big and small are creating a rich gaming ecosystem for our phones and tablets. But just like on the desktop, better looking graphics and higher resolution screens require faster GPUs and more memory bandwidth. The synthetic and real-world game engine tests in this section probe the various aspects of GPU performance to identify weak points that might ruin the fun.

3DMark: Ice Storm Unlimited

This test by Futuremark includes two different graphics tests and a CPU-based physics simulation. It’s a cross-platform benchmark that targets DirectX 11 feature level 9 on Windows and Windows RT and OpenGL ES 2.0 on Android and iOS. The graphics tests use low-quality textures and a GPU memory budget of 128MB. All tests render offscreen at 1280x720 to avoid vsync limitations and display resolution scaling. These features allow hardware performance comparisons between devices and even across platforms.

The two different graphics tests stress vertex and pixel processing separately. Graphics test 1 focuses on vertex processing while excluding pixel related tasks like post-processing and particle effects. Graphics test 2 uses fewer polygons without shadows to minimize vertex operations, while boosting overall pixel count by including particle effects. This second test measures a system’s ability to read textures, write to render targets, and add post-processing effects such as bloom, streaks, and motion blur. The table below summarizes the differences in geometry and pixel count between the two graphics tests.

Graphics test 1530,000180,0001.9 million
Graphics test 279,00026,0007.8 million

The Physics test uses the Bullet Open Source Physics Library to perform game-based physics simulations on the CPU. It uses one thread per available CPU core to run four simulated worlds, each containing two soft and two rigid bodies colliding. Each frame of the soft-body vertex data is sent to the GPU. Because the soft-body objects use a data structure that requires random memory access patterns, SoCs whose memory controller is optimized for serial rather than random memory access perform poorly.

The performance of each graphics test is measured in frames per second, and the graphics score is the harmonic mean of these results times a constant. The Physics test score is just the raw frames per second performance times a constant. The overall score is a weighted harmonic mean of the graphics and physics scores.

BaseMark X

This benchmark by Basemark Ltd. is built on top of a real-world game engine, Unity 4.2.2, and runs on Android, iOS, and Windows Phone. It includes two different tests—Dunes and Hangar—which stress the GPU with lighting effects, particles, dynamic shadows, shadow mapping, and other post-processing effects found in modern games. With as many as 900,000 triangles per frame, these tests also strain the GPU’s vertex processing capabilities. The tests target DirectX 11 feature level 9_3 on Windows and OpenGL ES 2.0 on Android and iOS.

Both tests are run offscreen at 1920x1080 (for direct comparison of hardware across devices) and onscreen at the device’s native resolution using default settings (antialiasing disabled). The same set of tests are run at both medium- and high-quality settings. Each test reports the average frames per second after rendering the entire scene. The final score is an equal combination of the offscreen Dunes and Hangar tests, normalized to the performance of a Samsung Galaxy S4.

GFXBench 3.0

GFXBench by Kishonti is a full graphics benchmarking suite, including two high-level, 3D gaming tests (Manhattan, T-Rex) for measuring real-world gaming performance and five low-level tests (Alpha Blending, ALU, Fill, Driver Overhead, Render Quality) for measuring hardware-level performance. It’s also cross-platform, supporting Windows 8 and Mac OS X on the desktop with OpenGL; Android and iOS with OpenGL ES; and Windows 8, Windows RT, and Windows Phone with DirectX 11 feature level 9/10.

All of the tests are run offscreen at 1920x1080, to facilitate direct comparisons between devices/hardware, and onscreen at the device’s native resolution, to see how the device handles the actual number of pixels supported by its screen. The tests are run using default settings and are broken into three run groups (Manhattan, T-Rex, and low-level tests) with a cooling period in between to mitigate thermal throttling, which can occur if all the tests are run back-to-back.

Manhattan: This OpenGL ES 3.0 based game simulation includes several modern effects, including diffuse and specular lighting with more than 60 lights, cube map reflection and emission, triplanar mapping, and instanced mesh rendering, along with post-processing effects like bloom and depth of field. The geometry pass employs multiple render targets and uses a combination of traditional forward and more modern deferred rendering processes in separate passes. Its graphics pipeline awards architectures proficient in pixel shading.

Image 1 of 2

Image 2 of 2

T-Rex: This demanding OpenGL ES 2.0 based game simulation is as much a stress test as it is a performance test, pushing the GPU hard and generating a lot of heat. While not as dependent on pixel shading as Manhattan, this test still uses a number of visual effects such as motion blur, parallax mapping, planar reflections, specular highlights, and soft shadows. Its more balanced rendering pipeline also uses complex geometry, high-res textures, and particles.

Alpha Blending: This synthetic test measures a GPU’s alpha-blended overdraw capability by layering 50 semi-transparent rectangles and measuring the frame rate. Rectangles are added or removed until the rendered scene runs steadily between 20 and 25 FPS. Performance is reported in MB/s, which represents the total number of different sized layers blended together, an important metric for hardware-accelerated UIs and games that include translucent surfaces.

This test is highly dependent on GPU memory bandwidth, since it uses high-resolution, uncompressed textures and requires reading/writing to the frame buffer (memory) during alpha blending. It also stresses the back-end of the rendering pipeline (ROPs) rasterizing the frame. Because all of the onscreen objects are transparent, GPUs see no benefit from early z-buffer optimizations.

ALU: This test measures pixel shader compute performance, an important metric for visually-rich modern games, by rendering a scene with rippling water and lighting effects like reflection and refraction. Performance is measured in frames per second. The onscreen results are vsync limited (60fps) for most GPUs, but the offscreen test is still useful for comparing the ALU performance of different devices.

Fill: The fill test measures texturing performance (in millions of texels per second) by rendering four layers of compressed textures. This test combines aspects of both the alpha blending and ALU tests, since it depends on both pixel processing performance and frame buffer bandwidth.

Image 1 of 3

Image 2 of 3

Image 3 of 3

From left to right: Alpha Blending, ALU, Fill

Driver Overhead: This test measures the graphics driver’s CPU overhead by rendering a large number of simple objects one-by-one, continuously changing the state of each object. Issuing separate draw calls for every object stresses both hardware (CPU) and software (graphics API and driver efficiency). While the GPU does render each scene, its impact on the overall score (given in frames per second) is minimal.

Render Quality: This test compares a single rendered frame from T-Rex to a reference image, computing the peak signal-to-noise ratio (PSNR) based on mean square error (MSE). The test value, measured in milliBels, reflects the visual fidelity of the rendered image. The primary purpose of this test is to make sure that the GPU driver is not “optimizing” (i.e. cheating) for performance by sacrificing quality.