Skip to main content

How We Test Smartphones And Tablets

Benchmark Suite

The mixture of synthetic and real-world tests we run are meant to give a comprehensive overview of a device’s performance. Synthetic tests—which are generally composed of several small, specialized blocks of code for performing operations such as cryptography, file compression, matrix operations, and alpha blending—are good at isolating the performance of different parts of a component’s architecture, including integer and floating point math, specialized instructions, pixel shaders, and rasterization. With this information, we can make comparisons between individual hardware components like SoCs (CPU A is faster than CPU B) or NAND flash. Because of their highly focused nature, however, it can be difficult to relate these results to the overall user experience, limiting us to generic statements about a device being faster because it has a faster CPU in certain benchmarks. Furthermore, synthetic tests are generally designed to push hardware to its limits—useful for determining maximum performance and for spotting weak points in a design—but do not represent real-world workloads.

For this reason, we also try to include benchmarks that test macro-level activities you do every day such as web browsing, composing text, editing photos, or watching a video. While these benchmarks are a better indicator of overall user experience, they are much more difficult to develop, leaving testers few options for mobile platforms.

To truly understand the performance of a device, we need to test it at the component level and at the system level, we need to know its maximum performance and its performance in real-world scenarios, and we also need to spot deficiencies (thermal throttling) and anomalies (unsupported features). No single benchmark can do all of these things. There’s not even a single benchmark that can adequately test any one of these things (creating a good benchmark is extremely difficult and there are always compromises). This is why we run a whole suite of benchmarks, many of which have overlapping functionality.

By now it should be apparent that the benchmarks we use are not randomly selected. In addition to fulfilling the requirements above, our benchmark suite comes from experienced developers who are willing to openly discuss how their benchmarks work. We work closely with most of these developers so that we may gain a better understanding of the tests themselves and to provide them with feedback for improving their tests. The table below lists the benchmarks we currently use to test mobile devices.

Google Android

CategoryBenchmarkVersionDeveloper
CPU And System PerformanceAndEBench Pro 20152.1.2472EEMBC
Basemark OS II Full2.0Basemark Ltd
Geekbench 33.3.1Primate Labs
MobileXPRT 20131.0.92.1Principled Technologies
PCMark1.1Futuremark
TabletMark 20143.0.0.63BAPCo
Browsermark2.1Basemark Ltd
JSBench2013.1Purdue University
Google Octane2.0Google
Peacekeeper-Futuremark
GPU And Gaming Performance3DMark: Ice Storm Unlimited1.2Futuremark
Basemark X1.1Basemark Ltd
GFXBench 3 Corporate3.0.28Kishonti
GFXBench 3.1 Corporate3.1.0Kishonti
Basemark ES 3.11.0.2Basemark Ltd
Battery Life And Thermal ThrottlingBasemark OS II Full2.0Basemark Ltd
GFXBench 3 Corporate3.0.28Kishonti
PCMark1.1Futuremark
TabletMark 20143.0.0.63BAPCo

Apple iOS

CategoryBenchmarkVersionDeveloper
CPU And System PerformanceBasemark OS II Full2.0Basemark Ltd
Geekbench 33.3.4Primate Labs
TabletMark 20143.0.0.63BAPCo
Browsermark2.1Basemark Ltd
JSBench2013.1Purdue University
Google Octane2.0Google
Peacekeeper-Futuremark
GPU And Gaming Performance3DMark: Ice Storm Unlimited1.2Futuremark
Basemark X1.1Basemark Ltd
GFXBench 3 Corporate3.0.32Kishonti
GFXBench 3.1 Corporate3.1.0Kishonti
Basemark ES 3.11.0.2Basemark Ltd
Battery Life And Thermal ThrottlingBasemark OS II Full2.0Basemark Ltd
GFXBench 3 Corporate3.0.32Kishonti
TabletMark 20143.0.0.63BAPCo

Microsoft Windows Phone

CategoryBenchmarkVersionDeveloper
CPU And System PerformanceBasemark OS II Full2.0Basemark Ltd
Browsermark2.1Basemark Ltd
JSBench2013.1Purdue University
Google Octane2.0Google
Peacekeeper-Futuremark
GPU And Gaming PerformanceBasemark X1.1Basemark Ltd
GFXBench 3 DirectX3.0.4Kishonti
Battery Life And Thermal ThrottlingBasemark OS II Full2.0Basemark Ltd
GFXBench 3 DirectX3.0.4Kishonti
  • blackmagnum
    Thank you for clearing this up, Matt. I am sure us readers will show approval with our clicks and regular site visits.
    Reply
  • falchard
    My testing methods amount to looking for the Windows Phone and putting the trophy next to it.
    Reply
  • WyomingKnott
    It's called a phone. Did I miss something? Phones should be tested for call clarity, for volume and distortion, for call drops. This is a set of tests for a tablet.
    Reply
  • MobileEditor
    It's called a phone. Did I miss something? Phones should be tested for call clarity, for volume and distortion, for call drops. This is a set of tests for a tablet.

    It's ironic that the base function of a smartphone is the one thing that we cannot test. There are simply too many variables in play: carrier, location, time of day, etc. I know other sites post recordings of call quality and bandwidth numbers in an attempt to make their reviews appear more substantial and "scientific." All they're really doing, however, is feeding their readers garbage data. Testing the same phone at the same location but at a different time of day will yield different numbers. And unless you work in the same building where they're performing these tests, how is this data remotely relevant to you?

    In reality, only the companies designing the RF components and making the smartphones can afford the equipment and special facilities necessary to properly test wireless performance. This is the reason why none of the more reputable sites test these functions; we know it cannot be done right, and no data is better than misleading data.

    Call clarity and distortion, for example, has a lot to do with the codec used encode the voice traffic. Most carriers still use the old AMR codec, which is strictly a voice codec rather than an audio codec, and is relatively low quality. Some carriers are rolling out AMR wide-band (HD-Voice), which improves call quality, but this is not a universal feature. Even carriers that support it do not support it in all areas.

    What about dropped calls? In the many years of using a cell phone, I can count the number of dropped calls I've had on one hand (that were not the result of driving into a tunnel or stepping into an elevator). How do we test something that occurs randomly and infrequently? If we do get a dropped call, is it the phone's fault or the network's? With only signal strength at the handset, it's impossible to tell.

    If there's one thing we like doing, it's testing stuff, but we're not going to do it if we cannot do it right.

    - Matt Humrick, Mobile Editor, Tom's Hardware
    Reply
  • WyomingKnott
    The reply is much appreciated.

    Not just Tom's (I like the site), but everyone has stopped rating phones on calls. It's been driving me nuts.
    Reply
  • KenOlson
    Matt,

    1st I think your reviews are very well done!

    Question: is there anyway of testing cell phone low signal performance?

    To date I have not found any English speaking reviews doing this.

    Thanks

    Ken
    Reply
  • MobileEditor
    1st I think your reviews are very well done!

    Question: is there anyway of testing cell phone low signal performance?

    Thanks for the compliment :)

    In order to test the low signal performance of a phone, we would need control of both ends of the connection. For example, you could be sitting right next to the cell tower and have an excellent signal, but still have a very slow connection. The problem is that you're sharing access to the tower with everyone else who's in range. So you can have a strong signal, but poor performance because the tower is overloaded. Without control of the tower, we would have no idea if the phone or the network is at fault.

    You can test this yourself by finding a cell tower near a freeway off-ramp. Perform a speed test around 10am while sitting at the stoplight. You'll have five bars and get excellent throughput. Now do the same thing at 5pm. You'll still have five bars, but you'll probably be getting closer to dialup speeds. The reason being that the people in those hundreds of cars stopped on the freeway are all passing the time by talking, texting, browsing, and probably even watching videos.

    - Matt Humrick, Mobile Editor, Tom's Hardware
    Reply