Testing Methodology
Collecting accurate, repeatable, and fair results from such a noisy environment requires a strict testing methodology. Here at Tom’s Hardware, we’ve used our knowledge and experience to develop a procedure that minimizes background tasks and strives to create a level playing field for all devices—as well as we can anyway, since there are some variables beyond our control.
Before discussing the details, however, we should answer a more basic question: Where do our review units come from? In some cases, we purchase the products ourselves, but most of the units we review are retail products provided by OEMs.
While Tom’s Hardware attracts readers from all over the world, the main site is based in the United States (there are other sites in the Tom’s family focusing on different regions). Therefore, the devices we test are models intended for sale in the North American market. The increasing importance of markets outside of the US, however, means many OEMs are launching products in these other regions first. Because of the media’s unhealthy obsession with being the first to post, many tech news sites are now reviewing international or even pre-production units running different software and often exhibiting different performance characteristics than the North American retail models. Many of these sites do not even disclose this information. We feel that this potentially misleading practice is not in our reader’s best interest, however. If we do test an international or pre-production unit outside of a full review, we will always disclose this within the article.
Configuration
So, after acquiring our North American retail units, what’s next? The first thing we do is install operating system and software updates. Next, we perform a factory reset to make sure we’re starting from a clean slate. The final step in preparing a device for testing involves diving into the settings menu. We’ll spare you our full list of configuration settings (we go through every possible menu), which are meant to minimize background tasks and keep comparisons as fair as possible, and just show you the most important ones in the table below.
Setting | State | Notes |
---|---|---|
Wireless & Data | ||
Airplane Mode | on | This is an uncontrollable variable because signal strength (and thus power draw) varies by location, carrier, time of day, etc. The cellular radio is powered down to keep device testing fair. |
Wi-Fi | on | Row 2 - Cell 2 |
Bluetooth | off | Row 3 - Cell 2 |
NFC | on | Row 4 - Cell 2 |
Location Services | off | Reduces background activity |
Data Collection | off | Options for improving customer experience and sending diagnostic data, usage statistics, etc. to Google, OEMs, or cellular providers are disabled to reduce background activity. |
Display | ||
Auto or Adaptive Brightness | off | Row 8 - Cell 2 |
Display Brightness | Row 9 - Cell 1 | The screens are calibrated to 200 nits, keeping results comparable between devices. |
Special display settings | off | Row 10 - Cell 2 |
Screen Mode | Row 11 - Cell 1 | Set to basic, native, standard, sRGB, or device default. When testing the display, each mode is tested separately. |
Wallpaper | Row 12 - Cell 1 | Default |
Battery | ||
Battery saving modes | off | Device’s implement different techniques. Turning these off shows battery life without having to sacrifice performance. |
Turn on automatically | never | Row 15 - Cell 2 |
User Accounts | ||
Google, iCloud, Facebook, Twitter, etc. | inactive | All cloud-based accounts (or any account that accesses the internet in the background) are deleted after initial setup. This reduces background activity that interferes with measurements. The only exception is the Microsoft account for Windows Phone which cannot be removed. |
Auto-sync Data | off | Row 18 - Cell 2 |
In order to keep testing fair and results comparable, we strive to make each device’s configuration as similar as possible. Due to differences in operating systems and features, however, there will always be some small differences. In these situations, we leave the settings at their default values.
Devices may also contain pre-installed software from cellular providers and OEMs, introducing more variability. When present, we do not remove or disable this software—it’s usually not possible and the average user will likely leave it running anyway. To help mitigate this issue, we try to get either unlocked devices or devices from carriers with the least amount of “bloatware.”
Testing Procedure
A consistent testing procedure is just as important to the data collection process as our device configuration profile. Since power level and temperature both effect a device’s performance, tests are performed in a controlled manner that accounts for these factors. Below are the main points of our testing procedure:
- The ambient room temperature is kept between 70 °F (21 °C) – 80 °F (26.5 °C). We do not actively cool the devices during testing. While this would further reduce the possibility of thermal throttling affecting the results, it’s not a realistic condition. After all, none of us carry around bags of ice, fans, or thermoelectric coolers in our pockets to cool our devices.
- Smartphones lie flat on a wood table (screen facing up) during testing, with tests conducted in portrait mode unless forced to run landscape by the app. Tablets are propped up in a holder in landscape mode, so that the entire backside of the device is exposed to air. This is to better simulate real-world usage.
- Devices are allowed to sit for a specified length of time after they are turned on to allow initial network syncing and background tasks to complete before starting a test.
- Devices are not touched or moved while tests are running.
- Devices are allowed to cool between test runs so that subsequent tests are not affected by heat buildup.
- All tests are performed while running on battery power. The battery charge level is not allowed to drop below a specific value while performing any performance measurement other than battery life.
Benchmarks are run at least two times and the results are averaged to get a final score. The minimum and maximum values from each benchmark run must not vary from the computed average value by more than 5%. If the variance threshold is exceeded, all of the benchmark scores for that run are discarded and the benchmark is run again. This ensures that the occasional outlier caused by random background tasks, cosmic rays, spooky quantum effects, interference from technologically advanced aliens, disturbances in the space-time continuum, or other unexplainable phenomena do not skew the final results.
Our testing procedure also includes several methods for detecting benchmark cheats.