Page 2:Product Specifications
Page 4:Test Equipment
Page 5:Why Packaging Is Important
Page 6:Four-Corner Testing
Page 7:Mixed Workload
Page 8:Steady State
Page 9:Real-World Software
Page 10:Advanced Workloads
Page 11:Notebook Battery Life
Page 12:Thermal Considerations
Page 13:Final Thoughts
Today we're giving you a glimpse into the software and procedures used to test client solid state and mechanical hard drives. Not every reviewer uses the same methods, so knowing what we present in each review is important as you make your purchasing decision. For the most part, you can take one SSD or hard drive tested by me and compare it to another product that I reviewed. We use a strict benchmarking regimen that ensures true apples to apples comparisons.
In our reviews we publish results from a handful of other products. We try to include performance data from NAND flash manufacturers like Intel, Crucial, SanDisk, Samsung and Toshiba. Not all of the devices under test (DUT) will be available in each review, but at least our approach to evaluating allows you to compare previously-tested DUTs to new products.
We look at performance a number of ways, and also pass judgement over packaging, bundled accessories and even the product specifications. Let's start with what those manufacturer-supplied numbers actually mean.
Companies publish specifications based on out-of-box performance data. The type of information given to the public varies from one manufacturer to another. Even the way specs are generated is not standard. The general rule is four-corner performance: sequential reads, sequential writes, random reads and random writes.
Two methods for determining sequential performance come from easy-to-use software with a GUI interface. ATTO at a queue depth of four or 10 is the old-school method. The utility won't let you test with a single outstanding command though, reporting throughput numbers not often seen in the real world. SanDisk and a few other companies no longer use ATTO and instead prefer CrystalDiskMark to show sequential read and write performance.
Random IOPS can be measured a number of ways. Most companies use Iometer with 4KB blocks at a queue depth of 32. Again, the results appear impressive, but share very little with the drive's real world behavior. We'll go into more depth on this shortly.
There are a few things to keep in mind when it comes to the specifications found on manufacturer websites and product packages. We've already touched on real-world performance and how you shouldn't expect to get relevant data from the company selling you its drive. After all, a typical use case doesn't exist. Even on the same machine, workloads vary from data to day. You especially can't compare one vendor's specs with another's. They use different configurations to generate their data, introducing variance.
We do, however, generate truly comparable benchmark numbers. This requires a strict test regiment. With solid-state drives, any workload you run before a test affects the outcome of whatever you measure next. Naturally, all SSDs start out clean, a condition we call fresh out of box, or FOB.
The FOB state means the controller is able to write to the flash without going through a read, modify and write operation. Once it's filled with data, the drive's controller needs to read a page, modify the data and then write the page again. This occurs even if the change involves just a single cell. The read, modify and write process can double or even triple latency, depending on the type of information being manipulated.
I’m a high-volume reviewer, working on an average of eight products per month. I also test devices still in development, often through several firmware revisions before the SSD launches. To maintain this pace while still providing quality commentary, I need several systems to stay ahead of the queue.
SATA Test System
|Motherboard||Asus Z87 ROG Maximus VI Extreme|
|Processor||Intel Core i7-4770K @ 4.5GHz|
|DRAM||Corsair Vengeance DDR3-1866|
|Graphics||Intel HD Graphics 4600|
|Power Supply||Corsair AX860i|
|Hot Swap Drive Enclosure||Thermaltake MAX-1562|
|Network||Mellanox ConnectX-3 VPI|
|Operating System||Microsoft Windows 8.1 Pro|
I’ve standardized on the above configuration for consumer SSD and hard drive reviews, utilizing four identical systems. These machines are dedicated to benchmarking SATA-based products. They also test enterprise network equipment from time to time. In order to keep them unchanged, I isolate them from the Internet, preventing automatic updates that might affect my results.
PCIe Test System
|Motherboard||ASRock Z97 Extreme6|
|Processor||Intel Core i7-4790K @ 4.5 GHz|
|DRAM||Corsair Vengeance Pro DDR3-1866|
|Graphics||Intel HD Graphics 4600|
|Power Supply||Corsair AX1200i|
|Hot Swap Drive Enclosure||Thermaltake MAX-1562|
|Network||Mellanox ConnectX-3 VPI|
|Operating System|| Microsoft Windows 8.1 Pro|
PCIe based storage is evaluated across a pair of purpose-built systems. The ASRock Z97 Extreme6 motherboard provides a direct PCIe 3.0 four-lane link from the CPU to M.2 interface. This is the ideal way to attach M.2-based storage to a high-performance, consumer-focused PC. These systems are also isolated from the Internet. The operating system configuration and test software are kept consistent between our PCIe- and SATA-based test beds.
I keep a few other systems available for specialty testing, cloning notebook battery life system images to drives and secure erase operations. In all, there are 29 modern systems at my disposal ranging from Sandy Bridge-based notebooks for testing storage products at trade shows to 10 identical dual-Xeon systems for testing network-attached storage (NAS) appliances with 120 clients running in Hyper-V.
We use two different notebooks to measure notebook battery life. Standard 2.5” SATA drives run through a Lenovo T440, one of the few laptops with DEVSLP support. I use a Lenovo X1 Carbon Gen 3 for testing PCIe- and SATA-based m.2 SSDs. The X1 Carbon Gen 3 ships with M.2 storage from Lenovo. There aren't many models with this feature, but that number should increase over the coming months.
Why Packaging Is Important
Most of us tend to shop online for products. Sometimes, though, the need for a product in-hand outweighs our desire to save a few dollars. Either way, the retail packaging is an important consideration, regardless of where you purchase from.
Online orders require shipping, and nothing's worse than waiting on a package to arrive only to find the product damaged. In our consumer reviews, we look at how companies package their retail SSDs and hard disks. Solid-state drives are immune to vibration for the most part. Advances in hard drive technology have increased the amount of vibration allowed with the HDD powered down. But we still like to see some form of vibration-absorbing material used in the package.
With SSDs, performance varies by capacity point. Smaller drives tend to be slower than larger ones, even in the same family. Some vendors publish specifications for each model, but others only list maximum performance from the series, presenting a best-case scenario. The 128 and even 256GB implementations are usually slower than the 512GB and 1TB versions.
When we shop in a retail store, most of us want to see product information. Again, some drive include specifications on the box, while others list the bare minimum. When we talk about what is included or not, we're hoping to compel manufacturers to be more informative with their customers.
The basic four corners of storage testing include sequential reads, sequential writes, random reads and random writes. Not every reviewer or company tackles these the same way.
Sequential data is usually measured with 128KB blocks, though some editors like to use 64KB and others go as high as 8MB blocks. For the most part, we use 128KB, but also publish a single-drive chart that shows a range of block sizes from 512b to 8MB in both sequential and random workloads. This chart also shows queue depths between one and 32.
Random data performance is almost universally measured with 4KB blocks at a queue depth of 32. This metric shows what manufacturers want end users to see, though it doesn't accurately reflect real-world performance. We show random performance with 4KB blocks at several queue depths ranging from one to 32 for most devices. PCIe-based products scale well past this queue depth, so we go as high as 128 in some tests.
In each review, we show a comparison between sequential reads and writes at a queue depth of two. We also break random read performance into groups on a bar graph at each queue depth. These random 4KB charts are divided into high and low queue depths.
The general consensus on mixed workloads includes 80% reads in client environments and 70% reads for workstations.
SATA-based devices are half-duplex; they can only read or write at a time, not both. Products based on the SCSI command set (including SAS) are full-duplex; they can read and write simultaneously. Full-duplex devices fare much better in mixed-workload environments.
Boot drives are subjected to mixed workloads since the system is constantly reading and writing small pieces of data. When you start an application, the software opens as a series of reads, but also logs (writes) data to the host. And this happens hundreds of times per minute.
Secondary drives used for bulk storage change the read write ratio. They do not log operations, but read and write when transferring files to and from the system. Most secondary drives hold data that is transferred sequentially. Movies, music, picture collections and other media files make up the bulk of secondary storage.
In the next section, we'll look at different transfer ratios and how sequential data reacts while multitasking in a secondary environment.
Steady state performance is often associated with enterprise workloads. For the most part, that is where I think it can stay. Client SSDs spend most of their time idling. The TRIM command, garbage collection and wear-leveling schemes have a chance to clean the NAND cells, which are kept ready for fresh writes.
The two images above are what we've come to associate with steady state performance. In a client environment, you never write 4KB blocks to your SSD for hours at a time. The first chart shows the second pass, not even the initial pass with clean cells available to absorb the write load. The second chart is what we are most interested in looking at. It illustrates your span of random performance in a worst-case scenario. Ideally, you'll see high IOPS throughput and a consistent flow of data, without much deviation.
There are some instances when steady state performance data is more relevant, such as prosumer workloads. Sequential mixed workload steady state testing shows us how a drive behaves after heavy multimedia editing on a secondary drive. Since we don't know what everyone's typical mix is, we show everything from 100% read to 0% read (which is 100% writes).
Once we get past the synthetic tests that measure the extreme corners of performance, we move into testing storage traces from real-world software. Our storage traces come from Futuremark and are part of the PCMark 8 suite.
PCMark 8's standard storage test leverages a number of real-world applications. The software runs and its I/O traces are recorded. PCMark 8 then plays the traces back on your computer, just as if you were running the workload in real-time. The benchmark also plays back the data stops, just as they'd appear with you running the workload. This is the most advanced test available for reproducing such a wide range of real-world software.
Futuremark PCMark 8 Storage Test
|Sequential Reads||Random Reads||Sequential Writes||Random Writes||Data Read||Data Written|
|World of Warcraft||1415||14927||10||659||390MB||5MB|
A standard run gives us a result for each individual test in the form of service time. More often than not, these numbers only demonstrate small differences between premium and value-oriented products. This happens in the real world, too.
PCMark also gives us a breakdown, conveying the average throughput of all tests. This result shows us a wider range with all of the software workloads combined. The single results are misleading since they capture a moment in time. But the final throughput number is an average of around one hour worth of work.
It wasn't enough for Futuremark to release the best traces we've benchmarked. The company went on to produce the best client storage metric ever created. The Storage Consistency Test works a drive though several stages of performance. For years, we knew that SSDs should be evaluated in three dimensions. The 2D look yields a basic picture of performance, but lacks the depth of steady state and performance recovery.
- Write to the drive sequentially up to its reported capacity with random data, write size of 256*512=131072 bytes.
- Write through a second time (to take care of over-provisioning).
- Run writes of random size between 8*512 and 2048*512 bytes on random offsets for 10 minutes.
- Run performance test (one pass only). The result is stored in secondary results with name prefix degrade_result_X where X is a counter.
- Repeat steps one and two eight times, and on each pass increase the duration of random writes by five minutes.
- Run writes of random size between 8*512 and 2048*512 bytes on random offsets for final duration achieved in degradation phase.
- Run performance test (one pass only). The result is stored in secondary results with name prefix steady_result_X where X is a counter.
- Repeat steps one and two five times.
- Idle for five minutes.
- Run performance test (one pass only). The result is stored in secondary result with name recovery_result_X where X is a counter.
- Repeat steps one and two five times.
- Write to the drive sequentially up to its reported capacity with zero data, write size of 256*512=131072 bytes.
The performance consistency test involves 18 individual runs using the same workloads as the standard test. The result is one long text file with several useful bits of data. We use the overall throughput from each combined test run and the overall latency.
Notebook Battery Life
At this time, we use Bapco’s MobileMark 2012 v1.5 for our notebook battery life test. Bapco recently released MobileMark 2014, and we will eventually move to the new software after sorting out a few issues. MobileMark 2012 v1.5 ships with three test scenarios: Office Productivity, Media Creation/Consumption and Blu-ray. We use the Office Productivity benchmark exclusively.
We use two separate systems to run MobileMark 2012 v1.5. The first is a Lenovo T440, which tests 2.5" SATA HDDs and SSDs. It also allows us to benchmark mSATA SSDs.
The second system is a third-gen Lenovo X1 Carbon, which ships with M.2 storage for testing both SATA- and PCIe-based devices. Notebook battery life and performance results are not comparable between the two systems. At this time, we haven't found a single notebook that allows us to test all formats in the same machine.
MobileMark 2012 v1.5 installs and/or uses the following 13 applications:
- ABBYY FineReader Pro 11
- Adobe Acrobat Pro X
- Adobe Flash player 11
- Adobe Photoshop CS5 Extended 12.04
- Adobe Photoshop Elements 10
- Adobe Premiere Pro CS 5.5
- CyberLink PowerDVD Ultra 11
- Microsoft Excel 2010 SP1
- Microsoft Internet Explorer 9 (or newer if already installed)
- Microsoft Outlook 2010 SP1
- Microsoft PowerPoint 2010 SP1
- Mozilla Firefox 14.0.1
- Winzip Pro 16
In order to keep testing consistent, each notebook needs fresh batteries after just ten tests. This comes out to a new battery once every two months on average. To maintain consistent results, we use genuine Lenovo six-cell batteries for the T440 and Lenovo's internal battery for the X1 Carbon.
When finished, we end up with two numbers. The first is a measurement in minutes, which tells us how long the notebook was powered on. The second one is a performance rating. In a low-power state, the notebook reduces bandwidth and clock rates on several components. The SATA bus, as well as the CPU, GPU, DMI link and DRAM drop to lower speeds to increase battery life. Our performance rating indicates efficiency when available power is the limiting factor.
From time to time we'll publish an image of a bare PCB with a thermal camera. We don't do this in every review, mainly just when new SSD controller comes to market. To show the range of heat generated, we publish two images: one with the drive at idle for 10 minutes and another after writing 4KB blocks for 10 minutes.
In some environments, you might not want a solid-state drive that hits 114 °C under heavy load.
NAND flash operates best within a certain temperature range. It's still capable of accepting writes at the upper end of that spectrum, but endurance suffers. Even operating flash at high temperatures can cause issues with long-term reliability. NAND consumes power, so it does generate a small amount of heat. But most of an SSD's thermal energy comes from the controller. We look at the design to see if the manufacturer places the flash far enough away from its processor.
Several exciting changes will affect the client storage market this year. SSDs are on track to receive a higher-performance interface and simplified set of commands. At the same time, flash is growing up (literally). These advancements will divide the market. While lower-cost products rival mechanical storage on the value front, high-performance products will allow new applications to thrive.
The two buzzwords for 2015 are Non-Volatile Memory Express (NVMe) and 256-bit 3D NAND. NVMe is a set of commands that unbinds NAND from the limitations of the Advanced Host Controller Interface. AHCI was introduced as the register-level interface for SATA. When SATA was introduced, flash in the densities we have today wasn't on the horizon. Back then, hard drives were going to rule for decades. Of course, their mechanical nature capped performance, limiting the utility of deep queue depths. SATA capped native command queuing at 32 queues of one command (far more than was needed). NVMe increases this limit to 64,000 queues, and each queue can sustain up to 64,000 commands.
NAND flash is advancing, too. Improvements in manufacturing technology already enabled the first 3D V-NAND from Samsung. IMFT will follow with 3D flash by mid-2015, and it's rumored that we'll see 256Gb densities. Overnight, 1TB SSDs will turn into 2TB SSDs. The manufacturing costs should be equal once the dust settles, so your wallet won't suffer when it comes time to step up capacity.
Also on the flash front, expect more three-bit-per-cell flash, also referred to as TLC. Many of the charts in this editorial show an unbranded SSD under the SMI SM2256 name. This is a R&D board from Silicon Motion with a new controller that should hit the market in a few months. It's designed to support cheap TLC flash with P/E cycles as low as 500. Advanced LDPC algorithms are expected to extend low-cost flash life to the levels we enjoy today. So, by the end of 2015, 256GB SSDs may sell for as little as $50.
The new high-performance products will definitely require some tweaking to our test methods. But the low-cost stuff probably will as well. Faster storage is expected to hit ceilings limited by its host interface, PCI Express. Currently, that means a four-lane PCIe 3.0 link, or 32Gb/s. PCIe 4.0 isn't too far off either. LDPC will adapt to changing flash as the medium wears. If an error occurs, the controller goes back to reread the flash cells. This will increase latency.
We are already seeing the effects of low-cost TLC flash causing issues with performance loss in Samsung's 840 EVO. That drive's 1xnm NAND shows signs of read retries after just a few months of data at rest. If Samsung shuffles the information too often, the product won't satisfy its minimum warranty standards. It's a tough position to be in for the best-selling consumer SSD of all time. In the coming months, we'll roll out a new test for the condition in question. And because this is a living document, we'll fill you in first.