TRIM Testing: Our Suite Evolves Yet Again
Finally, I want to introduce a new test I've been working on: using JEDEC's 218A consumer workload trace to create a TRIM test. It's not a neatly-packaged little utility you can run at home. Rather, this is a test scripted in ULINK's DriveMaster 2012 software and hardware suite.
DriveMaster is used by most SSD manufacturers to create and perform specific metrics. It's currently the only commercial product that can create the scenarios needed to validate TCG Opal 2 security, but it's almost unlimited in potential applications. There are various hardware components associated with the platform, such as a SATA/SAS power hub that allows the benchmarked drive to be power-cycled independently of the platform. Much of the benefit tied to a solution like DriveMaster is its ability to diagnose bugs, ensure compatibility, and issue low-level commands. In short, it's very handy for the companies actually building SSDs. And if off-the-shelf scripts don't do it for you, make your own. There's a steep learning curve, but the C-like environment and command documentation gives you a fighting chance.
This product also gives us some new ways to explore performance. Testing the TRIM command is just the first example of how we'll be using ULINK's contribution to the Tom's Hardware benchmark suite.
The suite ships with some built-in scripts, but also contains its own scripting language for extensibility and customization. This particular test uses JEDEC's published master trace of consumer I/O activity (similar to our Tom's Hardware Storage Bench trace). The read commands are removed from the trace, leaving write, flush, and TRIM commands. After secure erasure and writing preparatory data, the test commences. The trace is played against the drive four times using NCQ with and without TRIM, and DMA with and without TRIM. IOPS are measured and averaged every 100,000 commands.
On a 256 GB drive, each iteration writes close to 800 GB of data, so running the JEDEC TRIM test suite once on a 256 GB SSD generates almost 3.2 TB of mostly random writes (it's 75% random and 25% sequential). By the end of each run, over 37 million write commands are issued. If that sounds like a lot of storage traffic, it is.
The first two tests employ DMA to access the storage, while the last two use Native Command Queuing. Since most folks don't use DMA with SSDs (aside from some legacy or industrial applications) we don't concern ourselves with those. It can take up to 96 hours to run one drive through all four runs, though faster drives can cut the time in half, roughly. Because so much information is being written to an already-full SSD (the drive is filled before each test, and then close to 800 GB are written per iteration), SSDs that perform better under heavy load fare best. Without TRIM, on-the-fly garbage collection becomes a big contributor to high IOPS. With TRIM, 13% of space gets TRIM'ed, leaving more room for the controller to use for maintenance operations.
Here's the chart derived from our DriveMaster JEDEC TRIM test data. We have the 256 GB SanDisk X210, Samsung's venerable 840 Pro at 256 GB, and Crucial's more mainstream M500 (240 GB). Each device's NCQ-based test is plotted. The solid line represents average IOPS every 100,000 commands, but without TRIM. The hashed line represents performance every 100,000 with TRIM. In each case, the workload is mixed in with tons of small, random writes.
It seems logical that adding TRIM is helpful (depending on when and how a drive prefers to incorporate TRIM functionality). But that's not quite what we see. Crucial's 240 GB M500 doesn't show much gain from the addition of TRIM; both runs hover under 2000 IOPS. Samsung's 840 Pro enjoys substantial gains as as the test drags on. By the end, the 840 Pro is 50% faster with TRIM.
SanDisk's 256 GB X210 is almost as quick as the 840 Pro and M500 combined, though. In this trace-based benchmark, it appears that nCache is in its element, and even without additional over-provisioning, the X210 is smoking-fast. But using TRIM seems detrimental to performance. Either the X210 is tuned to excel in environments that don't support TRIM, or it's fast enough that the overhead associated with TRIM hurts more than it helps. Either way, SanDisk crushes this test. Interestingly, the X210 and M500 share Marvell's storage processor. It's the difference in NAND and firmware that yield the gap we're measuring.