Results: 4 KB Random Performance Scaling In RAID 0
Thread Count vs. Queue Depth
If we were just testing one SSD, we could simply step through increasing queue depths. One system process would stack commands, creating various QDs. That works well for SATA-enabled SSDs, particularly since there isn't much point to testing them at queue depths greater than 32. That all goes out the window when it comes to testing PCIe-based SSDs and RAID arrays. Eventually, workload generators begin running out of steam at higher QDs with only a single thread. Instead, we have to run multiple threads at multiple QDs to extract maximum performance.
It helps to have a visual representation this dynamic in action, and that's exactly what this chart represents. We start with a queue depth of one using one thread, and eventually arrive at 32 simultaneous threads each bombarding the array at a QD of 32. With just one thread, a 4K random write peters out just north of 100,000 IOPS. If we throw 32 threads at the array, we can beat 1,000,000 IOPS. That's why we're locking the thread count down at 32 and varying queue depth from here on out.
4 KB Random Read Performance Scaling in RAID 0
One thing you definitely want to see in a RAID array is scaling. As the number of drives increases, we want to observe a commensurate increase in performance. Then, as we apply a more demanding workload, we want performance to rise in tandem with intensity. That shouldn't be too hard to accomplish in RAID 0. After all, there isn't much overhead, and no pesky fault tolerance to slow us down.
Right out of the gate, we can glean a few tidbits from this chart. First, getting to 1,010,755 IOPS with a 24-drive RAID 0 array is extremely satisfying on a personal level. That's almost exactly 4 GB/s of throughput. Second, the gulf between 8 x SSD DC S3700s and 16 drives is enormous. It's almost 100%, or 400,000 IOPS. That's also the same percentage increase from four to eight drives. The bump from 16x to 24x should be close to 50%, but we're clearly hitting a bottleneck, achieving a comparatively modest 25% increase over the 16-drive array.
That's still excellent scaling in the grand scheme of things. It's completely unreasonable to expect exactly 4x/8x/16x/24x the performance of one drive. When we divide our peak 24-drive random read results by the number of member SSDs, each SSD DC S3700 contributes an astounding 42,114 IOs every second. A single 800 GB S3700 is rated for 76,000 4 KB read IOPS. So, at first glance it looks like we're losing a ton of performance. But considering the realities of scaling, we're still happy if we only see half of that peak. The fact that we get 42,000 IOPS/drive with 24 SSDs and 50,000 with 16 or less attached is pseudo-miraculous.
If your application requires tons of flash and you can supply a beefy workload, this is a viable option. Of course, large RAID 0 arrays expose you to a significant risk of failure over a period of years. But being responsible is rarely this fun.
4 KB Random Write Performance Scaling in RAID 0
When we switch to writes, we get more of the same awesomeness.
The fantastic scaling is still apparent, and the parity between read and write performance yields results just a few percentage points away from our read numbers (that is, except for the four-drive array, which doesn't lose anything compared to the read results).
This is a good place to point out that these are basically out-of-the-box scores. We haven't put enough writes on the SSDs to get them into a state where garbage collection is necessary. Limitations on the amount of time Intel could let us keep these things made drawn-out write sessions impractical. However, these 800 GB flagships are steady-state rated at 36,000 IOPS. So, we wouldn't expect a massive performance drop. The 24-drive array already peaks north of 980,000 IOPS, giving us around 41,000 write IOPS per SSD. We could conceivably rip off 860,000 4 KB write IOPS for years with a 24-drive RAID 0 array. How cool is that?
If these were conventional desktop drives, we'd want to over-provision them in an array. Without TRIM, performance could be expected to degrade substantially, and the only preventative measure would be sacrificing usable capacity to maintain speed over time. Intel's SSD DC S3700 line-up, and most other enterprise-oriented SATA drives, are over-provisioned for a reason. In the 800 GB model's case, only 745 GB of 1024 GB is usable. In exchange, though, you get consistent steady-state performance. Consequently, this also translates into lower write amplification, hence the 10 full random drive writes per day over five years that Intel claims the S3700s can endure.