Sign in with
Sign up | Sign in
Six SSD DC S3500 Drives And Intel's RST: Performance In RAID, Tested
By ,
1. Six SSD DC S3500 Drives, Three Configurations, All At 6 Gb/s

Intel hasn't launched any desktop-oriented drives based on its own SATA 6Gb/s controller yet. Those are reserved for some of the company's more enterprise-class SSDs. Both the SSD DC S3700 and S3500, which we reviewed in Intel SSD DC S3700 Review: Benchmarking Consistency and The SSD DC S3500 Review: Intel's 6 Gb/s Controller And 20 nm NAND, benefit from the processor's exceptional consistency, even if that latter model is better suited to read-heavy applications.

We're actually seeing an increasing number of drives similar to the SSD DC S3500, which are built to handle very specific workloads. Comparatively lower prices and solid performance in the applications they're designed for make up for the fact that all-around utility isn't a selling point. Given what you save on the initial investment, it's cheaper to simply tear the old drive out when it fails and pop in a new one. In certain environments, that's actually preferable to spending big on a high-endurance SSD from the outset, which may never need the pricier enterprise-class storage inside.

If you're a well-informed IT manager and you know how your applications behave, then you're in a great position to buy exactly the right solid-state product without overspending or underperforming. And that's where the SSD DC S3000-series drives come in. The aforementioned S3700 can have its entire capacity written to it 10 times a day and still last through its five-year warranty period. Plus, you pay a fraction of the price of SLC memory thanks to the company's carefully-binned HET-MLC flash. Intel's follow up, the S3500, is architecturally similar to the S3700, but much cheaper.

To make the S3500 more affordable, Intel replaces the S3700's special 25 nm NAND with more desktop-oriented 20 nm MLC. HET-MLC is graded relentlessly, so it enjoys superior endurance and wear characteristics compared to the compute-grade flash found on drivers like the SSD 530. But all of that work adds cost. In comparison, the more common 20 nm stuff in Intel's SSD DC S3500 isn't designed to shoulder write-heavy workloads. However, you pay a lot less per gigabyte of capacity. A 180 GB SSD 530 sells for close to the same price as a 180 GB SSD DC S3500.

Not that you'd want to use them interchangeably. The S3500 isn't optimized for low power consumption, nor is it available in three different form factors. In desktop and mobile segments, Intel continues leaning on LSI's SandForce controllers. Intel's SSD DC S3500 sports power loss protection, end-to-end data protection, and encryption, which are features that matter more in a business. Of course, that enterprise space is where you'd expect to find multiple SSDs being used together, either to multiply performance, to back each other up, or to increase capacity. And now that Intel's Haswell architecture is available in the Core and Xeon families, we have a new crop of platform controller hubs available with six native SATA 6Gb/s ports.

It just so happens that the company lent us six 480 GB SSD DC S3500s for testing. We already have a good idea what one can do on its own, based on our review. What happens when you sling six of them together, hooked right up to one of Intel's PCHs, though?

2. Our Haswell-Based Storage Platform: ASRock C226 WS and Xeon E3-1285 v3

Intel's Lynx Point PCH showed up on the desktop as the 8-series and in server/workstation spaces as the C220-series. It's a lot like the core logic that preceded it, except for its more modern storage controller and additional USB 3.0 connectivity.

The CPU interfaces with the PCH through four second-gen PCIe lanes referred to as the DMI, enabling 4 GB/s of bi-directional bandwidth. That's where Lynx Point's sextet of 6 Gb/s ports live and breathe. At least on paper, six capable SSDs will probably be constrained by that vital link. 

And so as we explore performance, we'll keep the platform's artificial ceiling in mind, too.

Not just any LGA 1150-equipped motherboard will work for what we want to do. Not every one features all six 6 Gb/s links, to begin. It'd also be helpful to have a third-party storage controller on-board for boot and optical drives, saving Intel's native connectivity for testing.

ASRock sent along its C226 WS, which does exactly what we need it to. In fact, we get a massive 10 SATA 6Gb/s-capable ports, six from Intel's PCH and four from a pair of Marvell 9172 controllers.

The board also employs a text-based UEFI, full support for Intel's virtualization features, and a good mix of PCI Express slots. It's also a natural fit for Haswell-based Xeon E3-1200 v3-series CPUs. From there, we need the right model. Dual-core processors don't have the muscle for wailing on SSD arrays, so you're left with the quad-core chips.

We're using the Xeon E3-1285 v3, with four cores and Hyper-Threading support. A base clock rate of 3.6 GHz jumps up to 4 GHz via Turbo Boost when the thermal headroom allows. Going the Xeon route means giving up overclocking. However, ECC-capable memory support and HD Graphics P4700 are more workstation-oriented features anyway. The top-end Xeon boasts 8 MB of shared L3 cache and an 84 W TDP.

What About The Operating System?

Windows isn't always the best environment for benchmarking storage with the potential for high I/O performance. Linux is really preferable, and not just because it offers so much flexibility. You also have more efficient I/O schedulers (and more options for configuring them). That doesn't mean you'll always see different results from Linux; you just might not need as much processing horsepower to get there. Small-block random workloads can fully load a modern quad-core CPU, so efficiency is always a boon.

With that said, our many days of testing RAID using Intel's Rapid Storage Technology software all happened in Windows (not just Windows 7, but also the 8.1 Preview and a pair of Server releases). At the end of our experimentation, we settled on Windows 7 though. As of right now, I/O performance doesn't look as good in the latest builds of Windows.

There is one situation where we're publishing results from CentOS 6.4 instead. This is an enterprise-grade Linux distro that simplifies some of the variables that are harder to manage in Windows. Largely to make life easier, the last page of testing is all Linux-based.

Test Hardware
ProcessorIntel E3-1285v3 Xeon (Haswell), 22 nm, 3.6 GHz, LGA 1150, 8 MB Shared L3, Turbo Boost Enabled
MotherboardASRock C226 WS, ATX Worstation, BIOS Rev: 1.00
MemoryCrucial Ballistix Sport 16 GB (2 x 8 GB) DDR3-1600 1.5 V
System Drive Crucial M500 120 GB SATA 6Gb/s, Firmware: MU02
Benchmarked Drives
6 x Intel SSD DC S3500 SATA 6Gb/s, Firmware: 0306
Graphics
Intel HD Graphics P4700
Power Supply
Seasonic X-650, 650 W 80 PLUS Gold
Chassis
Lian Li Pitstop
RAID
LSI 9266-8i PCIe Gen2 x8, FastPath and CacheCade AFK, Firmware: 3.270.65-2578
HBALSI 9207-8i PCIe Gen3 x8 HBA
System Software and Drivers
Operating
System
Windows 7 x64 Pro SP1
DirectX
DirectX 11
Drivers
Graphics: Intel 9.18.10.365
RST: 12.6.1033
IMEI: 9.0.0.1287
Generic AHCI: MSAHCI.SYS
Marvell 6Gb/s: 1.2.0.1032
Benchmarks
Tom's Hardware Storage Bench v1.0
Trace-Based 
Iometer 1.1.0# Workers = 2, 4 KB Random: LBA=100% Varying QDs, 128 KB Sequential, 4 KB Randoms, Exponential QD Scaling
FIO2.0.14
3. Intel Rapid Storage Technology Gets More Useful With RAID

Because we're benchmarking under Windows, we get some help from Intel's RST software. Rapid Storage Technology is both an AHCI and RAID driver, now facilitating the creation of RAID volumes directly. Yes, that's a fairly standard feature available from almost every hardware-based solution. But it's notable progress from the ever-evolving RST package.

If you're booting from a RAID array, you typically have to build the volume from the controller's firmware. Otherwise, you can do it in Windows for ease of use.

In the past, we would have built our array through Windows' Administrative Tools module in the Control Panel. This is even more simple through RST, though. First, you choose a volume type. RAID 0 yields maximum performance and usable capacity. RAID 1 enables mirroring, or what Intel refers to as real-time data protection. The RAID 5 option facilitates data protection as well, but makes more efficient use of capacity by reserving one drive for parity information. Finally, RAID 10 balances performance and protection by striping and mirroring.

From there, you pick the drives to include and button finalize the process. An Advanced tab lets you pick strip size and specify the array's capacity.

Because the hardware isn't particularly feature-rich, there aren't a ton of configuration options, and we're alright with that. We would prefer keying in the array's capacity manually though, rather than through a slider that makes precise selections difficult. With the creation process behind us, the Status tab shows six 447 GB SSDs hanging off the PCH in a RAID 5 configuration, and that it's in the process of initializing.

Of course, most folks will skip RAID 5 when they're using SSDs. RAID 0 offers something far more tangible with six 6 Gb/s ports: TRIM support. Intel's SSD Toolbox Optimizer works in Windows 7 too, with Microsoft's built-in Optimizer for Windows 8/8.1 taking its place in newer versions of the operating system. Should you wish to break an array or secure erase its member drives, remember that Windows 8 and Intel's Toolbox aren't compatible.

Overall, Rapid Storage Technology makes storage management easier than before. Cache options and Dynamic Storage Accelerator options are adjustable with a single click. The former is particularly important, since you probably want to avoid caching with an SSD-based RAID array. The latter is also interesting. As it happens, Dynamic Storage Accelerator is a Windows 7 setting that helps mitigate the deleterious performance tendencies of a platform hellbent on power efficiency. There are various states up and down the spectrum where I/O activity is detected. System power settings are adjusted on the fly to help coax more performance from the storage subsystem, offering Windows 7 users with Haswell-based platforms the option to keep power-saving settings enabled without wasting any speed.

4. Results: JBOD Performance

The first thing we want to establish is how fast these SSDs are, all together, in a best-case scenario? To that end, we'll test the SSDs in a JBOD (or "just a bunch of disks") configuration, exposing them to the operating system as individual units. In this case, we're using the C226 WS's six PCH-based SATA 6Gb/s ports. Then we test each drive in Iometer independently by using one worker per SSD. In this way, we catch a glimpse of maximum performance in RAID, without the losses attributable to processing overhead.

That's all well and good, but what do we actually learn? We basically establish a baseline. Do we hit a ceiling imposed by the platform's DMI? Does this limit sequential throughput or random I/Os? This is the optimal performance scenario, and it lets us frame our discussion of RAID across the next several pages.

Sequential Performance

Right off the bat we see that the C226's DMI link restricts the amount of throughput we can cram through the chipset. In theory, each second-gen PCIe lane is good for about 500 MB/s. Practically, that number is always lower.

With that in mind, have a look at our bottleneck. With one, two, and three drives loaded simultaneously, we see the scaling we expect, which is a little less than 500 MB/s on average. Then we get to four, five, and six drives, where we hit a roof. That's right around 1600 MB/s for reads. Really, we weren't expecting much more, given a peak 2 GB/s of bandwidth on paper.

The write results are similar, though the ceiling drops even lower. With four, five, and six drives churning at the same time, we get just over 1200 MB/s. Fortunately, most usage scenarios don't call for super-high sequential performance (even our FCAT testing only requires about 500 MB/s for capturing a lossless stream of video at 2560x1440).

Random Performance

A shift to random 4 KB performance is informative, involving more more transactions per second and less bandwidth. One hundred thousand 4 KB IOPS translates into 409.6 MB/s. So, when total bandwidth is limited (as it is today), we won't necessarily take it in the shorts when we start testing smaller, random accesses. Put differently, 1.6 GB/s worth of read bandwidth is a lot of 4 KB IOPS.

Sure enough, benchmarking each S3500 individually demonstrates really decent performance. With a single drive, we get up to 77,000 read IOPS with this particular setup. It's still apparent that the scaling isn't perfect, though. If one SSD gives us 77,000 IOPS, six should yield 460,000. Even still, performance still falls in the realm of awesome as six drives enable 370,000 IOPS.

But wait. Remember when I said that we shouldn't be throughput-limited during our random I/O testing? I lied. When you do the math, 370,000 IOPS is more than 1.5 GB/s. So, it's probable that more available bandwidth would yield even better numbers.

Other factors are naturally at work, too. It takes a lot of processing power to load up six SSDs the way we're testing them. Each drive has one thread dedicated to generating its workload, and with six drives we're utilizing 70% of our Xeon E3-1285 v3 to lay down the I/O. The CPU only has four physical cores though, so there could be scheduling issues in play as well. Regardless, the most plausible explanation is that the chipset's DMI is too narrow for our collection of drives running all-out. 

Moving on to 4 KB random writes, we get more of the same. One 480 GB SSD DC S3500 gives us a bit more than 65,500 IOPS. All six running alongside each other push more than 311,000 IOPS.

We now have some important figures that'll affect the conclusions we draw through the rest of our testing, and there are definitely applications where this setup makes sense. If you're building a NAS using ZFS, where each drive is presented individually to the operating system, this is an important way to look at aggregate performance. Of course, in that environment, it'd be smarter to use mechanical storage. Our purpose is to tease out the upper bounds of what's possible. Let's move on to the RAID arrays.

5. Results: RAID 0 Performance

RAID 0 offers scintillating speed if you're willing to compromise on reliability. Striping also yields the best capacity. In our case, that's almost 3 TB of solid-state storage presented to the operating system as a single logical volume. Just because we get 100 percent of each SSD's capacity doesn't mean we can harness 100 percent of its performance, though.

That's especially true of Intel's implementation, since all of the RAID calculations are performed on host resources. On a dedicated RAID card, you have a discrete controller offloading the storage load. I'm simply hoping that a modern Haswell-based quad-core CPU won't have any trouble feeding six 6 Gb/s SSDs.

The test methodology differs from what we did on the previous page. Since RAID 0 requires at least two drives, we create a two-drive array, resulting in one logical volume. Then we alter the workload. For JBOD testing, we used one thread to load each drive with I/O. This time, we use two workers on the logical drive, scaling from a queue depth of one to 64. Two threads applying one command at a time gives us a total I/O count of two. Or, if we push a queue depth of eight through each thread, we get an aggregate queue depth of 16. One thread tends to bog down under more intense workloads. So, multiple threads are used to extract peak performance. But because we want the comparison to be as fair as possible, we're using two threads for each config.

Sequential Performance

Without question, sequential speed is the most obvious victim of a controller that doesn't have enough bandwidth to do its job. Compare the single-drive performance to many drives in our read and write tests for a bit of perspective. Switching over to this chart format makes it easier to illustrate the throughput ceiling, so bear with the variety in graphics for a moment.

Starting with reads, one SSD tops at 489 MB/s. Two striped drives yield 925 MB/s, and three deliver up to 1300. After that, striping four or more SSD DC S3500s on the PCH's native ports is mostly pointless. Instead, you'd want a system with processor-based PCI Express and a more powerful RAID controller to extract better performance. 

The same goes for writes, which top out at a maximum of 1282 MB/s. With this test setup, one SSD DC S3500 writes sequentially at an impressive 92% of its read speed. That drops to 82% with four or more drives in RAID 0.

Random Performance

So, if total throughput is limited, diminishing the fantastic sequential performance of these drives, switching to small random accesses should help us get our speed fix, right? Yes and no. As I already mentioned, 100,000 4 KB IOPS is just under 410 MB/s. Theoretically, we should be able to push huge numbers under that vicious bandwidth ceiling. Except, small accesses really tax the host processor. Even achieving such an aggressive load can be difficult, since each I/O creates overhead.

On the previous page, we used one workload generator on each drive. For everything else, we need to even the playing field as much as possible, which takes us to places where choosing two threads to test each RAID configuration yields sub-optimal results. We could use trail and error to figure out which setup maximizes performance with each combination of drives. However, that'd cause our test with four SSDs in RAID 5 to differ from the same number of drives in RAID 0, which isn't what we want to do. So, we need to compromise by finding a setup that works well across the board and stick to it. 

Sure enough, we fall short of the potential we saw during our JBOD-based tests, which is what we expected. Two SSDs in RAID 0 deliver more than 150,000 IOPS. From there, scaling narrows quite a bit, and the four-, five-, and six-drive arrays don't improve upon each other at all. But that's not their fault. When we do the math, 300,000 4 KB IOPS converts to more than 1.2 GB/s. Factor in our workload setup and platform limitations; we're getting about as much performance from these drives as we can hope for.

On the left side of the chart, all of the tested configurations are pretty similar. Increasing the number of drives in the array doesn't do a ton. Though, even with two drives, 50,000 I/Os with two outstanding commands isn't a bad result. Perhaps the best part is that, with two drives, you can get almost 100,000 IOPS with four commands outstanding. Compare that to one high-performance desktop-oriented drive that only sees those numbers with a queue depth of 32. The RAID solution just keeps on scaling as the workload gets more intense.

Just like in the graphics world, multiplying hardware doesn't guarantee perfect scaling. JBOD testing helps show that one drive's performance doesn't affect another's. In RAID, however, the slowest member drive is the chain's weak link. If we were to remove an SSD DC S3500 and add a previous-gen product, the end result would be six drives that'd behave like that older repository.

What does that mean when our drives are working together in RAID? Well, not every drive is equally fast all of the time. If the workload is striped across them, then one drive may be slightly slower just then. Latency measurements tell the tale, with the briefest 1% of the I/O during a small slice of time perhaps taking 20 millionths of a second to complete, while at the other end of the spectrum taking 20,000 µs for a round-trip operation. Most of the time, average latency is fairly low, so most RAID arrays employing identical SSDs are are well-balanced.

This is one reason why consistency matters. If a service depends on storage subsystem to deliver low-latency performance, big spikes in the time it takes to service I/O can really affect certain applications. Some SSDs are better than others in that regard, which is one reason the SSD DC S3700 reviewed so well. In contrast, the S3500 was created for environments where higher endurance and a tight latencies take a backseat to value.

The story is the same for writes. Given six SSDs, you can get up to 276,000 4 KB IOPS. Otherwise, scaling slows way down as you stack on additional drives. As with reads, the performance at lower queue depths is great, and all configurations build momentum until a queue depth of 32, where they all seem to level out. Remember that a SATA drive with NCQ support only has 32 positions in which to store commands.

6. Results: RAID 5 Performance

RAID 5 can sustain the failure of one drive. RAID 6 (which Intel's integrated controller does not support) will keep an array up even after two failures. Of course, if you build a volume using SSDs, RAID 5 will cost you one drive worth of capacity. Using a trio of 480 GB SSD DC S3500s, losing a third of the configuration hurts. Giving up one drive out of six is less painful. But as you add storage to the array, the percentage of capacity lost to parity goes down.

Typically, writing the extra parity data also means that performance drops below single-drive levels, particularly without DRAM-based caching. Intel's Rapid Storage Technology relies on host processing power and not a discrete RAID controller, but it can help speed up writes substantially (particularly sequentials), depending on how caching is configured. With that said, enabling caching is far more helpful on arrays of mechanical disks. Why? Random writes are literally hit or miss. If data is in the array's cache, it's serviced at DRAM speeds. If not, latency shoots up as the I/O is located elsewhere. That penalty doesn't affect most hard drive arrays, but it slows down SSDs more dramatically.

It's somewhat of an issue, then, that read and write caching cannot be fully disengaged with Intel's RST in RAID 5. For more potential speed, disk drives would read data near a requested sector (on the way to the needed data) and toss that information in a buffer with the hope that, should it be called upon, it'd be ready more quickly. That's plausible in a sequential operation, but a lot less so when it comes to random accesses. As it happens, this same principle applies to RAID arrays. Write caching can be disabled for data security reasons, but RST always has some form of read-ahead enabled, passing data along to a RAM buffer on the host for use later. A hard drive's buffer typically holds this information, but a RAID setup passes the data along to the controller.

What does that end up looking like? Over the last two pages, I've tried to drive home that you have up to about 1.6 GB/s of usable throughput with Intel's PCH. Now, consider this chart:

With the previously discussed read-ahead behavior passing un-requested, adjacent data along to the host system's memory, sequential reads enjoy a significant boost, but only while the DRAM cache lasts. As you can see, read speeds from a three-drive RAID 5 array reach a stratospheric 3 GB/s. The data isn't coming from the SSDs, but rather our DDR3. The caveat is that getting such a notable boost requires significant drive utilization. It's a lot easier to stack commands on a hard drive. But it's a lot more challenging to do this in the real world on an SSD, since they service requests so much faster.

7. Results: A Second Look At RAID 5

Sequential Performance

Now that we've illustrated the read-ahead tendencies of Intel's RST, it's easier to put RAID 5's sequential scaling in context. The journey begins at 1.5 GB/s, which is right where we'd expect given the chipset's DMI link. But as the outstanding command count increases, read caching kicks into gear, pushing each array up to 3 GB/s. Depending on how long each queue depth is measured, it's probable that a longer benchmark would average down the caching effect somewhat.

It's also interesting that the three- and four-drive arrays start caching and peak earlier, while the five- and six-drive setups hit their ceilings later.

Write performance is traded off for RAID 5's data protection. Nevertheless, Intel's built-in SATA controller can achieve decent results, even if they're not always consistent.

The four- and six-drive setups deliver the strange results in this benchmark. Notice how much faster three SSDs are than four. Stranger still, the six-drive array sneaks up on five SSDs, though it falls short at low and high command counts.

Random Performance

This is what the random performance in a RAID 5 configuration looks like scaling from an OIO count of 2 through 128.

Nothing like the first two charts, right? With three, four, five, and six drives in RAID 5, every run looks the same. Adding more drives doesn't affect read IOPS in this setup. We could probably experiment with synthetic workloads to amplify any small difference between the arrays, but that's a significant challenge. Since we aren't seeing any scaling, it's possible that we're simply dealing with a bottleneck not necessarily related to throughput or host performance.

The same situation surfaces with 4 KB writes. From a total outstanding I/O count of 2 to 128, performance starts far lower than a single drive and ends up there, too. Ten-thousand IOPS is approximately one-third of an SSD DC S3500's specification at a queue depth of one. Random writes will always present a challenge in RAID 5, though. It's just hard to write small chunks of data to random addresses while managing parity data.

8. Mixing Block Sizes And Read/Write Ratios

So far we've only tested simple workloads, including 4 KB random access and sequentially-organized data. A storage device can perform read or write operations, so a 70/30% split really just means that 70 out of every 100 I/O operations were reads, while the other 30 were writes. Most of the time, workloads aren't read- or write-only, though. Neither do they consist of just one or two block sizes. Instead, a majority of workloads are a mix of reads and writes, with block sizes between 512 bytes and 1 MB.

And just in case you missed the disclaimer on the second page, the testing we're doing right here happens under Linux (CentOS 6.4 and FIO, to be exact). If you look carefully, you'll notice that performance, overall, appears higher than the Windows-based benchmarking.

Mixing Reads, Writes, And Block Sizes

It's good to measure IOPS at different block sizes, with a mix of reads and writes. We know that serving up a ton of 4 KB blocks is more difficult than 128 KB chunks of data, typically because achieving the same throughput with smaller accesses involves a lot more overhead. The relationship between IOPS and bandwidth is pretty simple though:

Block Size
Bandwidth at
1000 IOPS
4 KB / 4096 B
4000 KB/s
8 KB / 8192 B
8000 KB/s
16 KB / 16384 B
16,000 KB/s
32 KB / 32768 B
32,000 KB/s

If an imaginary SSD maxes out with 10,000 8 KB IOPS, then a 16 KB workload will probably register around 5,000. The amount of bandwidth stays the same at 80,000 KB/s, or around 80 MB/s (we'd actually have to divide the KB/s figure by 1024 to get real MB/s). With that in mind, check out the next chart:

We're exposing all six SSD DC S3500s in RAID 0 to one minute of various block sizes and read/write patterns, and then showing the results in IOPS. Check out the 4 KB line in purple. It starts at 100% read and ends at 100% write, hitting five blends in between. We do that for eight block sizes at seven read/write ratios for a total of 56 data points. Each one is the average IOPS generated over one minute, so it takes less than an hour to benchmark the array this way. The paradigm is similar to our earlier testing with two threads. This time, though, each thread generates a queue depth of 32, yielding a total outstanding I/O queue of 64.

You'll notice that the yellow line, which represents 512-byte blocks, begins at 350,000 IOPS and drops significantly before gaining some ground back as the workload switches to primarily write operations. That's actually common for 512-byte I/O, which are less than optimal for flash-based solutions that prefer 4 KB aligned accesses. A 512-byte access is, by definition, misaligned. So, most SSDs and RAID arrays choke on them to a degree.

Next, the 4 KB line (in purple) begins at 300,000 IOPS, and falls gradually until we push a 5/95% mix, where it edges up ever so slightly again. Most SSDs don't write as quickly as they read, so it's natural to expect that line to slope downward.

Last, we'll look at where each block size lands in the hierarchy.

Instead of showing IOPS, this chart presents the data in bandwidth form. The larger block sizes generate less bandwidth and consume more CPU resources to create.

Getting back to the yellow line, 350,000 512-byte IOPS sound impressive, but that's only about 87 MB/s. It's not until we hit 32 KB accesses that the array starts touching its bandwidth limit around 1600 MB/s.

This radar graphic is just another way to visualize the data. Starting at the top with 100% reads, the various access blends get more write-heavy traveling clockwise. The 512-byte blocks are in the middle, pushing through the least amount of bandwidth, giving us that tiny yellow bulls-eye. The other block sizes form concentric rings (if you squint). Note that 0% read (or 100% write) is the lowest for all access sizes, while the middle 35/65% to 65/35% mix yields the most bandwidth in the larger blocks.

9. Intel's Integrated Storage: New Hardware, New Software

Even enthusiasts are unlikely to round up six identical SSDs and drop them on a Z87- or C226-based motherboard. But it's nice to know that Intel makes this possible. While there were a lot of folks who felt stifled by the fact that previous-gen platform controller hubs only included two 6 Gb/s-capable SATA ports, a majority still stick with one SSD and one or more hard drives. That's the exact blend of storage we'd recommend, too. 

The latest platforms do support six SATA 6Gb/s ports though, so it is possible to populate them with high-performance storage and realize some very impressive numbers from them. And why not? SSDs continue sliding in price. A sextet of 256 GB drives for under a grand would be pretty mind-blowing in a very high-end machine. Intel's integrated controller supports RAID 0, 1, 5, 10, so you have options for more throughput and data protection, depending on what you need. 

Why not just drop a RAID controller into your Haswell-powered machine? For one, RoCs on add-in cards are not cheap. You could easily double the cost of your storage subsystem by going that route. Then there's the fact that any slot you use attached to the PCH is bound by the same DMI link, limiting bandwidth. Utilizing the processor's PCI Express connectivity automatically peels off at least eight lanes. In a small business server without discrete graphics, that's fine. But in a workstation where you're using a FirePro or Quadro card, it might not be. Just something to keep in mind. Besides, Intel's Rapid Storage Technology offers better performance at low queue depths, whereas high-end RAID cards are typically optimized for more outstanding commands.

ASRock's C226 WS is a great complement for a resilient file system like ZFS, which prefers when drives are presented directly to the operating system, rather than organized in a hardware-based RAID array. Software then manages striping, mirroring, and parity on its own. With six ports from Intel's PCH and four from Marvell controllers, that gives you a lot of options.

Based on our benchmark data, it looks like the best way to use Intel's integrated storage controller is with three high-performance SSDs. That's how you'll get the best bang for your buck in terms of performance scaling. With three 6 Gb/s ports open, you can then add a trio of hard drives for maximizing user data in RAID 5. Wait for 3 TB drives to go on sale, snag three trustworthy SSDs for RAID 0, and enjoy the best of performance, capacity, and value, all without the worry of a bottleneck along Intel's DMI. 

The other good news is that, with multiple SSDs bumping and grinding on a Haswell-based system, you have support for the TRIM command in RAID 0. That's a big bonus. Intel's SSD Toolbox doesn't work under Windows 8 or Server 2012, but the new Windows Optimizer does (which is somewhat like the Intel Toolbox's forced-TRIM optimization).

Of course, Intel's SSD DC S3500 is a sexy beast in its own right, destined for more important roles than kicking out big numbers on a built-in storage controller. Although it doesn't match the S3700's endurance or consistency, it was designed to fill a different role. The S3500 employs the compute-grade flash that goes into Intel's latest desktop-oriented drives, meaning it isn't meant to take writes all day long. Rather, it's best suited to applications that push a ton of reads. In that environment, it does its job well, and for significantly less money than beefier SSDs with HET-MLC, eMLC, or SLC NAND.

The SSD DC S3500s are also completely excellent in RAID. So much so, in fact, that I'm working on a follow-up to this piece. That's a story for another day, though.