Sign in with
Sign up | Sign in

Z97 Express: The Same Old Bandwidth Limitations

A 1400 MB/s SSD: ASRock's Z97 Extreme6 And Samsung's XP941
By

Not surprisingly, bandwidth through the Z97 Express platform controller hub to the host processor is limited by Intel's DMI interface, based on PCI Express 2.0. That connection won't be updated to third-gen transfer rates until Skylake, which is still two generations away. But Intel's mainstream desktop chipset doesn't just need the bandwidth advantages of PCIe 3.0, it could also really benefit from more lanes than the eight it offers currently.

We know this because we've already looked at how multi-drive SSD arrays on Intel's 6 Gb/s ports are cut off at the knees. Last year, with a stack of SSD DC 3500s and ASRock's C226 WS motherboard, I put together Six SSD DC S3500 Drives And Intel's RST: Performance In RAID, Tested, and the ceiling was made quite clear. Z87 Express offered six 6 Gb/s ports of connectivity, but three decent SSDs are enough to saturate the DMI's limited bandwidth. Sixteen-hundred megabytes per second was basically the limit.

Does any of that change in Z97 Express? How does the addition of SATA Express and a second-gen x2 slot sharing the same limited throughput alter the equation?

Of course, as we've established, ASRock's Z97 Extreme6 is unique. It does have a two-lane M.2 PCI Express 2.0 slot competing for the PCH's limited bandwidth. But it also employs what ASRock calls Ultra M.2, which is a second slot tapping into a Haswell-based CPU's 16 lanes of third-gen PCIe, too. This slot isn't affected by the chipset. And if you drop a PCIe M.2 drive into the Ultra slot, you can still use SATA Express, which is wired into Z97. In exchange, you can't run a graphics card using the processor's 16 lanes, instead bumping it down to eight. Perhaps more severely, SLI and CrossFire configurations are out, too.

But I'm a storage guy. Giving up complex graphics arrays is alright in my book.

So, here's a breakdown of the DMI bandwidth problem. With four SATA 6Gb/s drives in RAID 0, we're limited to around 1600 MB/s. When you factor in the PCH-attached M.2 slot, available bandwidth doesn't change. But the distribution does. Finally, we add Samsung's XP941 in ASRock's special Ultra slot. It doesn't cannibalize Z97's throughput, but as we apply a workload to every device simultaneously, check out how much bandwidth we can push through the Samsung compared to Plextor's M6 and four-drive array of SSD DC S3500s.

Each device gets a workload of 128 KB sequential data with Iometer 2010. We start with the four-drive RAID 0 array, which are already limited by the DMI interconnect. As expected, we see roughly 1600 MB/s. Then, we add the two-lane M.2 slot hosting Plextor's M6e, a PCIe-based drive. The read task is simultaneously applied to it and the RAID 0 configuration.

Not surprisingly, total bandwidth still adds up to ~1600 MB/s. But it's split unevenly between the M.2 slot and SATA 6Gb/s ports. No matter what combination of storage you use attached to Z97 Express, there's a finite ceiling in place. I concede that most desktop users won't ever see the upper bounds of what DMI 2.0 can do. But it's worth noting that Intel arms this chipset with more I/O options than the core logic can handle gracefully. 

Then we add Samsung's XP941, which does its business free of the DMI's limitations. It alone delivers as much throughput as Intel's four SSD DC S3500s. That's notable because, when you think about it, a single SSD in the PCH-attached M.2 slot monopolizes as much as half of the DMI's available headroom. As storage gets faster and the DMI doesn't, an increasing number of bottlenecks surface.

The same workload pushing writes (rather than reads) demonstrates even lower peak throughput, topping out north of 1300 MB/s. We saw the same thing last year in our Z87 Express-based RAID 0 story.

Tapping into the CPU's PCIe controller with a four-lane M.2 slot dangles a tantalizing option in front of storage enthusiasts like myself, eager to circumvent the Z97 chipset's limited capabilities. I understand that most enthusiasts, even the most affluent power users, won't have six SSDs hanging off of their motherboards. But it really doesn't take much to hit the upper bound of what a PCH can do. And DMI bandwidth is shared with USB and networking too, so we're even assuming those subsystems are sitting idle.

This is what the Disk Management console looks like with four SSDs on Intel's 6 Gb/s ports, Plextor's M6e in the PCH-attached M.2 slot, the USB 3.0 Windows to Go storage device used to boot the OS, and Samsung's XP941. Only the last device isn't sharing throughput through Intel's DMI.

Think you might try working around these issues by dropping a four- or eight-lane HBA onto your motherboard? Wrong. Remember, unless you're tapping into the processor's third-gen PCIe lanes, all expansion goes through Z97 Express, subjecting you to the same limitations. Professionals who need more should simply look to one of Intel's higher-end LGA 2011-based platforms. 

I remain critical of PCIe-attached storage without NVMe (the time for that is coming). However, AHCI doesn't stop Samsung's X941 from demonstrating sexy performance characteristics. And ASRock's Z97 Extreme6 is really the only board able to expose its potential right now. Let's take a closer look and suss out the extent of its advantage in the Ultra M.2 slot.

Ask a Category Expert

Create a new thread in the Reviews comments forum about this subject

Example: Notebook, Android, SSD hard drive

Display all 20 comments.
This thread is closed for comments
  • -7 Hide
    aminebouhafs , June 5, 2014 5:16 AM
    Once an SSD in plugged into the Ultra M.2 slot, the bandwidth between central processing unit and graphics processing unit is cut-down by half. Therefore, while the end-user gets additional SSD performance, the end-user may lose some GPU performance because of insufficient bandwidth between it and the CPU.
  • 8 Hide
    JoeArchitect , June 5, 2014 7:18 AM
    Very interesting article and a great read. Thanks, Chris - I hope to see more like this soon!
  • 0 Hide
    wussupi83 , June 5, 2014 8:21 AM
    great article! - although z97 still seems boring
  • -1 Hide
    Eggz , June 5, 2014 9:54 AM
    This makes me excited for X99! With 40 (or more) lanes, of PCI-e (probably more), there will be no need to compromise. We have to remember that the Z97 Chipset is a consumer-grade product, so there almost has to be tradoffs in order to justify stepping up to a high-end platform.

    That said, I feel like X99, NVMe, and and M.2 products will coincide nicely with their respective releases dates. Another interesting piece to the puzzle will be DDR4. Will the new storage technology and next-generation CPUs utilize it's speed, or like DD3, will it take several generations for other technologies to catch up to RAM speeds? This is quite an interesting time :) 
  • 5 Hide
    Amdlova , June 5, 2014 9:56 AM
    Chris test the asrock z97 itx... and another thing... my last 3 motherboard from asrock and i want to say Asrock Rock's!
  • 0 Hide
    Damn_Rookie , June 5, 2014 11:27 AM
    While storage isn't the most important area of computer hardware for me, I always enjoy reading Christopher's articles. Very well written, detail orientated, and above all else, interesting. Thanks!
  • 0 Hide
    hotwire_downunder , June 5, 2014 8:58 PM
    ASRock has come along way, I used them a long time back with disappointing results, but I have started to use them again and have not been disappointed this time around.

    Way to turn things around ASRock! Cheap as chips and rock steady!
  • 0 Hide
    alidan , June 5, 2014 10:51 PM
    @aminebouhafs if i remember right, didn't toms show how much performance loss there is when you tape gpu cards to emulate having half or even a quarter of the bandwidth? if i remember right back than the difference was only about 12% from 16 lanes down to either 4 or 8
  • 1 Hide
    Eggz , June 6, 2014 7:39 AM
    Quote:
    @aminebouhafs if i remember right, didn't toms show how much performance loss there is when you tape gpu cards to emulate having half or even a quarter of the bandwidth? if i remember right back than the difference was only about 12% from 16 lanes down to either 4 or 8


    PCI-e 3.0 x8 has enough bandwidth for any single card. The only downside to using PCI-e lanes on the SSD applies only to people who want to use multiple GPUs.

    Still, though, this is just the mid-range platform anyway. People looking for lots of expansion end up buying the X chipsets rather than the Z chipsets because of the greater expandability. I feel like the complaint is really misplaced for Z chipsets, since they only have 16 PCI-e lanes to begin with.
  • 2 Hide
    cryan , June 8, 2014 5:22 AM
    Quote:
    Once an SSD in plugged into the Ultra M.2 slot, the bandwidth between central processing unit and graphics processing unit is cut-down by half. Therefore, while the end-user gets additional SSD performance, the end-user may lose some GPU performance because of insufficient bandwidth between it and the CPU.


    Well, it'll definitely negate some GPU configurations, same as any PCIe add-in over the CPU's lanes. With so few lanes to work with on Intel's mainstream platforms, butting heads is inevitable.

    Regards,
    Christopher Ryan


  • 3 Hide
    cryan , June 8, 2014 5:23 AM
    Quote:
    While storage isn't the most important area of computer hardware for me, I always enjoy reading Christopher's articles. Very well written, detail orientated, and above all else, interesting. Thanks!


    Awww, shucks!

    Regards,
    Christopher Ryan
  • -1 Hide
    obamaliar , June 8, 2014 12:08 PM
    Supercool review, Just as a note though any pair of good SATA Based SSD's will blow the doors off of that x941. For example I am getting 740.00MB/s bandwidth at steady 5 and 948.37MB/s at recovery 5 for PCM8 extended photoshop heavy from a pair of Intel 730's
  • -1 Hide
    obamaliar , June 8, 2014 12:10 PM
    Supercool review, Just as a note though any pair of good SATA Based SSD's will blow the doors off of that x941. For example I am getting 740.00MB/s bandwidth at steady 5 and 948.37MB/s at recovery 5 for PCM8 extended photoshop heavy from a pair of Intel 730's
  • 1 Hide
    Evolution2001 , June 8, 2014 8:08 PM
    obamaliar, how do you reckon that your 740MBps or 948MBps is faster than 1400MBps? (referencing the sequential read of the tested drive)
    SATA3 has a theoretical max of 6Gbps (750MBps). However, the practical max is more around 600MBps.
    Assuming you are running your Intel 730's in RAID-0 and achieving the max practical throughput, you'd still only come up with ~1200MBps which is slower than what Tom's saw at 1400MBps ON A SINGLE DRIVE.
  • -1 Hide
    obamaliar , June 9, 2014 8:55 AM
    Quote:
    obamaliar, how do you reckon that your 740MBps or 948MBps is faster than 1400MBps? (referencing the sequential read of the tested drive)
    SATA3 has a theoretical max of 6Gbps (750MBps). However, the practical max is more around 600MBps.
    Assuming you are running your Intel 730's in RAID-0 and achieving the max practical throughput, you'd still only come up with ~1200MBps which is slower than what Tom's saw at 1400MBps ON A SINGLE DRIVE.

    Quote:
    obamaliar, how do you reckon that your 740MBps or 948MBps is faster than 1400MBps? (referencing the sequential read of the tested drive)
    SATA3 has a theoretical max of 6Gbps (750MBps). However, the practical max is more around 600MBps.
    Assuming you are running your Intel 730's in RAID-0 and achieving the max practical throughput, you'd still only come up with ~1200MBps which is slower than what Tom's saw at 1400MBps ON A SINGLE DRIVE.
    Evolution 2001, I am referring to OS simulated performance IE CRYAN's PCMark 8 extended testing. Please read the article and you would understand that sequential performance is really a non-factor in comparison to random performance in an OS environment. Right now, SATA RAID has vastly superior random performance to PCIe drives like the X941, even if you were to soft RAID a pair of X941's together they cannot match a pair of good SATA SSD's in RAID in an OS environment. I cannot show you that exactly because Soft RAID is not bootable. look here: https://www.facebook.com/groups/1445011539065390/ there ia a pair of X941's soft raided getting their asses kicked by SATA RAID. The reason? 4K writes do not scale on PCIe drives. When PCIe drives can be RAIDed, bootable in RAID and have an RST type driver that allows for write caching Then they will become the superior OS disk.
  • 2 Hide
    cryan , June 10, 2014 11:57 AM
    Quote:
    Quote:
    obamaliar, how do you reckon that your 740MBps or 948MBps is faster than 1400MBps? (referencing the sequential read of the tested drive)
    SATA3 has a theoretical max of 6Gbps (750MBps). However, the practical max is more around 600MBps.
    Assuming you are running your Intel 730's in RAID-0 and achieving the max practical throughput, you'd still only come up with ~1200MBps which is slower than what Tom's saw at 1400MBps ON A SINGLE DRIVE.

    Quote:
    obamaliar, how do you reckon that your 740MBps or 948MBps is faster than 1400MBps? (referencing the sequential read of the tested drive)
    SATA3 has a theoretical max of 6Gbps (750MBps). However, the practical max is more around 600MBps.
    Assuming you are running your Intel 730's in RAID-0 and achieving the max practical throughput, you'd still only come up with ~1200MBps which is slower than what Tom's saw at 1400MBps ON A SINGLE DRIVE.
    Evolution 2001, I am referring to OS simulated performance IE CRYAN's PCMark 8 extended testing. Please read the article and you would understand that sequential performance is really a non-factor in comparison to random performance in an OS environment. Right now, SATA RAID has vastly superior random performance to PCIe drives like the X941, even if you were to soft RAID a pair of X941's together they cannot match a pair of good SATA SSD's in RAID in an OS environment. I cannot show you that exactly because Soft RAID is not bootable. look here: https://www.facebook.com/groups/1445011539065390/ there ia a pair of X941's soft raided getting their asses kicked by SATA RAID. The reason? 4K writes do not scale on PCIe drives. When PCIe drives can be RAIDed, bootable in RAID and have an RST type driver that allows for write caching Then they will become the superior OS disk.


    Actually, the 4 KB writes are really an artifact of the AHCI controller/API. If you took the same flash and controller on the Sammy, but rigged it to use NVMe, I think you'd see a big bump in random 4 KB performance. I've said over and over that desktop users, for now, are better off by using a couple SATA drives in RAID. More than just adding bandwidth, which isn't always important (strictly speaking), it lowers service times significantly. Plus, it's great to just keep adding cheap drives and getting more performance and capacity (when striped). See the Plextor M6e PCIe review for my thoughts on this.

    It's all academic anyway, since you can only buy the XP941 from a few random places, and it's $750. If I had a laptop which could use it, maybe I go that route, but even there SATA is just more power efficient. Give me a 1 TB EVO or M550 instead..... at least for the time being.

    PS: Is this Jon C??

    Regards,
    Christopher Ryan
  • 0 Hide
    obamaliar , June 10, 2014 1:24 PM
    Thanks for the reply love your stuff C Ryan :) 
  • 1 Hide
    Eggz , June 10, 2014 3:42 PM
    Quote:
    Give me [either a 750 GB] or a 1 TB EVO or M550 instead..... at least for the time being.


    Totally agree! For now.

    I also added the 750 EVO in there because (I believe) the only difference between the 1TB and the 750GB is capacity, unlike the smaller drives, which actually have less performance (i.e. 120, 250, & 500 GB).
  • 0 Hide
    logainofhades , June 27, 2014 11:59 AM
    I would rather use a single powerful GPU anyway, so the cut to 8x due to the ultra M.2 slot doesn't bother me at all. This is definitely and interesting board. I want an Ultra M.2 slot on a mini-itx board. :D 
  • 0 Hide
    lukebutters , July 15, 2014 7:32 PM
    If the RAID controller is set up with 4 disks in RAID 10, will the DMI limit be reached?