Performance Issues After SSD RAID Refresh

Plain Old Me

Distinguished
Jun 25, 2008
23
0
18,510
Hey all and thanks in advance for replies.

To give some context I do lots of software development, particularly for applications dealing with large data sets. My workstation has been performing like a champ with basically no updates since ~2012. It is a ASRock x79 mobo with hex core I7 (LGA 2011, but not one of the Xeon processors, IIRC the Xeon ones have some interconnectivity features that this desktop processor does not have). Plain jane gfx.

What I have been dealing with lately is my storage array. Besides having a few oddball HDDs for archived storage (on various mobo SATA controllers) my main storage is 4x SSD running on a LSI 2008 RAID card (firmware and drivers just updated to the most recent) in RAID 10. Up until yesterday, the 4x SSD were all OCZ 120 gb drives from 2012, which had been running continuously since 2012. One finally started showing some signs of dying, and I had been meaning to expand my storage anyway, so I got 4x Mushkin Triactor 250 GB drives to replace em, again in a 4x SSD array on the LSI card. [As a side note, I had considered other solutions, but this looked like the best way to model a scaled down production database server]. So I got all the drives installed and cloned my old image, no issues at all with that. But, my performance numbers on the RAID array is not as expected. On the old array, I was getting around 600 MB/s seq write, 200 MB/s seq read, 350 and 200 MB/s on 512k, 30 and 60 MB/s (yes, write is faster than read, strange) on 4k, and 300 and 150 MB/s on 4kQD32. Back in 2012, I found this pretty spiffy. And I figured the 4k without depth was fine given that SSDs perform better with command queues for parallelism.

Fast forward to the new, 2017 drives. Seq and 512k reads improved pretty awesomely (to 1000 MB/s and 600 MB/s) and 4k read (both serial and QD32) improved a few MB/s. But my write speeds are utterly disappointing. Seq writes are at 130 MB/s, 512k at 100 MB/s, 4k at 7 mb/s (yeah, not missing any 0's), and 4kQD32 at 11. In other words, about half of the write performance as the 2012 drives for big writes and even slower for little writes.

Any ideas on what is going on here? I have heard that the LSI 2008 RAID card is not the best performer, but performance here seems even more crippled than expected, particularly given the decent numbers with the older drives.
 
I think your issue is with the Mushkin triactor ssd. The review I read says that because it is a cheap TLC based drvice, write performance is not good:
http://www.thessdreview.com/our-reviews/mushkin-triactor-ssd-review-480gb/6/

I think the top dog today will be the Samsung 850 PRO.
 

Plain Old Me

Distinguished
Jun 25, 2008
23
0
18,510
Wow- did not realize that the Mushkin had that big of a problem with performance. Even so, should its performance be so degraded that 5 year old OCZ bottom-of-the-barrel-for-2012 drives can beat it? When I got this set, I had just assumed that even the weakest drives today were better than the budget drives of yesteryear- and the budget drives I had were plenty for my use. Also, my write numbers are wayyyy weaker than the linked review in all cases- ranging from 1/3 performance to 1/15 performance. Maybe there is something else going on here?
 
I can not tell you exactly why your new drives are not performing well, but it is obvious that they are the cause since you have changed nothing else.
likely, the free space recovery algorithm is not good enough to reclaim free nand blocks as the drive gets filled.
Many ssd makers simply buy oem controllers and nand chips and assemble them.
Samsung and Intel make their own controllers and nand chips so they can validate performance and control quality.
I think they keep the best binned chips and sell off the rest to others.

You obviously want two things.

1. read/write performance.

2. Data protection.

Raid 10 supposedly does that, but perhaps you have other better options. For protection, raid 10 does mirroring. That protects you from a drive failure, letting you continue running and allowing a non destructive rebuild.
Just how important is it to you to have no down time?
Do you stock spare drives to rebuild in case of a failure?
What would happen if your pc suddenly caught fire and destroyed your data.
What if it was stolen?
How current does a backup need to be?
The thrust of these questions is to suggest that EXTERNAL backup done perhaps daily might be a protection scheme that addresses more than just drive failure.
Drives do not fail often, the MBTF is something more than 1 million hours. That is many years.
The cost of mirroring is that you need to do writes twice and insure that they are done.

My suggestion is to use a single 500gb ssd for performance and implement whatever backup seems appropriate.

Raid-0 has been over hyped as a performance enhancer.
Sequential benchmarks do look wonderful, but the real world does not seem to deliver the indicated performance benefits for most
desktop users. The reason is, that sequential benchmarks are coded for maximum overlapped I/O rates.
It depends on reading a stripe of data simultaneously from each raid-0 member, and that is rarely what we do.
The OS does mostly small random reads and writes, so raid-0 is of little use there.
There are some apps that will benefit. They are characterized by reading large files in a sequential overlapped manner.

Here is a study using ssd devices in raid-0.
http://www.tomshardware.com/reviews/ssd-raid-benchmark,3485.html
Spoiler... no benefit at all.

A very sophisticated raid card can overcome some of the performance limitations.
It will include a battery to allow deferred writing so an app can continue
I see your lsi 2008 card as only a very basic one.

For the absolute best performance, I suggest a Samsung 960 pro m.2
https://www.newegg.com/Product/Product.aspx?Item=N82E16820147596
you will also need a pcie to m.2 adapter.
Something like this:
https://www.newegg.com/Product/Product.aspx?Item=N82E16815124167&cm_re=m.2_pcie_adapter-_-15-124-167-_-Product

Benchmarks will show sequential speeds of 3500 read and 2100 write.
That is for a new drive, so as the drive fills up, I would expect the write to reduce some.
 

Plain Old Me

Distinguished
Jun 25, 2008
23
0
18,510
^ Excellent overview of a lot of stuff. I did not want to include all of the details of why I went the direction I did so that the question was not so verbose, but since we are looking at alternative solutions I will elaborate a bit. Right now, I have my OS on the SSD array. I also have project source code. Some of that project source gets uploaded to Git periodically. I also store a few repos for private projects on the array without backing up to Git (private repos on Git cost $, so unless a business is footing the bill I tend to avoid). In addition, I keep various data sets that are generally subsets of larger data sets (or other dev data sets of varying sizes) on the array when in use (scripts move data in/out depending on the project I am working on).

Is redundancy important here? For the data, generally not so much. For the source code, I really do not want to lose any newly written sections if at all possible. Because backups are periodic (I do nightly backups in addition to this to a HDD; the HDD also has a bootable image in case something like the RAID card gets corrupted) they do not necessarily capture this marginal source code, hence the RAID array. In the past, it has come in handy. When I first did the OCZ array, one of the original SSD sets died very soon after use. I just stuck my spare in and got the other drive RMA'd. Except for a ~1/3 loss of performance for a few hours, all was fine and I did not lose uptime.

Given all of this, real-time redundancy is more a less a hard requirement of mine. RAID 10 does this well and is common in real-world database applications. RAID 5 or 6 tends to work well also, but used to need more sophisticated RAID cards to do so. Not sure if that is the case anymore, but I did not want to upgrade my RAID if not necessary. But, RAID is done here primarily for redundancy and not for performance- though I would think SSD RAID would improve performance for things like the various SQL and NoSQL DMBS's I use in development as well as the manual parallel file system accesses some of my applications perform. A single SSD is sufficient to get the performance numbers I need to do testing. That said, RAID corrupting performance to the extent happening here is not really acceptable.

I had looked at NVMe solutions as well, but my understanding is that there might be some compatibility problems with my platform. IIRC, the X79 has weird PCIE 2.0/3.0 issues, as Intel was just making the transition when produced. Additionally, I am working on a Windows 7 OS (sometimes in a Linux distro in a VM) and I thought there may be some problems with Windows 7 and NVMe. As a final nail in the NVMe coffin for my particular application, I could not find a lot of info about running these types of drives in RAID. It looks like there may be some (perhaps very expensive) NVMe cards with RAID built in somehow, and of course there is software RAID (yuck).

With regard to the SSD filling issue, does this refer to TRIM? I do not know too much about this problem, but when I built the old OCZ RAID array I was worried about this. After my numbers for that initial array were pretty good, I figured that this was not a problem (or, perhaps, was not a problem for this particular RAID controller?). I was also operating on the assumption that the TRIM issues were being fixed back in 2012 and that these days there was not much problem left. Since it has been suggested that this aspect may be problematic, is it feasible that this could lead to performance degradation to an order of 1/15? I did run tests before and after cloning the OS onto the new array and did not notice significantly different numbers, but maybe something done on RAID initialization congested the drive?

When we get down to it, what particularly confuses me about this is just that the old OCZ drives (I looked at the exact model, they are OCZ Agility 3 series 120 gb) perform so much better than these brand-new drives. Looking at various posts elsewhere about this RAID controller and SSDs, it appears that this controller may just be picky about what drives in particular are being used. Apparently, it has issues with write parallelism, but again it seems strange that these issues may just hit the Mushkin drives and not the OCZ drives. Maybe the old OCZ drives have a more sophisticated controller than the new Mushkin drives?