Advice on optimizing Raid 5 SQL Server DAS

Hello, I hope I can make this inquiry clear. In our test environment, I am trying to optimize our SQL server's read/write performance. This environment is to test SQL 2000 and 2005 databases of sizes from a few MiB up to 100+ GiB.I
Here are the server and storage specs:

Dell PowerEdge 830 server
P4 3.6ghz HT processor
80GB Sata (OS)
Dell PowerVault 22xS SCSI DAS
Dell SCSI 320-2 PERC 4/DC SCSI Controller Card (U320)
10x Maxtor Atlas 10K IV 146GB U320 68pin SCSI Hard Drive 8B146 (Avg Read is 72-89MB, depending on where I found info for same or similar drives (this is an acceptable range to me.)

The Drives are running in RAID 5 with 64-KB Stripe size, running NTFS. I presume the Block size on the array is 4KB as the default NTFS size. Running HD Tune on the drive shows the following behaviors:

64KB Blocks on Accurate:

64KB Blocks on Fast:

8MB Blocks on Accurate:

8MB Blocks on Fast:

Running below 64KB, the performance gets worse, and different block sizes between 64KB and 8MB scales fairly well between the 2, though I haven't tried all block sizes. I'm curious about the big dip that I get on 64KB, as this occurs on all settings between Fast and Accurate.

I would expect that a RAID 5 array, of decent performing SCSI drives would get better average read performance, and I thought burst rates were generally as high or higher than the average or top read rates.

I know that I can't have all things, but if I can optimize this for SQL, which is mostly small bits of data, but there are some KB/MB sized BLOBs that exist within the databases, and hopefully get higher burst/max/avg speed, I would feel like I've done what I can on this array.

I looked in RAID controller menu's and didn't see a menu for write caching (or missed it), and don't know if this would help or hinder. This unit does have a BBU attached to it.

Now to the question: Can anyone give any suggestions or guidelines on optimizing this array for it's intended usage?

Thanks in advance,

The Taker
16 answers Last reply
More about advice optimizing raid server
  1. Use proper benchmarking utilities, like ATTO under windows, for measuring the sequential read/write performance. HDTune will only read with a queue depth of 1, which is uncommon because filesystems will use read-ahead optimizations. So it can happen that HDTune score is very low, but when working on an actual filesystem or testing with ATTO, the displayed performance is much higher. In that case, the HDTune scores are just wrong.
  2. Thanks for responding. I ran the ATTO tool on the RAID array, and was seeing rather good read speeds up in the 140MB/s range, but write speeds are absolutely pitiful at about 6MB/s on the high end. This is performing much worse than our older test server where I ran the same ATTO tool (with 3x 186GB Seagate SCSI drives), that is running software RAID-5 setup through Windows disk management.

    What could be impeding the write speeds so much?
  3. Bad RAID5 engine, windows XP RAID5 does not have any write-back functionality, so it will write very slow except when writing exactly one full stripe block (stripesize * (number of disks - 1)) fully aligned. This is a known weakness of RAID5, and it requires an advanced (and more complex) implementation in order to get good RAID5 write speeds.

    Such advanced implementations are available under Linux/BSD and flavours, but not under Windows. In Windows the best you can get is either Intel ICHxR RAID5 drivers with the 'write caching' option enabled, or hardware RAID. If you would build a NAS running Linux/BSD instead you could save yourself the cost of hardware RAID to get a professional and reliable RAID5 solution.
  4. Also, Windows XP has partition misalignment by default, so you will get very bad performance when writing, and it will wear on your disks since they will be seeking alot more. You should probably look at an alternate solution. Have you heard about the ZFS filesystem?
  5. I read that the DISKPART.EXE utility in Windows can do offset of the partition. This is using Dell PERC controller based on LSI Logic.

    The OS is Windows Server 2003 Standard SP2 x32.

    I am a linux fan, but this is for my company's test environment, and the app that we troubleshoot is Windows/MS SQL/IIS...basically a fully MS shop, so linux is not an option, and I wouldn't know where to begin trying to troubleshoot any databases on linux anyways. I've got the Microsoft KB's up on my browser at work that discuss DISKPART.EXE and how to calculate and set the offset. If I understand correctly then, there is data written to the 1st HDD sector or file system cluster or RAID stripe, and by default, the OS does not set the FS clusters to match the Stripe. Am I on the right track?

    I looked at the partition offset in DISKPART, and it's at block 32, whereas if I read the information on calculating this correctly, it would all match up on the 1st full sector at 64KB on disk?

    I'm trying to push my knowledge in this area, to help make this new array workable. Right now, 6MB/s writes just isn't going to cut it when we have to restore a 150GB database, and then have to access said database.

    Please let me know if I'm on the right track with the FS/RAID config/HDD sectors idea. I think the RAID controller does support write cache and write back, i'll have to look at how to set those options.

    I'm very grateful for the information so far, but Windows/Dell PERC/NTFS information will be the most beneficial to my work setup.
  6. You have a decent hardware RAID controller and a decent set of disks. A bit long in the tooth, and the Dell Perc controller ain't the greatest, but it's not bad (basically an LSI 21320), and the drives are decent. (And apologies, but BSD/Linux ain't gonna fix this. If we're dealing with software RAID maybe, but that's not what we're dealing with.)

    The write performance issue you see may be attributable to either alignment or write-back being disabled, or both. It may also be an artifact of type of tests you're running. Anything > 32KB is pretty large for SQL (or almost any DB), and low level benchmarks, especially when doing large IOs are a very coarse, and may be a very misleading indication of DB performance (most DB's have a lot more going on under the hood).

    If that's really a PERC 4/DC (and not a PERC 4e/DC), it's a PCI card. Is it plugged into a 64-bit PCI slot, or a 32-bit slot (Dell 830 has both)? Those burst transfer numbers from your screenshots look odd... they're all about the same and almost exactly 1/2 what you'd expect if it's in a 32-bit PCI slot. (No idea what might have happened to the other 1/2, as I have no idea how HD Tune is coming up with that number, but it's suspicious.)

    There is a "write policy" to control write-back; it should be set to "write-back". (However, note that it is disabled if you've enabled clustering.) There is also a "read policy" that includes read-ahead settings; turn off read-ahead for DB use; it will make transfer-rate benchmarks look better but is counter-productive for DB access patterns. There is also a "cache policy"; set it to "Cached I/O". And many others...

    I'd suggest getting the Dell Perc 4D docs and looking through them. I'd also suggest using something like SQLIO or IOmeter for testing. But first I'd suggest reading:
    The hardware he uses is a bit different, but it provides some very good advice, is intended for people just like you (Windows/SQL), and provides some numbers you can use for an apples-to-apples comparison.
  7. I wouldn't assume that 6MB/sec is what you'll see when doing a restore of the DB. If I'm reading your screen-shots correctly, it's considerably faster at larger IO's (even in its current config). While large IO's are not typical of day-to-day DB activity, the DB restore will, in general, perform larger IOs. I suggest you first focus on day-to-day performance, then try a restore. I think you'll find the restore is significantly faster than what day-to-day (small IO) numbers would lead you to believe.
  8. The screenshots are read-only, he didn't post the ATTO figures which had the bad write performance, and ATTO tests on the filesystem so this is NTFS write speed. You are right its not software RAID, but nothing would prevent him from using it, with say FreeNAS. He would be using the RAID controller as simple SCSI controller instead, and let the RAID5 part be done by software, which its really much better at than Windows specific drivers and even hardware RAID controllers. With 10 disks he should be getting 600MB/s+ throughput; not 6MB/s!

    If it has to be a windows-solution, manually correcting the stripe misalignment is not going to give him 600MB/s+ writes, like he should get. He has got 10 fast 10K rpm disk and all he gets is 6MB/s write performance. Obviously something is wrong, but it cant be the misalignment alone.

    Aside from that, if its going to run as a database, you should test its IOps not sequential performance. Maybe its sequential performance sucks because it does not employ any buffer/caching strategy, but is keen on IOps. To be honest i've not seen great performance from any PCI card, and even though it cannot be 32-bit 33MHz (since he got higher than 133MB/s) the faster PCI standards are not fixing the other deficiencies of PCI: high latency, high cpu utilization, low throughput, shared access.

    I would test the IOps performance, since that's what will the array will be doing: database access is much like random I/O. Assuming you have a misalignment you would be seeing huge differences between the read IOps and write IOps.

    If the RAID5 thing under windows is not working, but you still want to keep using Windows, why not change the RAID level to something like RAID0+1, that would fix the slow write issue and probably give you more reliability. RAID5 is something special, it needs intelligent logic to operate fast and reliable. Many get this wrong, even in hardware implementations.
  9. Here's my ATTO test:

    I was curious how it would look with Direct I/O unchecked, and this is how it looked:

    Obviously I can't trust that benchmark, because it well exceeds the performance of SCSI and any PCI (32/64/PCI-X/etc) specs. It was still kind of fun to see such high read/write speeds. I am trying to run IO-Meter but I don't seem to understand that yet, and I'll see what I can do. I read some of that Kendal Van Dyke blog, and I'm thinking I may test a rebuild of the array with 64KB offset, with 64KB stripe size and 64KB block size, so that everything matches up...unless anyone has any other recommendations first.

    If there are any other easy to use IO tools, let me know.
  10. Disabling direct I/O means you are testing both your RAM and the controller; so you're not testing disk performance anymore. You need the direct I/O in order to bypass the Windows filecache; the numbers are correct though, and reading/writing at 1GB/s is normal if the filecache is involved. Also note that ATTO tests with a very small filesize, 256MB is not enough to outsmart the filecache; you would need at least 8 times the amount of RAM (8* 4GB = 32GiB test file) for a clean benchmark this way.

    Unfortunately, there virtually are no good disk benchmarks for windows; the good ones are Intel iPeak Suite, which contains tracing/retrace benchmarks, but those are for advanced use and properly benchmarking on Windows with advanced RAID systems is not an easy task.

    The first screenshot you posted does look like write-back is disabled, and all writes are done in 2-phase, which is going to be extremely slow and will wear on the disks. Either you should fix the controller displaying proper values, or use another setup.
  11. Couple thoughts...

    1. Are you doing a fast initialize on the array? If yes, there will be background activity until it completes and the array will be doing more work. For that size array initialization can take quite a while, and if you cycle power/reset (and a few other things) it will start over again. It should finish initialization before you start benchmarking.

    2. Not that maximum serial transfer rates are what you're after, but the max read numbers you show from ATTO are much lower than what I'd expect for your configuration (250-350+MB/sec.) You might try it with a queue depth of at least 16-32, and with a test size of at least 10GB; I'd also eliminate transfer sizes > 512KB from the test to save time (above that it's really academic). A small test size and low queue depth isn't going push your array snf max out I/O across drives. (The write numbers also seem too low, even if write-back is disabled.)

    Please let us know what you discover. Always good to have more data points.

    p.s. If you haven't seen it already:

    p.s. Those Atlas drives have a max serial xfer rate of ~80MB/sec outer zone and 40MB/sec inner zone.
  12. I haven't seen quite the approximage 80mb/s reads at the top end, though ATTO shows similar reads to HD Tune. I changed the Write setting from Write-Through to Write-Back, and saw the following for the new read/write speeds:

    I took 1 of our spare drives (the reason we're not using all 14 slots for HDD's) and initialized it as a single-drive logical Disk, and the PERC Controller wanted to initialize it as a RAID-0 array with the 64-KB stripe sizes. I realize this may not be optimal for a single-drive, but was just trying to quickly setup a single-drive to test.

    Here's the single-drive performance (also tried 64KB block sizes, even if this is not optimal for a single-drive):

    I did run some HD Tune tests on the single-drive, and the performance was similar in graph shape, though a bit slower than the RAID array, which makes me wonder if there are any drive, configuration or setting issues that make it appear to "dip" in speed then speeding up again, when on the 64KB tests.

    We have to clear some data that made it's way on to the RAID array soon, then reformat the partition with the 64KB offset, and 64KB NTFS cluster size, and see if that offers any improvement. We will not change the current 64KB RAID Stripe size. I think even if we do not get any better performance than what we are now getting, that will be much better than before any changes were made to the configuration.

    I'll call it good for now, and when we reformat, I'll retest and let you all know the results. This has been a good learning experience, and one that I know I can use down the line.
  13. I'm not sure what I did. I've tried different combinations of:

    - NTFS 4KB and 64KB cluster sizes
    - Default offset (31.5KB), 64KB and 1024KB offset
    - Dynamic disk and Simple disk (converting my offset partition in simple disk would move offset to beginning of disk).

    all of my permutations (I'm proud of myself for using such a big word so early in the day) resulted in speeds almost identical to this:

    I don't know that I'll get any better performance out of this array, and now I'd rather have back the speeds I was getting before. I formatted with the original configuration (dynamic disk, default offset, default NTFS cluster) but still get speeds that are much slower than they were when I hadn't touched the NTFS partition and only set the Write-Back setting in the RAID controller.

    I noticed that my reads have gone up slightly, but at this point, i'd rather have more balanced read and write speeds.

    This is my goal again:

    Could anyone offer advice on what I may have done wrong, or how to restore the read/write to the higher write speeds?
  14. before my superiors get impatient, I'm calling it quits. I found that also changing DirectIO to Cached IO in the RAID controller helped with the speed. I don't remember changing this during configuration/troubleshooting, but I may have. Now i'm getting the following speeds, similar but slightly higher than my previous good configuration, this is with using the 64KB cluster/stripe/block setup:

    I'm calling it quits before I break anything else in my performance quest.

    Thanks again for everyone's help and advice!

    - The Taker
  15. Thanks for the update. Those are reasonable numbers. (Academic, but I'd bet you'd get a bit more out of it--especially read--if you increased the queue depth and the test size.)
  16. "I found that also changing DirectIO to Cached IO in the RAID controller helped with the speed."

    This setting enables the write-back mechanism, and was indeed what you had been looking for. Not that it is enabled, you need to rely on BBU for data protection, which you have. The write-speeds have improved though are still limited. You may try increasing the queue depth to see if that improves things, this can be done with a system already in production, as ATTO just works on its own files on the target filesystem, and doesn't touch others.

    Good luck with your system!
Ask a new question

Read More

NAS / RAID SCSI SQL Server Storage