Sign in with
Sign up | Sign in
Your question

RAID Guru Help Required....

Last response: in Storage
Share
December 22, 2006 9:26:19 PM

I'm currently working on a Production SQL server with 6 15k U320 SCSI hard drives on an IBM RAID controller in RAID 5.

We've been generally dissapointed with the performance of the machine, and historically have had nothing but problems with the machine.

I ran HD Tach as well as some other utility called DiskBench. HD Tach showed a burst rate < 100MB/s, a seek time of 6ms, and the graph showed an average of 60 MB/s read rate. It was close to how my laptop rated....

I almost cried.

I hoped that HD Tach just had prejudice against RAID 5 or the controller for some reason, so I dug up another bench tool called DiskBench.

Multi-threaded reads rated < 20MB/s. Multi-threaded writes were around 40MB/s. Something seemed really screwy here.

What's the best way to bench my array? What kind of performance should I expect from the array? Is HD Tach generally accurate?

If so, I need to call IBM and cause a ruckus.

Thanks in advance.

More about : raid guru required

December 22, 2006 10:13:52 PM

I used that same program to bench my raid 0 array against my Seagate 320 gig perpendicular recording drive. It found that my Seagate 320 is faster. Kind of weird. I noticed a significant decrease in loading time in programs, startup, and games with the raid even though it sais that 320 gig was faster. It did not run so fast when I had windows on it. So I would doubt the accuracy of it.
December 22, 2006 10:54:42 PM

I'd be calling IBM for a proper performance measurement tool at the least. I assume its under warranty / support... Tools are support.
Related resources
December 22, 2006 10:57:29 PM

What Raid card are you using, what bus is it one, what server type, stripe, onboard cache, and what version bios is the card using?
December 23, 2006 3:22:33 PM

Quote:
I'd be calling IBM for a proper performance measurement tool at the least. I assume its under warranty / support... Tools are support.


Yeah IBM support may as well be resident in our building. We've had so many problems with this machine.

I want to make sure I'm (1) not crying wolf and (2) I want to be able to benchmark it after they fix it and be able to say "Yes, it's good now," or "No. Keep working."

I didn't buy the server, so I don't know much about the hardware. I'm more interested at the moment in benchmarking it.

I do know it's U320, IBM ServeRAID, and I'm pretty sure it's PCI-X. All the drives are U320. (I believe the) RAID 5 stripe size is 64k.

From what I know about SCSI, it seems like a bad cable or terminator. This is because the burst rate was like 90MB/s.

Any ideas?
December 23, 2006 8:13:07 PM

OS? Card version? # drives, raid configuration? Firmware version? ibm GSA has a lot of support, but these questions need to be answered....
December 23, 2006 9:29:45 PM

Maybe I should rephrase the question....

What's a good benchmarking tool for RAID drives, and what should I expect from a typical Single-channel U320 RAID 5 array with 6 15k drives?

I know enough to know that HD Tach measuring 90MB/s burst is a problem, and that copying a 1GB file takes a full minute (~17MB/s).

I can research upgrading firmware, drivers, etc myself. What I need to do is to measure my array's performance because I believe it is slow, and I need to prove it beyond demonstrating a file copy.

We've paid for IBM support, so I can get them to diagnose the problem. What I need to do is prove the problem exists via performance benchmarks and be able to confirm its resolved...
December 23, 2006 9:31:37 PM

Quote:
OS? Card version? # drives, raid configuration? Firmware version? ibm GSA has a lot of support, but these questions need to be answered....


All the info I have at the moment is in the post immediate preceeding yours.
December 23, 2006 9:59:17 PM

Two people have asked for the same information...

The 1GB file copy should be a good enough test for before / after comparisons. One would think that would be enough evidence to convince an IBM tech that there is an issue.

A lot of smaller files ~64 KB that equal a GB would tend to prove your burst rate, as that's what you think your stripe size is.

Typically, what size file's are xferred to / from the server on a day to day basis? The array might need to be tuned for typical files. Not knowing what the specs of the adapter are, I can't say what the tuning options would be, but your 'resident' IBM tech should be able to advise you.
December 23, 2006 10:18:11 PM

Thanks for the help.

I'll get the tech specs as soon as I can. Unfortunately in order to get most of them I have to reboot the machine and go into the BIOS :(  It's a production machine, so I can't. I have to put a call and have someone at the data center do it for me.

It's a database server, but we do mostly data warehousing (i.e. frequent reads, few writes), which is why we went with RAID 5 over something like RAID 10.

Most of our reads/writes are sequential, and because we're using SQL server, they're in 8K blocks.

We actually have 11 drives on the card total; 2 in RAID 1, 2 more in RAID 1, 6 in RAID 5, and one hot-spare. I'm assuming they're all on one channel, because I tried copying from a RAID 1 array to a RAID 5 array and got that 17MB/s xfer rate. I expect channel-to-channel on the same card to be lightning fast. 17MB/s seems like they're on the same channel running 40MB/s on the line.

I hope that means I'm just running 40MB/s on the channel because of a bad cable, terminator, or config somewhere. I hope it isn't something retarded like a 33MHz PCI card, a small buffer, or something like that.

I'll create a bunch of 64K dummy files, and see what I can do about getting the specs.

Thanks again.
December 23, 2006 10:31:26 PM

Don't reboot a production machine just for this post... I know how painful that can be, and at this time of year, probably not a good idea. I'd discuss with your IBM tech possibly tuning the raid5 array stripe size down to 8k, that would fit you typical usage.

Raid 1 will usually run slower than a raid 5, because of the mirroring involved in writing, but should read faster than you seem to think is the case. (no mirroring in reading) Channel to channel would be faster than in-channel, again depending on adapter specs.

What I can tell you is that it is an Adeptec controller, my experience with most of those it that they are much faster than what you are seeing.
December 23, 2006 11:04:45 PM

I'd be doing the smart thing, and installing RAID 1 (only) on the very fastest drives.

Because, the fastest drives today, are faster than anything in RAID ever was.

RAID is a big wank, nothing more. Only RAID1 has value for its redundancy, and RAID5 is too expensive.
December 26, 2006 2:43:44 PM

Quote:
Don't reboot a production machine just for this post... I know how painful that can be, and at this time of year, probably not a good idea. I'd discuss with your IBM tech possibly tuning the raid5 array stripe size down to 8k, that would fit you typical usage.

Raid 1 will usually run slower than a raid 5, because of the mirroring involved in writing, but should read faster than you seem to think is the case. (no mirroring in reading) Channel to channel would be faster than in-channel, again depending on adapter specs.

What I can tell you is that it is an Adeptec controller, my experience with most of those it that they are much faster than what you are seeing.


Thanks for the help. I have the specs now that I'm back at work.

I checked the logs (IBM ServeRAID lacks a UI, so it took some time).

Card is a ServeRAID 6M on PCI-X at 133MHz. It has 256MB installed cache. The 2 Mirrors are on channel 1, and the RAID 5 array is on the second channel. The RAID 5 array is actually on an 8K stripe.

I copied from RAID 5 to RAID 1 (on different channels) and I was able to copy a single sequential 938k file in ~35 seconds (~35MB/s).

That seems REALLY low to me. What is this typically?
December 26, 2006 3:32:09 PM

Quote:
I'd be doing the smart thing, and installing RAID 1 (only) on the very fastest drives.


And the smart thing would be to test both configurations. You can't just immediately say RAID 1 is the solution. The OP has the right argument of test and change. Even a blind squirrel occasionally finds a nut.

Quote:
Because, the fastest drives today, are faster than anything in RAID ever was.


There's no way you can convince me (other than a documented case) that a single hard drive is faster than an array of the same hard drive using enterprise level equipment. Consider arrays consisting of fiber channel drives; there's no way a single drive could outperform it.

Quote:
RAID is a big wank, nothing more. Only RAID1 has value for its redundancy, and RAID5 is too expensive.


Did you use RAID 1 to fix a problem once? It seems you're a bit evangelistic on that particular level. You can't blindly discard RAID 5 as a viable solution. It has its place, and so does RAID 1.
December 26, 2006 3:42:18 PM

6x 15K Ultra 320 SCSI drives in a Raid 5 done correctly should get you over 300+ MB/s sustained throughput. Although transferring to the Raid 1 will be slower as its throughput will be much lower.
December 26, 2006 3:45:45 PM

Quote:
Thanks for the help. I have the specs now that I'm back at work.

I checked the logs (IBM ServeRAID lacks a UI, so it took some time).

Card is a ServeRAID 6M on PCI-X at 133MHz. It has 256MB installed cache. The 2 Mirrors are on channel 1, and the RAID 5 array is on the second channel. The RAID 5 array is actually on an 8K stripe.

I copied from RAID 5 to RAID 1 (on different channels) and I was able to copy a single sequential 938k file in ~35 seconds (~35MB/s).

That seems REALLY low to me. What is this typically?


Is that 938k or 938M? Copying to RAID 1 is going to get you an effective write speed of 1 disk, so 35MB/sec is not bad. Your RAID 1 is your bottleneck here.

What kind of performance problem are you having? If you need very high read speeds in a database, I wouldn't choose RAID 5. It could be a case where a redesign of the tables is required, if the performance has dropped over time.
December 26, 2006 3:46:11 PM

Quote:
I'd be doing the smart thing, and installing RAID 1 (only) on the very fastest drives.


And the smart thing would be to test both configurations. You can't just immediately say RAID 1 is the solution. The OP has the right argument of test and change. Even a blind squirrel occasionally finds a nut.

Quote:
Because, the fastest drives today, are faster than anything in RAID ever was.


There's no way you can convince me (other than a documented case) that a single hard drive is faster than an array of the same hard drive using enterprise level equipment. Consider arrays consisting of fiber channel drives; there's no way a single drive could outperform it.

Quote:
RAID is a big wank, nothing more. Only RAID1 has value for its redundancy, and RAID5 is too expensive.


Did you use RAID 1 to fix a problem once? It seems you're a bit evangelistic on that particular level. You can't blindly discard RAID 5 as a viable solution. It has its place, and so does RAID 1.

lol. There were so many things wrong with his post, I just chose to ignore it. Same with SupremeLaw's post ;) 

You take a risk when posting on forums that people who don't know what they're talking about will try and help :) 
December 26, 2006 3:58:30 PM

just a comparison to give you an idea.

im running the bare minimum raid 5 setup on a compaq ML370

4x 15k Ultra320 on a SmartArray 641 single channel 64k stripe.

HD Tach...
I get 150MB/s sustained throughput. And this number is low as I didnt shutdown a running SQL server instance, a J2SEE front end client, and their are 12 users using this machine as a file share.
December 26, 2006 4:21:22 PM

Quote:
6x 15K Ultra 320 SCSI drives in a Raid 5 done correctly should get you over 300+ MB/s sustained throughput. Although transferring to the Raid 1 will be slower as its throughput will be much lower.


Thanks. I expected about 300MB/sec read throughput and slightly slower writes as a result of parity calculation.

Quote:
Thanks for the help. I have the specs now that I'm back at work.

I checked the logs (IBM ServeRAID lacks a UI, so it took some time).

Card is a ServeRAID 6M on PCI-X at 133MHz. It has 256MB installed cache. The 2 Mirrors are on channel 1, and the RAID 5 array is on the second channel. The RAID 5 array is actually on an 8K stripe.

I copied from RAID 5 to RAID 1 (on different channels) and I was able to copy a single sequential 938k file in ~35 seconds (~35MB/s).

That seems REALLY low to me. What is this typically?


Is that 938k or 938M? Copying to RAID 1 is going to get you an effective write speed of 1 disk, so 35MB/sec is not bad. Your RAID 1 is your bottleneck here.

What kind of performance problem are you having? If you need very high read speeds in a database, I wouldn't choose RAID 5. It could be a case where a redesign of the tables is required, if the performance has dropped over time.

Whoa. Sorry. 938M (938,000k and change).

But yes, you've exposed the crux of the problem. I need to benchmark the actual throughput of the RAID 5 array. Granted, I expected a bottleneck with the RAID 1 volume, but I expected more like a 60MB/s-80MB/s sustained transfer rate, as opposed to < 40MB, mainly because it's a sequential write.

I dind't choose RAID 5, but I probably would have had I been involved at the time. 6 physical volumes allow for a hefty abount of parallel reads, which would allow us to saturate the U320 bus with large sequential reads(or so we thought).

Also, we're running at ~400GB for that volume. We would have needed ten drives as opposed to six to get that same storage in RAID 1.

(I appreciate the help, which is why I'm going into so much detail. I appreciate any alternative opinions)

Our database is primarily reads, and most writes are batch-jobs run offline. We're trying to squeeze the entire database into the 7GB memory footprint we have available. We've physically partitioned the table between the RAID 5 and RAID 1 volumes. The temporary database is on the RAID 1 volume, where most random writes will occur. Once the data is process, it's copied sequentially to the RAID 5 volume.

This provides us with the optimal architecture while still keeping costs down.

Now for benchmarking, I tried a program called DiskBench, which is something in .NET that just creates a file with random data, so there's no bottleneck. This is what I got:

RAID 5 ARRAY
----------------------------
Create 1GB File (10x100MB blocks): 10 seconds. ~95MB/sec.
Create 2x1GB File (Simultaneous): 28 Seconds ea. ~75MB/sec.

READ 1GB File (32MB Buffer): 34 Seconds. ~30MB/sec.
READ 2x1GB File (32MB Buffer): 118 seconds ea. ~16MB/sec.
-----------------------------

The array actually appears to perform faster writes than reads (which seems backwards to me). I tried the same program on the RAID 1 Array

RAID 1 ARRAY
----------------------------
Create 1GB File (10x100MB blocks): 18 seconds. ~54MB/sec.
Create 2x1GB File (Simultaneous): 40 Seconds ea. ~50MB/sec.

READ 1GB File (32MB Buffer): 18 Seconds. ~53MB/sec.
READ 2x1GB File (32MB Buffer): 50 seconds ea. ~40MB/sec.
-----------------------------


The RAID 1 array is more in line with what I'd expect, though it still seems a little low. Bear in mind that I have 256MB cache with write-backs enabled, and the server is under no load currently.

It looks like we might be running in wide-mode on both channels perhaps?
December 26, 2006 4:35:56 PM

Quote:
RAID 5 ARRAY
----------------------------
Create 1GB File (10x100MB blocks): 10 seconds. ~95MB/sec.
Create 2x1GB File (Simultaneous): 28 Seconds ea. ~75MB/sec.

READ 1GB File (32MB Buffer): 34 Seconds. ~30MB/sec.
READ 2x1GB File (32MB Buffer): 118 seconds ea. ~16MB/sec.
-----------------------------

The array actually appears to perform faster writes than reads (which seems backwards to me). I tried the same program on the RAID 1 Array

RAID 1 ARRAY
----------------------------
Create 1GB File (10x100MB blocks): 18 seconds. ~54MB/sec.
Create 2x1GB File (Simultaneous): 40 Seconds ea. ~50MB/sec.

READ 1GB File (32MB Buffer): 18 Seconds. ~53MB/sec.
READ 2x1GB File (32MB Buffer): 50 seconds ea. ~40MB/sec.
-----------------------------


The RAID 1 array is more in line with what I'd expect, though it still seems a little low. Bear in mind that I have 256MB cache with write-backs enabled, and the server is under no load currently.

It looks like we might be running in wide-mode on both channels perhaps?


You might check how the RAID cache is allocated. It sounds like it is more write-oriented. If you flip that around (say 75% read / 25% write), you may see a huge improvement.
December 26, 2006 4:53:46 PM

The cache doesn't appear to be configurable :?

Quote:
just a comparison to give you an idea.

im running the bare minimum raid 5 setup on a compaq ML370

4x 15k Ultra320 on a SmartArray 641 single channel 64k stripe.

HD Tach...
I get 150MB/s sustained throughput. And this number is low as I didnt shutdown a running SQL server instance, a J2SEE front end client, and their are 12 users using this machine as a file share.


Thanks. That's really exaclty what I need :) 





Something else to note: We're using a SCSI backplane for the RAID 5 array. How plausible is it that the backplane is bad? How could I test that?
December 26, 2006 5:03:23 PM

yep im on a backplane as well. generally if the backplane works, then it works. its a really simple circuit board.

sorry, not much of a help there as i dont know any good means of testing the backplane itself.
December 26, 2006 5:07:29 PM

Quote:
yep im on a backplane as well. generally if the backplane works, then it works. its a really simple circuit board.

sorry, not much of a help there as i dont know any good means of testing the backplane itself.


Thanks everyone for the help. This is enough to get the ball rolling with IBM I think.

I wish there was a more definitive way to test the array, but I'll just deal with what I've got. A rated 60MB sustained throughput for a U320 array is enough for me to worry.

Thanks again.
December 26, 2006 5:30:07 PM

if i were setting your system up from scratch then i would probably do this.

4x Raid 5 for the OS/Programs/db Log Files
6x Raid 5 for the Database
1x hot spare

then again i dont like crippling my system so i usually ditch the hotspare as the hotswap backplane makes the online spare a pointless endeavor. if it fails then swap out the dead drive with a cold spare and let it rebuild on low priority.

4x Raid 5 for the OS/Programs/db Log Files
7x Raid 5 for the Database
December 26, 2006 5:56:03 PM

If you use Windows as OS and MSSQL as RDBMS don't expect nothing more! Windows is very slow on disk access compared to *nix and MSSQL is one of the slowest RDBMS.
Anyway if you can't obtain high transfer rates with a large sequential file copying i'd investigate in this order: the SCSI cables and terminations, the SCSI drivers and the OS settings.
As a comparison I can tell you that one of the servers I manage is an IBM with 4x U320 15k IBM/Hitachi HD, connected to an IBM ServerRAID controller in RAID 5 and with Gentoo x64 (2x Opteron) I obtain a minimum of 190 MB/s in the large file sequential copying.
December 26, 2006 5:56:03 PM

Quote:
if i were setting your system up from scratch then i would probably do this.

4x Raid 5 for the OS/Programs/db Log Files
6x Raid 5 for the Database
1x hot spare

then again i dont like crippling my system so i usually ditch the hotspare as the hotswap backplane makes the online spare a pointless endeavor. if it fails then swap out the dead drive with a cold spare and let it rebuild on low priority.

4x Raid 5 for the OS/Programs/db Log Files
7x Raid 5 for the Database


See, I like the hot spare. For my databases, I like to get them rebuilt as fast as possible, and since work is 45 min away, it makes for a longer rebuild. But, to each his own...

It is possible though, that the backplane could be the issue, but I know of no way to test it either. To be honest, the last time I dealt with IBM was about 6 years ago. A ServeRAID card went belly up and IBM was terrible in helping us get it back up and running. That was the last straw and I never went back to them.

Good luck with that RAID, Whizzard9992.
December 26, 2006 6:00:28 PM

Quote:
If you use Windows as OS and MSSQL as RDBMS don't expect nothing more! Windows is very slow on disk access compared to *nix and MSSQL is one of the slowest RDBMS.


That is the most blind statement I have heard (outside of the Intel v AMD or MS v Linux debate). Any proof (besides TPC benchmarks) to back that?

I have Windows boxes running Oracle and managed a 250,000 user web application on MSSQL. Both work great. It's all about the tuning done on the queries and the hardware. You can't just "plug and go", or else you will get the results you stated.
December 26, 2006 6:03:22 PM

Quote:
I copied from RAID 5 to RAID 1 (on different channels) and I was able to copy a single sequential 938k file in ~35 seconds (~35MB/s).

That seems REALLY low to me. What is this typically?


Whizzard,

Just thought of something here. Are your logical drives badly fragmented? I'm wondering if that is causing your massive slowdown. If they are fragmented, then the read/writes won't be so sequential.

On my last assignment running SQL, I found the logical drives to be approx 60% fragmented (it was somewhere > 50, and I think 60 was it). I sped things up greatly just by doing a defrag.

After that, I scheduled a defrag to run periodically.
December 26, 2006 6:09:14 PM

Quote:
See, I like the hot spare. For my databases, I like to get them rebuilt as fast as possible, and since work is 45 min away, it makes for a longer rebuild. But, to each his own...

It is possible though, that the backplane could be the issue, but I know of no way to test it either. To be honest, the last time I dealt with IBM was about 6 years ago. A ServeRAID card went belly up and IBM was terrible in helping us get it back up and running. That was the last straw and I never went back to them.

Good luck with that RAID, Whizzard9992.


The online spare makes since in case designs where you have to shut down the system to change out a disk. Or in your case where you have to drive long distances to replace the drive. Or the server is in a remote location, then i would definitely run an online spare.

Since I am on location every day I would rather utilize the hot swap backplane for the additional performance instead of having it sitting there doing nothing for years at a time.

The only difference from hot spare vs cold spare is the time to pull the drive, insert the new one, and have it spin up. which is just a few seconds. you end up gaining that back in the end as you get the added performance of the additional drive.

But it isnt a big deal. I would agree, to each his own.
December 26, 2006 6:17:06 PM

Quote:
If you use Windows as OS and MSSQL as RDBMS don't expect nothing more! Windows is very slow on disk access compared to *nix and MSSQL is one of the slowest RDBMS.


I agree with belvdr on this one. You are spreading FUD. Go away!
December 26, 2006 6:58:37 PM

Thanks for all the help.

My configuration is actually close to what you quoted:

2xRAID 1 - OS/System/SWAP
2xRAID 1 - DB Logs/Temp DB
6xRAID 5 - Database

1xHot Spare assigned to RAID 5 Array. Can be reassigned remotely.

I agree the hot spare is a matter of preference.

I checked the disks and they're HEAVILY fragmented. I'm going to defrag and run the benches again.
December 27, 2006 12:35:51 PM

Hey Whizzard, any update?
December 27, 2006 2:55:39 PM

Quote:
Hey Whizzard, any update?


Working on it now :) 
December 29, 2006 12:18:52 PM

Quote:
Working on it now :) 


Hope it's working well. :) 
December 29, 2006 1:46:57 PM

Quote:
Working on it now :) 


Hope it's working well. :) 

I put in a call to our local help desk. The tech assisting me is skeptical, so he's running IO Meter. (?I always thought you could only test using IO Meter on unpartitioned space?)

I'll post up what the results were when I get them.
December 29, 2006 2:36:22 PM

How do I set up IO MEter myself to test performance?

Is there any way to identify on what bus my card is running without going into the BIOS? (i.e. PCI, PCI-X 100, PCI-X 133, etc)


I have a feeling I'm not going to get much help from my local help desk.

Quote:
Well, you're running RAID 5, so I wouldn't expect you to get reads faster than your laptop


:( 

*sigh*
December 29, 2006 2:47:33 PM

Need some more system information.

Whats the model of the system?

What slot is the raid controller currently in?
December 29, 2006 2:47:48 PM

I have the IO MEter results. Anyone wanna help me understand them? :) 

I guess the benches that were run were 2k OLTP benches in IO Meter....
December 29, 2006 2:48:37 PM

Quote:
Need some more system information.

Whats the model of the system?

What slot is the raid controller currently in?


IBM x236 series. I'm trying to find out what slot it's in.
December 29, 2006 3:09:31 PM

So I'm running 2 workers on the RAID 5 array, with the "All in one" access spec on both.

I'm watching the specs right now and I see an avg 80MB/s, 6200 IO/sec, and a max response time of 99 (ms?). Average response time is 0.31ms, so I'm assuming that is mostly cache hits?
January 2, 2007 10:38:24 AM

Quote:
How do I set up IO MEter myself to test performance?

Is there any way to identify on what bus my card is running without going into the BIOS? (i.e. PCI, PCI-X 100, PCI-X 133, etc)


I have a feeling I'm not going to get much help from my local help desk.

Well, you're running RAID 5, so I wouldn't expect you to get reads faster than your laptop


:( 

*sigh*

If that is the response from the local help desk, then they should switch careers ("Hey, who needs multiple spindles, just setup a single 2.5" drive at a low RPM!"). Ugh, anyway, I don't know a thing about IO Meter, but this tool (www.iometer.org) is from 2004. Is this thing still accurate running across arrays?

But from the spec's you give (6200 IOs/sec), that seems like a lot.

Is that spec across the entire controller or just one array? I'm guessing you don't have another controller you could move a couple of the arrays to in order to free up some bandwidth, eh?
January 2, 2007 1:30:23 PM

Quote:
How do I set up IO MEter myself to test performance?

Is there any way to identify on what bus my card is running without going into the BIOS? (i.e. PCI, PCI-X 100, PCI-X 133, etc)


I have a feeling I'm not going to get much help from my local help desk.

Well, you're running RAID 5, so I wouldn't expect you to get reads faster than your laptop


:( 

*sigh*

If that is the response from the local help desk, then they should switch careers ("Hey, who needs multiple spindles, just setup a single 2.5" drive at a low RPM!"). Ugh, anyway, I don't know a thing about IO Meter, but this tool (www.iometer.org) is from 2004. Is this thing still accurate running across arrays?

But from the spec's you give (6200 IOs/sec), that seems like a lot.

Is that spec across the entire controller or just one array? I'm guessing you don't have another controller you could move a couple of the arrays to in order to free up some bandwidth, eh?

Actually I do have another controller on the server. It's an adaptec RAID, even. We needed another channel for the Tape Drive because the tape drive is ultra-wide. I considered using this instead, but the problem is that this is a production server, and we'd have to repartition the array holding production data because the new controller's going to want to restripe the drive. Another problem is that we're using a SCSI mid-plane, and I think that may be the problem. If I switch the controller and the mid-plane is the problem, we're not going to see a difference in numbers. It is a plan, though.

At this point, however, I'm almost ready to pull the server down for 8 hours or so to go ahead and diagnose this problem. I'm sure there's something going on here....

I managed to run IO meter and I get ~100 MB/s sequential reads. I actually found an old IO Meter bench I ran in October that showed ~220 MB/s with an OLTP pattern. That still sounds low to me, but it's a lot better than what I'm getting now, and it's indicitive of a problem. A smoking gun, one might say :) 
January 2, 2007 7:32:42 PM

Just an update for anyone still following this issue: I've put in a call to our local help desk (Policy says they have to initiate the call to IBM if they can (1) corroborate our hardware issue and (2) cannot handle it internally).

I'm satisfied with the benches at this point to confirm a problem.

I'm assuming the high IO's are a result of the cache doing its job. If that's the case, then the MB/s bench is probably not an accurate reflection of the drive throughput.

I'll keep posting here as things happen. Thanks to everyone so far.
January 2, 2007 8:31:23 PM

I also have the x236 server with the 6M controller.

I am running RAID 5EE on the 6M controller with 6 10k u320 drives.

I also have an LSI controller in the same box with a maxtronic JANUSRAID external array (raid 6) connected to it via u320.

I am able to transfer 2 GB of data (Outlook PST files in five chunks) back and forth between the two arrays in approximately 35 seconds (each direction).

My technique for timing this is nothing elaborate. I just opened the system clock and watched the seconds click by while the transfer ran.

I am not a wizard when it comes to this stuff. I mention this to assure you that there hasn't been any high end tweaking.

I am using the LSI controller in addition to the 6M because the 6M wouldn't support the Janusraid enclosure. Maxtronic said they have trouble with the adaptec based HBA's.

I'll run some benchmark tools on both arrays in that box and post the results.
January 2, 2007 8:41:18 PM

Good to hear the troubleshooting is coming along. I'm wondering how the benchmarks would turn out if the load was removed from the system. Of course, I know you can't just "take it down" (I have the same issues too), but curiosity has me wondering.
January 2, 2007 8:44:33 PM

Quote:
I also have the x236 server with the 6M controller.

I am running RAID 5EE on the 6M controller with 6 10k u320 drives.

I also have an LSI controller in the same box with a maxtronic JANUSRAID external array (raid 6) connected to it via u320.

I am able to transfer 2 GB of data (Outlook PST files in five chunks) back and forth between the two arrays in approximately 35 seconds (each direction).

My technique for timing this is nothing elaborate. I just opened the system clock and watched the seconds click by while the transfer ran.

I am not a wizard when it comes to this stuff. I mention this to assure you that there hasn't been any high end tweaking.

I am using the LSI controller in addition to the 6M because the 6M wouldn't support the Janusraid enclosure. Maxtronic said they have trouble with the adaptec based HBA's.

I'll run some benchmark tools on both arrays in that box and post the results.


Awesome :)  Thanks :!:

It takes me about 1 min per 1GB going from my RAID 5 to my mirror, both on the 6M. That's about 4 times slower that what you've clocked.

I was able to take the load off (I shut down the SQL server service) when I ran the IO Meter benches.
January 2, 2007 11:26:09 PM

Quote:
Awesome :)  Thanks :!:

It takes me about 1 min per 1GB going from my RAID 5 to my mirror, both on the 6M. That's about 4 times slower that what you've clocked.

I was able to take the load off (I shut down the SQL server service) when I ran the IO Meter benches.


If you have 6,200 I/Os/second when the database is shutdown, something else is chewing up the disk. Is there any antivirus or other similar program on the box?

I'm not sure if there's a way in windows to see how much I/O a process is using.
January 3, 2007 10:29:20 AM

Bad news...

I ran HD Tach and found the bottleneck on my system.

The external array on the LSI card clocked in at about 140 MB avg

The internal array on the 6M came in at under 60 MB avg

I also received an error from HD Tach when running it on the 6M

HD Tach completed on the array connected to the LSI completed without issue.

Keep me posted as to what IBM says. I'd like to improve performance on the 6M if possible. (Loaded latest BIOS on the 6M as of a few weeks ago)
January 3, 2007 11:54:00 AM

Loaded the latest and greatest BIOS/firmware package from IBM
7.12.12

http://www-304.ibm.com/jct01004c/systems/support/suppor...

The HD Tach now completes without error at a whopping 54 MB avg with burst of 90

The LSI adapter to external array is pulling down 145 MB avg with burst of 179

I'd really like to hear from IBm about this performance,
January 3, 2007 3:04:59 PM

Check to make sure the controller is in the PCI-X 133 slot.

Looking at the manufacturers specs there are three PCI-X slots, two are 100 the other is 133. This probably isnt causing the bottleneck but you never know.
!