Sign in with
Sign up | Sign in
Your question

ICH10R - RAID 5 Failure

Last response: in Storage
Share
April 11, 2009 8:05:53 AM

Greetings all,

After having two 1TB NAS failures, I have decided to build my own File Server based on Windows XP (Least common denominator). I have built many data centers in my past with SUN, HP, IBM, and other vendors. I have been lured by the prospect of using SATA II drives in RAID 5 at an affordable price with software level RAID.

I embarked on designing and building a system as a Media and FileServer for my home network. My home has a 3COM switchw ith 2 dedicated 1GB/s ports and 24 100MB/s ports. This provides a healthy backbone for file server operation.

So here is the system I designed and built:

Gigabyte EP45-UD3R (45 NorthBridge, ICH10R SouthBridge)
Intel QUAD core 2.8GHz processor
4 GB memory DDR2 1066Mhz
5 (500GB) Seagate 7,200RPM drives

While I realize this is way too much processing power for a Fileserver, I decided to double this and use it for powerful desktop functions and some gaming. Since it willbe chewing power on the network backplane.

The configuration was done to use SATA2, NCQ, and Intel's ICH10R capability of setting up the RAID-5 across the 5 drives. The drive was segmented into 100GB for OS and the remainder, close to 1.8TB, for storage. Intel Matrix Storage Manager 8.5 picked up the drive array, initialized the array, and everything worked great.

One small hiccup along the way was the fact that Write speed was bismal! It was really horrible and give me a bad feel in my stomach as I was getting at best 5MB/S write speed! After much research, I found that my cache write-back flag was turned off. Turning this on gave me close to 80MB/S which was very nice. Read on Raid5 as you know is amazine as-is with very little optimization required.

Now to the problem, my rig worked wonderful for about 1 month in a 24x7 mode with no problems. Then the Blue Screens of Death on Windows XP started occuring. I have done the basic debugging and found the following types of failures:

Module - Error
-------------------
nt - DRIVER_FAULT
nt - DRIVER_FAULT memory_corruption
iaStor.sys - DRIVER_FAULT iaStor.sys
nt - DRIVER_FAULT memory_corruption
vax347b - COMMON_SYSTEM_FAULT Vax347b.sys
nt - COMMON_SYSTEM_FAULT memory_corruption
nt - COMMON_SYSTEM_FAULT ntkrpamp.exe
Vax347b COMMON_SYSTEM_FAULT Vax347b.sys
ntfs NULL_CLASS_PTR_DEREFERENCE ntfs.sys
nt COMMON_SYSTEM_FAULT ntkrpamp.exe
nt COMMON_SYSTEM_FAULT memory_corruption
intelppm COMMON_SYSTEM_FAULT intelppm.sys
nt COMMON_SYSTEM_FAULT memory_corruption
sr DRIVER_FAULT

To try and mitigate this from happening, I have started XP with the last working config. I know without a doubt that the ICH10R and the Storage Manager are the root cause of this. I was able to upgrade the driver to Storage Manager 8.8 and get the array back in working order (rebuild on XP). Interstengly I experienced no data loss (thumbs up for RAID5).

The system worked for about 1 week and then the crashes started again. The uptime ranges from minutes to a couple of hours before the crash occurs.

Now I am at a loss and not sure how to proceed. I would like to try to fix the problem because I was very happy with the performance of the box, even for intense gaming. If I can't find a solution I am thinking of turning my attention towards a Raid controller and give Software level RAID the thumbs down.

Any assistance with my problem would be greatly appreciated.

Regards.

More about : ich10r raid failure

April 11, 2009 8:33:00 AM

It sounds like a faulty chipset. Can you still return the motherboard? Or at least get another to try. You should have no problem moving your raid array to another board with ICH10R. Also, did you know about Intel's Matrix Raid? This simple feature alone should keep you from moving to hardware raid because you can create a Raid 0 array for the OS and a Raid 5 for file server all on the same 5 drives. Also, you won't be able to go above 2TB total available space under 32bit windows and onboard raid. You would need 'carving' for anything above 2TB or 64bit XP/Vista. Just something to think about.

OH YES, I was just about to hit the Submit button when I remembered that you are using a Gigabyte motherboard and their boards tend not to work with hardware Raid cards. I don't know the exact reason, but I have seen quite a few people try and fail. I know firsthand that ASUS boards work just fine.
April 11, 2009 9:47:17 AM

Thank you very much for the reply. It would be great if I can isolate hardware vs. software problems. As you can imagine it would be a royal pain to exchange the motherboard due to the mounting and configuration.

Are there any diagnostic tools that can isolate chipset problems? I assume you mean the ICH10R could be having the problem. It would be wonderful to isolate this quickly in this debugging quest.

Thanks
Related resources
April 11, 2009 4:32:52 PM

If your drives are 7200.11, that would be the first place to start pointing fingers at. Those drives suck and even the new firmware isn't 100% stable. My guess is one or more bad drives.
April 11, 2009 7:23:41 PM

Wow....another person with such great information. 7200.11's suck? Really? That is why all 8 of mine have not had any problems running 24/7 for the last 15months. And yet, 1 of my Raptors just died a few weeks ago. Surely those Raptors don't suck.

Wolf2: I don't know of any specific tools to test the southbridge/ICH10R.

Are you overclocking your cpu at all? This alone can cause problems with the southbridge if the voltage is not changed in the bios.

Also, have you tried changing the placement of your 5 drives among the 6 sata ports? Maybe it is a single port.
April 21, 2009 5:57:09 PM

Thanks for the post and the suggestions. I would like to add my two cents about the 7200 rpm drives and give you folks an update of my latest endavours with the setup.

First on the drives, the 1TB NAS I had ran for 14 months non-stop before crashing. Inside the NAS were two WD Caviar 7200 Sata2 drives. Interestingly, I blieve that the Unix software that came with the WD Book did something to mess up the drives.

After running Spinrite 6 on each drive individually, refreshing the surface, and plugging the drives back in, the NAS went back into operation. Nonetheless, I was upset by the incident as I lost my storage for 5 months until I stumbled onto Spinrite and was able to recover the data. Since then I have moved the data to the RAID5 server I am discussing on this thread.

Secondly, my RAID5 server has finally crashed beyond repair. The machine kept rebooting after POST and the SATA drives inventory, right before going into Windows.

After many desperate attempts at recovering it, I ended up going back to the shop and having the tech run full diagnostics on the motherboard, memory, CPU, and hard drives. Luckliy all the components were in working order and we performed a BIOS upgrade on the Gigabyte board, then re-initialized the RAID array and started the trek to re-install Win XP from scracth (wiping all my data, no worries I backup regularly as I have learned my lesson).

The machine is back on the home network now and I plan to restore my data back onto the RAID array. My faith is a bit shaky on this ICH10R and the Intel Matrix Storage manager though. I am worried that as I get into higher amounts of data (over 1TB) on the Array, I might end up facing BSODs and other problems again.

I will give it another shot but if it fails again, I believe that I am headed for Controller Based RAID folks, one that has a dedicated XOR CPU and on-board cache memory.

I will keep you posted.

April 23, 2009 9:25:40 AM

Are you sure the system is okay? The behavour you describe would resemble how the system would act if the memory were on the edge of stability (check memory and power supply).

As for the write speed - it can vary a lot with the intel controller according to gigabyte.
For the last year or so I've been fighting a fight to get my ich9r raid5 (5x500gb like yours) to work stable, and in the end I gave up and simply bought a single 2tb drive for storage - and use the others as stand alone drives with backup. It doesn't deliver the read performance of the raid 5, but your network adapter limits the read speed to under 125mb/sec anyway - and only if the other system uses the other gigabit port - else it'll be around 12mb/sec max anyway - slower than usb, thus making the need for a fast raid non-existent.

Reply from gigabyte regarding the non-excellent ich9r speeds (assuming the same is true with any other software raid controller)

Answer - 679114
Answer : Sorry for our late reply.

One reason is, your OS is located on the RAID. This causes the drops in the transfer speed.
You will get much better results if the OS and the tools are installed on a separate disk.

Best regards

GIGABYTE-Team


Answer : Dear [edited],

You can´t compare performance of ICH9r raid with a real hardware RAID controller with its own processor and RAM.
The speed of RAID 5 is nearly the same as of a single drive and there is the complete Windows driver stack and the Matrix Storage Manager Software is involved. This all uses processor time and decreases the speed.

The reliability of the RAID is nearly the same as on a professional hardware RAID controller. That a rebuild sometimes starts might be caused by some data loss due to faulty RAM or software problems. Don´t OC the system if running a RAID to increase
the reliability of the RAID.
You can check your memory with memtest86 in a long time test to make sure there is no error.
www.memtest.org

S.M.A.R.T errors are not reported because the controllers in AHCI or RAID mode can´t handle these informations. It is only working if the controllers operate in standard ide mode.
Please check the drives with the diagnostic utilitys from the drive manufacturers for errors and replace the faulty drives.

Best regards

GIGABYTE-Team


Edit: Removed my name from the reply.
April 25, 2009 6:10:20 AM

I've had similar problems with both 9 & 10 many times. The problem doesn't come from running an OS off of a RAID 5 set, it just magnifies it. RAID 5 is never ever ever a good choice for an OS/Program drive even with a hardware controller. Most of the software RAID 5 ( mobo based ) set-ups I've done constantly rebuild or "resynching" for no apparent reason. At the time, I just put up with it as a fact of life as surprisingly it didn't drastically reduce performance as expected. With an OS installed on the RAID 5 set however, any little glitch had the potential to "blue screen" the system. Gigabyte's advice was perfectly sound, especially from a troubleshooting point of view. The benefits of RAID 5 have been truly muddled over time resulting in alot of bad advice since its rise in popularity painting it as the best thing since sliced bread. The only real benefits of RAID 5 vs. any other level is the increased capacity vs. cost ratio. Web servers are a possible exception, but even there a smart RAID 10 set-up beats it. Using mobo-based RAID 5 should only be for storage and only if price vs. capacity nessesitates it. As to the exact cause of the problem for both 9 & 10 ?....I never did solve it, I just moved on to the more appropriate hardware RAID. Mobo RAID is great for "getting your feet wet", but should not necessarily be relied on for data security in RAID 5. I never had any issues with RAID 0,1, or 10 ( really 0+1 anyhow ), only RAID 5.
April 25, 2009 8:27:46 AM

Thanks for the info and insight guys! Its great to get the conclusions of people who have faced problems with these PC builds, as it tends to save a lot of time and heart ache.

Just to give you an update, since getting the machine back, the restore of the data has been going onto the RAID array. Close to 1TB+ of data, software setup files, etc are being restored. Things have been running fine for a few days and then it Blue Screened last night again. I am beginning to believe that the root cause is around a couple of reasons:

1) The volume of Data managed by the MOBO Raid, as it goes north of 1TB, it might choke - I started seeing these problems once I crossed over the 1TB threshold. The machine ran fine for 1 full moth with no problems while the overall data was less than 1TB (Note that my Array's total stoarge limit is 1.85TB as I have 5 drives x 500GB each in the Array)

2) The O/S installed on the same Array, based on the last couple of posts. I also believe that the short read/writes of the O/S and other software for that matter degardes the performance of the RAID array as the drives are constantly cranking to update the cache, virtuam memory file, etc.

So my plan of attack for the next week is as follows:

a) Verify that Memory Modules are sane and not causing the BSODs
b) Update the BIOS to the latest possible version
c) Update the Intel Storage Manager to the latest version
d) Run this for another week in 24x7 (the data already is above 1TB so if my assumption is correct I should see the problem)
e) Report back to this forum with the results, in case of a crash I will need help migrating the Contoller RAID

On a side note Shadowflash, the allure of RAID 5 has always been the cost benefit, ability to run with a drive down (hot-swap when you get replacement), and the ease of mind that your system is resilient to failures. I realize that 0+1 can give you similar benefits, but you will require more money to get the number of drives to get you an extended amount of storage (2TB+).
April 25, 2009 6:29:58 PM

Yep...the allure of cheap redundancy is pretty strong, however price is not as large of a factor now-a-days as TB drives are pretty cheap. I think the greater problem with RAID 10 is the increased physical space required which some cases and PSU's cannot support. I run in to that problem on builds more often than the financial concern of the drives themselves...

On a side note...the real reason RAID 5 should be avoided and why it's just a false sense of security to the end-user.
http://miracleas.com/BAARF/RAID5_versus_RAID10.txt
I personally have experienced this problem and was able to replicate it under controlled conditions. My opinion is that this phenomenom is the root cause of most RAID 5 mystery failures especially using non-RAID model SATA drives, but no one wants to here that anyhow....these inherent flaws have been known and ignored for a long time now.

Funny you should mention degraded array performance and rebuilding as an asset for RAID 5, as in reality, it is by far the worst of all redundant RAID levels at these tasks.

Even knowing all this and experiencing many parity related malfunctions, I too am continually tempted to use RAID 5....LOL
April 29, 2009 4:33:34 PM

ShadowFlash,

Can you elaborate more with "My opinion is that this phenomenom is the root cause of most RAID 5 mystery failures especially using non-RAID model SATA drives".

I think I have the problem with my new upgraded system right now. :pfff: 
April 30, 2009 1:00:15 AM

First, a good percentage of people here will disagree with me on this. My theories and opinions are based on the theoretical process of RAID and extensive testing and real-world experience. In recent years, RAID 5 and 6 have become increasingly popular due to its inherent economical advantages and improved controller design. This has lead to many people choosing these levels without proper consideration to the potential for disaster.

Did you read the link I posted ?....That should explain the inherent flaws in RAID 5 or 6. RAID 3 or 4 do not have this problem, as they provide a "free" parity check on reads, which 5 and 6 do not. That point aside, we come down to the drives and controller.....

The following are quotes from WD posted in other threads here....

Quote :

Question
What is the difference between Desktop edition and RAID (Enterprise) edition hard drives?

Answer
Western Digital manufactures desktop edition hard drives and RAID Edition hard drives. Each type of hard drive is designed to work specifically in either a desktop computer environment or on RAID controller.

If you install and use a desktop edition hard drive connected to a RAID controller, the drive may not work correctly. This is caused by the normal error recovery procedure that a desktop edition hard drive uses.

When an error is found on a desktop edition hard drive, the drive will enter into a deep recovery cycle to attempt to repair the error, recover the data from the problematic area, and then reallocate a dedicated area to replace the problematic area. This process can take up to 2 minutes depending on the severity of the issue. Most RAID controllers allow a very short amount of time for a hard drive to recover from an error. If a hard drive takes too long to complete this process, the drive will be dropped from the RAID array. Most RAID controllers allow from 7 to 15 seconds for error recovery before dropping a hard drive from an array. Western Digital does not recommend installing desktop edition hard drives in an enterprise environment (on a RAID controller).

Western Digital RAID edition hard drives have a feature called TLER (Time Limited Error Recovery) which stops the hard drive from entering into a deep recovery cycle. The hard drive will only spend 7 seconds to attempt to recover. This means that the hard drive will not be dropped from a RAID array.

If you install a RAID edition hard drive in a desktop computer, the computer system may report more errors than a normal desktop hard drive (due to the TLER feature). Western Digital does not recommend installing RAID edition hard drives into a desktop computer environment.


Quote :

Q: Regular 7200 RPM desktop drives run fine in RAID environments; why do I need these drives? A: Unlike regular desktop drives, WD RE SATA and EIDE hard drives are engineered and manufactured to enterprise-class standards and include features such as time-limited error recovery that make them an ideal solution for RAID.
Q: What is time-limited error recovery and why do I need it?
A: Desktop drives are designed to protect and recover data, at times pausing for as much as a few minutes to make sure that data is recovered. Inside a RAID system, where the RAID controller handles error recovery, the drive needn't pause for extended periods to recover data. In fact, heroic error recovery attempts can cause a RAID system to drop a drive out of the array. WD RE2 is engineered to prevent hard drive error recovery fallout by limiting the drive's error recovery time. With error recovery factory set to seven seconds, the drive has time to attempt a recovery, allow the RAID controller to log the error, and still stay online.


OK, that should start to explain the problems using desktop drives in RAID configurations. Many people DO successfully use desktop editions in RAID. The problem is compounding errors. When using a fast controller card, parity overhead is reduced, thus allowing for more time devoted to error recovery. The advantage to a hardware controller card when using parity RAID is it's ability to completely off-load system overhead. The problem with on-board RAID is any "system hang" for any reason, can affect the stability of the RAID array. Re-building or Re-synching takes time...lots of time...and usually a user will shut-down or go to standby long before the process can complete. Even if your machine is "always on", what happens, when there is another error before the first process is complete ? I've used both 3ware cards and the Intel Matrix controller, and although there performance ( especially the 3ware card ) is admirable, my arrays were almost constantly rebuiling or re-synching. This did not provide all that much performance loss ( surprisingly ), but left unchecked resulted in an ever increasing number of errors. Worst case scenario, a drive is completely dropped from the array. The hoops you have to jump through to re-add the supposedly failed drive is ridiculous.

These problems however are not limited to on-board or software RAID, only magnified. The situations I've been able to replicate have occured with mid to high end controller cards with on-card XOR engines and battery-backed cache. The problem in these cases were partially dying drives which were not reported by S.M.A.R.T. and also did not trigger a failed drive. This occured on an enterprise level SCSI disk shelf where bad data was written, not reported, and subsequently bad parity information written to disk. If this would have been reported on a read access, perhaps I could have caught it in time, but due to the nature of RAID 5, it wasn't. The end result was not complete data loss ( luckily ), but a corrupt directory structure which caused a "gobbley-gook" rebuild that had to be manually re-sorted file by file. I replicated this error on both an LSI megaRAID 1600 enterprise and a compaq smart array controller, both using the same set of 14 questionable disks. To this day, that exact same disk shelf is still in problem-free operation using RAID 10. Both now-ancient controllers are also still in problem-free use.

As you can see, I've had AND been able to replicate serious RAID 5 related errors with ALL forms of controllers. This substantiates the theoretical flaw in RAID 5. Just look through the forums, here and elsewhere, and see how many failed RAID 5 recovery questions exist. Now compare them to the iron-clad RAID 1, and the slightly more volatile RAID 10 and you'll begin to understand why I dis-like RAID 5. From a performance standpoint, RAID 10 almost always beats RAID 5 anyhow, so why use it on a desktop or workstation. Web Servers are the only exeption I can think of in terms of performance, and even that is questionable. My understanding in fact is that oracle database servers are increasingly moving away from RAID 5 for both performance reasons AND this very issue.

The only thing that magnifies these problems more than on-board controllers and desktop edition drives is the use of RAID 5 as an OS/system drive. ANY small error will more than likely cause boot problems, further restricting your ability to take action. Not to mention, small random writes are the weakness of parity RAID and needed by the OS to perform well, hence the instability associated with 1st generation SSD's.

By all means, don't believe me, do your own research and you'll find this is not a "mystery problem", but a well defined flaw. I am not a proffesional, just a RAID junkie who's done alot of testing in search of the holy grail of storage systems.....RAID 5 is not it.

AFAIK, there is no way to prevent ANY of these problems from occuring. Many people will tell you that I'm just too paranoid and that they have never had any of these problems. That does not mean that the potential for these scenarios do not exist. Use parity RAID at your own risk.

Sorry for the essay....but you asked me to elaborate. This is actuallly the "short version" LOL......
April 30, 2009 7:15:55 AM

Thanks for the extensive post Shadowflash! I will tell you that ever since the failure of the Server and its rebuild I have been uneasy about its full use within my home network. Also, the constant churn of the disks and sometimes slow trasfer rates are beginning to push me over to your camp (for Desktops at least).

So let me ask you this, would the following setup, based on your experience, yield the most reliable RAID system using SATA2 drives?

2 Disks in Raid 1 - For Operating System and Application Files

6 Disks in Raid 10 - For Data files

Intel Matrix Storage Manager in liu Hardware Controller.

Quite frankly, this will be a much cheaper solution for me than opting to a hardware controller. The cost would be three additional 500G disks and reconfiguring the system. On the other hand, the Hardwrae Controller cost would be 4 time as much. As for speed, going with what you are suggesting eliminates the parity bit calculation and synching which would significantly speed up data transfer.

I would also believe that with the data spread across three disks you will get faster read speed, so the system should see an improvement in that area as well.

What are your thoughts?

One other question, I tried to run Memtest86 but it did not work. The system would boot the CD and stop at the "Loading......................" prompt but nothing happens. Can you recommend another tool for Memory Testing? Can Sisoftware Sandra do this?

Thanks,
April 30, 2009 5:28:44 PM

My home server doubles as a workstation ( also an 8 drive set-up ), so I run 3 RAID sets, RAID 1 for the OS, and a RAID 0 for large apps and the log file. I use the extra space on the RAID 1 for set-up files, drive images, and such. The extra space on the larger RAID 0 array, I use as a scratch disk for nero or a download drive. The remaining 4 drives I run in RAID 10 for network accessable storage.

The intel matrix controller should be fine, it's pretty descent on a 6 drive RAID 10. It is suggested that you put the log file on a seperate spindle, but in the case of a small home server, i doubt if it would be that big an issue.

You could try this little real-world experiment I like to do for testing.....use "media player classic" and set the options to open a new window for each new media file clicked. Then, start opening movies and see how many you can simultaneously stream before any get choppy. Lay them all out across your desktop and randomly jump to different parts of the movies and see how it does. If you already have a RAID 5 set-up, try it there first so you can compare later. This really isn't a very accurate way to test as opposed to benchmarking, but it can simulate some real-world home server loads and give you a general "feel" of performance.

I can't help you with memtest though...sorry...
May 2, 2009 7:20:23 AM

Excellent info Shadowflash! What has your experience been with setting up stipe size on these RAID sets with the Intel Matrix Storage Manager? There isn't great guidance out there, not even in the Intel technical manuals.

I have read somewhere, someone saying that it should be aligned to the NTFS write block size. But it was not 100% clear as to what works and what doesn't. The standard is 64Kb for the stripe size but it can be lower or higher.

Your thoughts would be appreciated.

Thanks
May 3, 2009 5:46:04 AM

Shadowflash, I could really use your help. I just moved from XP x86 to Vista x64 with a 3ware 9650SE-8 and 3 500GB drives in Raid 5 on the 3ware(I know what your thinking but I will be going back to Raid 10 very soon). Apparently, 3ware doesn't have the best Vista/Server 2008 64bit support and I had to update the firmware to get Vista to work. I need Vista x64 because I use Adobe CS4 (Premiere Pro, PS & AE) and I must have 8GB of ram. And I must use my 3ware card for storage and the OS+Apps reside on 4 Raptors in Raid 10.

My problem is the Read speed of my Raid 5 array is 35MB/s when copying and 150MB/s for writes. It wasn't like this under XP. HD Tune Pro is able to show Read speeds up to 112MB/s with 256Kb blocks and only 50ish with 512kb blocks. The stripe is 64kb, btw. I have 3 Raptors that were part of a Raid 10(1 died a few weeks ago) and they had the XP on them. Since I'm not using XP anymore, I created a Raid 0 array first and then a Raid 5 array to test my 3ware card. The 3 Raptors had no problems in either Raid 0 or Raid 5, and I used the same stripe for the Raid 5 array as my 3 500GB array.

I bought a 1.5TB drive to backup everything as well as backing up important data to the Raptor Raid 5 and the Raid 0 array using the 4 Raptors connected to the onboard Intel(Raid 10 holds the OS and I used Matrix Raid). So, everything is backed up twice and I was thinking of deleting the 3 500GB Raid 5 array and creating a new array to see if that fixes it. I'm just very hesitant to delete all that data.

Do you or anyone else have any ideas as to what the problem is?

Could it be something XP did that doesn't work well with Vista?

Also, Shadowflash, you said your 3ware Raid 5 arrays were also rebuilding all the time. Are you certain they were rebuilding and not Verifying? Since I updated the firmware, I came across a new feature that allows a schedule for Verifying instead of it being automatic. Also, with my 3ware card, my PC alarm/speaker will let me know if there is any problem with a drive. I don't know if there are newer features that weren't in the cards you were using or if its something else.


PS Its nice to see another Raid junkie here.
May 3, 2009 8:28:48 AM

Greetings all,

All right, here is an update to my saga with the ICH10R Raid 5 Failures. First of all, to answer one of my own questions posted earlier, Memory Tester from HCI Design can be downloaded for free to test your computer's memory and check whether there are any errors in read or write. The free version, for home use only, will require you to open multiple instances of the tool at the same time to consume all available memory.

Running memory test on my machine with no problems have squarely put RAID5, ICH10R, and Intel Matrix Storage manager as the root cause for the BSODs I have been experiencing. This is a good point in my journey as I know that Memory is ok and I need to do something different with the RAID setup.

With the suggestion from Shadowflash, I have gutted my server last night wanting to add additional Hard Drives for the new RAID setup. The first problem I hit is running out of SATA ports! Yes, believe it my friends, when I bought the Mobo I thought who could ever want 8 SATA ports, well it turns out that I do! One of the SATA ports is taken up by the CD-ROM leaving 7 SATA ports available between the ICH10R and the Gigabyte SATA (GSATA) controller.

Faced with this problem, I changed my configuration to have 6 SATA drives hanging off the ICH10R, 1 SATA drive and 1 DVD drive off the GSATA controller. I figured I will run the 6 SATA drives off the ICH10R in RAID 10 and live with one drive for the OS (realizing that it will be a single point of failure). Then I hit the second problem, ICH10R only allows 4 drives max in RAID 10! And its RAID 0+1 not RAID 1+0. From my research RAID 1+0 is better than RAID 0+1 but there is no option within the ICH10R for 1+0 even though they call it RAID 10.

Having hit this second problem, I had to change my configuration again. I ended up configuring two of the 6 SATA drives on the ICH10R in RAID1 for the OS and putting the remaining four disks in RAID 0+1. On the GSata controller I installed one disk (no fault tolerance) and the DVD drive. I made the required configuration changes in the BIOS, added the member disks to two RAID volumes, and installed Windows XP.

After the install and drivers pathcing (with Intel Matrix Storage Manager to 8.8) the system is up and running again. So I ended up with the following configuration:

RAID-1 Volume with 500 GB
RAID-10 (0+1) Volume with 1TB
500GB Scracth Disk (No fault tolerance)

I will start the restore process and the burn-in during the upcoming week. All in all, while it was not what I expected, the final setup is pretty good with a lot of fault tolerance storage (1.5TB worth) and a total storage of 2TB.

I will keep you folks posted with the results of this reconfiguration and the performance within a couple of weeks.
May 3, 2009 9:30:30 AM

Wait, wait, wait. Raid 10 is 1+0. Intel's idiot engineer who wrote the BIOS wrong which shows 0+1 next to Raid 10.

Here is from Intel's site from the "Intel Matrix Storage Manager" page,
"A RAID 10 array uses four hard drives to create a combination of RAID levels 0 and 1 by forming a RAID 0 array from two RAID 1 arrays."

"Intel® Matrix Storage Manager
RAID 10 Volume Recovery

A RAID 10 volume will be reported as Degraded if one of the following conditions exists:

* One of the member hard drives fails or is disconnected.
* Two non-adjacent member hard drives fail or are disconnected.

A RAID 10 volume will be reported as Failed if one of the following conditions exists:

* Two adjacent member hard drives fail or are disconnected.
* Three or four member hard drives fail or are disconnected. "

Look at the "2 non-adjacent drives failing but the array is only degraded" - this means it is 1+0 because only 1 drive can fail in a 0+1 before all data is lost. With 0+1, there are 2 sets of Raid 0 and when 1 drive fails, all that is left is a single Raid 0 array.
With a 1+0 array, drives A, B, C & D - A+B=X & C+D=Y, one drive from X & Y can fail without losing data.
May 3, 2009 3:47:45 PM

wolf2 said:
Excellent info Shadowflash! What has your experience been with setting up stipe size on these RAID sets with the Intel Matrix Storage Manager? There isn't great guidance out there, not even in the Intel technical manuals.

I have read somewhere, someone saying that it should be aligned to the NTFS write block size. But it was not 100% clear as to what works and what doesn't. The standard is 64Kb for the stripe size but it can be lower or higher.

Your thoughts would be appreciated.

Thanks


There is a whole range of opinions on stripe size, and at one point I focused quite abit on this. Unfortunately, I ended driving myself insane with constant benchmarking with different stripe size and NTFS block size. In theory, setting the block size to the stripe size would be a good idea, however it never works. You have no way to be assured that the blocks are precisely aligned with the stripe size. Years ago, there was someone who figured out how to do it on a specific nforce chipset, and he did get some crazy fast benchmarking results, but the "trick" was only good under very specific conditions, with very specific hardware. I actually tried finding the thread that described how awhile back, but never could. It was somewhere at storagereview, that much I remember. I usually run 64k for my OS/program drives; this increases random IO, and I run 32k on my storage drives; this increases sequential performance. Deviating any more from this did yield more performance in the specific area needed, but sacrificed too much in other areas IMHO. Block size never seemed to make that much of a difference to me, but admitedly I gave up testing on that. I just use the default size and call it a day.

I had no idea there was a 4 drive limit to RAID 10, as that's all I ever used with the on-board. 8-ports with a 4 drive limit sucks. Sorry to here about the DVD drive being sata. I hope it still works out for you.
May 3, 2009 4:29:44 PM

@ specialk90.....

It's entirely possible that my 3ware array was verifying some of the time and rebuilding other times. The problem was, this was an "always on" machine, and even after months, this constant activity did not stop. After 2-3 months, the BSOD's would start. This was back on an XP 32-bit machine. When I went to Server '03 64-bit, The array would not function correctly. I chalked it up to bad 64-bit support too. I still use that RAID 10 array in a 32-bit XP machine at work for my "off-site" back-up ( I host work's backups at home in exchange ). The scheduled verify is a nice addition. I loved my 3ware card for performance, but it was definately "feature" weak.

I can't say I've seen READ speeds that low with any RAID 5, that is after all what they're good at. 150MB/s read and 35MB/s write sounds more normal to me for a slow RAID 5 array. Why are you using block sizes that big with a 64k stripe ? What you are essentially doing, is forcing EVERY write to be striped, which could explain the high write speeds. That same effect means that on reads, multiple spindles must be synched for ANY file, but this should be a positive on large sequential access. I've never had good luck with the larger block or stripe sizes. As a rule, I usually keep block size at least a little smaller than stripe size. As I said to wolf, I've seen increasingly diminished returns when straying too far from the default 64k and I've actually went backwards by going to far. How does the array benchmark with the windows default block size ? That would probablly be my first step. I usually benchmark first at default stripe and format at default block size to establish a baseline. Then, if I'm tweaking for a specific purpose, I rebuild the array and test with different stripe sizes first. When I have my desired stripe size, then I start working with block sizes and testing. If you do go this far, I'de be curious what the results are, because this drove me a little crazy when I did it. It took 2-3 weeks of constantly rebuilding and benchmarking, and I still couldn't establish any clear pattern. I thnk this is why there are so many different opinions on the subject.

When you say "while copying", is that refering to a benchmark or a "real-world" file transfer ? If it benchmarks OK, but dosen't perform well in real life, it could be the OS. I know almost nothing about Vista ( I know, I'm stubborn ), but XP and Server '03 always had problems with large file copies, even 64-bit. I used a program called Teracopy, which helped speed things up. The problem was with file copies that exceeded available resources. My typical copies would be 50-100GB in size, far surpassing my available RAM ( someday it won't...LOL ) and they would just choke. If you don't want to hassle with deleting and rebuilding, try that program first, and see if it helps. There are registry hacks that do the same thing, but I avoided those because they are always in effect, and you don't want that for normal file copies.

Sorry it took so long to get back to you guys....I've finally "cleaned my plate" of all my "side job" projects. Today's project...custom wired pwm 6-fan hot-swap control board, shoe-horning a SCSI U320 rack into a case it dosen't belong in, and resurrecting my old quad socket opteron server. Good luck to us all........
May 3, 2009 9:36:29 PM

Shadowflash, thank you for your comments so far.

1) The allocation/block size is windows default. The larger block size I was referring to was within HD Tune Pro while benchmarking.

2) I have a 3 drive Raid 5 with Raptors and I used the exact same Stripe(64k) and same block size(default) and its on the 3ware card. This array is extremely fast with 140+MB/s writes and reads while copying files.

I have a tendency to confuse people with what I write so let me try and start over.
These are my specs:
1) 4 74GB Raptors connected to Intel(ICH8R) & using Matrix Raid with a Raid 10 for OS and a Raid 0 for Adobe stuff. Both arrays are fast with the Raid 0 array average 225MB/s read/wite speed(according to HD Tune)

2) 3 500GB(7200.11) in Raid 5 on 3ware 9650, stripe=64, allocation=default

3) 3 150GB Raptors in Raid 5 on 3ware 9650,k stripe=64, allocation=default

4) 1 1.5TB(7200.11) on motherboard(ICH8R)

I use the Raid 0 array to test the real-world copy tests and everything, except #2, works like a charm.

On a side note: 3ware's 64bit support is seriously lacking. I'm rather glad LSI bought 3ware so maybe now their cards will have better support and more features.

In XP Pro x86, my write speeds were about 90MB/s but that was with only 100GB free. I once had this card in a Vista x86 and it performed just fine.
May 9, 2009 9:27:11 AM

Quick update folks,

After the changes made, the system has been running superb. Its close to 1 week now without a single BSOD so far. The other interesting thing is I was amazed at how snappy the system has become with the change. It truly does feel like a quad-core machine with a 1GB Radeon graphics card now.

The response is excellent, running multiple tasks at the same time is a snap, and the hard drive churn activity feels to be at 10% of what it used to be at with RAID5. It seems that software RAID 5 is a bit of a stretch and the amount of power that gets wasted does not justify the potential redundancy.

I will keep you posted as the acid test is for the server to run for 1 month without a crash in a 24x7 operation.
May 14, 2009 4:22:20 AM

Ok folks, some sad news!

Close to two weeks into operation the server started experiencing BSODs again. It ranges from VMX.sys to iastor.sys errors. The machine stays up between 2 to 4 hours before crashing with erros such as iastor.sys. I know that the iastor.sys error is related to the Intel Matrix storage manager and the driver for the ICH10R chipset.

Before any suggestions from the readers please realize that I have updated the BIOS, updated the drivers, tested the RAM/CPU, and motherboard. All updated, checks out, and seems to be working fine. The setup the I have now has:

2 Drives mirrord as OS volume
4 Drives Raid 10 as Media volume
1 Drive as scratch with no fault tolerance
Intel Matrix Storage Manager(iMSM) 8.8

All these drives are 500G. I have two options at the moment, either trying the hardware RAID and avoid any software/mobo raid combinations. Or, simply forgo RAID all together and stick with regular drives with a good backup routine.

Needless to say, I am a bit frustrated with the whole issue, especially after dropping a couple of thousand dollars on an unreliable system.
May 15, 2009 6:01:31 AM

I don't remember if you ever told us the cooling setup you have for the drives.

For this to suddenly come back possibly sounds like a heat issue which could be the Southbridge, any of the drives or even the Video card or possibly something else.

Have you considered the higher speed Ram, its compatibility and/or Ram voltage? One thought I have is that the board uses the Ram as Write-Back cache and a possible error with the Ram could cause this problem.

What could happen, is there is an error which corrupts something to do with the Intel driver(s). It seems like something is randomly occurring which causes corruption because you were able to reinstall everything and have it last for over a week without a problem.
a c 127 G Storage
May 17, 2009 6:53:07 PM

You're dealing with software errors that result in crashes, this could be due to faulty hardware (so check if the chipset could be overheated) or it could just be driver bug. Since Intel uses more features its driver is larger and thus more complicated.

Basically you should:
1) check the chipset temperature by touching its heatsink to guess its temperature. If it feels warm that's normal. If it really hurts that you must release your hand, it might be too hot. Adding a fan might help too.
2) re-check your memory with memtest86. its the best memory test there is because it tests all memory, and not memory iastor.sys might have in use already so doesn't get tested by any in-Windows application. You should download the UBCD (Ultimate Boot CD) which contains memtest, or any Ubuntu Linux cd which also contains memtest (its in the menu you get when booting from the cd). This might remedy any boot problems you had with the memtest86 ISO.

Also, i would like to stress that running Software RAID5 under Windows has reveral disadvantages. First you have a stripe misalignment, which amplifies the RAID5 small write performance penalty, and causes more I/O than necessary. Also, all software RAID5 solutions available on Windows do a very bad job, with one exception being the ICHxR drivers which have 'write caching' that can give you at least moderate performance. Obviously they don't work well for you, and you probably should look at another solution.

Have you ever thought about making a computer dedicated to handling all your storage? A RAID NAS might not be something casual computer users have, but it would allow you to run advanced RAID5 setups using ZFS filesystem, with its own implementation of RAID-5: RAID-Z. Combining both filesystem and RAID-engine in one package allows for variable stripe sizes, so 2-phase writes of RAID5 disappear, additing to performance. ZFS is also packed with features that would allow a maintenance-free, corruption-resistant and self-healing storage solution.

If you would like to know more about this path i'll indulge you, but check the two points addressed above first. If your hardware is really working properly, that would imply the Intel drivers are at fault here. If that is the case, it's worth looking at an alternate solution.
May 21, 2009 8:14:48 PM

Hello again folks. Sorry for the delay in the response to both Specialk90 and Sub mesa. I basically wanted information with teeth to it before posting back to the forum.

First of all, let me give you the update. I have completely ditched ICH10R RAID drivers after the last crash and decided to resort to good ol fashioned troubleshooting by eliminating one variable at a time. So I have loaded all the Fail Safe params in the BIOS (running F9 revision) and switched all the drivers to standard IDE not AHCI. I have dropped two drives out of the setup keeping a total of 5 x 500GB drives and 1 DVD Writer (all SATA).

The drives were setup as independent drives with no fault tolerance. I completed a fresh install of Windows XP on the first drive, installed all sound/lan drivers, and completed all the house keeping items of a fresh install. Then I put the machine to the test in the 24x7 with moderate read/write requests.

To my amazement, the machine started BSODS with the IRQL memory errors! Puzzled about this, i disconnected four drives and kept only one drive which has the OS. Then ran the machine again and was further puzzled by another BSOD within a few hours with the typical PAGE_FAULT_IN_NON_PAGED_AREA errors. Then it dawned on me what the last two posters have highlighted, the memory could be the culprit. I yanked one of my kid's DDR2 2GB 800 MHz RAM Chips out of his computer and into mine. I took out the two Corsair 2048MB 1066 MHz chips out of the server. I also reconnected all five drives back into service. Then booted the machine, it has been running non-stop for the past 4 or 5 days!

I feel that the memory is 80% of the problem right now but the acid test is a full 1-2 weeks of 24x7 operation without a failure. So I will keep you posted with what happens. Needless to say, I have no RAID setup whatsoever right now, but frankly I am really happy that I have a working system. Let's hope that it passes the two weeks mark and then I have to figure out where to go from there.

SpecialK on your question regarding cooling - I am using a hefty Gigabyte case with 2-front side fans and 2 back side fans. I paid a little extra cash at time of purchase to make sure I have plenty of fans in the machine. I am also using an Epsilon600W power supply with its own fan, so the system runs fairly cool across the board.

Sub mesa on your two questions, i believe that the answer now is related to the memory. The system seems to be fairly cool and I will double check the temperature of those chipsets over the next two weeks and let you know.

I am definitely interested in your suggestion about the dedicated RaidZ and ZFS and would appreciate any insight. I must say that having a file server that doubles up as a workstation is really nice. The file server provides media storage for the home network meanwhile its downloading files from the Internet in the background.

Thanks for all the help guys - it has really been crucial in isolating the problem and giving me hope when it was needed the most.
May 22, 2009 1:57:43 PM

Hi Wolf2,
Glad to hear you sorted your ICH10R issues. We have 20+ machines running version of ICHxR and running matrix RAID. And we have found the same performance difference you have discovered between the onboard RAID5 and 10. Hence we use a 2 disk RAID1 for OS and system. And a 4 disk RAID 10 for our data. And we have been using higher end Gigabyte motherboards. Hopefully that is the end of your worries. Stupid "memory" :o )

Cheers
Zapf
May 23, 2009 5:32:11 AM

sub mesa said:
...RAID-Z. Combining both filesystem and RAID-engine in one package allows for variable stripe sizes, so 2-phase writes of RAID5 disappear...
Going a bit OT here, but there seems to be some misconceptions about ZFS which need correcting...

ZFS does not eliminate "2-phase writes"--assuming you mean eliminate a potential low-level inconsistency due to, e.g., requiring atomic update of both the data and the parity drives, and which are typical potential failure modes of cheap RAID-5.

While ZFS doesn't eliminate it, it compensates for it with journaling, and thus makes it more robust. That applies across a ZFS vdev (i.e., it is a function of the ZFS on-disk structure, not a function of a specific file system). You could get pretty much the same from a decent/smart/BBU RAID-5 controller.

That said, I'd still prefer ZFS. And that said, the whole "RAID-5 write hole" brouhaha is due to cheap/dumb implementations (e.g., what you find with most mobo's), not because RAID-5 is fundamentally flawed. ZFS basically uses the exact same technique as RAID-5 (or RAID-6 with RAID-Z2). Variable stripe sizes are an optimization.

The need to perform atomic writes to multiple devices to ensure consistency doesn't "disappear" with ZFS--it's as constrained as anything else that has to write to disparate devices as an atomic action, but which can't ensure an atomic result because it is writing to disparate devices, any of which may fail before the entire action is completed.

The reason ZFS is resilient and immune to such low-level failures is because it maintains a journal, which allows it to ensure that "2-phase write" (i.e., an atomic update to disparate devices) either completes as an atomic operation, or ignore it as if it had never happened. Variable stripe size has nothing to do with it--you could do the same with fixed stripe sizes (and is essentially what decent BBU RAID-5 controllers do, only their "journal" is in BBU RAM on the controller).

Which is why, for best performance, you want a separate device to hold the ZFS journal/log (aka, a "logzilla").
a c 127 G Storage
May 23, 2009 3:57:18 PM

As Jeff Bonwick, lead developer of ZFS, stated:
Quote:
RAID-Z is a data/parity scheme like RAID-5, but it uses dynamic stripe width. Every block is its own RAID-Z stripe, regardless of blocksize. This means that every RAID-Z write is a full-stripe write. This, when combined with the copy-on-write transactional semantics of ZFS, completely eliminates the RAID write hole. RAID-Z is also faster than traditional RAID because it never has to do read-modify-write.

Whoa, whoa, whoa -- that's it? Variable stripe width? Geez, that seems pretty obvious. If it's such a good idea, why doesn't everybody do it?

The use of variable stripe sizes means RAID-Z doesn't have to do 2-phase writes (called "read-modify-write" by Jeff). Instead, by adjusting the size of the stripe so a full stripe block is exactly the size of the data you want to write, you get a "perfect fit" in which RAID5 can write very fast - so called 1-phase writes (called "full-stripe write" by Jeff).

As for journaling, that is something else. No atomic writes means you need some safeguard against dirty buffers and synchronisation between data and meta data. Filesystems can do this in various ways, and i wouldn't say ZFS does journaling. Since it uses a Copy-On-Write model, it simply writes to empty space when you overwrite something. If the system crashes meantime, it would appear as if the write never happened. Such systems differ from classic journaling, and are employed for many years in FreeBSD and other operating systems, called Soft Updates. Though each have their merits, its not really what you use, but how effective it is in preventing filesystem corruption in the case of system outages when using write-back mechanisms. NTFS fails in that, by the way. ZFS does the trick well provided its close to the disks, and can perform flush commands to the disk members, which you can't if you use some cheap hardware RAID. I haven't tested ZFS that thoroughly myself, however, i'm using it to my satisfaction. :) 

P.S. The answer to Jeff Bonwicks question in the quote above, is that ZFS is basically a filesystem and RAID-engine combined. Only this combination can do dynamic/variable stripe sizes, because you need information from both domains to make this work. ZFS is still pretty unique, and nothing exists that compares well with it. You can read his whole entry somewhere here.
May 23, 2009 6:07:57 PM

NB:

...when combined with the copy-on-write transactional semantics...

Without that it doesn't eliminate the "write hole".

...faster than traditional RAID because it never has to do read-modify-write.

Variable stripe size (always full-stripe write) is a performance benefit; it does not eliminate the write hole.

At the risk of oversimplification below...
  • N = number of disks in array - 1.
  • data[n] = block of data == stripe-size
  • "full-stripe write" == write of stripe-size * N bytes of user data
  • operations in the same step can be performed in parallel

    A. RAID-5 write(new-data) < full stripe write; e.g., single block:
    1. read(old-data); read(old-parity)
    2. new-parity = xor(old-data,new-data,old-parity)
    3. write(new-data); write(new-parity)

    B. RAID-5 write(new-data) == full-stripe write:
    1. new-parity = xor(new-data[1],...new-data[N])
    2. write(new-data[1]); ...write(new-data[N]); write(new-parity)

    C. ZFS write(new-data):
    1. T = transaction-start(); new-parity = xor(new-data[1],...new-data[N])
    2. allocate(new-data[1..N]+parity)
    3. write(new-data[0]); ...write(new-data[N]); write(parity)
    4. transaction-end(T)
    ...
    5. commit or roll-back transaction T

    Note that ZFS does the same number of writes as a RAID-5 full-strip write--one write to each disk in the array: data[1]...[N] + parity. If any of those writes fail to complete, there's an inconsistency. ZFS prevents that by using a transaction log, not by using variable stripe size.

    I also like ZFS a lot :) , as it does a better job and does it cheaper. And among other things, tt can make very effective use of SSD's as an additional tier for the transaction log (small/fast "logzilla") and as a read cache (larger/slower "readzilla"), which can provide significant performance improvement. IMHO the best use of SSD's today. (See Adam Leventhal's Fishworks blog.)
    Anonymous
    a b G Storage
    June 6, 2009 7:49:16 PM

    hello,

    I have a question: if I create raid matrix on motherboard with ICH9R chipset is it possible to move it then (in the future) to ICH10R chipset, if my mobo dies or something? I'm using WD drives with Raid0 for years but IDE with external controller. Have bought some time ago new mobo with ICH9R and I would like to use builtin raid. What do you think?

    Mike
    June 6, 2009 11:11:56 PM

    Quote:
    hello,

    I have a question: if I create raid matrix on motherboard with ICH9R chipset is it possible to move it then (in the future) to ICH10R chipset, if my mobo dies or something? I'm using WD drives with Raid0 for years but IDE with external controller. Have bought some time ago new mobo with ICH9R and I would like to use builtin raid. What do you think?

    Mike

    Yes it should work fine. Intel has done a good job of keeping the ICHxR family fairly portable- especially going up. Going down (due to limitations of raid drive combinations) is another story.
    June 14, 2009 6:24:49 AM

    I just wanted to get something straight about the ICH10R.

    Can I do this combination on the same chip.

    1 - Raid 1 Mirrored Set (2 Drives) Operating System
    1 - Raid 5 (3 Drives) Data or Raid 10 (4 Drives) Data

    and I can do these both at the same time on the same chip?

    Is there a performance hit for doing 2 sets vs 1 set on the same chip?

    Thanks,

    Ryedog
    June 14, 2009 7:13:27 AM

    microking4u said:
    Can I do this combination on the same chip.
    1 - Raid 1 Mirrored Set (2 Drives) Operating System
    1 - Raid 5 (3 Drives) Data or Raid 10 (4 Drives) Data
    and I can do these both at the same time on the same chip?
    Ryedog
    Yes you can; that's the "matrix" part of the "Intel Matrix RAID". As to the performance hit, I've no empirical evidence, but I'd guess (and only a guess) that the performance hit for multiple arrays vs. a single array is nil.
    June 14, 2009 11:07:11 PM

    I was also wondering what the performance would be like on ICH10r with just a Raid 10 with lets say some WD RE3 1TB drives in a single set of 4 drives. I know these drives have dual processors and 32mb cache. But I am wondering about maybe doing the operating system and data on the same set, maybe just partitioning them seperate. Using this configuration for a Desktop or Small Business Server?

    Anyone seen benchmarks on such a config?

    Thanks!

    Ryedog
    June 15, 2009 6:05:47 AM

    Hello everyone!!

    Alas, problem solved....well kind of I guess. The BSODs have stopped by replacing the 1066 speed memory with a couple of 800 speed chips. The system has been running for close to 3 weeks without a single failure, so I am stoked.

    The issue now is that I have dismantled all RAID setup and back to standard SATA drives running independently. Even worse, I am not even running them in AHCI mode :) 

    Quite frankly, I am enjoying the stability of the system so I will not be re-installing to setup RAID anytime soon. I will stick to the old fashioned backup approach for a few months until an opportunity presents itself to setup RAID again.

    Thanks for all those who have offered ideas in this process to identify the root cause. The interesting issue here is that I have not really discovered how reliable the ICH10R is because of the memory problem! So all this time was spent to track down memory as the Culprit and in the process I ended up removing all RAID configuration!

    Anyway, I thought you all would be interested to know.

    Wolf2
    a b G Storage
    June 15, 2009 11:21:21 AM

    I know you've already solved your problem, but I'd like to know something.
    Were you running your FSB @stock e.g. 266 or 333 and RAM at its rated 533Mhz (as set by 'Auto')? It's a common mistake people make with Intel chipsets.

    With Intel chipset memory controller sometimes there are certain FSB:RAM ratios, specific to your particular motherboard that are usually 'flaky'. All except 1:1. The higher that ratio is the more potential for instability. The surest way to use your RAM to its full potential is to clock your FSB up as well to keep that difference to a minimum. Or just run the RAM at lower freq., but tighter timing for around the same performance (real-world, not synthetic). A lowly stock FSB with high RAM freq. gets you nowhere.

    Now, about RAID5. I'm sure you know the failing nature with mechanical HDDs and its full consequence. RAID5 is the least I can do besides regular backups to prevent against such failure for archived data with best cost efficiency. Although I use hardware RAID5 now, my short 5month venture with Matrix RAID5 w/ 3x1TB had been a fairly good one. The only thing I can complain about it are obviously slower writes than hardware RAID5 and extremely long rebuild time (19hrs when I tested before deployment), both of which are expected from software RAID5 so not really complaints. As far as reliability goes, it's one of the most solid software RAID solutions I've seen (up there with Linux IMO) compared to shocking ones like nVidia which I wouldn't even touch with a 200ft pole.
    October 22, 2009 7:55:58 PM

    Shadowflash,

    Thanks for your insightful posts regarding RAID5!

    I think I'll switch to RAID1 now.

    Question about partitioning.

    Is it possible to partition a RAID1 volume?

    I have two WD Caviar Green 1TB drives which I would like to configure as RAID1 volume.

    I normally partition into OS and DATA partitions to make it easier to image the OS volume (for emergency restore) and backup the data volume.

    I'm wondering whether disk utilities like Partition Magic and True Image work with RAID volumes.

    Thanks!
    November 9, 2009 4:28:11 AM

    ShadowFlash,

    thank you. That is a brilliant, very helpful reply .

    If it is possible for the scenario (drive error recovery fallout ) you describe to also affect RAID10, then I believe that just today, one of my WD Caviar black (1TB) drives was just dropped from my RAID10 for the very reasons you outlined above. Your post helped me understand why this problem happened, as I did build the RAID (unknowingly) using the desktop edition version of the WD drives.

    My system is RAID10 implemented with the on-board Intel X58/ICH10R Southbridge sitting on ASUS P6T mobo. You would be correct in guessing that I'm another amateur (on computers) who thought I could handle implementing a RAID10 on my home PC.

    My strategy (for dealing with my RAID that now has dropped a drive) will be to swap in a new (same model) drive in the array and rebuild the array. They separately connect the "bad" drive as a stand-alone drive that I can then re-format around the bad sector. Then after fixing it, keep that drive as backup, thereby turning it into a "refurbished" drive that I can swap back in when this problem repeats itself with another one of those desktop edition drives.
    a c 127 G Storage
    November 9, 2009 7:14:33 AM

    Not the TLER story again :( 

    1) TLER is available on desktop drives like WD Green, you just have to enable it
    2) TLER is meant for guaranteeing uptime in servers
    3) TLER cannot magically solve surface errors
    4) RAIDs who encounter uncorrectable surface errors will kick the disk out of the array
    5) broken arrays should be resilvered with a spare disk
    6) for consumer systems who have no spare disks, the RAID engine should be tweaked to allow NOT disconnecting a drive at the earliest sign of trouble, such as possible with any non-Windows based advanced software RAID
    7) for serious storage needs, windows should be avoided and a dedicated NAS using advanced OS should be used

    I've used desktop class disks all my life in RAID5 and RAID0 configs, and simply setting the timeout to 60 seconds does wonders; it allows the disks to fix surface errors without disconnecting them straight away leaving you with a broken array. As people don't know how to 'glue' them together, sometimes they think all their data is gone. This is one reason RAID should NOT be used to increase reliability for consumer storage; in many cases it actually decreases reliability and consumers would be better off using plain-old backups without RAID instead.

    It has also to be said RAID and Windows never was a nice combination. Windows itself doesn't offer any advanced RAID, it leaves this to third-party drives like ICHxR to at least get some decent performance, while this could be native windows technology. It also has to be said Windows has no single reliable filesystem. NTFS is meta-data only light journaling and users have no option to use strong journaling.
    May 15, 2010 2:04:40 AM

    sub mesa said:
    Not the TLER story again :( 

    1) TLER is available on desktop drives like WD Green, you just have to enable it


    I just wanted to add to the article, for those googling and finding this such as I did, that since the date of the post I've reference above WD has removed the ability to enable TLER in desktop drives. Yes, there is a TLER Utility available on the internet and yes this will allow you to enable TLER on older desktop drives. However, information that has been posted in forums across the internet has confirmed that the new WD desktop drives (including the 1TB WD Black) does NOT have the ability to enable TLER any longer.
    September 6, 2011 5:40:54 AM

    ICH10 Raid Failures..! Ive had plenty of them drops the raid then i get the raid back up and working for a while. Then an error wont boot up etc. Then decided to install everything IDE and noticed a little hum. Hard drive clicking for no reason . You could here the head swing back and forth like hen you first power a hard drive up. So i decided to use the Jmicron controller on my Asus Board. What do you know everything worked fine. The whiny noise started when the hard drive would write the platters. The noise would come and go usually got louder and the drives clicked more the longer the computer been on. Called Asus and they have a probblem with the 2 sata controllers ICH10 controllers overheating or running hot. So i reinstalled everything turned the air on in the house. Turned all the fans on in the case on high waited for the house to get to 73 degrees. Then noticed no whine noise. No clicking and now write errors and never dropped a drive from the raid 0 Pair. I first had Seagate 7200.11 drives and there were firmware problems with those drives. I got Seagate to replace them and the new drives the same. All this time i thought the drives were bad or wierd. It was the ICH10 southbridge getting hot. So i ended up replacing the seagates with Western Digitals. There apparently is no real fix for this except keep the house cool or replace the board. Asus Rampage ii X58 going to be replaced with a 1366 x68 chipset board like the sniper or gene.
    !