How to choose the best hard drives for Intel's PCH RAID?

Hi,

I've received no response from Intel community forum regarding this topic:
"TLER/ERC/CCTL capable drives needed for PCH RAID?"
http://communities.intel.com/message/91107
Neither have I got a concrete answer for Intel support ticket 8000033427.
I wonder if my question is not valid.

question in brief:
When choosing RAID hard drives for using together with P55/H57 boards, are TLER/ERC/CCTL capable drives needed for a more stable RAID set up?

TLER/ERC/CCTL is a feature of a hard drive to accommodate RAID controller's management feature. I have collected some info and called hard drive manufactures in this regard.
WD TLER default sets to 7 seconds.
Seagate ERC default sets to 10 seconds.
Samsung CCTL default sets to 7 seconds.
These numbers may change for different versions/models. Some models may even allow users to change the timeout settings.

In the ticket 8000033427, an Intel engineer said that P55/H57 PCH, just like ICH7R, has a 10 seconds limit for a RAID member hard drive to reply to R/W commands before declaring this hard drive not responsive and dropping it from the RAID array, but he added Intel's PCH Software RAID does not truely support hardware enabled TLER/ERC/CCTL.

Thus, I don't know how to pick the best hard drives for the Intel PCH RAID array. If I choose hard drives built without TLER/ERC/CCTL features, these drives, while in its error recovery process, may be detected as not-responsive and dropped from the RAID array by PCH. On the other hand, If I choose TLER/ERC/CCTL drives, I've been warned about the incompatibility issues.

Any suggestions before I can get a clear answer from Intel chipset support on this? Or, shouldn't I be concerned about this issue?
28 answers Last reply
More about choose hard drives intel raid
  1. Quote:
    On the other hand, If I choose TLER/ERC/CCTL drives, I've been warned about the incompatibility issues.
    Who warned you about not using hard disks like Western Digital RE or Seagate ES series on an ICH10R controller?
  2. I received the following response from an Intel tech support engineer in the ticket #8000033427.

    He said on 5/28,
    "P55 only uses Software RAID. Therefore, the P55 documents do not list items such as TLER. .... TLER is not supported."

    He didn't mention specifically about ICH10R, but he said that 10 second limit is unchanged from ICH7R to PCH and the non-responsive drive will be marked as failed after 10 seconds since PCH issued a R/W command.

    AFAIK, Western Digital RE uses TLER for the out-of-sync problem between the RAID controller and the member drive during this hard drive's error recovery process.

    WD RE may work, at the first look, because it will definitely respond to PCH in 10 seconds, but how well these two will work with each other in various situations is at stake. I am evaluating the RAID stability risk of using either the Intel PCH or a dedicated RAID card.
  3. All I know is that HP SATA hard disks have TLER: http://h18000.www1.hp.com/products/servers/proliantstorage/serial/sata/entry/benefits.html

    They work well when connected to the Integrated SATA Raid (which seems to be a re-branded ICHxR controller).

    A dedicated RAID controller with BBU definitely is faster.
  4. WD RE(RAID edition) disks definitely have TLER.
    http://www.tomshardware.com/reviews/sbm-high-end-system,1689-7.html
    Even some desktop edition WD disks used to be "TLER available" via the TLER utility, and that is before WD decided to widen the gap between Desktop disks and Server RAID ones in the beginning of this year.

    I'm looking for a more reliable RAID solution, but of course faster is always better. For example, maybe I should just pick a RAID card which is marked CCTL compliant and then pair it with CCTL ready drives to achieve the RAID stability. Any suggestion on which to choose from, CCTL, TLER or ERC? or SAS maybe?
  5. Other than SSDs, nothing beats a bunch of 15K SAS drives connected to a high performance RAID controller and a BBU to enable write caching. I wouldn't use it for a desktop PC, but it's an excellent solution for servers. On a desktop I'd use RE3 drives on the ICH10R combined with an SSD if booting/applications loading performance is important. No matter what RAID solution is used, you need to perform regular backups.
  6. > Any suggestions before I can get a clear answer from Intel chipset support on this?

    We've seen a few requests here for assistance,
    because users assembled a RAID using
    WD's Caviar Black series, which do not support
    time-limited error recovery ("TLER").

    Yes, they will drop out of the RAID array
    after they start to fill up, because the
    firmware's error recovery logic may take too long,
    and Intel's I/O controller hub will conclude
    that one or more HDDs are not responding.

    BEST WAY is to stay with WD's RE (RAID Edition)
    HDDs, which are designed with TLER --
    time-limited error recovery.

    The specs for each WD HDD will state if TLER
    is supported by any given HDD:

    http://www.wdc.com/en/products/productcatalog.asp?language=en


    p.s. WD has sold so many millions of HDDs in recent years,
    and Intel's I/O controller hubs are also so ubiquitous,
    it's extremely unlikely that WD's RAID Edition HDDs
    are incompatible with any recent ICHx.


    MRFS
  7. Hi GhislainG,

    When SATA II and 6G is closing the gap between the speed of SATA and SAS, I wonder if there are some open knowledge and benchmark test results to show how SAS is more reliable than the new generation SATA. Anyway, it seems like you are on the side of putting in a RAID card with the backup battery unit for stability and speed.

    ===============================
    Hi MRFS,

    Yes, with WD RE hard drives and Intel I/O controller hubs' reputation and prevalence, I shouldn't worry too much about their incompatibility.

    Just like you thought, TLER will most likely help lessen the chance for hard drives to be dropped from a RAID array. This has also been mentioned by an Intel engineer.
    http://communities.intel.com/message/12098#12098

    However, by looking at these reported problems:
    "Random drive fails with new Matrix Storage Manager 8.9"
    http://communities.intel.com/message/51299#51299
    "Random drive fails with new Rapid Storage Technology 9.5 ?"
    http://communities.intel.com/thread/8139?start=0&tstart=0

    I am afraid WD RE TLER enabled disks will only alleviate the drop-out sympton somehow instead of providing a rock solid array, because even though they will be more responsive to ICHxR/PCH but may still not be a perfect match? The Intel support engineer told me that ICHxR/PCH does not truely support TLER/ERC/CCTL. He can't provide me a list of compatible hard drives for ICHxR/PCH.

    I searched through Intel site for tested and supported parts for the PCH motherboards but failed to find info regarding compatible/tested hard drives. Tested memory is listed nonetheless.
    http://www.intel.com/support/motherboards/desktop/sb/CS-029945.htm

    I think two things can help assure the stability of matching ICHxR/PCH with TLER/ERC/CCTL drives:
    -----------------------------------------------------------------------------------------------------
    1. Even though Intel's integrated RAID solution may not claim to be the perfect match for any specific vendor's(or vendor group's) standard but it should still claim categorically compatible to the extent that ICHxR/PCH is guaranteed not causing drop-out problem due to the prolonged disk's error recovery process(e.g., 5~10 seconds) as long as these disks have implemented TLER/ERC/CCTL.
    2. Some compatibility list of hard drives for the ICHxR/PCH boards
  8. I read the threads and it appears that version 9.6 fixes the issues reported with 8.9 and 9.5. It also looks like using RE drives wasn't as bad as using Caviar Black drives. What will you do? Buy a RAID controller, use version 9.6 or no RAID at all?

    Edit: SAS drives are not more reliable that SATA drives, but they are faster. When connected to a good RAID controller with a BBU, writing is very fast. Just add more drives to improve performance.
  9. From a scientific point of view, controlled tests need to compare
    all permutations involving all IDE, AHCI and RAID modes
    with and without TLER (or similar) support in the HDDs attached.

    And, with or without Intel's ICHxR chipsets, there is also the option
    to create "software RAID" arrays with Windows XP
    e.g. starting with dynamic disks.

    Thus, IDE and AHCI can still be configured in such a software RAID.


    We have 2 x 6G WD HDDs configured as a software RAID 0 (for speed);
    each is 1TB for a total of 2TB; and, this RAID 0 array
    is far from being full :)

    Here's that 6G HDD:

    http://www.newegg.com/Product/Product.aspx?Item=N82E16822136533&Tpk=N82E16822136533


    So far, so good; and, it's pretty fast too!
    The following test was done with a 96MB file,
    to force the test to read from the 2 HDD caches only,
    in order to get a feel for the 6G difference (if any):





    MRFS
  10. GhislainG>>....RE drives wasn't as bad as using Caviar Black drives<<
    It is pretty much concurred by me. But, I still can't draw conclusion WD RE will be trouble free though, at least not so by reading the threads from the Intel Communities forum.

    GhislainG>>What will you do? Buy a RAID controller, use version 9.6 or no RAID at all?<<
    If I just need a home desktop with some basic and convinent RAID capabilities, Intel ICHxR/PCH may be powerful enough. But, if I need to a reliable RAID without the possible mismatches waiting somewhere in the lifespan of the hard drive, I may wait for some answers and look around all options before jumping into a conclusion. No RAID is not that bad, if some cluster setup works. Redundancy can be achieved in many ways.

    MRFS,
    Two 6Gb/s 1TB RAID 0 setup is definitely a kill on the C/P. RAID 0's average throughput almost doubles that of non-RAID. Based on your data, its native non-RAID readings on average should be around 115MB/s. It is right on par with Legit Review's test results on SATA II and SATA 6Gb/s.
    "Seagate XT 2TB SATA 6Gb/s Hard Drive Testing"
    http://www.legitreviews.com/article/1127/3/

    How stable is your RAID 0 setup? Are there I/O intensive applications running on it 24/7/365?
  11. Quote:
    But, if I need to a reliable RAID without the possible mismatches waiting somewhere in the lifespan of the hard drive, I may wait for some answers and look around all options before jumping into a conclusion.
    To start with, no RAID is 100% reliable. Unfortunately you never mentioned that it's for a critical application in your original post. You can mitigate the risks with RAID0+1, RAID6, RAID60 and a few other combinations with the use of at least one hot spare drive just in case one fails at the most inopportune time. That's how an enterprise SAN should be setup, but I don't necessarily do it for servers that run 24/7/365. You also need a dual processor server, dual NICs, dual PSU, dual UPS, etc. Achieving 100% uptime is expensive.
  12. I don't expect RAID to do full system redundancy. It will only do disk redundancy. It takes much more to accomplish full system redundancy just like you said. But, I still expect to have a "relatively reliable" RAID without losing an arm and a leg.

    If we all agree this "error recovery gets out of sync" problem does exist between ICHxR/PCH and TLER/ERC/CCTL drives before Intel openly claims that ICHxR/PCH is categorically compatible with TLER/ERC/CCTL drives, shouldn't this potential problem be noted by users and dealt with by vendors? For example, can we users ask Intel to make new version firmware/software for ICHxR/PCH to address this issue? BTW, can AMD 's desktop chipset coordinate with TLER/ERC/CCTL drives and handle error recovery well without dropping them accidentally?
  13. > BTW, can AMD 's desktop chipset coordinate with TLER/ERC/CCTL drives and handle error recovery well without dropping them accidentally?

    Very good question: WHY DON'T YOU ASK AMD DIRECTLY?

    And, let us know what they say, please.

    Another reliable expert to ask this same question
    is Allyn Malventano at www.pcper.com .


    p.s. It occurs to me that a user option should be
    added to the SATA protocol, with a reasonable DEFAULT
    value, based on the best engineering expertise.
    This option should be accessible via Intel's
    RST and Matrix Storage Technology.


    MRFS
  14. Quote:
    For example, can we users ask Intel to make new version firmware/software for ICHxR/PCH to address this issue?
    Based on the threads that you linked, version 9.6 seems to address the dropped drives issue. No additional issues have been posted by people who upgraded to that version. It also is interesting that several complaints are from people using Intel motherboards and/or hard disks without TLER.
    Quote:
    BTW, can AMD 's desktop chipset coordinate with TLER/ERC/CCTL drives and handle error recovery well without dropping them accidentally?
    That's harder to determine because Intel have been supporting RAID since the ICH5R, therefore there is more info available about Intel than AMD. However you can google RAIDXpert and you'll find that the issue also exist with AMD. If I were you I'd probably stick to the Intel ICH10R and use the latest Intel Rapid Storage drivers (version 9.6.1014) and RAID enabled hard disks (WD TLER / Samsung CCTL / Seagate ERC). Or buy a controller like the 3ware 9650SE-4LPML and the BBU. It will cost less than $500 and provide good RAID5 performance.

    Edit: MRFS' suggestion to use software RAID shouldn't be ignored if you're leary of Intel's solution.
  15. p.s. Re: a "relatively reliable" RAID without losing an arm and a leg

    There are a lot of factors to consider, such as these observations
    which we offer, after several years of using RAID 0 primarily for speed:

    (1) 5-year warranties are superior to 3-year warranties,
    especially when retail cost per warranty year is considered;
    time passes swiftly when you're having fun, and 3 years
    can happen before you know it;

    (2) input power quality is crucial, which mandates a
    good UPS and PSU on every discrete system, with a
    feedback cable to initiate SHUTDOWN whenever
    the power grid fails;

    (3) active cooling on all HDDs is another necessity,
    ideally with a removable dust filter on all intake fans;

    (4) short-stroking partitions that host the most frequently
    used files increases performance and reduces wear
    on the armature assembly and bearing;

    (5) we also suspect, without conclusive proof, that
    regular disk checking does maintain the strength of
    raw magnetic recordings on the platter media;

    (6) installing HDDs with vibration-reducing mounts
    is another good idea, particularly when several HDDs
    share the same drive cage;

    (7) being kind to your HDD manufacturer is also a
    good policy, particularly if/when any given HDD fails:
    there is a measurable amount of "infant mortality"
    in this industry, so don't blow up if you experience
    your share of same!


    MRFS
  16. > MRFS' suggestion to use software RAID shouldn't be ignored if you're leary of Intel's solution.


    Software RAID doesn't come in all flavors, however,
    particularly with XP.

    I am told that server editions of Windows do support
    more flavors of software RAID.


    RTFM (Read The Fine Manual -- not always "F"ine however :)


    MRFS
  17. XP isn't an issue as people running mission critical applications use Windows Server 2003 or 2008 and dedicated servers with hardware RAID controllers, BBU, etc.

    Edit: I agree with the 7 points in your previous post. I have a home server that runs Windows Server 2003 and it's been running 24x7 for several years. It has a good PSU, a server motherboard and a Smart-UPS 1500 in case of power fluctuations or failures.
  18. > XP isn't an issue ...

    It certainly is for XP users who try to configure a RAID array
    using Intel's ICHxR chipset with HDDs that don't support
    TLER/ERC/CCTL.

    Some of those customers have complained bitterly
    e.g. with RMAs to Western Digital's support staff,
    that their RAID arrays are failing after lots of WRITEs
    and long before their factory warranties have expired.

    BUT, those customers failed to take note that
    their Caviar Black HDDs do NOT support TLER.

    And, even though I don't yet have any experience with Win7,
    my money is on a bet that similar things can be expected
    to happen with that OS too.

    So, part of the problem here is sheer customer ignorance.

    As I understand the crux of this problem, it results from
    an interaction among a HDD's firmware,
    the RAID controller at the other end of the data cable,
    and the logic of the device driver running that controller:

    if the HDD's firmware initiates an error recovery
    sequence, that sequence can take more time as
    the HDD has more data to check: if that sequence
    makes it "appear" to the controller that the HDD has died,
    the controller's internal logic very probably will "drop"
    that HDD from the RAID array.

    Thus, the problem can occur with any OS and with
    any HDDs that do not support one of these features:
    TLER/ERC/CCTL


    We've also observed something like the reverse of this
    situation: our Highpoint RocketRAID 2322 was dropping
    WD's RE HDDs only if we enabled periodic "polling"
    from that controller's User Interface. When polling
    was DISabled, our WD RE HDDs no longer dropped out.


    Yes, Windows Server 2003 or 2008 do support more
    software RAID options than does XP, but that is not
    the main point of this discussion.


    MRFS
  19. I simply meant that XP is not an issue for mission critical applications as it isn't the right platform. I agree that it's an issue for end users who select the wrong drives for RAID, but WD are partly at fault for not clearly warning users. On the other hand, IT people should be able to determine what's best suited for a given environment.
  20. Quote:
    if the HDD's firmware initiates an error recovery
    sequence, that sequence can take more time as
    the HDD has more data to check: if that sequence
    makes it "appear" to the controller that the HDD has died,
    the controller's internal logic very probably will "drop"
    that HDD from the RAID array.

    Thus, the problem can occur with any OS and with
    any HDDs that do not support one of these features:
    TLER/ERC/CCTL

    Even more so, I am afraid that some TLER/ERC/CCTL hard drives may still get dropped when they start talking different languages to ICHxR/PCH. I hope they all speak in common language in their basic dialogs, even if not so for some advanced error recovery handling features. That is what the relative reliability and categorical compatibility I referred to.

    Quote:
    Based on the threads that you linked, version 9.6 seems to address the dropped drives issue. No additional issues have been posted by people who upgraded to that version.
    ...
    That's harder to determine because Intel have been supporting RAID since the ICH5R, therefore there is more info available about Intel than AMD. However you can google RAIDXpert and you'll find that the issue also exist with AMD. If I were you I'd probably stick to the Intel ICH10R and use the latest Intel Rapid Storage drivers (version 9.6.1014) and RAID enabled hard disks (WD TLER / Samsung CCTL / Seagate ERC).

    I haven't found instances from Intel Communities forum where the relatively young IRST 9.6 proved itself a savior for the random disk fallout problem and confirmed by a tangible amount of users.
    Besides, Intel tech support confirmed that ICHxR/PCH family can't speak with TLER/ERC/CCTL drives even though they may have less chance to step on each other's foot.
    It seems I owe a visit to AMD forum on this perspective, as so recommended by MRFS.
  21. Please let us know your findings about RAIDXpert, your conclusions and your final decision.
  22. Once I finish researching on AMD's take and embark my journey to the AMD forum, I will definitely leave a note here. Please update me with your new findings as well.
  23. > I am afraid that some TLER/ERC/CCTL hard drives may still get dropped when they start talking different languages to ICHxR/PCH


    I don't think it's a syntax issue:

    the firmware in HDDs is already coded
    to respond to "polling" requests issued
    by SATA controllers: as I understand it,
    this is a feature of the ATA command set.

    SATA is merely the Serial version of that ATA command set.

    It's when a HDD does NOT respond
    to such a polling request, that the controller
    then decides to "drop" it from a RAID array.

    The failure to respond, in this context,
    is due to the fact that the HDD's firmware
    is simply BUSY doing error checking,
    and it also does NOT permit "interruptions"
    at the moment polling requests are received.

    Put differently, it is not a "real-time" process,
    but one which queues polling requests
    until such time as the firmware is ready
    to handle such a request.

    The same thing can and does happen
    whenever the HDD's cache is full:
    it will send a command back to the
    controller to wait until that cache
    has been emptied enough for more
    controller output to be received
    by that cache.

    I'm sure many of you have already
    had the experience of trying to "kill"
    a running process that has gone rogue,
    but all attempts to "kill" it fail.

    This was much more common with
    older versions of Windows, like 98SE,
    which did not always respond if/when
    a User tried to kill a rogue process.

    I wouldn't bet on this, until we receive
    absolute confirmation from the HDD manufacturers:

    But, from observing WD's HDD behavior over many years,
    I would have to make an educated guess that
    WD's TLER capability is simply a feature of firmware
    in their RAID Edition ("RE") drives which only does
    error checking for a very limited amount of time
    -AND-
    then, at the instant when that amount of time has passed,
    it checks for any polling requests before going back
    to error checking ...

    ... something like that.


    MRFS
  24. Quote:
    I don't think it's a syntax issue:

    the firmware in HDDs is already coded
    to respond to "polling" requests issued
    by SATA controllers: as I understand it,
    this is a feature of the ATA command set.

    SATA is merely the Serial version of that ATA command set.

    It's when a HDD does NOT respond
    to such a polling request, that the controller
    then decides to "drop" it from a RAID array.

    The failure to respond, in this context,
    is due to the fact that the HDD's firmware
    is simply BUSY doing error checking,
    and it also does NOT permit "interruptions"
    at the moment polling requests are received.


    Do you imply even though TLER, ERC, and CCTL each type has its own error recovery procedure/protocol, but they all adhere to the same basic set of instructions (i.e., the ATA command set) when communicating with Intel ICHxR/PCH? And this basic command set has taken basic error recovery process into account.

    From my understanding, there is no common baseline protocol/instruction to deal with RAID member's error recovery process other than trying not to step on each other's foot too quick, like within 10 seconds. But, will ICHxR/PCH wait forever as long as TLER enabled drive keeps telling the controller it is busy? A basic set should take these into accout to ensure relative reliability.
  25. FYI: found something relevant to AMD's SB850 chipset at Samsung's website:

    http://www.samsung.com/global/business/hdd/faqView.do?b2b_bbs_msg_id=308


    QI 36.

    HDD(F3+F3EG) F/W update for SB850 chipset

    A 36.


    This patch code is released in order to solve the compatibility problem between SB580 chipset and F3 + F3EG model.

    This patch program works only to
    F3 (HD323HJ / HD502HJ / HD503HI / HD103SJ / HD105SI)
    F3EG (HD153WI / HD203WI)

    [F/W Patch Procedure]

    1. Please unzip attached file to Bootable media (USB,CD, Floppy...etc).

    2. Please connect Target HDD to Primary / Mast.

    3. Booting the system to DOS mode by bootable media.(which with this F/W flash program).

    4. Please run " Patch ".

    5. When FW Flash successfully message display , please Power Off system, to make F/W flash complete.


    Note:
    Please do not warm boot (Ctrl+Alt+Delete, or push Reset button .. etc) your PC during the F/W patch. Warm boot can make F/W Patch fail and make unexpected problem on HDD.

    [end quote]


    MRFS
  26. MRFS,

    Thank you for the AMD info, and it takes time to digest. Before I set sail to AMD, do you think we better consult with your previously mentioned reliable expert Allyn Malventano at www.pcper.com for a third opinion? Do you have a direct contact channel to him? I hope we don't judge Intel ICHxR/PCH's RAID reliability too quick due to possible missing details or links. More experiences and reviews definitely will help.
  27. > Do you have a direct contact channel to him?

    Yes: Allyn's email address is a link at any of the excellent reviews
    he has authored at www.pcper.com e.g.:

    http://www.pcper.com/article.php?aid=669


    MRFS
  28. I will seek his advice.
Ask a new question

Read More

NAS / RAID Hard Drives Intel Storage Product