Number of disks in a software RAID-5 array

Archived from groups: comp.sys.ibm.pc.hardware.storage (More info?)

What are the trade-offs regarding the number of disks in a software
RAID-5 array ? My understanding is that, the more disks there are,

1. the more storage for the euro,

2. the worse the performance (assuming the bus is the
bottleneck, which is not unlikely in the case of software
RAID),

Does the reliability of the array increase with the number of
disks ? I'm aware that more disks means failures occur more
often but is it not offset by the fact that each disk contains a
smaller portion of the data ? I'm not sure about that because it
seems to contradict #1.

--
André Majorel <URL:http://www.teaser.fr/~amajorel/>
Todos, todos me miran mal
Salvo los ciegos, es natural
3 answers Last reply
More about number disks software raid array
  1. Archived from groups: comp.sys.ibm.pc.hardware.storage (More info?)

    Previously Andre Majorel <amajorel@teezer.fr> wrote:
    > What are the trade-offs regarding the number of disks in a software
    > RAID-5 array ? My understanding is that, the more disks there are,

    > 1. the more storage for the euro,

    Loss: 1/n with n partitions or drives in the RAID-5.

    > 2. the worse the performance (assuming the bus is the
    > bottleneck, which is not unlikely in the case of software
    > RAID),

    For reads only if you have some magic other method to circumvect that
    bottleneck. Writes get slower, since the also involve reads on RAID.

    Personal experience with Linux 2.6.x: Reads get fater up to the
    hardware limit, like a n-1 RAID-1 set, so no performence loss here.
    Writes Are about the same speed on a 3 disk RAID5 as on a
    8 disk RAID5. Since my application is dominated by reads, I never
    tried much tuning. However I have noticed that linear writes can
    get faster with larger block sizes, e.g. 32k or 128k, depending on
    the hardware.

    One thing that kills both read and write performance is putting
    two disks on one IDE channel in a promise 133TX2 controller.
    The effect seems also to be present with HighPoint HTP374-based
    controllers.

    > Does the reliability of the array increase with the number of
    > disks ?

    Overall "loss-risk vs. time" of course increases,
    since the more disks, the higher the risk of a double-loss.
    However normally you have some replacement procedure in place
    that will keep the risk relatively low, e.g. if you are going
    to replace a failed disk within 24 hours it is just the risk of
    loosing 2 disks in 24 hours. For this reason it is advisable
    to have a cold spare handy or maybe even a hot one. In practice
    people do not put more than 8 disks or so into one RAID5 array.
    Determining when such an array is not anymore more reliable than
    a single disk is difficult, since it e.g. depends on the speed
    of replacement and other concrete operation factors. For large
    numbers of disks, the reliability of a RAID% array will be
    significanly lower than that of an individual disk.

    Reliability ber byte stored also decreases with the number of
    disks, but it will never get worse than for one individual disk.
    Here you can think of the party info beeing used for more and
    more data and having less and less benefit.

    > I'm aware that more disks means failures occur more
    > often but is it not offset by the fact that each disk contains a
    > smaller portion of the data ? I'm not sure about that because it
    > seems to contradict #1.

    Your reasoning is flawed: An one disk loss means no data loss.
    A two disk loss means a catastrophic loss of _all_ data, no
    matter how large the individual pieces were.

    For more redundancy, you can use RAID6, which can tolerate
    up to two disk/partition lost. However it gets slow when two
    disks/partitions are missing. And in Linux-2.6.x it is still
    experimental.

    Arno
    --
    For email address: lastname AT tik DOT ee DOT ethz DOT ch
    GnuPG: ID:1E25338F FP:0C30 5782 9D93 F785 E79C 0296 797F 6B50 1E25 338F
    "The more corrupt the state, the more numerous the laws" - Tacitus
  2. Archived from groups: comp.sys.ibm.pc.hardware.storage (More info?)

    On 2004-12-31, Arno Wagner <me@privacy.net> wrote:
    > Previously Andre Majorel <amajorel@teezer.fr> wrote:
    >> What are the trade-offs regarding the number of disks in a software
    >> RAID-5 array ? My understanding is that, the more disks there are,
    >
    >> 1. the more storage for the euro,
    >
    > Loss: 1/n with n partitions or drives in the RAID-5.
    >
    >> 2. the worse the performance (assuming the bus is the
    >> bottleneck, which is not unlikely in the case of software
    >> RAID),
    >
    > For reads only if you have some magic other method to circumvect that
    > bottleneck. Writes get slower, since the also involve reads on RAID.
    >
    > Personal experience with Linux 2.6.x: Reads get fater up to the
    > hardware limit, like a n-1 RAID-1 set, so no performence loss here.
    >
    > Writes Are about the same speed on a 3 disk RAID5 as on a
    > 8 disk RAID5.

    OK.

    > Since my application is dominated by reads, I never
    > tried much tuning. However I have noticed that linear writes can
    > get faster with larger block sizes, e.g. 32k or 128k, depending on
    > the hardware.

    By block size, do you mean fs-level block (mke2fs -b), or
    fs-level stride (mke2fs -R stride=), or stripe size, or
    something else ?

    > One thing that kills both read and write performance is putting
    > two disks on one IDE channel in a promise 133TX2 controller.
    > The effect seems also to be present with HighPoint HTP374-based
    > controllers.

    Yes, I've been warned about that. I understand it's true
    regardless of the controller (i.e. it's a fundamental
    shortcoming of IDE).

    >> I'm aware that more disks means failures occur more
    >> often but is it not offset by the fact that each disk contains a
    >> smaller portion of the data ? I'm not sure about that because it
    >> seems to contradict #1.
    >
    > Your reasoning is flawed: An one disk loss means no data loss.
    > A two disk loss means a catastrophic loss of _all_ data, no
    > matter how large the individual pieces were.

    OK. As I don't understand how RAID-5 distributes the data across
    the disks, I'm in the dark.

    Thank you for the explanations. There doesn't seem to be any
    technical reason for increasing the disk count. 4 disks seems
    like a good compromise, yes ?

    --
    André Majorel <URL:http://www.teaser.fr/~amajorel/>
    Todos, todos me miran mal
    Salvo los ciegos, es natural
  3. Archived from groups: comp.sys.ibm.pc.hardware.storage (More info?)

    Previously Andre Majorel <amajorel@teezer.fr> wrote:
    > On 2004-12-31, Arno Wagner <me@privacy.net> wrote:
    >> Previously Andre Majorel <amajorel@teezer.fr> wrote:
    >>> What are the trade-offs regarding the number of disks in a software
    >>> RAID-5 array ? My understanding is that, the more disks there are,
    >>
    >>> 1. the more storage for the euro,
    >>
    >> Loss: 1/n with n partitions or drives in the RAID-5.
    >>
    >>> 2. the worse the performance (assuming the bus is the
    >>> bottleneck, which is not unlikely in the case of software
    >>> RAID),
    >>
    >> For reads only if you have some magic other method to circumvect that
    >> bottleneck. Writes get slower, since the also involve reads on RAID.
    >>
    >> Personal experience with Linux 2.6.x: Reads get fater up to the
    >> hardware limit, like a n-1 RAID-1 set, so no performence loss here.
    >>
    >> Writes Are about the same speed on a 3 disk RAID5 as on a
    >> 8 disk RAID5.

    > OK.

    >> Since my application is dominated by reads, I never
    >> tried much tuning. However I have noticed that linear writes can
    >> get faster with larger block sizes, e.g. 32k or 128k, depending on
    >> the hardware.

    > By block size, do you mean fs-level block (mke2fs -b), or
    > fs-level stride (mke2fs -R stride=), or stripe size, or
    > something else ?

    Sprry, that would be "chunk-size", i.e. stripe-size. There might
    be additional gains from matching the fs-level stride to it,
    but I did not try that so far.

    >> One thing that kills both read and write performance is putting
    >> two disks on one IDE channel in a promise 133TX2 controller.
    >> The effect seems also to be present with HighPoint HTP374-based
    >> controllers.

    > Yes, I've been warned about that. I understand it's true
    > regardless of the controller (i.e. it's a fundamental
    > shortcoming of IDE).

    From my experience the effect is small with the VIA onboard
    ATA controllers on my mainboard. It is massive with Promise
    PCI controllers and strong with the HPT on-board controllers
    I have.

    >>> I'm aware that more disks means failures occur more
    >>> often but is it not offset by the fact that each disk contains a
    >>> smaller portion of the data ? I'm not sure about that because it
    >>> seems to contradict #1.
    >>
    >> Your reasoning is flawed: An one disk loss means no data loss.
    >> A two disk loss means a catastrophic loss of _all_ data, no
    >> matter how large the individual pieces were.

    > OK. As I don't understand how RAID-5 distributes the data across
    > the disks, I'm in the dark.

    Simple: Split it in n-1 sets of pices, like it was a RAID-0.
    Then a set of pices (each in the sitze of a chunk) goes on
    each disk in turn. For the missing pice store a butwise
    xor of all the other pices. That way one missing pice can be
    reconstructed from the others and the xor-pice. The xor-pice is
    rotated around, so loss of any one disk had the same performance
    impact.

    If you losse 2 pices, regardless of which one, you miss one
    chunk-sized pice in any n-1 chunks, e.g. 4kB out of each
    28kB slice in an 8 disk array.

    > Thank you for the explanations. There doesn't seem to be any
    > technical reason for increasing the disk count. 4 disks seems
    > like a good compromise, yes ?

    The one reason to use more disks is that you get a larger
    array. My experience is that 4-8 disks make sense. 3 is wasteful.
    more than 8 gets difficult to manage.

    Arno
    --
    For email address: lastname AT tik DOT ee DOT ethz DOT ch
    GnuPG: ID:1E25338F FP:0C30 5782 9D93 F785 E79C 0296 797F 6B50 1E25 338F
    "The more corrupt the state, the more numerous the laws" - Tacitus
Ask a new question

Read More

NAS / RAID Storage Software