During the period i did some heavy performance testing with software RAID, i did many tests with the stripesize. Generally, you want the stripesize to be as high as possible, to allow one I/O requests to be handled by one drive; and one drive only!
Often the operating system reads chunks of 64KiB - 128KiB. With a stripe misalignment, even with a 128KiB stripesize this would cause some I/O to be spread over two disks. So two disks have to work on one I/O request.
It's much faster if you can make sure that in all/most cases, only ONE disk is involved in a single I/O request, so the other disks can work on another I/O request at the same time if there are enough queued I/O's. Theoretically, this is why RAID0 should scale 100% in terms of random IOps in RAID0. But the scaling is not 100% because:
1) the I/O is not perfectly evenly divided between all RAID disk members, but instead some disks are more often used than others. This issue is not present with RAID1, since all disk members have exact the same information, so the RAID engine has all the freedom to use any of the three disks, unlike RAID0 where a piece of data is stored on one disk only so the RAID engine has no freedom.
2) because of stripe misalignment and a too low stripesize, one I/O request will involve multiple disks; causing the parallel processing potential to sink to the bottom of the ocean.
There is no real reason to set a stripesize to lower than 128KiB, or even to a higher number like 1 or 4 MiB, with the exception of bad RAID drivers that 'optimize' in a very dirty way for sequential transfers by ALWAYS reading the full stripe block even if only a fraction was requested. This is very bad for random IOps performance.
Here's a graph of the stripesize plotted against random I/O performance: