I am building a new PC, and would like your opinion on what would be the faster option. Whatever disk layout I use I will be using an HP Smart Array P600 Controller http://h18004.www1.hp.com/products/quickspecs/12247_div... in this PC, connected to an available PCI express 4x slot. The options I have are:
20 x 72GB disks (10k) RAID 0, in an external enclosure, the communication to the disks is multiplexed down to a quad SATA (4 x 300MBs) external connector, with this option I fear running the 20 disks through only 4 SATA channels may be a bottleneck.
8 x 150GB disks (10k) RAID 0, internally to the PC, communication to each disk is via a dedicated 300MBs SATA connector, so there is no chance of bottlenecks. The raw SATA IO available in the internal option is 8 x 300MBs, so twice the external setup.
What option do you think will give me better performance, or it there not much in it?
I'd say the external setup would give slightly better performance, as thats 1200MiB/s max throughput, something that may hold back the 20 drives slightly but the 8 drives wouldnt match. I assume as these are SATA drives they are Raptors, which only have a 150MiB/s signalling rate anyway, meaning the max 'theoretical' throughput is the same for the 8 drives as the 20. The 20 of course will have more actual throughput.
But with 20 drives RAID0 is not the best idea, RAID5 or RAID6 would be much better.
EDIT: PCIe x4 will limit the bandwidth to about 1000MiB/s in either direction anyway, so you wont hit the SATA limit, no matter how many drives you run.
Depending on the way it is implimented on the controller a 20 disk RAID0/1 array can have the same READ throughput as a 20 disk RAID0 array, although half the capacity. (as it can read different info from all 20 disks)
However, write performance is the same as a 10 disk RAID0 array, as it has to write every byte twice.
They both have the same SATA bandwidth as the drives in the 8 disk array only support a 1.5GBps signalling rate, and the PCIe bandwidth will be the limiting factor on Reads anyway. As such I'd say the external setup would be faster, however it will also have more drive failures needing replacement in its lifetime.
IMHO, not based on actual data as I have no 20 disk RAID0 drive to test it with, the 20 disk array is going to be operating at the limits of either the Controllers processing power or the PCIe bandwith during reads, whichever it hits first. This is based on the fact that even assuming a sub-optimal 50MiB/s read speed from each drive, you can hit 1000MiB/s transfer with 20 drives.
Assuming this is the PCIe bandwidth, which I find likely as thats a high end controller, the sustained transfer with 20 drives should be close to 1000MiB/s.
8 drives on the other hand, if we assume a slightly optimistic 75MiB/s sustained transfer, are going to hit about 600MiB/s in total.
Do bear in mind however that even with a 32KiB stripe size, the total stripe across all drives is going to be 640KiB on the 20 disk option. As such, I wouldnt expect stellar perfoemance on smaller files, but I assume if you are setting up an array like this is is for Video editing or similar, and likely to use single massive files, in which case your performance should be amazing.
Since nobody has asked some key questions, here goes:
1) What do you mean by "faster"? Shorter access time? Higher throughput? What is the size range and distribution of the files you want to access? Is either read or write speed important or unimportant?
2) What is the total size of storage your data requires?
3) Do you have any cost constraints?
4) Do you have any equipment constraints (e.g. specific controller card as you mentioned)?
5) Do you have any maintenance constraints ("no" means dedicated IT staff to monitor/service storage system)?
6) Is fault-tolerance important?
7) What about backup?
A typical hard drive has a MTF (mean time before failure) of roughly a few years (lets say 3 years). That means that, on average, this kinda of drive is likely to experience some sort of problem/failure after at most 3 years. You might go for 12 years and never see any issues, or you might run it for 5 months and have it die on you in a dramatic and violent fashion.
A raid 0 array only functions if all the disks used for the array are operational. IF one disk fails, you lose ALL your data. So having 8 disks means that you are (without going into any complicated probability calculations), exponentially increasing the chances of losing your data at any particular moment - when compared to a single drive setup. This might not be noticeable with 2 or 3 drives (typically used in raid 0), as people will likely upgrade their system before their drives get too old.
But with 20 drives, you're begging for trouble. In fact, you can probably get sponsored by whatever company you decide to go with (your HD brand), since it's basically a reliability study you're running there.
Also, remember that your throughput is limited not only by your hard drive performance, but also by the performance of your controller. The controllers built in to most South Bridge chips (the cheap kind), albet adequate for most users, might bottleneck when you have 20 drives in "parallel". I dont even think they support (hardware raid 0) more than 4 drives. Although I could easily be wrong on that last point.
And lastly, the controller you mention is designed for SCSI drives. Why are we talking about SATA hard drives?
p.s. If you tell me that you want a 20 drive raid 0 SCSI setup to have faster load times, I'm going to have to recommend you see a psischiatrist.
1) The performance I care about is throughput of non sequential reads of a single 500GB file
2) See 1
3) Yes - and no... The internal option costs less, that is good, but I have the budget to get the job done
4) Yes - I have the controller already
5) I can swap out the disks when required, either external or internal they will be in hot swap cages, and I tend to keep spares.
7) I have an HP Ultrium 800GB tape drive
No offence but did you two read the previous posts?
He is talking about 10k RPM SATA drives, which can only be Raptors. These have a 5 year warranty, and as such are expected to last 5 years.
He is talking about RAID 1/0, and as such has fault tolerence accounted for. He can loose up to 10 of his 20 drives and maintain stability, assuming those 10 are all in different RAID1 pairs. If he is talking about a system like this I'm sure he will be bale to afford a number of drives as spares.
He only cares about Read speed, not write.
Yes, he has a SAS (serial attached SCSI) controller. These support SATA drives natively.
With a single large file, he is imho looking at the ideal storage solution, assuming it is a file to be accessed sequentially, and not a file that needs random access, like an SQL database.
If it is a file that needs random access, 15k RPM SAS SCSI drives would be a better bet.
EDIT: Just read, you want non-sequential access speed, I'd say 15k SAS drives are a better bet, as its seek times you need to worry about more than sequential read speed. Unfortunately something along the lines of a Fujitsu Max 73GB 15k RPM drive is going to cost almost double the price of a 74GB Raptor, however the Average Latency is 2ms, down from 2.99ms for the Raptor, Average seek is 3.3ms, from 4.6 on the Raptor, and max seek is 8ms, down from 10.2ms on the Raptor.
Raptors are however enterprise class drives, and so the MTBF is the same (1,200,000 hours).