Kind of an hypothetical request. We already run a production server but I'm just wondering how you guys might have done the same:
Can anyone suggest a good complete disk system for running a server that needs to access 1000's of small files? I don't care about transferring 5GB dvd's - we're talking mostly of finding up 5-50kb emails scattered around a disk (yes we use Maildir).
Can you suggest a disk? I expect we want one with very very fast seek times with max bandwidth being secondary.
We'd like a redundant system so a RAID 5 (6?) setup is a must - a good card for this?
Lastly - what would you suggest for a filesystem? XFS is what we're using right now. ReiserFS was considered too.
How many users. If its 1-50 SATA will do you fine. If we are talking 100's then SAS is the way to go.
SAS Seagate drives are tough to beat in the SCSI/SAS world.
At least 4 or more disks (6 optimal) if you are going to do Raid5. This should overcome any write slowdowns inherent in Raid5, not that you need write speeds on an email server. What sort of capacity are you needing?
The controller makes all the difference on high end raid solutions, this is the one area that saving money is not an option.
We're aiming for say 10,000 users
Average 50MB usage between them (total 500GB)
Peak usage of 2GB mailboxes with 20,000 emails in one Maildir.
I take it then that SAS is halfway between SATA and SCSI?
However at the moment we have around 3000 active users on SATA disks and a 3ware 9550sx RAID 5. It copes very well so you can push SATA further than you think. I actually think we will be able to at least double our user load on this setup. Not sure about 10,000 but it might still work. We'll add on another disk in due course which should help performance further.
I know of installations that average 25,000 IMAP users per server so I know this is possible. Just wondering what kind of difference in disk setup one needs to consider for maildir which is all small files.
1. 10K or 15K SAS drives will give you the seek times and IOps that will help immensely.
2. Get a RAID card with as large a cache as you can get. Fact: You seek the drives 25% less if you can get 25% cache hits. Obviously, with the large cache, get the battery backup if you can so that you can use write-back cache and maintain data integrity. Better yet, a UPS for the whole server.
SAS is Serial Attached SCSI, which is SCSI's version of SATA. It's the full SCSI protocol over SAS cables, which gives you the cable management, noise immunity, and simpler configuration, as well as full 300MB/sec bandwidth to each drive instead of a shared bus.
For a SAS RAID card with a large cache, look at LSI's cards.
I was being a bit conservative on the SATA estimates. You have to be careful when using SATA in a production environment. Their are only a few sata drives that are Enterprise rated. Also, SATA does better for larger files than it does with small emails. This is primarily due to the way the cache is setup on each drive, that along with the lower spindle speeds which are typically 7200rpm.
SAS will outperform both SATA and SCSI. Wouldnt bother with the 10K, go straight for the 15K with all those small files. It will be more than worth it when you have to go to the disk due to a cache miss.
As SomeJoe says, get as much cache on the controller as possible. LSI controllers are the brand of choice for large production environments.