RAID0 Help / Question
I have a Intel Desktop Board D5400XS and was using onboard RAID and 4 velociraptors. I use VMware and after having many VM's open my computer will lock up after heavy hd activity so I want to get a RAID Controller. The HDD's are 3.0Gbps, but I noticed all the PCI-E Raid cards are 1.5Gbps. I went to Fry's and they only had a PCI-X RAID card that has full speed. Not knowing I bought this and noticed my board doesn't have a PCI-X slot lol. Can someone please help me by pointing me in the right direction to get a RAID card for this board?
I do some crazy stuff with VMware myself, and I don't think the RAID controller is necessarily your problem. Of course, a true hardware RAID controller always beats on-board anyhow, so it's not a bad upgrade anyhow. How many VM's do you have open at once, and more importantly, what are there typicall uses ? The rest of your system specs would be helpful too. The first thing that jumps out at me is the 4x RAID 0 set. I tried this when I was first started playing around with VM's with similar results. The big problem, is you are set-up for throughput, not I/O. Splitting up the giant RAID 0 set typically yields far better results. Personally, I dedicate 1 hard drive for each VM, with 2 per HDD being my limit, as long as 1 one of those 2 VM's are not heavily taxed. In the case of VM's, more spindles available for unrelated simultaneous tasks is far more effecient than 1 ultra fast RAID set. This is one of those scenarios where your typical desktop benchmarks are almost completely meaningless.
Performance Killers in VMware...
1: Available cores ---> if you have both those sockets populated you're golden. Most of the time, especially when hosting multiple VM's 1 core per VM is actually faster than 2.
2: RAM, RAM, and more RAM ---> although over-allocating RAM is supported in VMware, swapping to disk under heavy load can take it all down - This could also be your problem, especially with only one logical drive ( RAID or not ).
3: HDD's ---> VMware actually recommends a dedicated drive or RAID set for each VM. If the VM's are only used for testing new software/OS's, than you can get away with a simple partition. Any VM actually used while the host or other guest VM's perform disk related tasks, will suffer horriblly from residing on the same logical drive. On the enterprise level, this can be worked around by using RAID 10 ( or 5 ) to mount more than 1 VM per logical drive as these RAID levels yield better I/O in multi-user environments ( which is what virtualizing is ). The VM's should never reside on the same logical drive as the host OS or VMware software. ( preferablly OS on 1 drive, host apps on another, and VM's on other multiple drives ).
I know this may sound like overkill, but the performance difference is night and day when dedicating a HDD to a VM. Knowing your usage patterns would help to see if you can "cheat" somewhat.
If you still want to buy a RAID card, a budget is definately needed here, as they get very pricey.
rrob -- I'd go with the 3Ware 9650E. I use LSI these days, but in the past I've had better experience with 3Ware than Adaptec. With 4 drives I'd suggest RAID 10 unless you're really hurting for space, in which case there are probably better solutions.
I have a somewhat similar config on a Dell PowerEdge 2950 (2x quad-core 54xx Xeon's) w/16GB running VMWare server, with 4x DAS SATA RAID-10 for the OS and VM images (seperate partitions for OS and VM images, but the same array), with the VM's data disks sitting on a SAN. It's an SMB environment and runs 12-16 production VM's without a problem, including Exchange and SQL server, a file server, supports a few Linux web and NFS servers, etc. It could do more if I put more memory in it.
ShadowFlash -- Where have you seen the recomendation for a dedicated/array per VM from VMware? Segregating LUN's (maybe) if you're using a SAN, but there are quite a few installs happily running many many VM's from a single DAS array (or for that matter a single SAN LUN via esx). If the VM's are data disk IO intensive, then yes, their data disks should be on an array capable of supporting them--OS's themselves aren't generally that that disk IO intensive unless you're paging. Many people are running quite a few VM's with no DAS directly off low-end SAN's (e.g., EMC Ax150i) over a couple Gbe links.
I don't have the specific link for the seperate disk VM's handy. I think it might actually be in the workstation pdf from VMware, but I'll check if it's an issue. I thought it was common knowledge for workstation at least...I've seen it recommended many times elsewhere and at VMware. I know for me, it was a huge performance gain moving the VM's to individual drives ( RAID 1's in my case ). As I said, using RAID 10 alot more is possible. RAID 0 just dosen't allow for the type of access patterns that RAID 10 allows. I can run a good number of VM's on one of my 14-drive DAS's running as a single RAID 10, although splitting it into 7 RAID 1 sets is way more effective. Without knowing how much RAM he has, paging was also suspected as the problem which I stated. I'm guessing he's over-allocating the RAM and swapping to disk under heavy loads, which would kill any single logical drive layout. Every scenario I've tried using RAID 0 for hosting VM's, especially with the host OS installed on it, yielded very poor results.
I perhaps jumped the gun a bit, but without knowing what the OP meant by "many VM's open" and his actuall usage, not to mention the rest of the system specs, I didn't have alot to go on. If in fact he's in an enterprise environment, I doubt if Tom's would be the first place to go for advice. In my check list of performance killers, RAM ranked higher than HDD's and alot of my problems dissapeared when I upgraded to 16GB's of RAM too. My main point was that I suspect something other than the RAID controller as the performance problem. Make no mistake, I'm a RAID junkie who'de ordinarily recommend a new controller card any day, but I just don't think that's the problem, although a dedicated cache with battery back-up sure would help on the random writes.
Anything inlvolving SAN's containing at least 4 drives are more network limited, not HDD limited anyhow, even with dual GBe in large file access. A streaming media server is the perfect example. In your case the seperate Data drive helps out alot. Without it, any heavy sequential data file access would cause all the VM's to suffer horrible performance loss. Stripe size comes in to play here as well for the random I/O. Running both the host OS and Apps on that same drive hurts even more, and can crash ( or at least hang ) the whole enchilada...been there, done that. VMware Server is an entirely different animal, as it provides much better direct hardware support. I ( perhaps wrongly ) assumed that he was running VMware in a host windows enviornment using workstation. I honestlty don't have any experience with esx server ( when it supports DirectX I'm there in a heartbeat ).
I run 6 concurrent VM's on a quad socket dual-core Opty Tyan board with 16GB's of RAM. Four VM's are primary workstations capable of mild gaming and limited CAD use. One is perpetually set-up for remote access, and the last is for my streaming media center. I'm in the process of upgrading to a quad socket quad core kepping the same config, but allocating 2 cores for each VM just 'cause I got a deal on the hardware. My usage patterns are obviously far different than your average server type use. It's VMware's fault really wierdos like me exist, as VMware Workstations' DirectX support stems specificaly from there work with boeing to support CAD across VM's. Gaming was just a natural by-product. I do both, and I have enough kids to need the extra seats at home, hence my strange project.
ShadowFlash -- Getting a bit OT here, but...
I think what you may be referring to is the use of raw devices, or booting an image from a raw device(?). Sharing those among VM's is a definite no-no unless the OS and file system is truly cluster-aware (and none, outside a few lab rats, are unless the disk is used simply as a cluster quorum device). Thus, that recommendation is because you'll likely crash the OS's sharing the device and/or trash the file system, not for performance reasons.
Those Gbe links may not seem like much, but it's pretty amazing to see just how many VM's can work over a single Gbe link (which is really all you get even with teaming unless you use split LUNs, MCS, MPIO, etc.). OS's alone tend to do a lot less IO than most people think.
For typical server workloads (e.g., Web server, Exchange, DB), disk IOPs are the limiting factor, not network bandwidth; doubling disk spindles provides a far greater performance improvement than doubling network bandwidth. A survey of SAN's will show that... compare performance based on increasing network bandwidth with increasing the number of disk spindles... all other things being equal, more spindles wins every time. A look at typical throughput in MB/sec for web and DB servers with DAS will also confirm that... lots of small random IO's with MB/sec throughput in single digits (and maybe double digits if you have fast disks). Disks, SAN-connected are otherwise, are generally the limiting factor, not the network (at least not Gbe networks).
p.s. I don't think you'll ever see VMware esx do DirectX--it's intended for a very different audience than VMware workstation. For me that's fine... my own VM server box sits in a closet and I access it and it's VM's remotely (via web, RDP, X, SSH and VNC). If I want DirectX performance, it goes on bare metal (of which there is one, and only one, such box sitting under my desk).
Yeah, I know we're way OT, but the OP seems to have gone AWOL. I have tried the raw physical disk approach, but it's never really given me any benefits, except as a data disk, not the VM itself ( and that was just for easy file management via the host OS, not for performance ). I think where our difference's lie, is the vastly different uses we have. Running CAD, Gaming, and streaming media can all be very disk intensive, hence the problems and solutions I've experienced. I did look around VMware community forums last night, and I did find a bunch of posts with similar problems. The majority of advice given there was to seperate the VM images from the host OS, RAID 10 the VM image disks, and/or seperate the VM's onto diffirent physical spindles, having no more than 2 VM's per dedicated spindle. The problem with what I found is that it's all very dated circa 2006, which is approx. when I was having these problems myself. I just didn't feel right quoting old links. Perhaps VMware's file management improvements have somewhat rectified these issues since then, I don't know. Most of these things are not anywhere near as much of an issue for ESX server, as you do not "double up" on file systems with the host windows OS.
When I said that network speed is more limiting than HDD speed, I was refering to max throughput. 4 HDD's in RAID 0 will easily surpass actual file transfer speeds of GBe. Streaming 1 HD movie across a GBe will noticably slow down performance of all the VM's. Again, vastly different uses, you and me. I do run dual GBe myself, and for the small I/O transactions it is great, and I wouldn't go back to 100 speeds by any means, but actual streaming transfer speeds do not set my world on fire. I'm sure you are absolutely right, and I know I am too, it's just the difference between an I/O environment and a disk intensive applications, not to mention my handicap of running the whole ball of wax inside a host windows OS. Using ESX server is far more effecient for large numbers of VM's.
It's also important to note that I due not rely on network transfer rates for this very reason. Most of my VM's are directlty connected to my host using multiple monitors, keyboards, and mice. Strange, I know. I am experimenting with the improved RDP in 2008 R2 now to see if I can actually pull it all off now the more traditional "thin client" way. In the past, both terminal services, and network bandwith bottlenecked me.
From the limited information given by the OP, do you believe his problem is actually the lack of a good hardware controller card ? I've run heavy disk intensive VM's before off of on-board RAID with no major issues to speak of.
I know that I'll probablly never see DX support in ESX, but it would be nice. A number of my CAD workstations at work could be easily replaced by VM's, as they do not need extreme modeling performance, but DX support would extremely help basic operation. OpenGL would be nicer yet, but that really is wishful thinking.
ShadowFlash -- Sounds like you have a pretty cool setup. Hard to tell where the OP's problem is, but my first guesses would be:
1. Not enough memory: (a) VM's are paging, either internally or due to VMWare trimming; or (b) excessive host OS/app paging by not reserving/limiting of memory; (c) possibly resulting in timeouts and retries; (d) leading to thrashing.
2. Disk fragmentation (host or within vdisk): (a) excessive use of snapshots; (b) not preallocating vdisks; (c) internal vdisk fragmentation; (d) possibly resulting in timeouts and retries; (e) leading to thrashing.
I always preallocate guest OS vdisks; use a separate host partition/fs reserved only for guest OS vdisks; and use the largest reasonable cluster size possible for the host partition/fs that contains those vdisks.
Then ensure paging in the host and guests doesn't go non-linear; if it does, reallocate memory, unload some guests, or add more memory. With that, I've found a single array/spindle can support quite a few guests, whether DAS or SAN.
As to data disks (host vdisks, SAN LUNs, etc.), it's almost impossible to make generalizations, other than that most of them should probably be on a different LUN/array/spindle tailored to their needs (and most importantly) separate from guest VM's.
Again, most OS's don't inherently do that much disk IO, at least if they have adequate memory, which is why running many guest OS's directly off even a low-end iSCSI SAN (or DAS) generally works OK. (I cringe when I see a guest OS with a bunch of disk IO-intensive the apps and their data in a single vdisk, or the app/data vdisks on the same spindle/array.)
Good call on excessive snapshots....missed that one. I'de never even consider dynamic sizing disks either, just asking for trouble there. I guess we're pretty much doing the same thing. I usually use a single array for each VM image AND it's data disk. It's not the best scenario, but it does isolate any possible contention with the other VM's. My goal was to isolate each VM from each other as far as possible on the hardware level. That includes HDD's, multiple CPU's and NUMA memory banks. My end goal is to achieve near perfect workstation performance on each VM simultaneously. This only works on a somewhat smaller scale, as you can run out of dedicated resourses real quick, and are forced to share. I run 2x 14-drive compaq SCSI disk shelfs, one of which is specifically dedicated to my VM's, the other I only run part-time as a back-up solution. They're a bit old, but like you said, more spindles are alway better
On a side note, PCoIP will change everything in the next VMware View, and with Teradici's next gen hardware host cards and portals, multiple CAD/gaming clients running at 30-60fps with full DX and OpenGL will finally be possible. And to anyone else watching....yes, it can play cyrsis
Thanks, for the compliment, your set-up is no slouch either, LOL. Unfortunately, I soon as I get mine how I like it, something new comes out, and I start all over. I'm typing this on a P4 right now, just because I'm redo-ing the main rig yet again....sigh....
Edit: You did get me thinking though....Do you think it would be better in my case to devote a 6-drive RAID 10 array to all the VM images themselves via partitions and dedicating an 8-drive array to the data disk for them all ? This could cause some minor contention under heavy disk loads, but a large RAID 10 should be able to handle the simultaneous streaming reads at least. The benefit, of course, would be an increase in performance when some VM's are idle due to the full performance advantages of the larger RAID sets. As I'm in the process of redoing everything anyhow, now would be the time to make a change.
ShadowFlash -- If I understand your config correctly, you have a 6x RAID-10 array, plus another 8x drives to work with, for a total of 14 drives. I've found a 4x RAID 10 array (DAS SATA) more than sufficient for as many guest OS's/vdisks as available memory/CPU will handle on a 16GB 2x Xeon 5430.
For me that's 12-16 VM's on that config, with the primary constraint being memory... the RAID-10 array on which the guest VM's reside is well below it's max most of the time (only time it's really stressed is startup, or when bouncing VM's, otherwise the aggregate disk IO rate is quite low and the disk queue length rarely gets above 1-2.
Assuming a similar config works for you, that would leave you 10 drives for data to divide up as needed. Not sure what your IO workload looks like, but, e.g., in one install I've got an Exchange server running on a (SAN) 2x RAID-1, and a file server running on a (SAN) 6x RAID-5, and a Linux/NFS server running on a (SAN) 2x RAID-1. While they're rarely balls-to-the-wall (it's an SMB environment with typically 40-60 users), the VM's are generally happy, and I've never had a problem with them complaining (except when someone unplugs network cables .
Hard to tell, but I expect that, as you suggest, you could divvy up some of those drives and dedicate some to RAID-1 or RAID-10 arrays for app/IO and get very good performance, without killing your ability to run lots of guests. Sorry I can't be more definitive (too many variables , but I'd be very interested in what you determine.