Sign in with
Sign up | Sign in

Write Back Cache (Failure Scenarios)

Hi All,

Short Version:

I've just completed a brand new build of a pretty strong workstation. My question(s) to you folks center around what happens to this machine with write-back cache enabled (with a pretty sturdy UPS, under USB monitoring from the machine) under two scenarios. 1. BSOD , 2. General System Hangs.

I'ts configured with a 2 Drive Raid 0 (System) volume, a 5 Drive Raid 0 (Data Processing) volume and a 3 Drive RAID 5 (Database Replication/Storage) Volume.

Be aware that I'm well aware of RAID 0 risks and a thorough, multi-phase backup plan is in place. That said, for the reasons listed in the "Long Version/Details" version below, I can't afford cycles lost on parity calc's most of the time. In short, I need as much speed as I can safely wring out of the machine, which is why I want to implement write-back caching.

As mentioned, it's well UPS'd and configured to a 3 minute shut-down in the event of an outage. The only things I can't control as I'm currently built is a potential I/O BSOD, or something like a "Hang" on a Windows Update shutdown/restart (which has happened to me in the past on other workstations).

So in those scenarios, with write-back cache enabled, what happens to the data that was memory-resident at the time of the failure?

If you need more details to help you make a call on this, read below.

Thanks in advance!

Long Version/Details

I'm going to front-load as many pieces of information as I can in this post to hopefully catch any questions you might have in advance.

As I said, the machine is a workstation designed to allow me to design, build and test-bed software running ETL and analytics against some fairly robust volumes of data. I wanted a workstation front end with a little server butt that I could do some full scale validation and testing on. Some processes tend to be equally R/W intensive, while others tend to be much more read driven. The main thing worrying me is that I've never operated complex RAID configuration on a pair of on-board controllers.

The machine is configured as follows.

Board: EVGA Classified SR-X
Procs: 2 Xeon 2630's
Memory: 96GB (DDR3 1866 Quad Channel)
HDD: 7X Seagate Constellation ES (1TB, 7200 RPM, 64MB)
3X Seagate Constellation ES (2TB, 7200 RPM, 64MB)
PSU: 850 Watt
UPS: Cyberpower 1350VA 810 Watt
O/S: Windows 7 Ultimate

The board provides to on-board RAID controllers:
1 Intel Raid offering 2 on-board SATA channels @ 6GB/s and 4 channels @ 3 GB/s
1 Marvell 2 Port Mini-SAS (8 SATA Port Breakout)

I'm running:

2 1TB HDD's on the Intel SATA 6GB/s Ports RAID 0 (System Volume) C:
5 1TB HDD's on the Marvell SAS Ports RAID 0 (Processing Volume) P:
3 2TB HDD's on the Intel Sata 3GB/s Ports RAID 5 (DB Replication/Storage) S:
There's also a 6 TB (RAID 5) attached storage device that the data ultimately pushes to before going out to production.


So, with all of that said, what do we think happens if the machine BSOD's, or "Hung" for any reason?

Additional Info: The Process

This work is done with a combination of Perl, SQL, and C# in a SQL Server 2008 R2 Environment.

Most of the work of cleansing, normalizing/denormalizing the data (as the case may be) and other processes happens on P: volume (Raid 0), and as they're sorted and bucketed into their proper tables, they replicate to the S: volume where there's a little redundancy. Once there, some verification and validation scripts (heavy read processes, few writes) are run on the data staged on the S: (Raid 5) volume, and then it's pushed to attached storage which pushes it out to the world and the other drives are cleaned up.

Thanks again for your patience in reading all of this, and in advance for your help!
CHSIndependent~
Ask the community
!