I'm an IT Analyst working in Disaster Recovery (DR) for a Fortune 200 manufacturing company.
I'm doing a little research on disaster recovery and what qualifies as a disaster. Can those of you who work in the DR space provide me with what you and/or your company define as a disaster?
When is it appropriate to invoke a company’s IT disaster recovery plan?
- What qualifies as an IT disaster?
- What kind of resources are needed to start performing recovery?
- How much time must past (service outage) before a disaster can be declared?
- Are disaster definitions intentionally vague or are there specific milestones that need to be met before an IT disaster can be declared?
- What role does fail over play in the definition of disaster?
- What is the difference between an IT disaster and a major/serious incident?
DR is based on a system being completely lost, or having to rebuild a server.
If one of your core servers went down, how do you rebuild it? What are the functions of that server?
You want to document how to rebuild or configure the system, what functions that system held, and all that fun stuff.
How and when you turn to DR is your own decision.
When to declare one is depedent on the knowledge of your IT department. Plain and simple.
Fail over is far more necessary than having a disaster recovery plan ready to go. If it is hardware based most companies are SOL because they do not keep extra hardware on hand for DR. You likely will fail over to existing hardware, or repurpose and existing non-critical systems.
Prior to any occurrence you and you immediate boss should have written guidelines (a 'manual') to guide any response. The manual should clearly define various incidents and the immediate response and steps necessary.
In a company as large as yours, the owners/board of directors should sign off on the plan, too.
As riser noted, it comes down to documentation and authorization. IT 'incidents' and 'production line' incidents (i. e., problems with something like a CNC or robotic machine) need to be segregated with a 'hot' employee empowered in each situation. This can range from a line employee, to a supervisor or even a plant manager; from the night-time IT guy (or girl), their supervisor or the IT director. This essentially follows the range of your imagination (leave out Alien Invasions and Body Snatchers).
As you define each crisis level you may 'authorize' various levels of immediate funding for the 'hot' employee at their discretion -- even to 'unlimited' cash.
DR is all about business continuity. Start with the most basic company functions, and build from there. I.e. - if there was just ONE function that had to continue in the face of a disaster, what would it be, and how would you keep it running? Then move down the list of essential business functions and set up a plan for each one.
DR also depends greatly on geography. Businesses based next to an ocean need to have different plans than businesses based in the mountains, and businesses downtown need a different plan than businesses in other sections of town.