IT Disaster Recovery

STRAWBERRY, CALIFORNIA, UNITED STATES – 2021/08/26: View of burning trees as the Caldor Fire grows to the Tahoe basin. The Caldor fire has grown to over 130,000 acres and threatens to grow to the Tahoe basin. These images where taken at a backfire set by crews in an effort to gain control on the Caldor fire. Cause still unknown at this time. (Photo by Ty O’Neil/SOPA Images/LightRocket via Getty Images)

Disaster Recovery is important and disasters come in all shapes & sizes. Fire, flood, explosion, virus attack, human error, etc.

Having a realistic plan is vital, the plan needs to be fully tested and detailed with all the information someone might possibly need to deal with whatever disaster they are faced with – all presented in easy to understand language.

In 2011 I wrote a DR document for a large college I was working for, the document was 20 pages long and here I’m going to summarise the essentials.

Hopefully this will help provide a framework document for anyone looking to implement something similar, or at least help you think about what needs to be covered.

INTRODUCTION AND DEFINITIONS

The documetn should start by stating that whilst a major incident or disaster is always extremely unlikely to occur – and difficult to prevent – an organisation needs to put in place procedures to cover any eventuality which might arise.  By having a carefully prepared and flexible Disaster Recovery Plan, the damage and disruption can be minimised and the IT services returned to a normal as quickly as possible. 

The document should describe the manner in which the organisation would respond to a variety of disasters, and provide simple to read, accurate and up to date information.

It should be clearyly stated that an ‘Incident Manager’ will be called upon to form a ‘Recovery Team’ who will be responsible for carrying out the technical work required to establish operational systems. 

Definition of a Disaster 

For the purposes of the document a disaster is seen as the loss of IT service provision for a large part of the organisation which adversely affects operational running. 

Additional Information 

The document should be reviewed and distributed to all staff who may be expected to play a part in recovery. The document should be stored at various safe locations electronically and in printed form. Anything incorrect should be remedied immediately and the appropriate changes made. If a procedure or issue is identified that will save time and may often be suspected as occurring it must be included in the documentation. 

All amendments to the document should be logged in an appendum. 

The document provides an overview of the processes involved, but more detailed information for each system/server etc should be provided in sperate DR Manuals. These documents would contain enough technical information for anyone with a high enough level of technical skill (though not necesarily any experience) to work on restoration.

Estimated time to restore critical services would depend upon requirements and scope., but you should try to provide estimates based on identified scenarios.

STAFFING

An ‘Incident Manager’ will be appointed and be expected to: 

  1. Ascertain the extent of damage to computer hardware, software, wiring etc 
  1. Identify requirements and prioritisation for essential services 
  1. Establish a ‘recovery team’ and act as a central communications point 
  1. Prioritize order in which services are to be restored 
  1. Act as a central communications point for recovery team 
  1. Liase with 3rd parties as required 

Recovery Team 

A large pool of staff should be identified as having sufficient skills to be called upon to form part of the ‘recovery team’, this team of technical experts, managers etc. will be instrumental in ensuring services are restored and effective communications are in place.

Here you should list the names, job titles, mobile phone numbers, email addresses of each member of staff.

DISASTER RECOVERY KIT

Disaster Recovery Kits to be used by staff in the event of a DR include the USB storage devices, applications, OS and other essential tools which may be required. 

The kits should be stored in safe locations, along with the required technical DR Manuals. 

SERVICE DEPENDENCY

In order to recover a service, it may be necessary to recovery many components. It is vital to understand dependencies, such as Exchange upon Active Directory, and AD upon DNS; in order to fully restore a pre-defined service.

Here you should list each service along with the dependencies, for example you might have an on premise email server which is dependent upon a working DNS server.

BACKUP PROCESS OVERVIEW

Here you should provide information on how your systems are backed up, what the schedules are, where the data is stored etc.

NETWORK

Everything depends upon a working network, so the network and its dependencies should be listed along with appropriate network diagrams. Here you should also list any telecoms requirements and diagrams.

CHANGE LOG

Ensure all changes to the document are captured and recorded here.