Disaster Recovery Plan Standard

Standard provides guidance for creating a Disaster Recovery Plan

Purpose

Standard provides guidance for creating a Disaster Recovery Plan - a set of clear contingency procedures for recovering Critical Information Assets from disruptions (which can range from cyber-attacks to power outages to severe weather to international political situations) and resume normal operations within a reasonable time objective with overall least negative impact as possible and minimal effort; as well as any necessary prior preparations, i.e. personnel assignment and training, alternative means of operations. In turn allow to create and fill any missing required processes, purchase necessary support assets, train personnel and otherwise prepare for a contingency.

Scope

Standard is aimed at owners of Critical Information Assets, or those assigned responsibility for purposes of Disaster Recovery planning and/or execution.

General

It is required that each asset that is designated as a Critical Information Asset to have a Disaster Recovery Plan (DRP), responsibility of which is primarily on the Information Asset Owner, or delegated to specific personnel.

This Standard is developed using a methodical four phase approach to the Critical Information Asset system recovery:

  • Activation and Notification Phase, which occurs after a disruption that is determined to last beyond a set amount of hours (which can include damage to the facility housing the system, damaged equipment, or other types of possible long-term loss). Once activated, relevant parties are notified of a potentially long-term outage, and an exhaustive assessment is performed - which is then relayed to system owners, recovery coordinators and recovery procedures are then appropriately modified and adapted to specific causes of the disruption.

  • Recovery Phase includes details of the activities and procedures for recovery of the affected system. The general idea is that these activities and instructions are written in sufficient detail, context and at a level that skilled technical personnel could understand and recover the system without in-depth knowledge of the system itself. Also, this phase involves necessary escalation details and communication instructions to system owners and users.

  • Reconstitution Phase defines all of the actions taken to test and validate the system as operational prior to returning the whole operation to its usual state, which can mean functionality or regression testing, concurrent processing or any other applicable and appropriate possibility. The system is then declared recovered and operational by system owners upon successful completion of validation testing, appropriate parties notified and DRP is deactivated. It also extends to such variations, such as, for example, preparing a new permanent location to any support system processing requirements, making integrity determinations.

  • Post-disaster focuses on the clean-up of any temporary services, platforms, sites, processes and returning to a pre-contingency state, as well as address recovery effort documentation, activity log finalisation, incorporation of lessons learned into plan updates.

To best prepare for composing a Disaster Recovery Plan, it is highly recommended to begin with the following:

  • Analyse a range of outage risk scenarios.

  • Familiarise with any applicable regulations, i.e., PCI DSS.

  • Review the history of outages and disruptions, and how they were handled.

  • Determine and set reasonable RTO and identify an acceptable RPO, both of which must also be mirrored in the Asset Inventory List.

  • Consider existing insurance policies.

  • Perform vendor risk evaluation according to the Vendor Security Management process, as well as establish an SLA.

  • Make sure the system adheres to the Physical Security Standard.

  • Ensure compliance with Access Control Policy.

  • Other relevant standards and policies that allow to manage Physical, Legal and Cyber risks.

Structure of Disaster Recovery Plan

As applicable, must include the part of the phases the following main points in each phase (an example template can be found in Appendix A):

  • Activation criteria and notification

    • List of predefined roles that are authorized to activate DRP (i.e. the system or business owner, or the operations point of contact (POC) for system support).

    • Activation criteria that should be met for activation, for example:

      • 4.1.1.2.1. The type of outage indicates The Critical Information Asset will be down for more than the set RTO time.

        • The facility or equipment housing The Critical Information Asset is damaged and may not be available within the set RTO time.

        • Other criteria, as appropriate to particularities of the system.

    • After the plan is activated, the following procedures are to be enacted:

      • Notification procedures and the assigned personnel.

      • The sequence in which personnel are notified (e.g., system owner, technical POC, DRP Coordinator, business unit or user unit POC, and recovery team POC).

      • The method of notification (e.g., email blast, call tree, automated notification system, etc.).

    • Initial assessment procedures:

      • Outline detailed procedures to include how to determine the cause of the outage.

      • Identification of potential for additional disruption or damage.

      • Assessment of affected physical area(s).

      • Determination of the physical infrastructure status, equipment functionality, and inventory.

      • Estimated time to restore service to normal operations.

      • Other relevant information as necessary.

    • Recovery

      • System recovery strategy, which is to be modified as appropriate to best fit the Critical Information Asset and the aim to restore system capabilities, repair damage, and resume operational capabilities at the original or an alternate location in case of an emergency:

        • Identify recovery location (if not at original location).

        • Identify required resources to perform recovery procedures.

        • Retrieve backup and system installation media.

        • Recover hardware and operating system (if required).

        • Recover system from backup and system installation media.

        • Identify specific roles or teams responsible for each procedure.

      • A list of detailed steps for recovery, for example:

        • System installation instructions.

        • Required configuration settings or changes.

        • Keystroke-level recovery steps.

        • Recovery of data from backups and audit logs.

        • Any other system recovery procedures, as appropriate.

      • If the system relies totally on another group or system for its recovery and reconstitution, all necessary information should be provided for detailed recovery and reconstitution procedures for that supporting system.

      • Appropriate procedures should be detailed for escalation notices (i.e. problem escalation to leadership and status awareness to system owners and users) during recovery efforts and assign responsible teams or personnel. Notifications during recovery include problem escalation to leadership and status awareness to system owners and users.

      • Backup and Recovery Plan may be used created according to Information Backup and Recovery Standard.

    • Reconstitution

      • Include detailed procedures for testing or validation of recovered data to ensure that data is correct and up to date.

      • An example validation testing checklist, in Appendix B.

      • Functionality Testing can be added if procedures test both the functionality and data validity.

      • Include predetermined formal declaration of recovery notification procedures for technical Points of Contact and appropriately users after the Critical Information Asset is returned to normal operations.

      • Teams or persons responsible for each procedure must be identified and assigned.

    • Post-disaster

      • Detail any specific cleanup procedures for the system, including preferred locations for manuals and documents and returning backup or installation media to its original location.

      • Provide procedures for returning retrieved backup or installation media to its offsite data storage location.

      • Provide appropriate procedures for ensuring that a full system backup is conducted within a reasonable time frame and with proper procedure taken offsite.

      • Provide details about the types of information each DRP team member is required to provide or collect for updating the DRP with lessons learned. Types of documentation that should be generated and collected after a contingency activation include:

        • Activity logs (including recovery steps performed and by whom, the time the steps were initiated and completed, and any problems or concerns encountered while executing activities).

        • Functionality and data testing results.

        • After Action Report.

        • Lessons learned documentation.

        • Total recovery time and how it compares to the set RTO and RPO.

        • Other relevant details.

Additionally, the following information and/or associated procedures, if applicable to particularities of an asset, should be developed, documented and included in the Disaster Recovery plan, or its Appendixes:

  • A preliminary recovery timeline is recommended.

  • A list of all involved authorized personnel, their designated role and responsibilities in the recovery process, as well as a few alternative contact methods (an example can be found in Appendix C), which should include, but not limited by:

    • CISO, PO and CTO.

    • Recovery Team

    • 3rd party staff, like system vendor support technicians.

    • Legal team

    • Communication personnel

    • Succession of responsibility when usual staff is unavailable to perform their duties.

    • Establish procedures, methods and channels for disaster team communication.

    • It is recommended to include a chain of command and ways to guarantee effective coordination between internal and external teams.

    • Include details of proper paths of escalation.

    • It is highly recommended to have pre-made generic templates and procedures for external communication.

    • Any necessary legal information and legal steps to be taken.

    • All relevant network infrastructure documents, e.g., network diagrams, equipment configurations, databases.

    • A list of any necessary hardware and/or software (perhaps always kept as spares in the inventory), its details (i.e., OS, processor, memory, storage requirements), and vendors for procurement.

    • Relevant technical documentation for the Critical Information Asset, its data and configurations.

    • A description of alternate onsite and/or offsite storage of full and incremental backups, as well as alternate recovery processing sites, an example could be found in Appendix D.

    • Where necessary, add authentication and authorisation procedure details.

    • It is recommended to identify and include any alternate manual or automated processing procedures that would enable continued processing of information that would normally be done by the affected system.

Exceptions

DRP does not apply to the following situations:

  • Overall recovery and continuity of mission/business operations, which are addressed by Business Continuity Plan (BCP).

  • Emergency evacuation of personnel.

Auditing and planned testing

Asset Inventory list must have a check point whether DRP is ready and a link to it.

The plan, at a minimum, should be reviewed and updated on an annual basis, or after significant changes to the system.

Any changes to the plan must be documented, approved and appropriately signed (example can be found in Appendix E).

It is highly recommended that at least annually there is a planned exercise to test the implementation of disaster recovery plan in order to:

  • For the involved teams to gain the necessary experience.

  • Make sure the plan hits its set RTO and RPO, or any other, goal.

  • Identify changes in the asset, gaps and problems with the plan; and make a timely correction or addition as necessary.

Exercises and their results should be appropriately documented.

A fully functional DRP test should include all points of contact and be facilitated by a 3rd party or an outside observer.

Testing and maintenance activities should have a documented schedule, an example can be found in Appendix F.

Standard Review and Update

This Standard must be maintained in accordance with the Information Security Policy.

Appendix A - example of DRP phases (Access Control system in HQ)

Activation criteria and notification:

  • The DRP may be activated by Risk Manager or assigned IT admin, if one or more of the following criteria are met:

    • Access control system is down for more than 6 hours.

    • The facility or equipment housing Access Control system is damaged and may not be available within 6 hours.

  • After the plan is activated, the following notification procedures are to be enacted:

    • Risk manager or Head IT admin notifies system vendors of the issue via a call and request engineer support ASAP.

    • Risk manager or assigned IT admin inform HQ users of the situation and possible downtime via HQ office slack channel.

  • Initial assessment procedures are done by assigned IT admin:

    • The Access Control system equipment is checked for possible dysfunction, i.e. power supply, cut wiring.

    • Check the supporting systems, i.e. router/switch.

    • Asses physical area housing the system.

    • Determination of the physical infrastructure status, i.e. possible issues with electrical grid.

    • Estimate possible recovery time.

    • Other relevant information as necessary.

Recovery:

  • System recovery strategy:

    • Identify recovery location (if not at original location).

    • Identify required resources to perform recovery procedures for vendor engineers.

    • Retrieve backup data.

    • Recover hardware and operating system (if required).

    • Identify and assign specific roles and responsibilities for each procedure.

  • A list of steps for recovery:

    • If required, install a fresh copy of the vendor specified operating system on a device that meets vendor requirements.

    • Organise safe access for vendor engineers to the High Security Area where the Access Control System is housed according to Physical Security Standard.

    • Any other system recovery procedures, as appropriate to the situation.

  • If required, provide appropriate spare supporting hardware.

Reconstitution

  • After the vendor engineers recover system to operability assigned IT admin or Risk manager must:

    • Check access cards grant appropriate access.

    • Check if the access card is appropriately registered in the system logging and represents the legitimate owner of the access card.

    • After reconstitution is complete, the Risk Manager or the assigned IT admin will formally declare recovery of the Critical Information Asset to normal operations in HQ office slack channel.

Post-factum

  • Return any unused hardware, software to inventory.

    • Return any test access cards to the designated safe repository.

    • Schedule extra data backup to offsite as soon as reasonable.

    • When reasonable, document:

      • Activity logs (including recovery steps performed and by whom, the time the steps were initiated and completed, and any problems or concerns encountered while executing activities).

      • Functionality and data testing results.

      • Analyse lessons learned as a freeform appendix.

Appendix B - Validation testing checklist

No.
Procedure
Expected Results
Actual Results
Successful?
Performed by

1.

At the Command Prompt, type in sysname

System Log-in Screen appears

2.

Login as user testuser, using password testpass

Initial Screen with Main Menu shows

3.

From Menu - select 5- Generate Report

Report Generation Screen shows

4.

  • Select Current Date Report

  • Select Weekly

  • Select To Screen

Report is generated on screen with last successful transaction included

5.

  • Select Close

Report Generation Screen Shows

6.

  • Select Return to Main Menu

Initial Screen with Main Menu shows

7.

  • Select Log-Off

Log-in Screen appears

Appendix C - Disaster Recovery Key Personnel - Roles, responsibilities and contact information

Key Personnel
Contact Information

Risk manager

Mobile

Insert number

Insert Name

Slack

Insert number

Alternative

Insert number

Email

Insert email address

Risk manager – Alternate

Mobile

Insert number

Slack

Insert number

Alternative

Insert number

Email

Insert email address

Disaster Recovery Coordinator

Mobile

Insert number

Slack

Insert number

Alternative

Insert number

Email

Insert email address

Disaster Recovery Coordinator – Alternate

Mobile

Insert number

Slack

Insert number

Alternative

Insert number

Email

Insert email address

Recovery Team Lead

Mobile

Insert number

Slack

Insert number

Alternative

Insert number

Email

Insert email address

Recovery Team Members

Mobile

Insert number

Slack

Insert number

Alternative

Insert number

Email

Insert email address

3rd party vendor staff

Mobile

Insert number

Name and Title

Slack

Insert number

Alternative

Insert number

Email

Insert email address

Appendix D - Alternate storage and information processing site

No.
Alternate Storage Information
Required information

1.

Location of alternate storage facility, time availability of access

Fill in relevant information

2.

Whether the alternate storage facility is owned by the organization or is a third-party storage provider

3.

Name and points of contact for the alternate storage facility

4.

Delivery schedule and procedures for packaging media to go to alternate storage facility

5.

Procedures for retrieving media from the alternate storage facility

6.

Names and contact information for persons authorized to retrieve media

7.

Alternate storage configuration features that facilitate recovery operations

8.

Any potential accessibility problems to the alternate storage site in the event of a widespread disruption or disaster

9.

Mitigation steps to access alternate storage site in the event of a widespread disruption or disaster

10.

Types of data located at alternate storage site, including databases, application software, operating systems, and other critical information system software

11.

Insert other information as appropriate

No.
Alternate Storage Information
Required information

1.

Location of alternate storage facility, time availability of access

Fill in as appropriate

2.

Whether the alternate processing site is owned by the organization or is a third-party site provider

3.

Name and points of contact for the alternate processing site.

4.

Procedures for accessing and using the alternate processing site, and access security features of alternate processing site

5.

Names and contact information for those persons authorized to go to alternate processing site

6.

Type of alternate processing site, and equipment available at site

7.

Alternate processing site configuration information (such as available power, floor space, office space, telecommunications availability, etc.)

8.

Any potential accessibility problems to the alternate processing site in the event of a widespread disruption or disaster

9.

Mitigation steps to access alternate processing site in the event of a widespread disruption or disaster

10.

SLAs or other agreements of use of alternate processing site, available office/support space, setup times, and such

11.

Insert other information as appropriate

Appendix E - DRP change log

Record of Changes

Page No.

Change Comment

Date of Change

Signature

Appendix F - Test and maintenance schedule

Identify failover test facilitator.

March 1

Disaster Recovery Coordinator

Determine scope of failover test (include other systems?).

March 15

Disaster Recovery Coordinator, Test Facilitator

Develop a failover test plan.

April 1

Test Facilitator

Invite participants.

July 10

Test Facilitator

Conduct functional tests.

July 31

Test Facilitator, Disaster Recovery Coordinator, POCs

Finalize after action report and lessons learned.

August 15

Disaster Recovery Coordinator

Update DRP based on lessons learned.

September 15

Disaster Recovery Coordinator

Approve and distribute updated version of DRP.

September 30

Risk Manager, Disaster Recovery Coordinator

Revision History

Version
Author
Approved By
Revision date
Approval date

0.1

GK

2023-05-20

2023-05-23

0.2

DM

2023-11-02

2023-11-02

0.3

GK

DM

2024-09-10

2024-09-10

Last updated