Module 2.2: Disaster Recovery
Navigation: Course Index | 2.1 Business Continuity | 2.3 Incident Response
Learning Objectives
By the end of this module, you will be able to:
- Explain the purpose of disaster recovery and how it supports business continuity
- Compare hot, warm, cold, mobile, and cloud recovery sites
- Distinguish full, incremental, and differential backups, including archive bit behavior
- Apply the 3-2-1 backup rule to real scenarios
- Compare RAID 0, 1, 5, 6, and 10
- Select recovery strategies appropriate to different business needs
- Describe how DR testing validates recovery readiness
---
What Disaster Recovery Is
Disaster recovery is the process of restoring IT systems, applications, and data after a disruptive event. If business continuity is the whole strategy for keeping the organization alive, DR is the technical part of that strategy.
Core idea
BC asks, “How does the business keep functioning?”
DR asks, “How do we bring the technology back?”
Analogy
If a restaurant loses power, business continuity might include paper order forms, flashlight menus, and cash-only operations. Disaster recovery would be the process of restoring the power, POS system, kitchen network, and inventory database.
Why DR matters
Without DR, a company may have a continuity plan on paper but no practical way to restore its systems. Backups, replicas, alternate sites, and tested procedures convert the BC vision into an actual recovery path.
| DR Benefit | What It Protects | |---|---| | Faster restoration | Reduces downtime | | Lower data loss | Protects records and transactions | | Regulatory compliance | Supports legal and industry requirements | | Customer trust | Shows resilience and professionalism | | Operational continuity | Helps business resume essential functions |
> Exam Tip: DR is focused on technology. It is not the same thing as continuity, but it enables continuity.
---
DR Site Types
Recovery sites differ mainly by cost, readiness, and how much infrastructure is already in place.
| Site Type | Cost | Recovery Speed | Infrastructure Status | Best For | |---|---:|---:|---|---| | Hot site | Highest | Minutes to hours | Fully configured and ready | Mission-critical systems | | Warm site | Medium | Hours to days | Partially configured | Important but not immediate systems | | Cold site | Lowest | Days to weeks | Space and utilities only | Low-priority workloads | | Mobile site | Variable | Moderate | Portable or containerized facility | Flexible and temporary operations | | Cloud site | Variable | Fast to moderate | Provider-managed, elastic resources | Scalable modern recovery |
Hot site
A hot site is an almost live duplicate of production. It usually has current data replication, servers already configured, and network connectivity ready to go.
Best when downtime is very expensive. The downside is cost.
Warm site
A warm site has systems and connectivity, but some configuration, data loading, or recent updates still need to occur before it can operate.
Cold site
A cold site is just a location with power, cooling, and space. Hardware must be installed and systems built before it can run.
Mobile site
A mobile site is a portable recovery environment, such as a trailer or container-based data center. It is useful for some emergency scenarios or temporary operations.
Cloud recovery
Cloud DR uses virtual infrastructure and storage in a provider environment. It can be fast to activate and easier to scale than traditional sites, but it still requires planning, testing, and cost control.
Cost vs time comparison
| Site | Cost | Time to Recover | |---|---|---| | Hot | High | Very short | | Warm | Medium | Short to moderate | | Cold | Low | Long | | Mobile | Medium to high | Moderate | | Cloud | Variable | Often short, depends on architecture |
> Exam Tip: Hot is expensive but fast. Cold is cheap but slow. That trade-off appears in many exam questions.
---
Backup Strategies
Backups are the most common DR control. They provide a recoverable copy of data after deletion, corruption, ransomware, hardware failure, or other loss.
Full backup
A full backup copies all selected data every time.
Pros:
- Simplest restore process
- Fastest recovery from that backup set
Cons:
- Slowest to create
- Uses the most storage
Incremental backup
An incremental backup copies only data changed since the last backup of any kind.
Pros:
- Fast to create
- Uses the least storage
Cons:
- Restore is slow because all increments since the last full backup are needed
Differential backup
A differential backup copies data changed since the last full backup.
Pros:
- Faster restore than incremental
- Easier recovery chain than incremental
Cons:
- Grows larger each day until the next full backup
Comparison table
| Backup Type | Copies What? | Backup Speed | Restore Speed | Storage Use | |---|---|---|---|---| | Full | Everything | Slow | Fast | High | | Incremental | Changes since last backup | Fast | Slow | Low | | Differential | Changes since last full | Medium | Medium | Medium |
Archive bit behavior
The archive bit is a flag historically used by some backup systems to track whether a file has changed since the last backup.
| Backup Type | Archive Bit After Backup | |---|---| | Full | Usually cleared | | Incremental | Cleared after the changed files are backed up | | Differential | Usually not cleared until the next full backup |
This is why differential backups grow over time. The changed files remain marked as needing backup until a full backup resets the cycle.
Example backup sequence
Suppose Monday is a full backup.
- Tuesday incremental copies changes since Monday and clears the archive bit for those files.
- Wednesday incremental copies changes since Tuesday.
- Tuesday differential copies changes since Monday, and Wednesday differential also copies changes since Monday.
The incremental chain is smaller, but restore is more complex. The differential chain is larger, but restore is easier.
> Exam Tip: Incremental is usually fastest to back up. Full is usually fastest to restore.
---
The 3-2-1 Rule
The 3-2-1 rule is a classic backup best practice.
- Keep 3 copies of data
- Store them on 2 different media types
- Keep 1 copy offsite
Why it works
If one copy is destroyed, you still have others. If one medium fails, you still have a different medium. If the building is lost, the offsite copy survives.
Example
An organization might keep:
- Primary production database on SSD storage
- Backup copy on disk-to-disk backup storage
- Offsite copy in cloud object storage
That satisfies the rule because there are three copies and at least one is offsite.
Modern extension
Many organizations now add immutability or air-gapping to resist ransomware. That is not the original 3-2-1 rule, but it is a common modern improvement.
> Exam Tip: If you see a question about surviving site loss, ransomware, and hardware failure, the 3-2-1 rule is often the right concept.
---
RAID Levels
RAID is not a backup. It provides availability and fault tolerance at the storage layer, but it does not protect against deletion, corruption, or ransomware.
RAID comparison table
| RAID Level | Fault Tolerance | Performance | Storage Efficiency | Typical Use | |---|---|---|---|---| | RAID 0 | None | Very high | High | Speed only, no protection | | RAID 1 | Can lose one drive in mirror | Good read performance | 50% | Simple redundancy | | RAID 5 | Can lose one drive | Good read, slower write | Good | General-purpose storage | | RAID 6 | Can lose two drives | Slower write than RAID 5 | Moderate | Safer than RAID 5 for larger arrays | | RAID 10 | Can survive drive failures if not in same mirror pair | Very high | 50% | High performance and availability |
RAID 0
RAID 0 stripes data across disks with no redundancy. It improves performance but provides no fault tolerance. If one disk fails, the array fails.
RAID 1
RAID 1 mirrors data on two disks. If one fails, the other still has a complete copy.
RAID 5
RAID 5 stripes data and parity across at least three disks. It tolerates one disk failure.
RAID 6
RAID 6 uses dual parity and tolerates two disk failures. It is slower to write than RAID 5 but more resilient.
RAID 10
RAID 10 combines mirroring and striping. It is fast and resilient but requires more disks because half the capacity is used for redundancy.
Backup vs RAID
| Control | Protects Against | Does Not Protect Against | |---|---|---| | RAID | Disk failure | Deletion, corruption, ransomware, fire | | Backup | Data loss and recovery | Does not keep production running by itself |
> Exam Tip: A common trap is treating RAID as a backup. It is not. RAID keeps the storage running; backups let you recover the data.
---
Recovery Strategies
Recovery strategy is the method used to restore systems and services after a disruption.
Common strategies
| Strategy | Description | Best When | |---|---|---| | Restore from backup | Rebuild data and systems from a known-good backup | Data loss or corruption occurred | | Failover | Automatically switch to a redundant system | High availability is required | | Switchover | Planned move to an alternate system | Maintenance or controlled migration | | Replication | Keep a copy of data synchronized elsewhere | Very low RPO is needed | | Reconstitution | Rebuild systems from scratch using images and configs | Systems are severely damaged |
Example
A company affected by ransomware may choose to wipe infected systems and restore from clean backups. A payment processor may rely on failover and replication to continue service with minimal interruption.
Choosing the strategy
The best strategy depends on the BIA results, budget, data criticality, and acceptable downtime.
| Business Need | Likely Strategy | |---|---| | Low cost, low urgency | Backup restore from cold site | | Moderate urgency | Warm site with differential backups | | High urgency | Hot site with replication | | Very low data loss tolerance | Synchronous replication or frequent backups |
---
DR Testing
Testing confirms that the recovery plan actually works.
Major test types
| Test Type | What It Checks | |---|---| | Checklist review | Are the steps complete and current? | | Tabletop | Can the team reason through the scenario? | | Walkthrough | Do the procedures make sense step by step? | | Simulation | Can people and systems coordinate under pressure? | | Parallel | Can the alternate environment run in sync? | | Full interruption | Can recovery happen when primary systems are truly unavailable? |
What to test
- Backup integrity
- Restore times
- Site readiness
- Network connectivity
- Credential and access recovery
- Contact lists
- Application dependencies
- DNS and routing changes
Example
A DR test might restore a virtual server into a cloud environment and verify that the application launches, connects to its database, and accepts user logins. If the test only proves the VM boots but the application cannot talk to its database, the test failed.
Good testing practice
- Test regularly
- Vary the scenarios
- Involve business owners and technical staff
- Document gaps and remediation steps
- Retest after major changes
> Exam Tip: A recovery plan that has never been tested is a theory, not a capability.
---
DR in Real Events
Ransomware scenario
If a hospital’s active systems are encrypted, DR may require:
- Isolating infected systems
- Verifying backups are clean
- Restoring critical systems in priority order
- Bringing up alternate communications
- Coordinating with legal and incident response teams
Cloud outage scenario
If a cloud provider region fails, the recovery strategy may involve:
- Redirecting users to another region
- Restoring services from replicated images
- Reconfiguring DNS and load balancers
- Validating application dependencies
Natural disaster scenario
If a data center is flooded or burned, the site itself may be unavailable. A cold site or cloud recovery environment may be the only practical path.
---
Common Mistakes and Exam Pointers
| Mistake | Why It Is Wrong | |---|---| | Treating RAID as backup | RAID does not protect against logical data loss | | Confusing incremental and differential | Incremental tracks changes since last backup; differential since last full | | Thinking cold site is fast | Cold site is cheapest but slowest | | Assuming cloud automatically equals DR | Cloud still needs design, backups, and tests | | Forgetting archive bit behavior | Important for how differential backups grow |
Memory aid
- Hot = ready now
- Warm = almost ready
- Cold = build it later
- Full = everything
- Incremental = since last backup
- Differential = since last full
- RAID = availability, not backup
---
Practice Questions
1. What is the main purpose of disaster recovery?
- A. To eliminate all security incidents
- B. To restore IT systems and data after disruption
- C. To classify data by sensitivity
- D. To manage employee schedules
- Answer: ✅ B
2. Which DR site is most expensive but fastest to activate?
- A. Cold
- B. Warm
- C. Hot
- D. Mobile
- Answer: ✅ C
3. Which backup type copies changes since the last full backup?
- A. Full
- B. Incremental
- C. Differential
- D. Mirror
- Answer: ✅ C
4. Which backup type usually restores the fastest?
- A. Incremental
- B. Differential
- C. Full
- D. Archive
- Answer: ✅ C
5. What does RAID 5 tolerate?
- A. No disk failures
- B. One disk failure
- C. Two disk failures
- D. Three disk failures
- Answer: ✅ B
6. Which statement best describes RAID?
- A. It is the same as backup
- B. It improves storage availability but does not replace backups
- C. It protects only against malware
- D. It is used only in cloud systems
- Answer: ✅ B
7. What is the 3-2-1 rule?
- A. Three users, two sites, one server
- B. Three copies, two media types, one offsite copy
- C. Three backups, two admins, one password
- D. Three networks, two firewalls, one router
- Answer: ✅ B
8. Which backup type makes restore slower because all incrementals must be applied?
- A. Full
- B. Incremental
- C. Differential
- D. Snapshot only
- Answer: ✅ B
9. What happens to the archive bit after a full backup in many backup systems?
- A. It stays set forever
- B. It is usually cleared
- C. It becomes encrypted
- D. It is deleted with the file
- Answer: ✅ B
10. Which recovery test is the most realistic and risky?
- A. Checklist review
- B. Tabletop
- C. Parallel
- D. Full interruption
- Answer: ✅ D
---
Navigation: Course Index | 2.1 Business Continuity | 2.3 Incident Response