Almost 10 years ago, I gave a presentation at the local VMware User Group (VMUG) meeting in Prague, Czechia, on Metro Cluster High Availability and SRM Disaster Recovery. The slide deck is available here on Slideshare. I highly recommend reviewing the slide deck, as it clearly explains the fundamental concepts and terminology of Business Continuity and Disaster Recovery (BCDR), along with the VMware technologies used to plan, design, and implement effective BCDR solutions.
Let me briefly outline the key BCDR concepts, documents, and terms below.
- Business Continuity Basic Concepts
- Resilience (High Availability)
- Recovery (Disaster Recovery)
- Mitigation (Disaster Avoidance)
- Business Continuity Essential Documents
- BIA (Business Impact Analysis) is essential document for any Business Continuity Initiative
- Risk Management Plan (Prevention and mitigation)
- Contingency Plan (Response and recovery)
- Business Continuity Basic Terms
- RPO (Recovery Point Objective) – Data level
- RTO (Recovery Time Objective) – Infrastructure level
- WRT (Work Recovery Time) – Application level
- MTD (Maximum Tolerable Downtime) = RTO + WRT – Business level
RPO (Recovery Point Objective) and RTO (Recovery Time Objective) are the most known terms in Business Continuity and Disaster Recovery world and I hope all IT professionals know at least these two terms. However, the repetition is the mother of wisdom so let's repeat what RPO and RTO are. The picture is worth 1,000 words so look at the picture bellow.
![]() |
RPO and RTO |
RPO - The maximum acceptable amount of data loss measured in time. In other words, "How far back in time can we afford to go in our backups?"
RTO - The maximum acceptable time to restore systems and services (aka infrastructure) after a disaster.
WRT - The time needed after systems are restored (post-RTO) to make applications fully operational (e.g., data validation, restarting services). It’s a subset of MTD and follows RTO.
MTD - How much time does our business accept before the company is back and running after a disaster. In other words, it is the total time a business can be unavailable before causing irrecoverable damage or significant impact.
Easy, right? Not really, Disaster Recovery projects are, based on my experience, the most complex project in IT infrastructure.
Right on-spot, David! One of basic lessons to learn when one wants his IT was serving to business and not just comming with requests for funcy features based on marketing bias.
ReplyDelete