Unified Recovery Management #3: Recovery Considerations
Welcome back! In chapter 2, I probably scared you senseless with the incredible complexity that storage and backup administrators face in trying to manage data across a wide array of infrastructure and application types, adapting tools and processes to react to a wide array of things that can go wrong, all to ensure that the impacts on users and business operations are minimized.
In this chapter, I’ll attempt to put a little structure around how to cost-effectively address this daunting challenge. It’s all about policies that balance the needs of the business against the resources you have – money, people, infrastructure (or more simply, money!).
If you try to take a ‘one-size-fits-all’ approach to data protection and recovery management, you are either going to spend way too much money (putting the solvency of your organization at risk), or you are not going to meet the needs of the most critical business applications (putting competitiveness and long-term viability at risk).
So the answer is to apply the right technologies and policies to each application need. And yes, this will add another layer of complexity to the environment, but there isn’t much choice.
This diagram lists just some of the things you should consider when creating a recovery plan for each type of data, in each location, for each of the things that can reasonably go wrong.
The first one Recovery Point Objectives (RPO). This measures how much data you’re willing to risk, in terms of the time between backup operations. If you’re backing up a system once each night, you have an RPO of 24 hours, and all of the data created and changed in the 24 hours after the last backup is at risk. That’s obviously not good enough for many applications in many industries, but it is good enough for others.
The second consideration is Recovery Time Objective (RTO). This measures the amount of time it takes to recover from an event. Depending on the type and location of the event, RTO can include the time to determine what happened, deploy any needed hardware and other infrastructure, copy the needed data from the backup repository, recreate any lost data if possible (see RPO above), and reconnect your users and other systems. The longer the RTO, the longer the applicable systems may be down, so planning for a short RTO for the more critical applications is appropriate.
Next, you’ll probably need to consider the costs of the solution in terms of acquisition costs for the solution, plus labor, bandwidth, on-going services, etc. The key to a successful recovery plan is to balance these costs against the needs of the business – ensuring that you are delivering the appropriate levels of RPO and RTO at the lowest possible costs.
The last consideration is probably obvious to everyone, but you’re not going to want to deploy any recovery solution that negatively impacts business operations. For example, applying an aggressive RPO (frequent backups) to a critical application isn’t going to work if the recovery solution requires that you stop and close the application to perform the backup. The cure is not allowed to kill the patient.
So, what can you do? There are lots of choices and point solutions – from many vendors - to address each of the permutations that your plan may have, and I’ll cover many of them in my next blog. Then I’ll start looking at ways to tie all those technologies together to create a truly Unified Recovery Management platform.
"The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions."