Everybody is talking about backup strategies and backup concepts, but barely anyone explains the recovery strategies and concepts. Let’s be honest, the backup is important, but a working restore is essential! Because: no data = no business
Best Practices for your Backup Strategy – The Recovery
That’s why today I’d like to talk about recovery requirements and what to consider before setting up your backup and restore solution. In order to get an idea on the tasks that have to be resolved after a disaster happened, we will put the cart before the horse and start with the call from an annoyed colleague:
“Hey, my computer doesn’t work anymore. It won’t let me save my yearly report on the file server.”
In those moments quoting one of our favorite TV-shows ‘IT Crowd” with “Have you tried turning it on and off again.” probably doesn’t help. A few seconds later the next colleague is calling, and the next, and the next… Now you are starting to realize there is something going on. First advice from us: Don’t panic! Imagine yourself running around in panic, pulling your hair out and screaming “I should have listened to my mom and should have become a politician.” Probably funny to look at, but that doesn’t solve the problem. And seriously, a politician?!
Second: try to locate the problem. Check all possible hardware, software, and setup options
- Are all power cords plugged in (that happens more often than you think…)
- Is it the file server?
- One of the switches/ router?
- A cable?
- Did someone change some settings in the last couple minutes?
Now you have to figure out how long it will take until you have the service up and running again. To estimate that you need to know how important the failed service is. Both in combination will result in your plan (not the plan to take over the world domination, the other one!!!). For example, if the file server is just a file sharing service, let the colleagues know that they have to save their data locally till the server is up and running again. But if that server also contains an order system or the support ticket system you have to hurry a little more.
Locate your latest backup. Again, depending on the type of service an older backup is sufficient or not. The backup of the order system should contain a backup from just a few minutes ago. A file sharing server can survive a longer set back. And what type of backup strategy are you running? Do you usually do incremental or differential backups? Because this influences the restore time as well.
To call a spade a spade, here are the objectives you have to have an idea of:
RTO (Recovery Time Objective) –> How long do you need, until everything is in order again?
Calculate the time till all systems have to be online again without jeopardizing the business. In case your server runs the order system as well, you want to have the server repaired in just a few minutes. Is it a file sharing platform you might be able to go a couple of days.
RPO (Recovery Point Objective) –> How much data can be lost without risking the company’s future?
How old is the backup allowed to be? Does the data change in minutely intervals and losing more than one hour would mean you have an irreplaceable data loss? Or is the file server just the second location for your colleague’s data and they could easily copy everything to that server again?
For both objectives you need to decide if you want to spend a huge chunk of your budget or not. But keep in mind, being ‘under’- or ‘over-insured’ can have disastrous consequences.
Being ‘over-insured’ probably doesn’t harm your data, but your IT budget instead. Do you really need the high-end data protection solution for your one server? Or is the ‘SMB+’ solution more than sufficient?
Being ‘under-insured’ instead can (worst case) end in complete data loss and therewith in losing your job (In case the future of the company you are working at is not of interest to you).
Rome wasn’t built in one day, but…
Take your time to prioritize the RTO’s and RPO’s for every machine and type of data in your environment. At the end you can even decide to use two different backup solutions. One very enterprise-like software for the two servers that can’t be offline for a minute and a likewise fast, but perhaps with a smaller feature set and therewith more cost-effective solution for all the other servers.
After you have setup your RPO’s, RTO’s, and decided on a backup software, you need a backup strategy. Keep in mind that incremental backups (every changed file since the last full or incremental backup) are faster than differential backups (every changed file since the last full backup, regardless of the number of differentials since), but they typically need more time for a restore of all the data. A restore from a differential backup just needs the last full and the latest differential backup, where a restore from an incremental backup needs the last full backup and all incremental backups in series. Thus, the disk or tape device has to search longer for the single incremental backups.
… it was worth every minute waited
Do you remember your parents and teachers saying you have to practice the presentation more than once before you give it? The same applies for restores. Practice them in regular intervals, it helps staying calm during a disaster. The more you practiced the worst case scenario, the faster you will have everything up and running again when it does eventually happen. Also having a disaster recovery plan in written or printed form gives you a tool to look up the strategy in case you have a mental blackout.