Database down! DBA ninjas to the rescue
Handy 101 guide for Oracle administrators
Workshop Database administrators (DBAs) may not be given much attention ninety-nine per cent of the time. But when the database fails for some reason, they become ninjas, (hopefully) restoring the data, recovering the firm's ability to do business, and generally saving the day.
This all assumes that you've backed up your database properly, of course, and that you've validated the backups and understand the recovery process. This article explains to DBAs the steps involved in recovering from unexpected events quickly and smoothly.
First things first
Ideally, you'll want to avoid your database going down in the first place. Oracle divides causes of downtime into two broad sets: planned and unplanned. Planned downtime shouldn't be a problem, but unplanned is. Oracle further divides unplanned downtime into two areas: data failures, and computer failures. DBAs will be most interested in data failures – the four main categories are:
- Storage error
- Human error
- Site failure
DBAs have some control over storage error and human error, at least. Using Automatic Storage Management, and techniques such as good capacity planning, can help you to avoid exceeding storage limits and crashing your database. Protecting against human error is a combination of automating key tasks, and putting the proper privileges and access controls in place to stop unqualified people accidentally DROPing your customer tables.
While you can protect against some of these things, DBAs still haven't worked out how to control fire, flooding and other acts of god. We'll sure there'll be a third party plugin for those eventually, but for now, you're going to need a robust backup and recovery policy.
There are three things to back up in an Oracle database:
- The server parameters file (SPFILE), which is a binary containing information about the server running the database, in key-value pairs.
- The Control file. This is a binary file containing database structure information, including the database name, the names and locations of associated datafiles, time stamping and checkpoint info.
- The data files themselves (generally considered quite important). Since Oracle 8, Oracle has used the recovery manager (RMAN) to handle backing up these things automatically.
Kinds of backup
There are two broad kinds of backup: a cold backup, and a hot one. Cold (offline) backups are the easiest to do. This backup of your entire database is designed to produce an exact, one-off copy, which is then easily restorable.
They're great, but for one thing. They're called offline backup for a reason: you have to stop the database to run them. If you're backing up every day, and you have a large database or an ecommerce site that needs to be up 24 x 7, then you don't really want to have to shut your system down daily for this process.
That leaves the alternative: a hot backup. This is trickier to do, because it backs up the database while it is running, but it has the advantage that you don't have to disrupt service. It does impact performance, though, so try to schedule it for periods of low activity.
Hot backups are good for recovering databases on the fly, rather than complete restoration from scratch. They are complex, so best automated via scripts. However, remember to update those scripts for new tablespaces when your schema changes.
Protecting yourself with incremental backups
Incremental backups are a good way to keep you protected while minimising the performance impact on the database. They work by backing up only the changes made to the database since the last backup, and they form part of a backup strategy suggested by Oracle. It suggests daily backups for this.
There are two types of incremental backup; level 0, and level 1. Level 0 is a full backup, but it's designed to be added to incrementally. Level 1 backups are simply the deltas from the previous incremental backup.
Start by creating an incremental level 0 datafile copy backup. That gives you the entire database's contents in a separate set of files.
Then, the next day, you'll create a level 1 incremental backup, containing the blocks changed since the first backup. That gives you the ability to recover your database to the point when that backup was made. Now, you'll have two sets of files: the level 0 backup, and the level 1 backup.
From now on, each day, you apply yesterday's incremental backup to the level 0 backup. That will create a level 0 backup file that you can use to restore your database to yesterday's state. Then, you create a new incremental level 1 backup, containing the changes made to the database from yesterday.
That way, if your database goes down, you can apply the latest level 1 backup to the level 0 backup, restoring it to the start of the current day.
Salvation in the REDO logs
That's database restoration covered. That's all well and good, but a restored database isn't necessarily up to date. What about if you business can't afford to lose the data stored since the last incremental backup, and needs you to recover your data to some point after the backup that happened at the beginning of today? Luckily, REDO logs are there to help.
REDO logs store changes that are made to the database. When backing up a database, you can run in ARCHIVELOG mode to back these up, too. When you're performing a hot backup, you are forced to do this.
REDO logs cover the gap between the last incremental backup, and the current point in time. Once you've restored your database, you can use these files to roll forward all changes made to the database. Then, you'll roll back those changes that were requested, but which hadn't been committed to the database when it went down. This recovers the data lost after your restoration. It gets the database to the exact state that it was in when it went down.
Protecting these REDO logs is important. ARCHIVELOG will archive those that have been filled with changes, but multiple copies of online ones should be stored on separate disk spindles.
Make this recovery process faster by using the Fast Recovery Area (FRA). This is a single disk location where Oracle will store and automatically manage all recovery-related files (including RMAN backups).
There are user-managed options for backing up Oracle data, but RMAN is really your go-to tool here. There are third party tools that can help you to work with this utility, though.
Quest Backup Reporter for Oracle, a component of the Toad tool for Oracle DBAs, increases the visibility of your Oracle backup jobs. It's a read-only reporting tool that allows you to see all of your executed RMAN backups in one screen, along with their execution status. You can look at a backup schedule, and also check your RMAN policy definition, seeing if there are any violations of your backup policies.
Finally, don't forget to validate your backups using BACKUP VALIDATE, which will check to ensure that all of the necessarily files needed during a backup have been copied, and are intact. And perform regular test restorations to separate non-production hardware, so that you can sure the database would actually restore in the event of a disaster.
You can script that test, too - and with all of this in place, you'll be sleeping a lot more soundly at night. ®