This article is more than 1 year old

Database down! DBA ninjas to the rescue

Handy 101 guide for Oracle administrators

Workshop Database administrators (DBAs) may not be given much attention ninety-nine per cent of the time. But when the database fails for some reason, they become ninjas, (hopefully) restoring the data, recovering the firm's ability to do business, and generally saving the day.

This all assumes that you've backed up your database properly, of course, and that you've validated the backups and understand the recovery process. This article explains to DBAs the steps involved in recovering from unexpected events quickly and smoothly.

First things first

Ideally, you'll want to avoid your database going down in the first place. Oracle divides causes of downtime into two broad sets: planned and unplanned. Planned downtime shouldn't be a problem, but unplanned is. Oracle further divides unplanned downtime into two areas: data failures, and computer failures. DBAs will be most interested in data failures – the four main categories are:

  • Storage error
  • Human error
  • Corruption
  • Site failure

DBAs have some control over storage error and human error, at least. Using Automatic Storage Management, and techniques such as good capacity planning, can help you to avoid exceeding storage limits and crashing your database. Protecting against human error is a combination of automating key tasks, and putting the proper privileges and access controls in place to stop unqualified people accidentally DROPing your customer tables.

While you can protect against some of these things, DBAs still haven't worked out how to control fire, flooding and other acts of god. We'll sure there'll be a third party plugin for those eventually, but for now, you're going to need a robust backup and recovery policy.

There are three things to back up in an Oracle database:

  • The server parameters file (SPFILE), which is a binary containing information about the server running the database, in key-value pairs.
  • The Control file. This is a binary file containing database structure information, including the database name, the names and locations of associated datafiles, time stamping and checkpoint info.
  • The data files themselves (generally considered quite important). Since Oracle 8, Oracle has used the recovery manager (RMAN) to handle backing up these things automatically.

Kinds of backup

There are two broad kinds of backup: a cold backup, and a hot one. Cold (offline) backups are the easiest to do. This backup of your entire database is designed to produce an exact, one-off copy, which is then easily restorable.

They're great, but for one thing. They're called offline backup for a reason: you have to stop the database to run them. If you're backing up every day, and you have a large database or an ecommerce site that needs to be up 24 x 7, then you don't really want to have to shut your system down daily for this process.

That leaves the alternative: a hot backup. This is trickier to do, because it backs up the database while it is running, but it has the advantage that you don't have to disrupt service. It does impact performance, though, so try to schedule it for periods of low activity.

Hot backups are good for recovering databases on the fly, rather than complete restoration from scratch. They are complex, so best automated via scripts. However, remember to update those scripts for new tablespaces when your schema changes.

More about

More about

More about

TIP US OFF

Send us news


Other stories you might like