Performing an Oracle database health check? We have a little list
Guidelines for the diligent DBA
Mine the logs
Check for errors in the logs. Some of these errors may be worrisome and critical, and should be addressed immediately, while there are others that are not as significant. It is important to be able to recognize the difference and correct them.
Critical errors also create incidents and incident dumps in the Automatic Diagnostic Repository (ADR), which is the system-wide tracing and logging repository.
Each Oracle database server writes to a trace file. When a server process detects an error, information about it is sent to that trace file. The name of each trace file usually includes the name of the process writing the file, such as the RECO recovery process, for example.
The alert log is available in XML or text. This includes a list of all messages and errors, including internal errors (ORA-600), errors relating to block corruption (ORA-1578), and deadlock-based errors (ORA-60).
The alert log also offers several categories of error message relating to shared server functions and dispatcher processes, and will tell you about errors that happen during the automatic refresh of a materialized view.
One particular class of error message to watch out for is background processes. These should be addressed right away. If you get a message written to the LGWR trace file and the alert log, explaining that the log writer process (LGWR) cannot write to a member of a log group, then that's a sign of a media or I/O problem.
Checking the performance of the database is a key part of any Oracle health check. Toad lets you check for specific parameters, including the distribution of datafiles across I/O channels (which can help to prevent data bottlenecks). It can also monitor usage of the shared pool, which is the area of memory that Oracle uses hold its library cache. This acts as a buffer for all SQL statements that are processed by the Oracle database, so performance will suffer if this is overly contended.
Toad also lets you compare your current health check against checks made before, to see what has changed. So if critical performance parameters (or others) are changing, you can see that clearly.
Ask for and solicit feedback. Listen to users of the database, and take notes. The best health check for the database is to find out how it's affecting people in real life. What are they complaining about? What errors are they getting and what frustrations do they have? Define steps to mitigate these complaints, if you haven't already.
Automation is the foundation for any good DBA, especially in areas such as health checks, which should be repeated on a regular basis so that you can spot any emerging problems.
You can also script Toad's entire health check automatically, by using the small camera icon at the bottom left of the health check window to save the settings in that area. It will then be created as an application of its own inside Toad Automation Designer, enabling you to run it regularly and quickly. You can get it sent to your email, if you like, so that you can be alerted to any potential database difficulties over your morning bagel.
No one step will be enough to keep your database in tip-top shape. Instead, any health check should encompass all of these steps, providing good visibility across all facets of its operation. Following these guidelines will help the diligent DBA to spot any bottlenecks for their applications as they appear, and help to ensure stability going forward. Doing it on a regular databases will help prevent your datafile from becoming a datafail. ®