Performing an Oracle database health check? We have a little list
Guidelines for the diligent DBA
Workshop Everyone needs a checkup from time to time, and your Oracle database is no exception. A periodic medical can keep it running smoothly, and avoid more serious conditions from developing later.
Here is a guide to help ensure optimal performance, with a series of checkpoints that can form the basis for a regular database review.
You can use the Health Monitor available in release 11g to assess several aspects of your database's health. These include various data integrity checks for file structure, REDO logs and undo segments.
You can also use Toad for Oracle, which is one of the most popular third party administrator tools for the Oracle database. Toad users can launch database health checks either from the database browser, or from the diagnose menu.
One of the areas that Toad for Oracle focuses on is security. DBAs can find the vulnerability assessment under its own category. It includes a variety of vulnerability checks, alongside an 'explainer' column, providing a description of each check as a friendly reminder to the DBA.
The assessment checks in Toad are many and varied, but they include listing listing hidden users and nested roles, and listing GRANTs on SYS tables granted directly to users.
DBAs can check many aspects of their Oracle database's health, and any health check should include a thorough security audit to ensure that there is no light leakage. A review should include the following:
- Patch management procedures.
- Backup recovery (test it).
- User password management and access controls.
- User roles and responsibilities.
- Configuration parameters.
- How your audit trail is configured.
- Privileges (how are system and object privileges assigned? Who has them?)
- Access to Oracle packages and privileges, including operating system access.
- Which PL/SQL and Java is used, and how.
Monitor and optimize your use of database space. There are two ways to do this: proactive tablespace management (PTM), and shrinking segments. Under PTM, Oracle warns database administrators when available space is running low. It has two thresholds when alerts are triggered: critical, and warning.
You can conduct this health check incrementally as space frees up in the database server, which helps you make space available to users when needed. Check the critical and warning thresholds of the table spaces to know how much of your table space is being used. By default, the settings are 97% critical and 85% warning.
Segment shrinkage makes unused space available to other segments in the tablespace. You can shrink a segment using the Enterprise Manager.
Mine the logs
Check for errors in the logs. Some of these errors may be worrisome and critical, and should be addressed immediately, while there are others that are not as significant. It is important to be able to recognize the difference and correct them.
Critical errors also create incidents and incident dumps in the Automatic Diagnostic Repository (ADR), which is the system-wide tracing and logging repository.
Each Oracle database server writes to a trace file. When a server process detects an error, information about it is sent to that trace file. The name of each trace file usually includes the name of the process writing the file, such as the RECO recovery process, for example.
The alert log is available in XML or text. This includes a list of all messages and errors, including internal errors (ORA-600), errors relating to block corruption (ORA-1578), and deadlock-based errors (ORA-60).
The alert log also offers several categories of error message relating to shared server functions and dispatcher processes, and will tell you about errors that happen during the automatic refresh of a materialized view.
One particular class of error message to watch out for is background processes. These should be addressed right away. If you get a message written to the LGWR trace file and the alert log, explaining that the log writer process (LGWR) cannot write to a member of a log group, then that's a sign of a media or I/O problem.
Checking the performance of the database is a key part of any Oracle health check. Toad lets you check for specific parameters, including the distribution of datafiles across I/O channels (which can help to prevent data bottlenecks). It can also monitor usage of the shared pool, which is the area of memory that Oracle uses hold its library cache. This acts as a buffer for all SQL statements that are processed by the Oracle database, so performance will suffer if this is overly contended.
Toad also lets you compare your current health check against checks made before, to see what has changed. So if critical performance parameters (or others) are changing, you can see that clearly.
Ask for and solicit feedback. Listen to users of the database, and take notes. The best health check for the database is to find out how it's affecting people in real life. What are they complaining about? What errors are they getting and what frustrations do they have? Define steps to mitigate these complaints, if you haven't already.
Automation is the foundation for any good DBA, especially in areas such as health checks, which should be repeated on a regular basis so that you can spot any emerging problems.
You can also script Toad's entire health check automatically, by using the small camera icon at the bottom left of the health check window to save the settings in that area. It will then be created as an application of its own inside Toad Automation Designer, enabling you to run it regularly and quickly. You can get it sent to your email, if you like, so that you can be alerted to any potential database difficulties over your morning bagel.
No one step will be enough to keep your database in tip-top shape. Instead, any health check should encompass all of these steps, providing good visibility across all facets of its operation. Following these guidelines will help the diligent DBA to spot any bottlenecks for their applications as they appear, and help to ensure stability going forward. Doing it on a regular databases will help prevent your datafile from becoming a datafail. ®