Capacity planning: How to plan ahead and keep your Oracle database healthy
Capacity planning: A DBA primer
Workshop Crystal balls aren't really good for IT administration. Database administrators (DBAs) like certainty and clarity, and that extends to how much strain their databases will be under in the future. Capacity planning is a way to figure that out.
The concept behind capacity planning is simple, but the mechanics are harder. Capacity planning acknowledges that the business requirements on the system may increase, and forecasts how much resource must be added to the database system to ensure that the user experience continues uninterrupted.
Typically, the resources you'll add may be CPU power, memory, storage, or network capacity, or more likely a combination of these, depending on how demands are predicted to grow.
This makes capacity planning different to performance tuning, because the latter is a reactive process. You tweak the system's parameters to ensure that it does not dip below pre-set levels based on current performance requirements. You plan capacity to ensure that it does not fall below performance levels in the future.
There is no silver bullet for capacity planning. There are several tools and steps available, but to be done well, this requires a little forethought and hard work.
One tool that you can use to help plan your Oracle capacity is Quest’s Space Manager with LiveReorg. This lets you rejig your files to reclaim space on your physical media, while keeping the database up and running. It displays current space use in visual form, while also using database metrics to analyse future growth in databases that share storage resources. This extends to using 'what if' analysis, so that you can tell what happens if the order rate rises by 10 per cent, say.
What's your goal?
The easiest part will be to define your goal. A conversation with business managers will hopefully help, here. Asking them what they expect key metrics in their business to do over the next few months will help you to craft an agreement with them, so that you can plan things more effectively at your end.
Talk to the business about its metrics. It may be most interested in the number of concurrent ecommerce customer sessions it expects to see on the web site. Perhaps the number of orders per month is expected to grow by 15 per cent, or the number of customer records accessed by the call centre will grow by a specific amount during a new product upgrade.
What's your configuration?
You have to understand your database inside out before you can plan capacity, especially if you're trying to predict multiple resources. But not just the database; collecting operations statistics on your entire system will help you to determine how it is performing, and plan for the next step.
Do it when there aren't any performance crunches or unexpected spikes in demand, so that you get a reliable picture of what the system normally does. You should collect statistics on the following:
- Oracle Database This will naturally be the most important piece of the puzzle for a DBA. The Automatic Workload Repository (AWR) can be a good way to find some of the data on Oracle database performance, as it takes regular snapshots of key performance metrics. You can look at these metrics in a web-based viewer.
- The application Profiling the application that the database serves will help you to see if any bottlenecks exist there. The profiling tool that you use here will depend on the application framework that you're using. .NET and Java each have a variety of solutions available, for example.
- The operating system Check to see how your operating system is performing when the database is running at normal levels. Again, the tools for this will depend on the specific platform (Sysinternals can help with Windows).
- The underlying infrastructure CPU usage, network bandwidth, and storage usage, are all relevant parts of the puzzle. Collecting this data during your snapshot is an important element in the process.
Gathering this data regularly and storing it will help you to track the increased use of data resources over time. The more historical data sets that you have here, the more accurate you'll be able to get in your predictions. You can store this data either in the AWR itself for smaller systems, or for larger, enterprise systems, in a dedicated Capacity Management Information System (CMIS).
If storing your statistical data in the AWR, you can use the AWR baseline to compare multiple snapshots of data over time. Ideally, you'll want enough of these snapshots to highlight times of peak load and normal operation. You can use then use Enterprise Manager to chart that historical data.
Get your statistics textbook and spreadsheet out
You need to find the average of these data points over time. One way to do that is to find the mean average (the sum of all the values, divided by the number of values). This will take units of peak load and poor performance into account. If you'd rather ignore that extreme data, you can use a median, in which you order the data (from lowest to highest, say) and take the median of the middle two values. This will remove the data that doesn't represent the norm.
If you're going after specific goals or questions (e.g.: "how many more orders a day can I process with the current CPU resource?") then statistical analysis can be useful. Details are beyond the scope of this document, but you can use functions such as correlations to find out how much an increase in one metric relates to the increase in another.
You might find that CPU usage is likely to be affected by the number of updates in a particular table column, for example. This could then help you to calculate the rough number of updates to that particular column permitted by a certain number of CPUs, at a specific CPU usage rate.
Excel spreadsheets are a common way to calculate these results, and there is a variety of pre-baked Oracle capacity planning spreadsheets available for sale online.
Set your thresholds
Your thresholds are levels of system usage beyond which you don't want to move. Thresholds can be set on all kinds of metrics in Oracle, such as cumulative user logins, for example.
These thresholds will help you to monitor when your resource capacity is starting to fall behind. As the number of critical alerts increases, it'll be a good sign that your capacity planning model needs to be revised, and new resource potentially added.
Capacity planning isn't the easiest of tasks for Oracle DBAs, and statistical modelling is never a completely sure thing. The further out you try to plan, the less accurate your details may become, which is why regular revision of your statistical data and recalculation of capacity baselines is a good idea.
This technical know-how, complemented by regular communication with business users to see what new demands are coming down the pipe, will help to keep your database ready for the challenges that a healthy business will inevitably throw at it.
Most importantly, it will keep you in control of your systems. This is a far better place to be than the alternative. No one wants to be thrown into reactive mode as they scurry to fix performance problems that were months in the making - if only they had the foresight to spot them. ®