Measuring the datacentre as a service
Yardstick, abacus, measuring jug
Data Centre The precise nature of a company's datacentre varies from place to place.
In some cases it will simply be a server room in the company's own premises; in others it may be multiple dedicated buildings many miles apart with robust, resilient interlinks. Whichever it is, the metrics or service measurement are the same.
The first considerations are business needs, and these are the hardest to agree on because the starting point will be a demand for absolute resilience at low cost. This means you'll have to sit all the key stakeholders down in a room and agree what can be done within the available budget. It may well be a robust discussion, but stick to your guns and point out key facts such as:
- If you want to get close to 100 per cent uptime you'll need at least two premises, and a properly resilient network and server infrastructure.
- All connections need to be redundant and preferably routed through diverse ducts into diverse external locations
- A full risk assessment of natural disaster liability (primarily flooding) is required
- Service levels must be agreed in the context of application implementation – you can't be expected to guarantee 100 percent uptime on a non-clustered database application that resides in a single location, for instance.
Service provision is, however, a compromise between business desire versus economic reality and risk – so agree the full context of an application when considering how to support it, and don't be afraid to think a little bit laterally.
For instance, a travel company with a call centre in the North Yorkshire countryside had a sales pattern that was skewed heavily into the November-January period, in which it did over 80 percent of its annual trade. For nine months of the year it relied on standard UPS power backup since this was considered sufficient when bookings were trickling in. In the more business-critical three-month busy period, two bright orange, rented generators appeared in the loading bay, providing additional backup during the high-risk period.
The next step is the most important: write down everything you have agreed and get stakeholders to ratify it. Yes, you're all on the same team, but having something down in black and white focuses the mind rather more than a round-table discussion. You are, after all, being considered as a supplier by your internal customer, so treat the agreement as a proper customer-supplier relationship. Be realistic and arrive at an achievable solution, since it's in neither party's interest to have to refer to the penalty clauses in anger.
Part of the document you produce will be the nuts and bolts of the service level agreement (SLA) between you and the customer. It's normal, for instance, to reflect the relative importance of different applications and business areas by having different SLAs on a per-application or per-department basis. Challenge every sentence with the question: “How do I measure that?”.
In many cases you'll already have the tools (for instance most UPSs are able to report fluctuation and outage stats), but it's generally inexpensive to add measurement tools for everything from basic server uptime through to correct application operation. Only sign up to measurements that can quantitatively be metered, give the customers a transparent management view onto the data, and agree with the business that “The XYZ application is really slow” is not an acceptable challenge whereas “The response of the XYZ application query screen has gone above the agreed 1.5 second average over the past 30 minutes” is.
In short, then, your data centre is ready to be measured as a service when:
- It satisfies the requirements that have been agreed by all within the various requirements, risks and business constraints of the organisation.
- You have filled the gaps between the facilities you had and the facilities that are required.
- You have implemented the capability to measure the service and make that data available to the customers.
- You are able to provide your internal customers with the tools they need to have visibility of performance and raise queries should they detect an issue.