Why did my server just die?

The importance of update testing

Internet Security Threat Report 2014

A recent update to the Point-Of-Sales (POS) software my organization uses also had the potential for some very serious disruption. The software in question is fairly decent stuff as far as POS software goes. It does the job and the features cover most of what we might want it to do.

This software however has traditionally had a weakness: it has never been able to make full use of the hardware provided to it. The underlying Pervasive database is actually a competent piece of gear capable of far more than the POS application ever asked of it.

No matter how we tried to configure this software - or the underlying database - we simply could not get it to consume more than 25 per cent of the hardware resources provided. Eventually, we virtualized it. We set up a system whereby at the end of each night the dataset was extracted from the primary copy of the POS software and pushed over to several reports servers.

Send in the clones

These reports servers were cloned instances of our POS server on which we could run various business reports. Some of the reports could take a full business day. Given that management has a nearly insatiable appetite for data, we had ballooned to the point where at peak times we were running a primary VM, two reports servers, and a testbed system. This was not because the hardware was inadequate to the task, but rather because the software stubbornly refused to use it.

Enter the latest update. Though a major version update - version 5.x to version 6.x - the release notes nonetheless indicated it to be a largely incremental update. Update testing went smoothly; it did all the things it was supposed to do.

The new version didn't require any changes to Windows, the Pervasive client software or really much of anything else that I could detect. I ran the system through what tests I could think of and then turned the system over to the accountants. They like to run various beancounter reports to ensure that the update didn't fundamentally change how it calculated things.

The very first report flattened the system. Every other VM on the testbed server turned into molasses and phone calls started coming in from a half dozen different people demanding to know what had just happened to their test servers. I have to admit that despite the changes to the POS server being the most recent element on the server to change, I did not for a second suspect it to be the cause.

Indeed, my first thought was that the testbed host server had dropped a disk; a degraded RAID 6 on an LSI 1078 is not exactly swift storage. I fired up the vSphere client to check the hardware status, but everything was healthy. When I twigged that the POS servers most recent update had at last enabled it to actually use the hardware provided to it, I was floored.

This then serves as a great example of how you can be bitten by a "good" update. The newly upgraded functionality has so dramatically altered resource requirements that deployment of what should be a simple update will require a complete review of our hardware allocation. When it comes to software updates, be careful what you wish for - and remember to test thoroughly when you get it. ®

Intelligent flash storage arrays

More from The Register

next story
Fat fingered geo-block kept Aussies in the dark
NASA launches new climate model at SC14
75 days of supercomputing later ...
Yahoo! blames! MONSTER! email! OUTAGE! on! CUT! CABLE! bungle!
Weekend woe for BT as telco struggles to restore service
You think the CLOUD's insecure? It's BETTER than UK.GOV's DATA CENTRES
We don't even know where some of them ARE – Maude
BOFH: WHERE did this 'fax-enabled' printer UPGRADE come from?
Don't worry about that cable, it's part of the config
Want to STUFF Facebook with blatant ADVERTISING? Fine! But you must PAY
Pony up or push off, Zuck tells social marketeers
Astro-boffins start opening universe simulation data
Got a supercomputer? Want to simulate a universe? Here you go
prev story


Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
Getting started with customer-focused identity management
Learn why identity is a fundamental requirement to digital growth, and how without it there is no way to identify and engage customers in a meaningful way.
How to determine if cloud backup is right for your servers
Two key factors, technical feasibility and TCO economics, that backup and IT operations managers should consider when assessing cloud backup.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
How to simplify SSL certificate management
Simple steps to take control of SSL certificates across the enterprise, and recommendations centralizing certificate management throughout their lifecycle.