Feeds

IBM employee sparks massive bank outage

Big Blue liveware triggers seven-hour FAIL

Intelligent flash storage arrays

Last Monday, one of Singapore's largest banks suffered a seven-hour IT outage that took down everything from back-office services to ATMs. This Tuesday, the flawed component was identified: an IBM employee.

"We take full responsibility for this incident," wrote DBS Group Holdings CEO Piyush Gupta in a statement. A laudably mature response, to be sure, but his communiqué went on to explain that the blame for the outage, which lasted from 3am to 10am on Monday July 5, is to be borne by IBM.

Specifically, an IBM employee who made "a procedural error in what was to have been a routine maintenance operation [that] subsequently caused a complete system outage."

Oops.

The cascading failure began when a storage subsystem began giving error messages that indicated intermittent failures. A fix was scheduled for 3am, "a quiet period," in Gupta's words.

Unfortunately for DBS and IBM, an "outdated procedure" was used to initiate the repair, and all IT hell broke loose. By 3:40 a "a technical command function" was mobilized, and at 5:20 a system restart was attempted. Didn't work.

Following "complications during the machine restart," Gupta wrote, the "bankwide disaster recovery command centre" was activated, but by 8:30 it was determined that the core troubles could be fixed by 10:00, so full-scale disaster recovery wasn't needed. Main services were, indeed, up by 10:00, and, Gupta wrote, "All other services were progressively restored through the morning and virtually everything was back on track by lunchtime." No data was lost during the outage, he reports.

IBM and BDS entered into a S$1.2bn ($872m, £575m) agreement in 2002 in which the bank outsourced "selected IT services and infrastructure in Singapore and Hong Kong to IBM."

IBM on Tuesday released a statement noting that it had "taken steps to enhance training of our personnel related to current procedures and brought in experts from our global team to provide further assistance."

Big Blue did not note if that one unlucky IT admin was receiving the enhanced training, or if he has now become an uptick in global unemployment statistics. ®

Beginner's guide to SSL certificates

Whitepapers

Driving business with continuous operational intelligence
Introducing an innovative approach offered by ExtraHop for producing continuous operational intelligence.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
How to determine if cloud backup is right for your servers
Two key factors, technical feasibility and TCO economics, that backup and IT operations managers should consider when assessing cloud backup.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Choosing a cloud hosting partner with confidence
Download Choosing a Cloud Hosting Provider with Confidence to learn more about cloud computing - the new opportunities and new security challenges.