IBM employee sparks massive bank outage
Big Blue liveware triggers seven-hour FAIL
Last Monday, one of Singapore's largest banks suffered a seven-hour IT outage that took down everything from back-office services to ATMs. This Tuesday, the flawed component was identified: an IBM employee.
"We take full responsibility for this incident," wrote DBS Group Holdings CEO Piyush Gupta in a statement. A laudably mature response, to be sure, but his communiqué went on to explain that the blame for the outage, which lasted from 3am to 10am on Monday July 5, is to be borne by IBM.
Specifically, an IBM employee who made "a procedural error in what was to have been a routine maintenance operation [that] subsequently caused a complete system outage."
The cascading failure began when a storage subsystem began giving error messages that indicated intermittent failures. A fix was scheduled for 3am, "a quiet period," in Gupta's words.
Unfortunately for DBS and IBM, an "outdated procedure" was used to initiate the repair, and all IT hell broke loose. By 3:40 a "a technical command function" was mobilized, and at 5:20 a system restart was attempted. Didn't work.
Following "complications during the machine restart," Gupta wrote, the "bankwide disaster recovery command centre" was activated, but by 8:30 it was determined that the core troubles could be fixed by 10:00, so full-scale disaster recovery wasn't needed. Main services were, indeed, up by 10:00, and, Gupta wrote, "All other services were progressively restored through the morning and virtually everything was back on track by lunchtime." No data was lost during the outage, he reports.
IBM and BDS entered into a S$1.2bn ($872m, £575m) agreement in 2002 in which the bank outsourced "selected IT services and infrastructure in Singapore and Hong Kong to IBM."
IBM on Tuesday released a statement noting that it had "taken steps to enhance training of our personnel related to current procedures and brought in experts from our global team to provide further assistance."
Big Blue did not note if that one unlucky IT admin was receiving the enhanced training, or if he has now become an uptick in global unemployment statistics. ®