Oz bank meltdown due to file corruption cock-up

IBM mainframe upgrade downer

graph up

The five-day mainframe bank system meltdown at the National Bank of Australia (NAB) was due to a corrupted file on an IBM mainframe system that was being upgraded.

It's reported that staff attempted a mainframe upgrade on Wednesday 25 November, and this failed to complete. It was reversed and this was when, it appears, ongoing payment processing data in a file was corrupted.

It caused payments to stop or to be recorded incorrectly, with some customer accounts having multiple incorrect debits applied. Money transfers to other banks as well as the NAB's own customers were affected.

Private and business customers were prevented from accessing their accounts at ATMs and electronic funds transfer payments stopped. Customers had to attend branches in person to get cash and the bank hurriedly opened some branches on Sunday to cope with the rush. It ran full-page adverts in Australian papers saying how sorry it was.

A payment processing backlog built up. Some customers had interest applied to illusory debts in their accounts, and the bank's support staff had the massive job of rolling everything back to a known good point and then reapplying transactions in strict time to get everything back up to date.

The bank has promised any disadvantaged customers will have their accounts put right.

The NAB production environment has four IBM mainframes in a PLEX, a group of mainframes running z/OS in a Parallel Sysplex cluster that allows them to share data, operate in parallel and provide disaster recovery facilities. In this case a mainframe upgrade provided the backdrop to a disaster the SyspLex was supposed to prevent.

The CIO of National Australia Bank is Michele Tredenick. She has Denis McGee reporting to her as general manager of IT, and he was appointed in summer 2008. He previously worked for ANZ Bank and was heavily involved in architecting its offshored IT operations in Bangalore.

NAB has built up its offshore mainframe support and maintenance work in recent years, and has been working with Infosys and Satyam Computer Services on the maintenance of core banking systems in this regard.

At the time of McGee's appointment NAB was developing an overhaul of its legacy banking systems in a Aus$1bn four to five year project. This upgrade appears to have been cancelled.

In February 2009 NAB suspended a project to outsource more IT work to Satyam because that firm's long-term future was unclear. Satyam retained its existing NAB maintenance and application development work though.

The bank has not said how the file was corrupted, and the suspicion is that data in the file was screwed up during the failed upgrade and roll-back process.

NAB, which has been reapplying transactions to customer accounts from the known point before the failure, is predicting that all of its customer accounts should be back to normal tomorrow. The bank faces class action lawsuits, and it is very likely that a senior IT management head or two will have to be lopped off to appease customers.

If lousy offshore support is shown to be a factor, then the whole offshoring of support is liable to be re-examined. ®

Sponsored: 5 critical considerations for enterprise cloud backup