HSBC e-payments system limps back online
Mystery database glitch blamed for extended outage
Posted in Management, 9th April 2008 11:49 GMT
Watch Now : Virtual Machine Movement with Hyper-V
HSBC UK managed to get its beleaguered e-payments system back online yesterday evening after the second extended outage in less than a fortnight.
Retailers using the system were unable to process transactions from about 5pm (BST) on Monday until Tuesday evening.
Similar problems left merchants unable to take payments for two days over the weekend of 29-30 March. E-commerce firms using the service have intensified earlier calls for compensation.
In an email message to retailers (below), HSBC blamed the latest problems on a database glitch. It promised that services would be restored, following testing, at 9am on Wednesday. Merchants have told El Reg that the service came back ahead of schedule at about 7pm on Tuesday.
The service was operational from 12:00pm today but we have since encountered a number of further performance issues.As previously advised the issues that we are facing relate to the databases.
This is the first time that we have experienced this particular issue and we have engaged the help of both the database suppliers and the providers of the application.
In order that we can provide a stable platform to fully support your business, regrettably we have temporarily suspended the Secure ePayments service with effect from 15:30 today.
This suspension of service will allow us to undertake further correction and testing. The earliest the system will return will be 9:00am tomorrow.
The Bank's Executive Management are fully aware of the situation and appreciate your continued patience throughout this further period of service disruption.
HSBC's e-payments system is attractive for retailers because of the relative speed in which the money can be transferred into online bank accounts.
As previously reported, Merchants who rely on HSBC are angry about the latest outage.
We're yet to hear back from HSBC on what steps it intends to take to avoid the problem happening again or comment on whether or not it intends to offer compensation to affected customers for lost business.
Reg readers report that HSBC's e-payments site also went down for a day in January, making a total of five days of outage in the last three months. In a recorded message the bank said the cause of the latest outage had become the subject of a "full investigation". It added that it was continuing to monitor the system "closely through peak periods". ®
Watch Now : Virtual Machine Movement with Hyper-V
COMMENTS
Glad its online but!
Will it happen again this month? Rather not take the risk again and I do understand that banks have problems but so did my customers grrrrrr that kicked off with me! switched to Nochex.com
Ok...
I wasn't suggesting that this shouldn't have happened, or that it isn't a massive screw up, rather that banking systems are far more complex than generally given credit for.
Consider: A production server fails, you fail over to the DR server with it's copy of the disk in a remote site. Easy, probably even automated.
However: A database corruption occurs, this corruption would have been instantly transfered to the DR disk, so that DR server is totally useless. You (probaby) have snapshots from Start of day, or end of previous day (pre-batch). Did the batch corrupt the database? Do you want to recover from pre-batch, or post batch? If the batch corrupted the database, how? Do you need to re-run the batch, can it be re-run the next night? How long does it take to run? Did another server cause the corruption, at a guess a dialin system like merchant handsets would be using a bigass unix box, or Tandem, almost certainly talking to a back end mainframe, probably via some sort of broker... etc. etc. etc.
This is just one of many scenarios that could have happened, rather simplified one as well, but illustrates how DR can be rendered pretty useless. It doesn't even consider the reuqirement to recover from tape.
Why?
Is there always someone around to say that a total ballsup and utter failure is not "doing too bad, really"?
OK, so in an imperfect world balls-ups and failures will happen, though they shouldn't, but no, that doesn't make them all right.
Can we please have a T5 icon? Or even a bag with a BA tag on it would do.
Pedant, moi?
@Fraser
It's actually "Cue", as in a signal, such as a word or action, used to prompt another event in a performance, such as an actor's speech or entrance, a change in lighting, or a sound effect.. not a line of waiting people or vehicles, more commonly found in any British "fast food" outlet where the concept of FAST is above them....
poor sods
I wouldn't like to be in their IT department right now - even the guy who puts toner in the printers is probably getting stiffed for this.

Enabling efficient data center monitoring
The new Office Garage series:
Top 10 SIEM implementer’s checklist