Feeds

IBM spends holiday season wrangling e-tail FAIL

WebSphere-powered site at Oz department store chain takes the week off

Protecting users from Firesheep and other Sidejacking attacks with SSL

Update Australia's largest department store's website crashed for the week of post-Christmas sales, leaving IBM and possibly Oracle scrambling to fix the mess.

The department store chain that suffered the outage is Myer, Australia's analog for the UK's John Lewis or North America's Macy's inasmuch as it is positioned beneath more upmarket alternatives.

In 2008, Oracle published a case study (PDF) trumpeting that Oracle Retail has been implemented at Myer over several years, to give it “a unified view of inventory levels, purchasing, and supplier contracts so they could make better buying decisions.” The case study names IBM as the implementation partner. In 2012, IBM let it be known that its WebSphere software would power Myer's new website.

That site was put to the test on December 26th, traditionally the opening day for a period of heavy retail discounting in Australia. The site quickly crashed under a rush of mouse-wielding bargain-seekers, many of whom would have chased the exclusive deals offered online. The site stayed down until January 2nd. December 26th and January 1st are public holidays in Australia, while the 27th, 30th and 31st of December were theoretically normal business days on which Myer and IBM would have been able to drag their tech teams off the beach and into the office. The weekend of December 27th and 28th afforded further opportunity to work on the site.

As the outage continued, Myer executives told various media, including The Australian Financial Review, that its own IT team and IBM folk around the world were doing all they could to get the site back online.

Just what went wrong has not been disclosed but a Myer spokesperson told The Register that “Communications between the software and servers” was the problem. The surge in volume of shoppers, we were told, was not the cause of the problem as Myer prepared for a rush by laying on extra compute capacity.

The latter nugget of information suggests that perhaps an elastic cloud service was used to provide extra seasonal capacity. We've contacted IBM and it is yet to offer an explanation as to its role in the situation, nor has Oracle confirmed that it is still present at Myer.

Myer's website crash notice

Myer's website crash notice was in place for six days

But let's guess that Oracle Retail remains in place, if only because it's not the kind of application that turns over in a hurry, and that the Websphere-powered retail web site makes use of the inventory levels it provides.

As described by Myer, the problem sounds like it could be one of two things:

  • A network problem, either on the LAN or WAN;
  • A middleware mess of some sort that means messages from an app on one server aren't playing nicely with another.

If the former scenario is the cause of this mess, presumably either a telco or a networking hardware vendor is currently looking at the “penalties” section of their contracts. IBM still sells a little own-brand networking kit but mostly resells boxen made by others, so if the network is the problem we imagine fingers will be pointed in many directions. We'd guess that it's a LAN issue, as telcos are generally pretty good at redundancy. But would a LAN issue result in a week-long outage? Unless something's literally gone up in smoke, it's hard to imagine so. Which leads us to a second scenario.

Might integration between discrete software have come undone, perhaps when the extra seasonal compute capacity was added to the mix? Perhaps links between WebSphere and Oracle Retail - or some other software present at Myer we don't know about - came apart with disastrous consequences?

One thing is almost certain: whoever designed and tended the myer.com.au disaster recovery plan is about to revise their curriculum vitae.

At the time of writing, IBM and Oracle had not provided any comment. Myer's promised to reinstate the offers as soon as is possible, plans an investigation into the incident and is now watching its site's performance carefully to prevent future outages. ®

Update: Friday January 3rd IBM has sent The Reg the following "IBM statement attributed to an IBM spokesperson".

"An IBM team of local and global experts worked around the clock with MYER to resolve the issue with its online store. The technical issue was caused by a communication breakdown between internet servers and a software application. IBM and MYER will work together to conduct a thorough review to ensure this issue does not reoccur. IBM is committed to supporting MYER to continue to provide high quality service to its customers."

"... a communication breakdown between internet servers and a software application" sounds a bit like our hypothesis that scaling to the cloud broke something. We'll keep asking for more detail, but don't expect much now that we're in spokesperson territory.

Website security in corporate America

More from The Register

next story
Phones 4u slips into administration after EE cuts ties with Brit mobe retailer
More than 5,500 jobs could be axed if rescue mission fails
Israeli spies rebel over mass-snooping on innocent Palestinians
'Disciplinary treatment will be sharp and clear' vow spy-chiefs
Apple CEO Tim Cook: TV is TERRIBLE and stuck in the 1970s
The iKing thinks telly is far too fiddly and ugly – basically, iTunes
Huawei ditches new Windows Phone mobe plans, blames poor sales
Giganto mobe firm slams door shut on Microsoft. OH DEAR
Phones 4u website DIES as wounded mobe retailer struggles to stay above water
Founder blames 'ruthless network partners' for implosion
Found inside ISIS terror chap's laptop: CELINE DION tunes
REPORT: Stash of terrorist material found in Syria Dell box
Show us your Five-Eyes SECRETS says Privacy International
Refusal to disclose GCHQ canteen menus and prices triggers Euro Human Rights Court action
prev story

Whitepapers

Providing a secure and efficient Helpdesk
A single remote control platform for user support is be key to providing an efficient helpdesk. Retain full control over the way in which screen and keystroke data is transmitted.
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.
Security and trust: The backbone of doing business over the internet
Explores the current state of website security and the contributions Symantec is making to help organizations protect critical data and build trust with customers.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.