Feeds

IBM spends holiday season wrangling e-tail FAIL

WebSphere-powered site at Oz department store chain takes the week off

High performance access to file storage

Update Australia's largest department store's website crashed for the week of post-Christmas sales, leaving IBM and possibly Oracle scrambling to fix the mess.

The department store chain that suffered the outage is Myer, Australia's analog for the UK's John Lewis or North America's Macy's inasmuch as it is positioned beneath more upmarket alternatives.

In 2008, Oracle published a case study (PDF) trumpeting that Oracle Retail has been implemented at Myer over several years, to give it “a unified view of inventory levels, purchasing, and supplier contracts so they could make better buying decisions.” The case study names IBM as the implementation partner. In 2012, IBM let it be known that its WebSphere software would power Myer's new website.

That site was put to the test on December 26th, traditionally the opening day for a period of heavy retail discounting in Australia. The site quickly crashed under a rush of mouse-wielding bargain-seekers, many of whom would have chased the exclusive deals offered online. The site stayed down until January 2nd. December 26th and January 1st are public holidays in Australia, while the 27th, 30th and 31st of December were theoretically normal business days on which Myer and IBM would have been able to drag their tech teams off the beach and into the office. The weekend of December 27th and 28th afforded further opportunity to work on the site.

As the outage continued, Myer executives told various media, including The Australian Financial Review, that its own IT team and IBM folk around the world were doing all they could to get the site back online.

Just what went wrong has not been disclosed but a Myer spokesperson told The Register that “Communications between the software and servers” was the problem. The surge in volume of shoppers, we were told, was not the cause of the problem as Myer prepared for a rush by laying on extra compute capacity.

The latter nugget of information suggests that perhaps an elastic cloud service was used to provide extra seasonal capacity. We've contacted IBM and it is yet to offer an explanation as to its role in the situation, nor has Oracle confirmed that it is still present at Myer.

Myer's website crash notice

Myer's website crash notice was in place for six days

But let's guess that Oracle Retail remains in place, if only because it's not the kind of application that turns over in a hurry, and that the Websphere-powered retail web site makes use of the inventory levels it provides.

As described by Myer, the problem sounds like it could be one of two things:

  • A network problem, either on the LAN or WAN;
  • A middleware mess of some sort that means messages from an app on one server aren't playing nicely with another.

If the former scenario is the cause of this mess, presumably either a telco or a networking hardware vendor is currently looking at the “penalties” section of their contracts. IBM still sells a little own-brand networking kit but mostly resells boxen made by others, so if the network is the problem we imagine fingers will be pointed in many directions. We'd guess that it's a LAN issue, as telcos are generally pretty good at redundancy. But would a LAN issue result in a week-long outage? Unless something's literally gone up in smoke, it's hard to imagine so. Which leads us to a second scenario.

Might integration between discrete software have come undone, perhaps when the extra seasonal compute capacity was added to the mix? Perhaps links between WebSphere and Oracle Retail - or some other software present at Myer we don't know about - came apart with disastrous consequences?

One thing is almost certain: whoever designed and tended the myer.com.au disaster recovery plan is about to revise their curriculum vitae.

At the time of writing, IBM and Oracle had not provided any comment. Myer's promised to reinstate the offers as soon as is possible, plans an investigation into the incident and is now watching its site's performance carefully to prevent future outages. ®

Update: Friday January 3rd IBM has sent The Reg the following "IBM statement attributed to an IBM spokesperson".

"An IBM team of local and global experts worked around the clock with MYER to resolve the issue with its online store. The technical issue was caused by a communication breakdown between internet servers and a software application. IBM and MYER will work together to conduct a thorough review to ensure this issue does not reoccur. IBM is committed to supporting MYER to continue to provide high quality service to its customers."

"... a communication breakdown between internet servers and a software application" sounds a bit like our hypothesis that scaling to the cloud broke something. We'll keep asking for more detail, but don't expect much now that we're in spokesperson territory.

High performance access to file storage

More from The Register

next story
Sorry London, Europe's top tech city is Munich
New 'Atlas of ICT Activity' finds innovation isn't happening at Silicon Roundabout
MtGox chief Karpelès refuses to come to US for g-men's grilling
Bitcoin baron says he needs another lawyer for FinCEN chat
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
Audio fans, prepare yourself for the Second Coming ... of Blu-ray
High Fidelity Pure Audio – is this what your ears have been waiting for?
Did a date calculation bug just cost hard-up Co-op Bank £110m?
And just when Brit banking org needs £400m to stay afloat
Zucker punched: Google gobbles Facebook-wooed Titan Aerospace
Up, up and away in my beautiful balloon flying broadband-bot
Apple DOMINATES the Valley, rakes in more profit than Google, HP, Intel, Cisco COMBINED
Cook & Co. also pay more taxes than those four worthies PLUS eBay and Oracle
It may be ILLEGAL to run Heartbleed health checks – IT lawyer
Do the right thing, earn up to 10 years in clink
prev story

Whitepapers

Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
HP ArcSight ESM solution helps Finansbank
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Mobile application security study
Download this report to see the alarming realities regarding the sheer number of applications vulnerable to attack, as well as the most common and easily addressable vulnerability errors.