Feeds

How two failed capacitors stranded Sydney rail commuters

Stalled by a LAN switch

  • alert
  • submit to reddit

Secure remote control for conventional and virtual desktops

The railway signaling failure which crippled Sydney on April 12 (some commuters reported trips of more than three hours) was caused by a failing LAN switch and software that couldn’t cope, an engineering report has found.

The switch, probably a Cisco device given that the Borg is Railcorp’s dominant LAN kit supplier, was part of the network in the Sydenham signaling station. That facility governs signaling for a large chunk of the Sydney rail network.

The guilty switch suffered partial failure of two electrolytic capacitors, the report found. The switch is part of a dual redundant LAN which is supposed to be resilient to failure; however, the configuration didn’t account for an intermittent breakdown.

With the caps failing, the switch would shut down and try to re-start itself. This, the engineer’s report says, meant the Sydenham LAN was “caught in a cycle where it was continually trying to reconfigure itself to address the changing state of the network.”

It only took a little over ten minutes for technical staff to initiate a disaster recovery plan, but the procedure took more than an hour to complete. In that time, The software that governs the rail network, known as ATRICS, was unable to cope with the flaky network. This led to a knock-on effect, taking out a system called Microloc at another station, Revesby.

With ATRICS and Revesby’s Microloc system both failing, the network failed to a “safe state”, the report says – in which trains were halted where they were. Because of the hugely interdependent state of the Sydney rail network, 847 trains were delayed and 240 were cancelled, and it took the rest of April 12th for the system to recover.

One of the key recommendations of the report is that “The resilience of the ATRICS software to automatically recover from network disturbances without the need for manual intervention should be addressed as a matter of urgency.”

Amen to that. ®

New hybrid storage solutions

More from The Register

next story
JINGS! Microsoft Bing called Scots indyref RIGHT!
Redmond sporran metrics get one in the ten ring
Phones 4u slips into administration after EE cuts ties with Brit mobe retailer
More than 5,500 jobs could be axed if rescue mission fails
Phones 4u website DIES as wounded mobe retailer struggles to stay above water
Founder blames 'ruthless network partners' for implosion
Found inside ISIS terror chap's laptop: CELINE DION tunes
REPORT: Stash of terrorist material found in Syria Dell box
OECD lashes out at tax avoiding globocorps' location-flipping antics
You hear that, Amazon, Google, Microsoft et al?
Show us your Five-Eyes SECRETS says Privacy International
Refusal to disclose GCHQ canteen menus and prices triggers Euro Human Rights Court action
prev story

Whitepapers

Providing a secure and efficient Helpdesk
A single remote control platform for user support is be key to providing an efficient helpdesk. Retain full control over the way in which screen and keystroke data is transmitted.
WIN a very cool portable ZX Spectrum
Win a one-off portable Spectrum built by legendary hardware hacker Ben Heck
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Security and trust: The backbone of doing business over the internet
Explores the current state of website security and the contributions Symantec is making to help organizations protect critical data and build trust with customers.