Feeds

How two failed capacitors stranded Sydney rail commuters

Stalled by a LAN switch

  • alert
  • submit to reddit

The Essential Guide to IT Transformation

The railway signaling failure which crippled Sydney on April 12 (some commuters reported trips of more than three hours) was caused by a failing LAN switch and software that couldn’t cope, an engineering report has found.

The switch, probably a Cisco device given that the Borg is Railcorp’s dominant LAN kit supplier, was part of the network in the Sydenham signaling station. That facility governs signaling for a large chunk of the Sydney rail network.

The guilty switch suffered partial failure of two electrolytic capacitors, the report found. The switch is part of a dual redundant LAN which is supposed to be resilient to failure; however, the configuration didn’t account for an intermittent breakdown.

With the caps failing, the switch would shut down and try to re-start itself. This, the engineer’s report says, meant the Sydenham LAN was “caught in a cycle where it was continually trying to reconfigure itself to address the changing state of the network.”

It only took a little over ten minutes for technical staff to initiate a disaster recovery plan, but the procedure took more than an hour to complete. In that time, The software that governs the rail network, known as ATRICS, was unable to cope with the flaky network. This led to a knock-on effect, taking out a system called Microloc at another station, Revesby.

With ATRICS and Revesby’s Microloc system both failing, the network failed to a “safe state”, the report says – in which trains were halted where they were. Because of the hugely interdependent state of the Sydney rail network, 847 trains were delayed and 240 were cancelled, and it took the rest of April 12th for the system to recover.

One of the key recommendations of the report is that “The resilience of the ATRICS software to automatically recover from network disturbances without the need for manual intervention should be addressed as a matter of urgency.”

Amen to that. ®

The Essential Guide to IT Transformation

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
The Essential Guide to IT Transformation
ServiceNow discusses three IT transformations that can help CIO's automate IT services to transform IT and the enterprise.
Maximize storage efficiency across the enterprise
The HP StoreOnce backup solution offers highly flexible, centrally managed, and highly efficient data protection for any enterprise.