Feeds

How two failed capacitors stranded Sydney rail commuters

Stalled by a LAN switch

  • alert
  • submit to reddit

Choosing a cloud hosting partner with confidence

The railway signaling failure which crippled Sydney on April 12 (some commuters reported trips of more than three hours) was caused by a failing LAN switch and software that couldn’t cope, an engineering report has found.

The switch, probably a Cisco device given that the Borg is Railcorp’s dominant LAN kit supplier, was part of the network in the Sydenham signaling station. That facility governs signaling for a large chunk of the Sydney rail network.

The guilty switch suffered partial failure of two electrolytic capacitors, the report found. The switch is part of a dual redundant LAN which is supposed to be resilient to failure; however, the configuration didn’t account for an intermittent breakdown.

With the caps failing, the switch would shut down and try to re-start itself. This, the engineer’s report says, meant the Sydenham LAN was “caught in a cycle where it was continually trying to reconfigure itself to address the changing state of the network.”

It only took a little over ten minutes for technical staff to initiate a disaster recovery plan, but the procedure took more than an hour to complete. In that time, The software that governs the rail network, known as ATRICS, was unable to cope with the flaky network. This led to a knock-on effect, taking out a system called Microloc at another station, Revesby.

With ATRICS and Revesby’s Microloc system both failing, the network failed to a “safe state”, the report says – in which trains were halted where they were. Because of the hugely interdependent state of the Sydney rail network, 847 trains were delayed and 240 were cancelled, and it took the rest of April 12th for the system to recover.

One of the key recommendations of the report is that “The resilience of the ATRICS software to automatically recover from network disturbances without the need for manual intervention should be addressed as a matter of urgency.”

Amen to that. ®

Secure remote control for conventional and virtual desktops

More from The Register

next story
Bono apologises for iTunes album dump
Megalomania, generosity and FEAR of irrelevance drove group to Apple deal
HBO shocks US pay TV world: We're down with OTT. Netflix says, 'Gee'
This affects every broadcaster, every cable guy
Facebook, Apple: LADIES! Why not FREEZE your EGGS? It's on the company!
No biological clockwatching when you work in Silicon Valley
SCREW YOU, EU: BBC rolls out Right To Remember as Google deletes links
Not even Google can withstand the power of Auntie
Arab States make play for greater government control of the internet
Nerds told to get lost in last-minute power grab bid at UN meeting
Apple SILENCES Bose, YANKS headphones from stores
The, er, Beats go on after noise-cancelling spat
Zippy one-liners, broken promises: Doctor Who on the Orient Express
Series finally hits stride, but Clara's U-turn is baffling
Don't bother telling people if you lose their data, say Euro bods
You read that right – with the proviso that it's encrypted
America's super-secret X-37B plane returns to Earth after nearly TWO YEARS aloft
674 days in space for US Air Force's mystery orbital vehicle
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Win a year’s supply of chocolate
There is no techie angle to this competition so we're not going to pretend there is, but everyone loves chocolate so who cares.
Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.