The Register® — Biting the hand that feeds IT

Feeds

How two failed capacitors stranded Sydney rail commuters

Stalled by a LAN switch

  • print
  • alert

Customer Success Testimonial: Recovery is Everything

The railway signaling failure which crippled Sydney on April 12 (some commuters reported trips of more than three hours) was caused by a failing LAN switch and software that couldn’t cope, an engineering report has found.

The switch, probably a Cisco device given that the Borg is Railcorp’s dominant LAN kit supplier, was part of the network in the Sydenham signaling station. That facility governs signaling for a large chunk of the Sydney rail network.

The guilty switch suffered partial failure of two electrolytic capacitors, the report found. The switch is part of a dual redundant LAN which is supposed to be resilient to failure; however, the configuration didn’t account for an intermittent breakdown.

With the caps failing, the switch would shut down and try to re-start itself. This, the engineer’s report says, meant the Sydenham LAN was “caught in a cycle where it was continually trying to reconfigure itself to address the changing state of the network.”

It only took a little over ten minutes for technical staff to initiate a disaster recovery plan, but the procedure took more than an hour to complete. In that time, The software that governs the rail network, known as ATRICS, was unable to cope with the flaky network. This led to a knock-on effect, taking out a system called Microloc at another station, Revesby.

With ATRICS and Revesby’s Microloc system both failing, the network failed to a “safe state”, the report says – in which trains were halted where they were. Because of the hugely interdependent state of the Sydney rail network, 847 trains were delayed and 240 were cancelled, and it took the rest of April 12th for the system to recover.

One of the key recommendations of the report is that “The resilience of the ATRICS software to automatically recover from network disturbances without the need for manual intervention should be addressed as a matter of urgency.”

Amen to that. ®

Cloud based data management

you obviously don't live in NSW then....

The solution will involve a parliamentary oversight committee run by Rev. Fred Nile (with all of his IT experience) . This should take about 4 years of junket, visiting major railway locations such as Paris, London and Rome, to see how things are meant to work.

The committee will recommend about $2.1 billion spending required to fix problems. A tender by the government will receive bids ranging from $4.32 to $9 billion to fix, a final, FINAL price of $12 billion will be settled on, with a company headed by the Treasurer's brother-in-law.

This will then take another 4 years to implement.

After 6 years, with costs blown out to $18 billion, the government will finally admit that nothing has been done, they can't recall who the person who signed the contract was, or where he lives, the construction company will turn out to be based in a beachside shack in Belize, with bank accounts held in criminals-are-our-friends-Switzerland.

The next day, the signals will fail again, but this time the software will have forgotten the 'safe mode' setting, having been changed by the government to 'make sure every train gets there really fast and early to keep the voters happy' setting.

Chaos ensues, but being NSW this is normal.

2
0

S*** happens

The title was my first thought. Although others may bay for blood, it's the kind of oversight that can happen. At least the system failed secure, inasmuch no trains crashed into each other, nobody died (that I know of!), and hopefully they'll now learn from this.

Beer, because it's an Aussie tradition

2
0

So the end result...

I imagine they'll have created a small team of fairly well paid technical staff to develop of solution. The solution, in around 12 months time, will be to leave it all alone and hope it never happens that way again. Which it probably won't.

0
0

More from The Register

 breaking news
 breaking news
NSA PRISM-gate: Relax, GCHQ spooks 'keep us safe', says Cameron
Whatever they are up to, it's all above board, we're told
 breaking news
BBC lied to Parliament about doomed £100m IT monster, thunder MPs
Axed DMI ballooned and burst while watchdogs sang Kumbaya
PRISM snitch claims NSA hacked Chinese targets since 2009
Snowden suddenly looks safer in Hong Kong after revelations
 breaking news
US chief spook: Look, we only want to spy on 6.66 BEELLLION of you
Americans assured they are not in the NSA's sights
SCO vs. IBM battle resumes over ownership of Unix
Zombie lawsuit back and wants to suck the brains out of Linux
 breaking news
Silicon Valley digiterati to brainstorm at 30,000 ft
Nothing spurs creative thinking like 11 hours in a flying tube
Confidence in US Congress sinks to lowest level ever recorded
So why the %$#@! do we keep re-electing the same politicians?