Feeds

How UK air traffic control system was caught asleep on the job

We reveal the touchy culprit behind major NATS glitch

High performance access to file storage

A big outage that struck Britain's air traffic control system on Saturday was due to a technical fault with a touch screen interface provided by Frequentis, The Register has learned.

On Saturday 7 December, during the run-up to one of the busiest times of the year for the UK's airports, controllers at NATS (National Air Traffic Services) operations room in Swanwick noticed that their system had suddenly stopped working.

It quickly became clear that a major problem was unfolding that caused delays for thousands of passengers on flights into and out of Blighty's airspace over the weekend.

By midday on a typical Saturday, NATS would normally expect to be handling around 2,000 flights. But on the Saturday just gone, it was forced to reduce that load by 20 per cent, while its engineers rushed to resolve the technical cockup.

NATS - which bills itself as a "public private partnership" between its own staff (holding 5 per cent) seven major airlines (holding 42 per cent), operator LHR Airports Ltd (4 per cent) and the UK government (holding a 49 per cent "golden share") - initially, and rather vaguely, said the flaw was connected to an internal telephone system that is used by air traffic controllers.

Naturally, El Reg sought more technical details about what had gone wrong.

"The outage on Saturday was caused by a problem with a Frequentis system that enables our controllers to talk to other parts of the operation," a spokesman at NATS said.

"It uses a touch screen interface that automatically loads all the contacts - around NATS and in other agencies involved in the air navigation network - that a controller will need for the particular piece of airspace that they’re controlling at that time.

"It therefore ensures they can always immediately reach the person they need to speak to and will reconfigure itself with settings specific to the sector that the controller is responsible for when they log in for their shift."

But during Saturday's routine shift change, the system – which has been used by NATS for 11 years – collapsed, forcing the controllers to ground aircraft while engineers attempted to fix the error.

It's understood that the touchscreen telephone system failed to configure correctly so that new positions could be opened to split the extra sectors needed for daytime airspace control.

Delays were reported at airports including London, Cardiff, Edinburgh, Glasgow and Dublin. NATS said at the time that the glitch had not compromised passenger safety, but some questioned why contingency didn't fully kick in when the system failed.

NATS said on Saturday:

The technical and operational contingency measures we have had in place all day have enabled us to deliver more than 80 per cent of our normal operation. The reduction in capacity has had a disproportionate effect on southern England because it is extremely complex and busy airspace and we sincerely regret inconvenience to our airline customers and their passengers.

To be clear, this is a very complex and sophisticated system with more than a million lines of software. This is not simply internal telephones, it is the system that controllers use to speak to other ATC agencies both in the UK and Europe and is the biggest system of its kind in Europe.

It added that it had worked closely with Frequentis to get the system up and running. But by Monday morning, following a weekend of political pressure about the outage, NATS boss Richard Deakin admitted that an inquiry into the resilience of the UK airspace was needed.

“We are keen to do all we can at NATS to ensure the aviation industry has a full understanding of the capability that is in place in the UK and to take any further steps our customers and regulators decide are necessary to help avoid a repeat of last Saturday’s problems," he said.

Deakin added that the error took 14 hours to resolve and claimed that NATS eventually "delivered over 90 per cent of an extremely busy schedule of flights during the day".

It was the first time such a serious technical flaw had occurred since the system was installed in 2002, he said.

But we can't help but agree with exasperated folk stranded at airports over the weekend who - quite reasonably - asked why such a failure could have happened in the first place with a critical system. Redundancy, much? ®

High performance access to file storage

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Oh no, Joe: WinPhone users already griping over 8.1 mega-update
Hang on. Which bit of Developer Preview don't you understand?
Microsoft lobs pre-release Windows Phone 8.1 at devs who dare
App makers can load it before anyone else, but if they do they're stuck with it
Half of Twitter's 'active users' are SILENT STALKERS
Nearly 50% have NEVER tweeted a word
Internet-of-stuff startup dumps NoSQL for ... SQL?
NoSQL taste great at first but lacks proper nutrients, says startup cloud whiz
IRS boss on XP migration: 'Classic fix the airplane while you're flying it attempt'
Plus: Condoleezza Rice at Dropbox 'maybe she can find ... weapons of mass destruction'
Ditch the sync, paddle in the Streem: Upstart offers syncless sharing
Upload, delete and carry on sharing afterwards?
New Facebook phone app allows you to stalk your mates
Nearby Friends feature goes live in a few weeks
Microsoft TIER SMEAR changes app prices whether devs ask or not
Some go up, some go down, Redmond goes silent
prev story

Whitepapers

Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.