Feeds

How UK air traffic control system was caught asleep on the job

We reveal the touchy culprit behind major NATS glitch

Boost IT visibility and business value

A big outage that struck Britain's air traffic control system on Saturday was due to a technical fault with a touch screen interface provided by Frequentis, The Register has learned.

On Saturday 7 December, during the run-up to one of the busiest times of the year for the UK's airports, controllers at NATS (National Air Traffic Services) operations room in Swanwick noticed that their system had suddenly stopped working.

It quickly became clear that a major problem was unfolding that caused delays for thousands of passengers on flights into and out of Blighty's airspace over the weekend.

By midday on a typical Saturday, NATS would normally expect to be handling around 2,000 flights. But on the Saturday just gone, it was forced to reduce that load by 20 per cent, while its engineers rushed to resolve the technical cockup.

NATS - which bills itself as a "public private partnership" between its own staff (holding 5 per cent) seven major airlines (holding 42 per cent), operator LHR Airports Ltd (4 per cent) and the UK government (holding a 49 per cent "golden share") - initially, and rather vaguely, said the flaw was connected to an internal telephone system that is used by air traffic controllers.

Naturally, El Reg sought more technical details about what had gone wrong.

"The outage on Saturday was caused by a problem with a Frequentis system that enables our controllers to talk to other parts of the operation," a spokesman at NATS said.

"It uses a touch screen interface that automatically loads all the contacts - around NATS and in other agencies involved in the air navigation network - that a controller will need for the particular piece of airspace that they’re controlling at that time.

"It therefore ensures they can always immediately reach the person they need to speak to and will reconfigure itself with settings specific to the sector that the controller is responsible for when they log in for their shift."

But during Saturday's routine shift change, the system – which has been used by NATS for 11 years – collapsed, forcing the controllers to ground aircraft while engineers attempted to fix the error.

It's understood that the touchscreen telephone system failed to configure correctly so that new positions could be opened to split the extra sectors needed for daytime airspace control.

Delays were reported at airports including London, Cardiff, Edinburgh, Glasgow and Dublin. NATS said at the time that the glitch had not compromised passenger safety, but some questioned why contingency didn't fully kick in when the system failed.

NATS said on Saturday:

The technical and operational contingency measures we have had in place all day have enabled us to deliver more than 80 per cent of our normal operation. The reduction in capacity has had a disproportionate effect on southern England because it is extremely complex and busy airspace and we sincerely regret inconvenience to our airline customers and their passengers.

To be clear, this is a very complex and sophisticated system with more than a million lines of software. This is not simply internal telephones, it is the system that controllers use to speak to other ATC agencies both in the UK and Europe and is the biggest system of its kind in Europe.

It added that it had worked closely with Frequentis to get the system up and running. But by Monday morning, following a weekend of political pressure about the outage, NATS boss Richard Deakin admitted that an inquiry into the resilience of the UK airspace was needed.

“We are keen to do all we can at NATS to ensure the aviation industry has a full understanding of the capability that is in place in the UK and to take any further steps our customers and regulators decide are necessary to help avoid a repeat of last Saturday’s problems," he said.

Deakin added that the error took 14 hours to resolve and claimed that NATS eventually "delivered over 90 per cent of an extremely busy schedule of flights during the day".

It was the first time such a serious technical flaw had occurred since the system was installed in 2002, he said.

But we can't help but agree with exasperated folk stranded at airports over the weekend who - quite reasonably - asked why such a failure could have happened in the first place with a critical system. Redundancy, much? ®

Boost IT visibility and business value

More from The Register

next story
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Munich considers dumping Linux for ... GULP ... Windows!
Give a penguinista a hug, the Outlook's not good for open source's poster child
Intel's Raspberry Pi rival Galileo can now run Windows
Behold the Internet of Things. Wintel Things
Linux Foundation says many Linux admins and engineers are certifiable
Floats exam program to help IT employers lock up talent
Microsoft cries UNINSTALL in the wake of Blue Screens of Death™
Cache crash causes contained choloric calamity
Eat up Martha! Microsoft slings handwriting recog into OneNote on Android
Freehand input on non-Windows kit for the first time
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 10 endpoint backup mistakes
Avoid the ten endpoint backup mistakes to ensure that your critical corporate data is protected and end user productivity is improved.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Rethinking backup and recovery in the modern data center
Combining intelligence, operational analytics, and automation to enable efficient, data-driven IT organizations using the HP ABR approach.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.