Leap second bug cripples Linux servers at airlines, Reddit, LinkedIn
Not a good time to be Australian
The leap second inserted at the weekend crippled Linux-powered servers running one of the world’s largest airline reservation systems - delaying and cancelling flights.
Machines running the mighty Amadeus Altea system were brought down soon after an extra second was added to Coordinated Universal Time (UTC) at midnight on Saturday, 30 June. The bonus second was inserted at the direction of time boffins to keep UTC synchronised with Earth's slowing rotation.
The Altea system was taken offline for an hour, and staff at Qantas and Virgin Australia had to check in passengers manually, disrupting flight plans.
A spokesperson for Amadeus confirmed to The Reg today that the outage had been caused by a bug in the kernel of the open-source Linux operating system, and the flaw was triggered by the leap-second change on Saturday night. He said the problem has been sidestepped using a workaround within an hour, but Amadeus is investigating how to avoid and detect similar bugs in advance.
Servers run by Mozilla, StumbleUpon, Yelp, FourSquare, Reddit and LinkedIn were also reported to have been hit by the same bug. Mozilla said its implementation of the Java-based Hadoop data processing framework and ElasticSearch weren’t working properly on Saturday evening.
Mozilla’s Eric Ziegenhorn posted at 0517 PT, minutes after the leap second was added:
Servers running Java apps such as Hadoop and ElasticSearch and Java doesn't appear to be working. We believe this is related to the leap second happening tonight because it happened at midnight GMT.
Of all these, however, it’s the Altea outage that was by far the more troubling: Amadeus provides the backend booking and reservation system for a growing number of the world’s airlines. Amadeus claims it is the world’s second largest processor of online bookings while it is reported to handle 25 per cent of the world’s 84,000 daily flights.
Amadeus claims 135 airlines have implemented its Altea reservation system, more than 100 have purchased Altea inventory and 60 use Altea departure control. The Altea system was rewritten from a mainframe app in 2004 and moved to Unix-like systems in a response by Amadeus to keep up with changing demands.
Rolled out in 2005, Altea is a set of software modules for booking and reservations that run on Linux and Unix servers, using Java Enterprise Edition, Spring and Apache. Amadeus built the system to move off ageing big iron.
The leap second was added to compensate for the Earth’s uneven rotation by the International Earth Rotation and Reference and System Service. UTC is the time standard for all clocks, devices and applications, as well as POSIX-compliant operating systems, and a second is periodically inserted. There’s traditionally been three ways of implementing the change, which are described here.
Linux distro biz Red Hat published a patch for its Enterprise Linux here - patches for other flavours of the OS are circulating. It is believed the leap second causes the Linux kernel to livelock when the system attempts to adjust the time and date of the computer, causing processors to spin on their wheels doing nothing and hampering services as a result.
Google has modified its internal NTP time servers to gradually add a couple of milliseconds to its regular clock adjustments during a window before the actual leap second is required. ®
Sponsored: Benefits from the lessons learned in HPC