Bug kills Intel gig-E controllers
Detective work at Layer 2
Star2Star CTO Kristian Kielhofner has identified a buggy implementation the Intel 82574L Ethernet controller that makes some kit subject to a “packet of death” that hangs the port receiving the packet.
His lengthy description of the discovery of the bug is at his personal blog, here, but it boils down to this: with the right combination of SIP traffic, the hex value 32 (ASCII 2) with the offset 0x47f would crash the interface that received it.
Unfortunately, Kielhofner says, an EEPROM fix is required – so what worked in his circumstances might not work for others. He notes that Intel has a fix.
Now for the detective story. The issue arose with customer complaints that Star2Star-branded hardware was crashing randomly:
“The system and ethernet interfaces would appear fine and then after a random amount of traffic the interface would report a hardware error (lost communication with PHY) and lose link. Literally the link lights on the switch and interface would go out.”
After lots of packet captures, he writes, “I ended up tracing this (Asterisk) response to a specific phone manufacturer’s INVITE”. Further investigation led to an SDP quirk in which “Problem packets had just the right Call-ID, tags, and branches to cause the ‘2’ in the ptime to line up with 0x47f.”
And – here’s where the problem stops being specific and becomes something that can conceivably have a wider impact – with the dangerous packet contents identified, Kielhofner was then able to reproduce it outside the VoIP world:
“With a modified HTTP server configured to generate the data at byte value (based on headers, host, etc) you could easily configure an HTTP 200 response to contain the packet of death - and kill client machines behind firewalls” – (depending on the firewall type, he also notes).
He’s posted a test here to allow others to see if they’re using affected interfaces. ®
Update - Intel Responds: Intel has provided this statement to The Register:
Intel was made aware of this issue in September 2012 by the blog’s author. Intel worked with the author as well as the original motherboard manufacturer to investigate and determine root cause. Intel root caused the issue to the specific vendor’s mother board design where an incorrect EEPROM image was programmed during manufacturing. We communicated the findings and recommended corrections to the motherboard manufacturer.
It is Intel’s belief that this is an implementation issue isolated to a specific manufacturer, not a design problem with the Intel 82574L Gigabit Ethernet controller. Intel has not observed this issue with any implementations which follow Intel’s published design guidelines. ®