FYI: Processor bugs are everywhere – just ask Intel and AMD

More chip flaws await

By Thomas Claburn in San Francisco

Posted in Data Centre, 26th January 2018 20:38 GMT

In 2015, Microsoft senior engineer Dan Luu forecast a bountiful harvest of chip bugs in the years ahead.

"We’ve seen at least two serious bugs in Intel CPUs in the last quarter, and it’s almost certain there are more bugs lurking," he wrote. "There was a time when a CPU family might only have one bug per year, with serious bugs happening once every few years, or even once a decade, but we’ve moved past that."

Thanks to growing chip complexity, compounded by hardware virtualization, and reduced design validation efforts, Luu argued, the incidence of hardware problems could be expected to increase.

This month's Meltdown and Spectre security flaws that affect chip designs from AMD, Arm, and Intel to varying degrees support that claim. But there are many other examples.

Last March, there was a bug affecting AMD's Ryzen chips that got patched with a workaround. And in June, AMD replaced some Ryzen 7 chips that weren't tuned to perform well under load.

That same summer, problems with hyperthreading surfaced in Intel's Skylake and Kaby Lake processors.

In February last year, clock problems with Intel's Atom C2000 chips surfaced, requiring widespread replacement.

Webpage slinger Cloudflare this month recounted a problem with Intel's Broadwell chips that it encountered last year.

In February 2017, while fixing a security issue the company dubbed Cloudbleed, Cloudflare engineers spotted a number of unexplained NGINX process crashes.

These segmentation faults (SIGSEGV) killed server processes intermittently but often enough to attract attention because the company runs so many servers.

The crashes produced core dumps and sifting through them requires some effort because they can be several gigabytes in size.

After ruling out memory errors, explains Cloudflare systems engineer David Wragg a blog post, those working on the issue noticed a common factor: the crashes were all occurring on Intel Xeon E5-2650 v4 servers.

Suspicions of a hardware problem were validated when engineers noticed an entry in Intel's errata for that processor model.

"The Specification Update described 85 issues, most of which are obscure issues of interest mainly to the developers of the BIOS and operating systems," said Wragg. "But one caught our eye: 'BDF76 An Intel Hyper-Threading Technology Enabled Processor May Exhibit Internal Parity Errors or Unpredictable System Behavior.'"

Intel fixed issue BDF76 through a microcode patch that Cloudflare delivered through a BIOS update from its server vendor. After the patch was applied, the number of unexplained core dumps dropped significantly.

Expect more hardware flaws to come. ®

Sign up to our NewsletterGet IT in your inbox daily

81 Comments

More from The Register

Micron, Intel consciously uncouple 3D NAND development

Will continue to work on 3D XPoint together

Hands up who HASN'T sued Intel over Spectre, Meltdown chip flaws

Chipzilla says class-action lawsuit tally stands at 32

Monday: Intel defector touts Arm server chip. Wednesday: Intel shows off new server chips

Xeon D-2100 a coincidence, Chipzilla assures us

Intel adopts Orwellian irony with call for fast Meltdown-Spectre action after slow patch delivery

For now, have some code that won't crash Skylakes and stay close to your Telescreens

Intel beefs up low-end line with Gemini Lake CPUs

Pentium Silver, Celeron get gigabit WiFi update

Former Intel EMEAR sales director takes Chipzilla to tribunal

Claims unfair dismissal, sex discrimination, withholding bonuses

Whomp. Intel's promised fatter Optane drive arrives

Offers advice on getting better Optane benchmark boosts

Intel is upset that Qualcomm is treating it like Intel treated AMD for years and years

Chipzilla takes number, joins queue to kick Snapdragon biz in the ball arrays

Intel top brass smacked with sueball for keeping schtum about chip flaws

CEO, CFO under fire as lawsuits mount up

Intel alerted computer makers to chip flaws on Nov 29 – new claim

Total coincidence: That's the same day Chipzilla's CEO sold off his shares