AMD reveals Opteron crash bugs

Fixes pipelined

AMD has pledged to fix three bugs in its Opteron chips that, if unremedied, could cause host systems to crash in certain circumstances.

In a June 2004 Opteron and Athlon 64 Revision Guide (PDF), the chip maker lists three processor glitches - known in the trade as 'errata' - that could mean the "system may hang".

A misaligned 128-bit store could cause "processor deadlock" when "a 128-bit store operation (MOVUPS, MOVUPD, MOVDQU) occurs to a cacheable memory type. The store is misaligned across two cache lines such that the upper eight bytes span a cache line boundary. The store has retired but not yet written the data cache. The store is followed by two other load or store operations to the same cache index as the second half of the misaligned store (ie. bits 11:6 are the same)."

There's no workaround, but AMD belives it "unlikely" that such a set of circumstances will come together.

Next, there's a potential deadlock with "tightly couple semaphores" in an MP systems in which "a write location may not become externally visible due to certain internal pipeline conditions involving tightly coupled semaphores across multiple processors.

"1. Processor A does a write to clear processor B's semaphore but that write has not yet become visible to the system.

"2. Processor B is waiting for its semaphore to be released before releasing processor A's semaphore.

"3. Processor A immediately enters a spin loop waiting for its semaphore to be cleared by processor B, and the spin loop must fetch from the instruction cache (IC) on every cycle.

"4. Because the IC is busy every cycle combined with other highly specific internal pipeline conditions, processor A's original write is prevented from being seen by processor B. Additionally, event 3 (above) must follow event 1 closely in time and interrupts must be disabled."

In this case, AMD provides a BIOS setting that should eliminate the problem, but says it intends to fix the issue in a future chip revision. Ditto the final potential crash opportunity, in which Reverse MOVS may yield "unpredictable behaviour".

"In certain situations a REP MOVS instruction may lead to incorrect results," says AMD. "An incorrect address size, data size or source operand segment may be used or a succeeding instruction may be skipped. This may occur under the following conditions:

"1. EFLAGS.DF=1 (the string is being moved in the reverse direction).

"2. The number of items being moved (RCX) is between 1 and 20.

"3. The REP MOVS instruction is preceded by some microcoded instruction that has not completely retired by the time the REP MOVS begins execution. The set of such instructions includes BOUND, CLI, LDS, LES, LFS, LGS, LSS, IDIV, and most microcoded x87 instructions."

Again, AMD plans to fix the issue in a future chip revision, and advises users and system builders to seek out a BIOS update that will solve the problem.

The Rev MOVS issue is the main addition to the Guide's June edition - the other two problems first surfaced in April. The chip maker lists a second addition to the June Guide: a specification violation made by "some processor revisions" in which their Rtt pins do not meet HyperTransport specifications. AMD has said it will fix this too, but at least "there are no known failures related to this problem". ®

Related stories

AMD: no longer the also-ran
AMD updates public roadmap
AMD sets date for dual-core CPUs
AMD to ship mobile Athlons, Semprons in Q3
HP gets vague about Opteron and Itanium blades
AMD bags Chinese giant
AMD readies low-cost Sempron CPUs
Tyan aims four-way Opteron board at supercomp makers
AMD unveils Socket 939 processors

Related review

AMD Athlon 64 Socket 939

Sponsored: Minds Mastering Machines - Call for papers now open

Biting the hand that feeds IT © 1998–2018