The Linux-AMD AGP bug – who's to blame?

Cache, check or card?

After arranging a tete a tete between two representatives of the Linux kernel team and AMD, Gentoo founder Daniel Robbins has drawn some initial conclusions on the bug affecting Linux users on AMD Athlon AGP systems.

Daniel concludes that neither side is guilty; that it isn't an AMD bug, more a feature; but he also recommends that kernel hackers find a new problem to this particular situation. It appears that the GART (Graphics Address Remapping Table) which feeds the AGP card with system memory isn't cache coherent. Although both Linux (and apparently Windows 2000) expect it to be.

That's an drastic oversimplification of Daniel's postings on the Gentoo front page this morning
and his explanation on the kernel mailing list.

The GART lays out memory for the AGP card in a contiguous block, but the real memory that's being addressed. of course, lies all over the place - having been paged out in 4k blocks from main system memory. Intel added 4MB pages, and the temporary workaround for both Linux and Windows 2000 is to disable the 4MB page option.

AMD concludes:- "Our conclusion is that the operating system is creating coherency problems within the system by creating cacheable translation to AGP GART-mapped physical memory...
When the cache-line eviction occurs the stale data written to physical memory has fatal side effects."

With our limited knowledge of PC hardware architecture - and we trust Register readers can explain this one for us - we can't quite see how that relates to the 4k/4MB page size option. Why can't a simple flush clear the cache, we wonder? Let us know.

Nevertheless disabling 4MB pages appears to do the trick, all agree.

There's a birds eye view of where the GART fits in to the scheme of things at Anandtechhere

Related Story

AMD chip bug snares Linux users