Cache bang wallop! Huge loads spark NetApp flash crash
Sysadmins gnash at FAS6000 array fault
Some high-end NetApp FAS6000 arrays are suffering failures that cause them to halt and restart. NetApp is fixing the problem.
NetApp Flash Cache PCIe card
El Reg understands that several FAS6000 customers in Europe have discovered that their arrays stop working while under heavy load and abruptly restart. This interrupts their ability to satisfy data I/O requests from accessing servers and so slows down applications running on those machines.
These arrays use NetApp's Flash Cache [PDF] , a three-quarter-length PCIe x8 card containing NAND solid-state chips that form a read cache. This memory handles data I/O that would otherwise shift ten times more slowly if it had to be accessed from the array's hard disk drives.
A FAS6000 array can have up to 6TB of flash cache spread across twelve 512GB cards or modules.
The cache has a customisable FPGA chip  that controls the system and caching activities. Suspicion has fallen on this chip as the source of the problem.
The company is working hard, as we might imagine, to solve the fault, and said:
NetApp is aware that an individual hardware component failure has occurred in a limited number of customers’ high-end systems. NetApp's systems are designed to be fault tolerant to enable continuous operation in the event that a failure does occur.
In this instance, we have identified the hardware component and we are currently introducing resolutions to our customers. NetApp remains committed to providing the best experience with our products and we are working closely with our customers to resolve the situation as quickly as possible.
We have been told that affected customers have agreements with NetApp to not publicly disclose the issue. This has not been confirmed. ®