Boffins say flash disk demands new RAID designs
Wear-levelling levels out
Solid state disks (SSDs) are wonderfully fast, but every time you write to them, the semiconductors involved degrade just a little, a property that means you swap them into established storage rigs at your peril.
That's the conclusion of a new paper, “Stochastic Analysis on RAID Reliability for Solid-State Drives” from a pair of boffins at The Chinese University of Hong Kong.
The paper explains its purpose as follows:
“Traditional storage systems mainly use parity-based RAID to provide reliability guarantees by striping redundancy across multiple devices, but the effectiveness of RAID in SSDs remains debatable as parity updates aggravate the wearing and bit error rates of SSDs.
In particular, an open problem is that how different parity distributions over multiple devices, such as the even distribution suggested by conventional wisdom, or uneven distributions proposed in recent RAID schemes for SSDs, may influence the reliability of an SSD RAID array.”
It's worth asking those questions of SSDs in RAID arrays because the all-silicon disks wear out over time. Disk-makers try to ensure their products have the longest possible working life with a trick called wear-levelling that spreads the work around inside an SSD so different bits of it don't get more work than others. The idea is that all parts of an SSD therefore age gracefully at about the same rate, instead of some regions being beaten to death at a young age.
Wear-levelling gets better with each passing year, but when data preservation really matters it is still disconcerting to know that bits of your disk might drop off the twig. Doubly so if you've gone to all the trouble of building a RAID-5 array to access the extra level of data protection it affords.
The pair therefore set up tests to see how much data survives, and is lost, on SSDs set up under the Diff-RAID scheme and conventional RAID 5. The study offers a very, very, complex test and measurement methodology involving a lot of reads, writes and erasures of data on SSDs, under various conditions, with subsequent measurement of how well the SSDs are at the end of the process.
Tests assume up to a terabyte of traffic passing through a disk each day, a not-unreasonable assumption given SSDs are often targeted for high I/O chores.
The results show that under some circumstances the all-SSD RAID arrays do well. On other occasions, things get nasty, especially as the drives age. And despite wear levelling, it seems the onset of ageing – and therefore bit loss - was possible to detect in the pair's tests.
The conclusion? “Performance and reliability analysis on RAID in the context of hard disk drives has been extensively studied. SSDs have a distinct property that their error rates increase as they wear down, so a new model is necessary to characterize the reliability of SSD RAID.” The paper offers that new model, for those willing to wade through some rather dense maths.
If that's not your bag, another out-take seems simple: you can't assume SSDs are always as reliable as spinning rust, especially when you use them in the same way. ®