Sun tripling RAID protection
Another salvo in the fight against disk failure
Comment The RAID industry standard for storage is RAID-6, with recovery from a double drive failure. But it's not going to be good enough as disk capacities increase, prolonging failed disk rebuild times and so lengthening the window of unrecoverable failure if a third disk fails before the recovery from a double drive failure is complete.
This point is made by Adam Leventhal of Oracle/Sun's Fishworks in a blog. He says hard drive capacity roughly doubles every year but hard drive bandwidth is pretty constant, so it takes longer and longer to write data to fill up a drive.
Other things being equal, a 500GB drive will take twice as long to write as a 250GB drive. Suppliers are now producing 2TB drives, taking four times as long to fill with data as a 500GB drive; Leventhal implying that it will take about eight hours.
Assume 3TB drives are coming, then 4TB ones, and we're looking at 12 hours and 16 hours respectively for a rebuild of a full failed disk. Every added terabyte adds four hours to the rebuild time, half a day. That's increasing the chances a third drive will fail in the rebuild period for second and first failed drive rebuilds.
Leventhal has added triple-parity RAID to Sun's ZFS filesystem, calling it RAIDz3. He suggests calling it generically RAID-7 or RAID-8 might be silly. RAID-6 is often known as RAID-DP though, so RAID-TP would seem logical. Leventhal says it too could be superseded if disk capacities keep on growing.
That has to be logically true but, if the use of 3.5-inch disks switches over to 2.5-inch drives then that would reduce failed disk rebuild times. It would also likely increase the number of drives in an array, putting us back, roughly speaking, at square one.
Triple-parity RAID-Z will be included in the next major software release for Oracle/Sun's 7000 series sometime in the third quarter of this year; in other words, in a few weeks. It's not a first though - Avante Digital had a triple-parity EasyRAID product in 2006.
We might expect triple-parity RAID to start appearing, perhaps as an option, in mainstream enterprise EMC, HDS, HDS, IBM and NetApp arrays, and third-party RAID controllers from next year. ®
> Self. Foot. Shoot.
> Enjoy the extra 20% capacity while you can.
See: http://queue.acm.org/detail.cfm?id=1317400 where it says...
DB What are the provocative problems in storage that are still outstanding, and does ZFS help? What’s next? What’s still left? What are the things that you see down the pike that might be the big issues that we’ll be dealing with?
JB There are not just issues, but opportunities, too. I’ll give you an example. We were looking at the spec sheets for one of the newest Seagate drives recently, and they had an awful lot of error-correction support in there to deal with the fact that the media is not perfect.
BM They’re pushing the limits of the physics on these devices so hard that there’s a statistical error rate.
JB Right, so we looked at the data rates coming out of the drive. The delivered bandwidth from the outer tracks was about 80 megabytes per second, but the raw data rate—the rate that is actually coming off the platter—was closer to 100. This tells you that some 20 percent of the bits on that disk are actually error corrections.
BM Error correcting, tracking, bad sector remapping.
JB Exactly, so one of the questions you ask yourself is, “Well, if I’m going to start moving my data-integrity stuff up into the file system anyway—because I can actually get end-to-end data integrity that way, which is always stronger—then why not get some additional performance out of the disk drive? Why not give me an option with this disk drive?” I’ll remap all the bad sectors, because we don’t even have to remap them. It suffices to allocate it elsewhere and basically deliberately leak the block that is defective. It wouldn’t take a whole lot of file-system code to do that.
Then you can say, “Put the drive in this mode,” and you’ve got a drive with 20 percent more capacity and 20 percent higher bandwidth because you’re running ZFS on top of it. That would be pretty cool.
DB That’s a really exciting idea. Have you had those discussions with the drive vendors about whether they would offer that mode?
BM Not quite, because they’re most interested in moving up the margin chain, if you will, and providing more unreliable devices that they sell at a lower cost; it isn’t really something they care to entertain all that thoroughly.
Make Up Your MInds
You went with RAID so you wouldn't have to buy as many disks. If you don't want the RAID rebuild time then mirror instead. ----- sheeshhhh!
and so it begins...
> I would like to see triple parity from ZFS and an option to bypass hardware error correction.
Self. Foot. Shoot.
Enjoy the extra 20% capacity while you can.