Flash and the five-minute rule
NAND then there was disruption
Comment "Flash is a better disk ... and disk is a better tape." So said distinguished Microsoft engineer Jim Gray in 2006.
A few years before then he had formulated his famous five-minute rule of thumb, which said that it is better to cache disk-held data in DRAM if it was going to be re-used in five minutes or less. The idea was that, given the relative costs of DRAM, disk capacity, I/Os and data page sizes, it would cost less to cache such data in DRAM than keep it on disk.
This proved to be an enduring rule and the coming of affordable flash now means that it can be used to say when data should be stored in NAND or disk or DRAM. Given the steep price declines of flash, the predictions from it are that flash will become the place to store active data, not disk.
This picture of flash's prospects was presented by David Dale of NetApp at an SNIA data centre event in London on Tuesday. His session was called The Benefits of Solid State in Enterprise Storage Systems, and he set aside NetApp partiality to present an industry picture. That's the value of SNIA sessions like this - vendor reps don't sell or market their products, they educate.
Dale's pitch was fascinating and gave a glimpse of the turmoil that lies in wait for storage and server vendors that don't embrace flash, turmoil that can be predicted by Gray's five-minute rule.
Five-minute rule formula
The rule is worked out like this: you calculate the break-even relative interval (RI), and if it's less than five minutes you cache the data. The formula is:
RI = (Data pages per MB / IOPS per disk) x (price of disk / price per MB of RAM)
According to this article 1MB of RAM cost $5,000 in 1987, a 15 IOPS disk cost $15,000 and the data page size was 1KB. The RI formula, using these figures, works out at 205 secs. Data used more often than that should be kept in RAM with the rest on disk.
By 1997 prices and IOPS had changed. The MB of RAM cost $15, a 64 IOPS disk cost $2,000 and the page size was 8KB. The RI works out to pretty much the same: 267secs.
Fast forward to 2008, and the MB of RAM now costs 10 cents, the disk plus controller costs $650 and spurts out 183 IOPS - this is a 15K, 2.5-inch enterprise drive - and the RI is 2,273 secs, or 38 minutes. Oops. You can't feasibly hold all the data that will be re-used in 38 minutes in RAM or you can increase the page size. A 64KB one gives an RI of 568 seconds (nine minutes) which is better but still too high.
Enter the flash dragon. It's less expensive, generally speaking, than disk drives were in 1987 and this changes things. Flash is the disruptive technology that brings this RI discontinuity back into balance - that was the substance of Dale's SNIA pitch.
It has 100 times better IOPS per dollar and a thousand times better IOPS per milliwatt than disk at random reads. It is 10 times better at bandwidth per milliwatt than disk. It is also 10 times better at MB per milliwatt than DRAM, and wins big in the latency stakes over disk, but DRAM is better still.
Notwithstanding that disks are really good at doing sequential writes, flash not really buying you anything with writes, flash will show up as a disk and a DRAM replacement.
Dale provided five-minute rule RI numbers for flash against DRAM. A slide stated: "Assuming that he cost of cache is dominated by its capacity, and the cost of backing store is dominated by its access cost (cost per IOPS), then the break even interval for keeping a page of data in cache is given by dividing the backing store cost per IOPS by the cache cost per page."
In 1987, using these metrics, disk cost $2,000/IOPS and RAM was $5/KB. A 1KB page's break-even point was 400 seconds.
In 2008, Dale said, disk was $1 per IOPS, a 2,000x reduction, and DRAM was $50/GB, a 100,000x reduction, meaning it was $0.05/MB. The 50KB page break-even was five minutes, the 4KB once was one hour and the 1KB one was five hours. There needed to be a 50-fold increase in page size to cache for break-even at five minutes.
Looking at break even for flash and hard drives (HDD) in 2010 he said HDDS cost $1/IOPS, single-level cell (SLC) flash around $10/GB and multi-level cell (MLC) around $4/GB. A250KB page break even with SLC was five minutes, but five hours with a 4KB page size. It was five minutes with a 625KB page size with MLC flash and 13 hours with a 4KB MLC page size.
Again there needed to be a 50-fold increase in page size to cache for break even at five minutes.
Looking at DRAM and flash his numbers were $0.05/IOPS for 4KB enterprise SLC, $0.02 for 4KB and enterprise MLC, with DRAM at $20/GB. A 6KB page size SLC break even came out at five minutes, as did a 2KB page size MLC.
What does this mean?
Flash makes it cost-effective to keep more small random data in a NAND cache than DRAM, say a five-plus hour working set in NAND and a one-hour one in DRAM. The random data working set size in DRAM can be reduced.
Why is this happening now? Dale said you can put flash chips together to get capacity not too expensive compared to disk. Secondly, there is a ton of innovation being poured into MLC NAND, which is cheaper than SLC. Thirdly, array vendors getting much better at tiering and so you can put flash drives as a tier zero in there.
He said, all things considered, "We're at the front end of a big discontinuity," and, getting back to the five-minute rule and comparing DRAM and flash, not DRAM and disk, "You can keep five hours worth of data in flash cache ahead of disk now." When treating flash as backing store for DRAM, the five-minute rule holds at a 6KB page size for DRAM.
The result of this is that you won't buy as much DRAM as you do do now. Also the random working set in DRAM can be reduced from one hour to five minutes.
Enterprise arrays are affected by this just like host servers. Dale said the application area opportunities include intense random reads, sequential read after random write and low read latency, enabling memory resident apps.
He identified three use cases: flash now showing up as tier zero in arrays, read caching in the controller with the hot working set always there, and having SSD cache in the network. Think Avere.
The flash invasion of arrays means you will buy fewer disks. It's inevitable and it's only the beginning. He thought it will affect vendors more than customers.
Ironically NetApp only has one of these use cases in its product line - caching in the controller with its PAM card - it has yet to introduce tier zero flash into its arrays.
His presentation finished up with a prediction that we will see the introduction of a flash-based Storage Class Memory sitting between DRAM and disk arrays by 2013. He showed a slide predicting that MLC NAND would meet enterprise HDD on cost/GB around 2012-2013. His penultimate slide said:
Over the next five years solid state technologies will have a profound impact on enterprise storage ... The architectural balance of memory, cache and persistent storage will change. Today's solid state implementations in enterprise storage demonstrate these changes. It's only the beginning.
There is a storage discontinuity between DRAM and hard drives and flash is going to fill it - that is now beginning to look like a certainty.
David Dale's presentation is on an SNIA website for the London event. If you are in the SNIA you can request the location and password by emailing Paul dot Trowbridge at Evito dot com. If you are not in the SNIA then we guess you'd better join and then send the email to Paul. ®