Will multi-tier flash arrays come to a data centre near you?
One step at a time
Multi-tier storage is a familiar concept in data centres and large server installations.
In the old days, this was often a simple case of having a single-access bus type with different-speed disks, for example SCSI-based disk arrays with 15,000rpm disks in the “fast” set and 7,200rpm or 5,400rpm in the “slow” set.
Commonly, however, it is achieved by having entirely different interfaces and disk types, for example SAS-connected flash disks for the fast set and SATA-connected spinning disks in the less performant set.
The thing is, though, flash storage is not a single concept. It is a whole family of technologies which has three members:
- SLC (single level cell): each storage element can store one bit of data;
- MLC (multi-level cell): an appallingly named step up from SLC that should really be called double-level cell as each element can store up to two bits;
- TLC (triple-level cell): quelle surprise, each element can store up to three bits.
As in all computing applications, there is no such thing as the best option – technology is full of compromises. First of all, flash storage wears out: as you write and erase each storage cell it degrades and eventually becomes unusable because it is simply not able to hold charge any more.
Wear and tear
SLC is super-fast and has the longest lifespan, but it is expensive – you need twice as many cells to hold the same amount of data on an SLC medium as you would on MLC, for instance.
TLC has a rubbish lifespan – each cell lives at one of eight voltage levels, compared with SLC's two and MLC's four, and the more levels you have the more likely a wear-related error becomes – but it is far cheaper.
Right now MLC tends to be the average choice because it has a half-decent trade-off between lifespan and speed versus cost.
Bearing this in mind, then, let's take a step back and look at the two main reasons why we employ multi-tier storage in our data centres in the first place.
Disk-based backups: more and more companies are taking the approach of using dirt cheap SATA-connected disks as their secondary storage medium instead of tape, using tape only as an archiving technique.
I have worked with companies that don't actually own a tape drive because they have multiple sites and socking big arrays of cheap disks for their backups. When they want an archive they can take away they dump the data to a removable drive.
High-speed storage for niche applications: introducing super-fast disks for a tiny minority of applications that are so data intensive that their average-speed kit can't keep up.
Let's look at each of these scenarios.
For disk-based backups, you won't be employing flash any time soon because of those two magic words: “dirt cheap”. For backups you want cheap storage and lots of it, and it will be a few years yet before flash overtakes spinning disks for value.
For high-speed storage there are two possible scenarios.
- You currently have a load of spinning disks and you need something fast, in which case you will add a flash tier above your spinning tier.
- You currently have a load of flash-based storage (which is mega-fast anyway) and if you need even more speed then you probably need to look not just at the storage technology but at the entire underlying storage subsystem.
- On the face of it, then, the need for multi-tier flash arrays is at best limited and at worst non-existent – but perhaps only in a visible sense.
- What on earth do I mean by that? Invisible flash arrays?
Who's a clever controller?
Well, one of the key factors with the current flash technologies on the market is that because flash drives wear out, the vendors are making massive efforts to offset this finite lifetime by building cleverness into the controllers to which they attach.
So controllers are designed with concepts such as “wear levelling” which spread the load across the cells on the drives. You don't see this – it just does it for you.
Flash drives are already appearing in spinning disk arrays as a pre-storage layer. Just as RAID and SCSI controllers have on-board cache in the form of RAM, we are also seeing high-speed caching disks on storage arrays to sit between the high-speed host machines and the slower spinning disks.
You can probably see where this is going.
Is it flash or isn't it?
With today's technology, then, it is not a vast leap of logic to conceive of a two-tier array in which we have a small number of high-speed, long-lived SLC disks at the host-facing layer (to which the host machines write) with a larger number of slower, shorter-lived MLC or TLC disks at the back end and a clever controller that decides when the time is right to shuttle data from the outward-facing layer into the background layer.
It is hard to imagine, though, that in the next year or two – while flash storage is still in its relative youth – we will have companies going out and deliberately installing multi-tier flash storage.
The investment required to install a flash infrastructure of significant size is considerable – you have to look at not just the disks but the entire storage infrastructure to be sure that the hosts can make best use of the storage at the expensive end of the chain.
So the decision for the near future won't be “what types of flash shall I use?” but “is it flash or isn't it?”.
In three to five years, when flash becomes more of a commodity, we may well start to see people tiering it in their SANs just like they currently do with 15,000rpm and 7,200rpm spinning disks.
For the time being, though, the multi-tiering in a flash world is likely to sit inside the array and be controlled entirely by intelligent controller logic. The system manager will remain a mere technological spectator. ®