Original URL: http://www.theregister.co.uk/2014/06/23/how_much_disruptive_innovation_does_your_flash_storage_rig_really_need/

How much disruptive innovation does your flash storage rig really need?

Random IO? Or just plain random?

By Chris Mellor

Posted in Storage, 23rd June 2014 17:57 GMT

Our technology world is fascinated by disruptive innovation. Every tech startup says its new technology is disruptive and therefore it is bound to succeed.

So it is with all-flash arrays which can answer data requests in microseconds, instead of the milliseconds needed by disk drive arrays.

Startups such as Pure Storage, SolidFire and Violin say they have best-of-breed products in the networked storage array category because they are all-flash with software designed from the ground up to control their arrays.

They provide flash speed at roughly the cost of the fastest performing disks, the 15,000rpm drives. These are many times slower than flash because of their need to move the read/write heads across the surface of the disk platters to the right track and then wait for the right sector to appear under the head as the disk rotates.

If you can't beat them then join them, say the disk-drive array vendors, who have all put SSDs in disk drive slots to create faster reacting storage.

Dell, EMC, Fujitsu, HDS, HP, IBM, NetApp and others have all done this, with some such as EMC and NetApp introducing flash caches as well to speed data on its way.

Not so fast (literally), say the all-flash array startups. The mainstream vendors' arrays with flash storage inside still use disk IO-based control software, legacy stacks of software that assume data is stored in sectors in tracks on platters of spinning disk drives.

"Our software," they will say, "has been designed from the get-go to use flash and be aware that it wears out with repeated writing, unlike disk. It minimises the number of writes by coalescing them and deduplication to get rid of redundant data."

The mainstream disk-drive arrays can't do deduplication at all, or as well, because their disk drives are too slow for all the mapping hash table look-ups needed.

Their disk-based software stacks aren't as efficient at reducing the number of writes, and the upper- and mid-level controller software has to have extra steps inserted in lower-layer code to make the flash storage look like disk to the upper layers.

Lean and mean

This makes the IO processing slower. "Our software is leaner and more efficient," say the vendors.

That this is true is shown by suppliers such as EMC and IBM buying their own all-flash-array startups: XtremIO for EMC and TMS for IBM.

NetApp is developing its own all-flash array called FlashRay, but Dell, HDS and HP have chosen not to go this route. They rely instead on using all-flash array versions of their existing Compellent (Dell) and StoreServ (HP) arrays and saying "our software is good enough to drive the flash hardware effectively and efficiently".

HDS has a flash acceleration sub-system it has developed for its VSP and HUS VM arrays and is saying pretty much the same thing regarding its array controller software. But these three suppliers say something else as well: that their array controller software has a full set of data and array management features that the all-flash array startups don't have.

For example, their arrays can replicate data between them as a way of protecting against an array failure. They can take snapshots of data and store them as another way of protecting against data loss.

They have highly reliable software, strengthened by years of development, which enterprise customers can rely on to store their data safely. Their arrays have controller and other features to ensure there is no single point of failure.

The management facilities of their arrays are mature and well understood by customers and integrated into upper-level or overall IT management frameworks and with virtual server software domains.

This level of data protection and management maturity and integration is too valuable to be simply discarded because there is a new hot box on the street.

Certainly the new all-flash arrays are disruptive but so were Hovercraft and Segway scooters, and neither of these inventions turned out to have any lasting relevance. Innovation on its own is not sufficient to be disruptive. Simply being new is not enough.

To have and have not

All the mainstream drive array vendors are singing from this hymn sheet, but equally they are all adopting all-flash array technology and a clear-cut divide is opening up between the ones with standalone all-flash array technology and the ones without.

Dell, HDS and HP do not have separate all-flash arrays that stand alongside their existing arrays. Instead their flash storage is inside these arrays and hence inherits the data protection and management benefits that these arrays already have.

The latest HDS VSP G1000 array, in all-flash mode, is the fifth-fastest performer on the industry-standard SPECsfs2008 NFS benchmark, designed to test storage array's responsiveness to random file IO in a business context.

HDS_VSP_G1000_SPECsfs2008_NFS

HDS VSP G1000 SPECsfs2008 NFS benchmark ranking

There are no equivalent benchmark results for new all flash arrays from suppliers such as Pure Storage or SolidFire.

As a rule of thumb we could say a flash-transformed array and software could match many all-flash arrays in pure performance, with the benefit of fitting in to current data and array management processes and the disadvantage of being more expensive to acquire, power, cool and house in a data centre.

Jay Kid, NetApp's chief technology officer, thinks that all-flash arrays outside of legacy storage arrays have their place. He believes that extending the capabilities of NetApp's ONTAP storage controller software would be a stretch too far to cover flash, as its primary focus is disk.

The company's coming all-flash FlashRay and existing EF540/550 systems, neither of which run ONTAP, can run and use flash better than ONTAP.

He says NetApp is excited about FlashRay, a fresh design. It will be "a great $/GB offering, with MLC flash and hundreds of thousands of IOPS."

That is one of the vital aspects. New all-flash arrays deliver more IO operations per second, many more than an existing disk-based storage array architecture can do with the same amount of flash storage and at a much lower cost per IO.

This is true for random IO, the kind that sends disk read-write heads in time-consuming skips across the platters. It is not so true for sequential I/O, where both flash and disk pour data into and out of an array at roughly comparable speeds.

Suits you, sir

Therefore new all-flash arrays are thought to be a good fit for applications needing lots of random IO – ones that are, in the jargon, latency-sensitive, such as databases, real-time analytics and trading software.

The idea is that an all-flash array can satisfy random IO requests from a much smaller data-centre footprint as measured in rack enclosure sizes and therefore needs less power and cooling, as well as less space, all helping to lower its total cost.

Note that array vendors without a new all-flash array in their product arsenal will disagree with this blanket statement.

Dell, HDS and HP, for example, would assert that their flash-transformed traditional arrays are as good as new all-flash arrays and have their existing storage management and data protection features. That means no new storage silo to acquire and operate and no new supplier relationship to manage.

We can build up the characteristics of the ideal application for (1) new all-flash arrays, (2) flash-transformed traditional arrays, (3) flash-enhanced traditional arrays with SSDs and flash caches and (4) all-disk storage arrays.

These four categories are separated by qualitative differences so businesses and public organisations have to balance their budgets and needs and evaluate their own individual best product fit to requirements.

Over time the new all-flash arrays will get better management and data protection processes and may fit into a supplier's legacy operational and management processes.

As an example EMC's ViPR software abstraction layer can be used to operate and provision traditional-style EMC arrays such as VMAX and VNX and the newer XtremIO all-flash array product.

The conclusion to take away from this discussion is use best of breed when you need speed and a traditional array when operational process fit is your priority. ®