EMC XtremIO has its quirks but rumours of its death are overblown

An embarrassment of all-flash array riches

Comment A rumour from last week implied EMC was considering shutting down its XtremIO product line. How likely is it – and is there any basis to EMC taking this direction?

The rumour seemed to come from a number of separate parties, although we can’t exclude a clever viral marketing campaign from a competitor.

XtremIO history

EMC acquired XtremIO for around $430m in May 2012, before any product had shipped to customers. With around $25m in funding, this was a pretty sweet deal for the XtremIO investors. After some directed availability, XtremIO 1.0 was released to General Availability (GA) in November 2013. Version 2.4 went GA in May 2014, with version 3.0 quickly being announced and making it GA in September 2014.

This was the infamous release that was both a disruptive and destructive upgrade. Version 4.0 was announced at EMC World in May 2015 (bringing in revised hardware) with GA on 30 July 2015.

Since then there have been only minor bug fix type releases with no big news announcements on the platform. So – no new software/hardware releases in over 12 months, with the pace of releases slowing down.

XtremIO architecture

The XtremIO platform is based on X-bricks. These are dual controller and disk shelves, which, combined with UPS support, are marketed as a highly available unit.

Systems then scale out through multiple X-bricks, with a current maximum of eight. All X-brick nodes participate in reading and writing data with new I/O distributed algorithmically across the nodes to ensure an even spread of data. This has the benefit that all components of the cluster are involved in serving data.

However it also has some negatives. First, the loss of an X-brick leads to a “system down” situation. Now this is highly unlikely, but anything that takes a single brick down will cause a problem because there’s no internal X-Brick to X-Brick redundancy. Remember, as you add more nodes to a cluster without replication, then the availability is reduced by a factor of the number of nodes; a four brick system has half the availability of a two brick system, as either component could fail and there’s no redundancy. Similarly, eight brick systems are half as reliable for the same reason.

XtremIO writes data as a 25-drive (23+2) full stripe write across all drives. This is the XDP proprietary RAID system. XDP provides very low write amplification, but does mean X-bricks are running with a fixed configuration – except the 5TB starter bricks.

This is good and bad for expansion: if X-bricks could be expanded with more capacity (which they can’t), in the current model, each would have to be expanded by an entire disk shelf. To keep the cluster performance even, each X-brick would also need to be upgraded.

The need for a uniform configuration and inability to expand a single X-brick makes it a problem for EMC. New drive capacities (expressed as X-brick capacities) have to exist for many years, unless customers are offered an entire system replacement rather than an upgrade. As yet, EMC hasn’t introduced TLC drives for XtremIO. With the current design, an entire cluster would have to be built from TLC drives, unless XtremIO 5.0 brings in the ability to mix and match.

Just to touch again on that lack of TLC support, it does seem that the industry is again ahead of EMC, including their own products!

EMC Unity supports TLC, as does NetApp, Nimble, Dell SC, HPE 3PAR, Kaminario and (I believe) SolidFire. So why not keep cost competitive and use TLC in XtremIO? Are there architectural restrictions? Samsung is predicting a quick move to TLC technology, so vendors not supporting this media will be left behind in price wars.

EMC product portfolio

Not including the software-based solutions (like ScaleIO), EMC now has:

  • DSSD (high end performance)
  • XtremIO
  • All-flash VMAX
  • All-flash Unity
  • All-flash VNX2 (although the hybrid models seem to be pushed more here)
  • After the acquisition by Dell there will be all-flash Dell SC

It’s an embarrassment of riches, so which platforms will remain and which will go?

I can’t imagine the new Dell Technologies division will have room for six all-flash systems. DSSD sits in a specific market segment. Dell SC, Unity and VNX2 all seem to overlap, so presumably one will survive and customers will be directed to that over time.

XtremIO and VMAX overlap, with the new VMAX systems (announced in February 2016) matching XtremIO for performance and exceeding XtremIO in scalability and native features.

So why was VMAX-AF (my terminology) introduced? Presumably EMC has many customers who simply didn’t want to move from their investment in VMAX. The platform is mature, has rock-solid features such as SRDF and companies invest time and effort in training staff and scripting to the platform, including operational procedures.

XtremIO still has no native replication and in reality, RecoverPoint is a workaround solution. Calvin Zito at HPE points out that the VMAX-AF models 450 and 850 are coincidentally named the same as the high-end 3PAR platforms 20450 and 20850…

XtremIO issues

The Register article linked at the top of this page points to potential issues with XtremIO being the reason for the rumoured shutdown of the platform.

Apparently issues of scalability and reliability are said to be a problem. When XtremIO 3.0 was released, we know that block size was increased from 4KB to (presumably) 8KB to cater for larger SSDs. I say “presumably” as all of the public literature was changed to say “a few kilobytes” rather than quote an exact number.

It could be that the engineering change with 3.0 has increased the block size sufficiently to cater for larger drives. With the release of XtremIO 4.0, each X-brick controller was given a DRAM upgrade too. The reason this is important is because one key feature of XtremIO is the ability to keep all metadata in memory. Naturally this means there is a scalability limit on both the number of X-bricks and the capacity supported by an X-brick. The way around this is to implement some kind of “metadata swapping” process, moving some metadata to flash or secondary DRAM to get around the problem.

The trade-off with this will always be in compromising performance, as reading metadata on flash will be way slower than accessing it in DRAM.

The Architect’s View

EMC is potentially in a bit of a bind with their all-flash platforms. Unity and Dell SC will meet the requirements of midrange customers; DSSD will meet high-end requirements.

If (and I stress if) there are scalability and reliability issues, then moving back to VMAX may be both a defensive position and one to placate VMAX customers not wanting to move.

The real problem here for EMC though is a financial one. EMC claims a billion-dollar run-rate with XtremIO (see articles here and here), making any announcement on the future (or not) of XtremIO a share-price changing event, one that EMC will not do lightly and certainly not with the impending Dell acquisition. Better instead let things drift along for another few months until the Dell transaction completes, then use a portfolio consolidation/rationalisation story to slowly phase XtremIO out.

Every month that a new XtremIO upgrade isn’t announced will add more grist to the rumour mill – though EMC may announce a major upgrade in the next 30 days and we'll all have to eat some humble pie. Which way would you bet? ®

Biting the hand that feeds IT © 1998–2017