Original URL: http://www.theregister.co.uk/2008/11/04/emc_maui_another_invista/

Is EMC's Maui another Invista?

Biting off more than it can chew

By Chris Mellor

Posted in Storage, 4th November 2008 08:02 GMT

Comment It was different a year ago - full of confidence, EMC head honcho Joe Tucci blithely told analysts about a slew of oncoming EMC products with codenames. Among them were Hulk and Maui, hardware and software to produce a new kind of clusterable storage system, a global repository scaling up to multiple petabytes in size. The hint then was that six months should see them come out into the open.

Nearly a year has passed and Hulk hardware languishes as the Infiniflex 10,000, a kind of near-also-ran demo product lacking its true software and with all the marketing push behind it of a George W Bush re-election campaign.

Maui has disappeared from the EMC lexicon, with an internal EMC blogger having revelations about what it could do in video form abruptly pulled from his blog site.

It's not that Hulk and Maui are busted flushes, just that preliminary expectations have been set and then... nothing. Staff such as Chuck Hollis, VP technical alliances, won't talk about Maui, but will discuss in general the need for software to run a global storage repository.

The role he envisages for this is mind-blowing. It gives an insight into possible development difficulties that have sprung from Maui seeming to be not just storage array controller software, but a whole new level of storage infrastructure software that front-ends and manages data access and storage for a network of inter-connected global data storage centres.

What follows is my interpretation of what Hollis and other EMC people have said and written over the past year and the questions raised.

Infrastructure system and clustered object/filer

Maui is a storage facility with data containers spread around the globe storing data that is ingested, protected and moved to provide localised access from wherever you are on the planet.

We have been told that Maui is more than a clustered file system and orders of magnitude bigger than anything else available today in terms of capacity. It is built on commodity system components including clusterable storage arrays with commodity hard drives inside them. These storage units hold objects along with what Hollis calls rich semantics. Neither a file-level approach nor a block-level approach will scale enough in his view, and it has to be object-based.

So we should assume we're talking about billions, even trillions, of objects and their associated metadata, multi-petabytes of storage capacity and millions of users. Costs are a great concern because there so many darn components - tens of thousands of disk drives, for example - that shaving pennies off their price or increasing utilisation by single digit percentages can save millions of dollars.

We're talking here about creating a Google-class infrastructure from scratch. Not even Google did that and it's taken Sergey Brin's boys years to build out what we see today. It is the UK's National Health IT system but on a global scale with every aspect of it multiplied a million times - I'm guessing but Hollis has used the term 'uber-massive' - and built by one company.

This data is accessed from virtually any kind of internet client device, such as smart phones, netbooks, notebooks, games consoles, desktops and servers. Other mentioned devices are set-top boxes, mobile iTunes devices, RFID-like sensor devices sending in data, VOIP phones, security cameras and satellites. It is universal access. Let your imagination run riot.

The networking infrastructure over which all this runs has to be carrier class, simply 'there' like the phone (landline, not patchy mobile) or electricity.

The accessing devices don't use Maui software themselves but they access objects or files, or their applications access data, that is stored on the Maui infrastructure.

How do they know it is stored on a Maui-infrastructure? How are the Maui repository's contents made known to these devices? How are the Maui contents in each node made known to the others? How are object imports and deletions handled, indices updated, and their space provisioned/reclaimed? How are object security levels created, maintained and altered?

What is Maui?

This is what Maui could be - Maui will be a clustered storage node. A file (object) system with, literally, a global name space; a object ingest and classification system. Maui will be an object placement, management, location and protection system; a search process; a real time data mover across global distances. It will be a global repository content representation system with multi-mode client access and ingest request and completion support; a self-tuning, self-healing and self-correcting management system reacting automatically to surges in demand and immune to any single failure scenario.

Maui will be an object access and tracking system with the ability to automatically move content to access hotspots and load-balance between storage nodes in a cluster and data centre nodes; it will be one of the world's most complex business continuity and disaster recovery systems. It will be running inside a global data centre infrastructure comprising physical and virtual data centres, servers, networking and storage of a complexity that hasn't been built before, and will need this virtualisation at multiple levels to provide the provisioning and scaling flexibility needed.

If the paragraph you have just read is Maui then EMC might consider outsourcing it to Google. It might be delivered quicker.

Is Maui the infrastructure or the clustered (object) filer O/S? A clustered object/filer is deliverable in everyday product terms. A global and self-healing/managing/tuning/correcting object repository infrastructure isn't. That's big - really big.

If Maui is the clustered object (filer) O/S then it needs an infrastructure to do the global stuff. What are the boundaries between that infrastructure and Maui? The more that Maui has to do the longer it is going to take to produce it. The less it has to do the longer it will take to define, devise and implement the infrastructure.

It's possible that there are two projects inside EMC - the Maui clustered storage software and the linked data centre infrastructure behind it.

Maui hasn't appeared yet, 12 months after being first revealed. If it's intertwined with the infrastructure component then that's no surprise at all, is it? It isn't going to appear just like that because nothing like that has ever been built before. This whole thing - Maui plus infrastructure - could be one of the most complex coding projects on the planet. Watching EMC's internal development engineering bandwidth cope with this will be like watching a python swallow a moose - a whole moose herd, in fact.

What might happen is that the clustered object/filer O/S part s delivered first with the marketing spin presenting Maui as a clustered storage building block which can be used by customers directly, but which will also be used by EMC as it builds out its own global repository infrastructure.

In this way Maui might be presented as a clustered object/file system with terrific scalability, positioned as an HP ExDS9100 beater, an Isilon buster, a kick in the IBM Scale Out File System (SOFS) butt.

But if it is only this, with no global repository infrastructure behind it, then EMC will have undershot terribly the expectations it has set. Without the infrastructure Maui will be in danger of becoming another Invista (EMC's SAN director-hosted storage virtualisation and management software): a worthy idea oversold and under-delivered. ®