Original URL: http://www.theregister.co.uk/2009/06/05/dell_emc_quantum/

Dell and the dedupe appliance conundrum

What's going on?

By Chris Mellor

Posted in Storage, 5th June 2009 20:33 GMT

Comment Dell is announcing a new deduplication appliance on Monday, the DL2000, while simultaneously saying dedupe will move on from a backup and appliance focus to something broader and more pervasive. What's going on?

Let's essay an attempt to join up some dots in this release with previous Dell statements - and with EMC's statements and activities - and see where it leads us.

The DL2000 release said: "Dell’s deduplication strategy is to foster and encourage the rapid evolution of dedupe technology into a storage environment where the functionality exists everywhere. As deduplication matures quickly, it will move beyond backup storage – where it primarily resides today - to other data types including near-primary, archive, file, and object storage solutions."

The DL2000 is an integrated deduping backup appliance and so represents, we might say, first generation deduplication. The second generation will encompass, in Dell's view, deduplication of near-primary data, archive data, file, and object storage.

Considering that Dell storage, apart from the EqualLogic PS6000 products, comes from EMC, we can note that near-primary storage refers to general, unstructured information stored on primary or tier 1 (Fibre Channel) storage, meaning Clariion CX in Dell/EMC terms. We can also note that the general EMC meaning of file storage is a filer, network-attached storage (NAS). Dell sells EMC's Celerra NX4 NAS product. The general meaning of object storage in EMC terms is Centera, which Dell does not sell.

The implication here is that Dell will provide deduplication for Clariion CX near-primary storage and Celerra NX4 filer storage. But here's a question. How can Dell provide object storage deduplication when it doesn't supply an object storage product? Is there a hint here that Dell is going to take and sell EMC's Centera product?

Let's return to the DL2000 release for a moment and find another telling statement: "Dell believes that incorporating deduplication functionality into ISV, application and storage software can provide significant benefits to customers."

What does this mean? To me it says that Dell will supply deduplication software technology that can be incorporated into ISV software, into application software and into storage software. This seems to be another aspect of second generation deduplication.

Follow the implication here and imagine a piece of application software and its stored data, either on directly-attached storage (DAS) or on a storage array. The application has code that deduplicates its stored data. Okay, where is that done? On the server running the application and storing data on DAS, or on the storage array, if it uses one, using its controller CPU cycles?

We have specific deduplication products now because dedupe is CPU and disk-intensive, and running it on general servers could cripple them. Okay, but Nehalem servers are coming and server virtualisation is here already so couldn't we find the CPU cycles to do that on the servers?

Alternatively, we could parse "storage software" to mean storage array controller software and have the dedupe execute on the array, using spare CPU cycles - they're tending to be multi-core Xeon controllers now - with the ISV and application software telling the array what to dedupe and where to place the deduped data.

That chimes in with ideas of multi-tiering inside arrays. So far no-one is talking much about multi-tiered DAS but we can suppose that might come. The situation seems easier to understand with networked storage than it is with DAS, since DAS implies application server CPU cycles are used in deduping which seems a poor use of server CPU cycles.

Dell to ship Centera?

On that basis we can say then that it looks as if Dell may be going to ship Clariion and Celerra products with a deduping capability (in the storage software) and - this is a bit more of a stretch - may be about to ship deduping (possibly doing more dedupe than it does at the moment somehow) Centera object stores. It will also supply dedupe software for inclusion in ISV and application software

Looking beyond the DL2000 release, can we tie these implications to other things Dell and EMC have said?

In November last year, Dell said it would develop a single block-level dedupe and replication architecture spanning its own, Quantum, and EMC storage arrays. It was then going to develop "a single de-duplication architecture across its PowerVault, EqualLogic and Dell/EMC storage arrays. It will be able to replicate de-duped data between these arrays and, in theory, between them and Quantum and EMC storage arrays, across both LAN and WAN."

Is that still true, given that EMC is pursuing a takeover of Data Domain which is seen to overlap with EMC's own DL 3D backup deduplication products? It is, because Paul Kaeley, a Global Practice Executive from Dell's consulting services, said today that Dell has a relationship with both Quantum and EMC and will be releasing a Quantum-based deduplication product from EMC during the 2nd half of the year.

This relationship and strategy has continued since November last year and so predates the very recent Data Domain takeover initiative by EMC.

We can categorise both the DL 3D and the Data Domain products as point deduplication products focussed on backup data, in the same generic class as the DL2000, although that is primarily for small and medium businesses (SMB) and is more affordable.

The implication here is that EMC is developing second generation deduplication products, based on its licensed Quantum DXi technology which will enable Clariion, Celerra, and Centera (maybe) storage arrays to deduplicate near-primary, archive, file, and object storage data, Centera more than it does at present if that's feasible. EMC and Dell will still offer point product backup deduplication products though, DL 3D and maybe Data Domain from EMC, and DL2000 and whatever follow-on products might come along - DL3000 anybody? - from Dell.

Are there any EMC statements and actions to back this up? For starters this would provide a solid reason for EMC extending a $100m handout loan to Quantum. There has to have been a very good reason for EMC to do that.

At the time, EMC senior vice president Rich Napolitano said: "Quantum will be able to focus more energy on continued innovation and working with EMC to remain front and centre in one of the storage industry's hottest trends." Front and centre seems especially meaningful given the scenario we are are developing in this join-the-dots exercise.

EMC and Dell using same dedupe hymn sheet

In EMC's letter to Data Domain announcing its takeover bid it included these statements:

The combination of Data Domain’s and EMC’s technologies will provide the basis for the next-generation of disk-based back-up and archiving solutions for customers by providing functionally superior and cost-effective alternatives to tape-based information backup.

Next-generation disk-based back-up and archiving solutions represent key enabling technologies for the build-out and customer use of true high-reliability cloud computing infrastructures for both enterprises’ own virtualized data centers (“private” clouds) and third party providers (“public” clouds).

What does next-generation disk-based back-up and archiving solutions mean?

With reference to the Data Domain bid, EMC's chief blogger, Chuck Hollis, EMC's global marketing chief technology officer, has blogged:

...The more you look at data reduction in all of its various forms, you start to realize that it can play just about everywhere in the extended storage stack: backups, archives, file systems, remote replication, remote caching, primary storage, etc. etc. There’s no single “best approach” simply because there are so many places where dedupe can be intelligently used.

This is starting to echo the Dell second generation dedupe ideas. We can argue that Data Domain isn't going to play outside the backup and archive space in EMC's deduplicating world because this wider near-primary, file and object deduplication spectrum is already being worked on by EMC using Quantum technology and Dell is going to sell it. The Kaeley statement and the DL2000 release make that clear.

Hollis continues his blog, saying:

... it follows that having more different flavors of dedupe technology in your arsenal is a good thing – hence EMC’s interest in Data Domain as well as many other forms of compression, single instancing and data deduplication ... one way of thinking of this proposed acquisition is nothing more than EMC building out our data dedupe portfolio -- and associated orchestation layers -- in much the way as we’ve done before.

... EMC believes that data deduplication – in all of its various and sundry forms – is a very big deal to EMC, our customers and the industry as a whole ... EMC is building a very broad portfolio of data deduplication technologies, just as we’ve done in other areas we think are very strategic. It's a feature that will show up everywhere over time.

Showing up everywhere over time chimes with the statements in the Dell DL2000 release. Can we see the same hymn sheet being used here?

Hollis has previously said that dedupe is a feature. In his blog he reconciles this view with the Data Domain bid thus: "Data deduplication is a feature -- it shows up in many places and many forms, including simple-to-use backup appliances. Placing the right data at the right place at the right time (e.g. Networker, SourceOne, Avamar, Documentum, et. al.) is not a feature, it's a product."

What can we deduce by joining up these EMC dedupe dots? First dedupe is extremely important to EMC and can play in backups, archives, file systems, primary storage, etc. This could be Dell talking, using the same hymn sheet again, and would be identical to Dell's dedupe application spectrum if Hollis had included object storage.

Secondly, EMC is building a broad portfolio of dedupe technologies and these will not be mutually exclusive.

All-in-all the joined-up Dell and EMC dots are telling us that EMC will announce deduping Clariion, Celerra, and maybe enhanced deduping Centera, storage products, using Quantum DXi technology, and Dell will adopt and sell them in the second half of this year. That's if my dot-joining skills are working well. They could be working badly though, and I've just drawn a crock of the stuff that sticks to fans.

Still, some kind of Quantum technology-based deduping product or product set is going to come out of EMC later this year and Dell will adopt and sell it. That's now a given. ®