Dell and the dedupe appliance conundrum
What's going on?
Comment Dell is announcing a new deduplication appliance on Monday, the DL2000, while simultaneously saying dedupe will move on from a backup and appliance focus to something broader and more pervasive. What's going on?
Let's essay an attempt to join up some dots in this release with previous Dell statements - and with EMC's statements and activities - and see where it leads us.
The DL2000 release said: "Dell’s deduplication strategy is to foster and encourage the rapid evolution of dedupe technology into a storage environment where the functionality exists everywhere. As deduplication matures quickly, it will move beyond backup storage – where it primarily resides today - to other data types including near-primary, archive, file, and object storage solutions."
The DL2000 is an integrated deduping backup appliance and so represents, we might say, first generation deduplication. The second generation will encompass, in Dell's view, deduplication of near-primary data, archive data, file, and object storage.
Considering that Dell storage, apart from the EqualLogic PS6000 products, comes from EMC, we can note that near-primary storage refers to general, unstructured information stored on primary or tier 1 (Fibre Channel) storage, meaning Clariion CX in Dell/EMC terms. We can also note that the general EMC meaning of file storage is a filer, network-attached storage (NAS). Dell sells EMC's Celerra NX4 NAS product. The general meaning of object storage in EMC terms is Centera, which Dell does not sell.
The implication here is that Dell will provide deduplication for Clariion CX near-primary storage and Celerra NX4 filer storage. But here's a question. How can Dell provide object storage deduplication when it doesn't supply an object storage product? Is there a hint here that Dell is going to take and sell EMC's Centera product?
Let's return to the DL2000 release for a moment and find another telling statement: "Dell believes that incorporating deduplication functionality into ISV, application and storage software can provide significant benefits to customers."
What does this mean? To me it says that Dell will supply deduplication software technology that can be incorporated into ISV software, into application software and into storage software. This seems to be another aspect of second generation deduplication.
Follow the implication here and imagine a piece of application software and its stored data, either on directly-attached storage (DAS) or on a storage array. The application has code that deduplicates its stored data. Okay, where is that done? On the server running the application and storing data on DAS, or on the storage array, if it uses one, using its controller CPU cycles?
We have specific deduplication products now because dedupe is CPU and disk-intensive, and running it on general servers could cripple them. Okay, but Nehalem servers are coming and server virtualisation is here already so couldn't we find the CPU cycles to do that on the servers?
Alternatively, we could parse "storage software" to mean storage array controller software and have the dedupe execute on the array, using spare CPU cycles - they're tending to be multi-core Xeon controllers now - with the ISV and application software telling the array what to dedupe and where to place the deduped data.
That chimes in with ideas of multi-tiering inside arrays. So far no-one is talking much about multi-tiered DAS but we can suppose that might come. The situation seems easier to understand with networked storage than it is with DAS, since DAS implies application server CPU cycles are used in deduping which seems a poor use of server CPU cycles.