Riverbed claims it will de-dupe primary storage
Dam that data flood
"Hey you storage vendors, listen up. We're gonna tell you how to deduplicate primary data." That's the message given out at Riverbed's Vision Day for financial analysts on Monday.
The startling premise of Riverbed's in-development Atlas appliances is that you can strip out up to 90 per cent of the data stored in data centres by using enhanced WAN acceleration appliances to find and deduplicate data flowing between servers and storage arrays. That would be up to 95 per cent of raw backup data, Riverbed says, because it is chock full of redundant information. Broadly speaking, storage capacity goes up ten times, twenty times in the case of backup data.
What the storage array vendors say is impossible, the deduplication of primary storage array data, is what Riverbed says is practicable. It's basing this on its Steelhead appliance, which deduplicates data sent across the WAN as a way of speeding communications to branch and remote offices. This is such that they have less local IT infrastructure with the central data centre having a consolidated and thus hopefully more efficient and cost-effective set-up.
The idea is, Eric Wolford, Riverbed's marketing and business development SVP, said: "With the Atlas appliance, we are doing for data at rest what we have always done for data in motion."
Riverbed's Atlas Appliance will sit along with Steelhead appliances in front of storage arrays. The Steelhead breaks up data coming to it into byte-level patterns. The Atlas maintains an index of master data patterns and will only send new data patterns on to the arrays. So, as servers send data to the arrays it is inspected by the in-band Steelhead & Atlas appliance combination, deduplicated, and sent on - with up to 90 per cent of it removed and replaced by pointers - to the storage array where it rests.
In an added twist, Atlas can be used to inspect an array's contents and deduplicate it, reclaiming redundant capacity. Initially Alas will support Windows servers and unstructured/semi-structured data with Unix servers and structured data coming along later. The first Atlas appliance should be announced next year and will come in a redundant cluster configuration for high availability.
Riverbed is certainly thinking big, with Wolford saying. "When IT infrastructure is overloaded with redundant data, there are efficiency and cost impacts across the organization. Our vision is to eliminate these inefficiencies through removing redundant data at every point between the data center and the end user. "
It's a bold idea. Riverbed isn't saying - yet - which server-storage interfaces will be supported. We might presume that the idea is to embrace all standard storage protocol comms lines: Fibre Channel; Ethernet, and all the main protocols: block-level SAN and NAS interfaces such as CIFS and NFS.
We might presume wrong. Riverbed's statement did say: "The Atlas appliance is designed to help scale existing file storage by enabling customers' existing file servers to serve more users and deliver a larger amount of data per device." Ah, files and file servers. Not quite "removing redundant data at every point between the data center and the end user".
Never mind, these are just details. Yesterday was the big picture day with big picture benefits: Lower costs; enhanced user experience; improved manageability; scalability; greater productivity; and enhanced Riverbed revenues and profits, the presentation being to financial analysts.
Riverbed, by the way, is in a legal dispute with Quantum over its deduplication technology which, Quantum claims, infringes its patents. The stakes just got higher.
Another note: NetApp has Storage Acceleration Appliances sitting in front of its arrays now and it has its ASIS deduplication technology. Perhaps NetApp could offer the same functionality as Atlas to its customers? It is already saying that deduplication applies to much more than backup data. ®
NetApp way ahead
Give me a scenario where it doesn't work.
If you want to use industry standard backup tools for something like a database against a LUN rather than an NDMP dump :-)
Ignorance is bliss
The main point here is that the author as well as Riverbed and the quoted SVP have not done their homework. NetApp supports deduplication on primary as well as backup and archive, and has done so for a very long time. Recently NetApp was recognized by Dave Russell (Research VP at Gartner) as the market leader in data deduplication, bar none. Data ONTAP provides this capability at the volume level (turned on, turned off, scheduled) for free. Through using the NetApp V-Series, non-NetApp arrays can now be deduplicated in the primary environment as well (as pointed out by others above).
So what is the news here?
Jim S. (yes I work at NetApp)
NetApp way ahead
>"A couple of points, firstly you must buy into the Netapp filesystem, essentially making your array a slave to Netapp's NAS implementation."
Yes, of course... and the vendors who only perform the disk functions whine about this because they want to control the WHOLE environment and resist any open storage platforms. It's a weak argument if the configuration meets or exceeds the end-user expectations.
>"Just because Netapp are pitching this as a supported solution, I don't see any of the other vendors clamouring to sign joint support agreements around this. Who carries the can if it all goes tits up."
Yes, of course... why would EMC certify NetApp as a gateway when EMC wants to own your whole environment and make you buy just EMC? In an open environment, the end-user has to exercise some authority over the role each vendor's piece plays in their environment. If EMC (or any vendor) balks at just being disk to a NetAp gateway, the end-user needs to walk away from EMC and work with a vendor that will cooperate... and trust me... EMC will cooperate if they think they are out of the solution.
>"This only works for certain usage cases"
Give me a scenario where it doesn't work.