HP doubles down with dedupe speed record
Boosted HP no longer needs Sepaton
HP reckons it can claim the dedupe speed king crown, ingesting at 100TB/hour and spitting it our at 40TB/hour, faster by far than the dedupe dominator, Data Domain.
The enhanced dedupe performance was announced at the HP Discover event in Las Vegas on Monday. HP gets to such giddy heights by combining its StoreOnce Catalyst software with its high-end disk-to-disk B6200 backup array, announced in Vienna in November 2011.
HP says it "offers the industry’s highest-performing backup-and-restore throughput when combined with the enhanced HP StoreOnce B6200 Backup. Clients can recover in a single day what would take a full workweek with the competition."
The B6200 is a virtual tape library or straight backup to disk target, with from 48TB to 768TB of raw capacity, 32TB to 512TB usable. It features 2TB SAS disk drives and has a scale-out cluster architecture supporting up to eight nodes and carries both Ethernet and Fibre Channel host connectivity.
HP claims it "delivers the industry's only large-scale deduplication appliance with fully automated high-availability features." It has an autonomic restart feature to ensure that backup jobs complete if there is a major hardware failure; a reassuring capability.
EMC recently announced its biggest and fastest deduping array, the DD990 holding up to 65PB of logical data and deduping at up to 31TB/hour, with Boost, 15TB/hour un-Boosteds comfortably eclipsing the previous high-end system, the DD 890, which deduped at 14TB/hour.
HP has sold Sepaton's S2100 ES2 big deduping array to customers with the most demanding dedupe needs, and the ES2 is a clusterable system with up to eight nodes and a 43.2TB/hour ingest rate.
Now, with the Catalyst-accelerated B6200 HP overtakes that, ingesting data more than twice as fast. We understand Sepaton might have an announcement of its own soon, but unless it dedupes significantly faster than 100TB/hour it looks as if its HP selling relationship is effectively over. What need is there for an 8-node Sepaton system when an 8-node Store Once system goes faster?
Boost involves the backup data sending system, the Backup Exec media server for example, doing some of the dedupe work and sending less data to the Data Domain box.
What does Catalyst do? It "allows clients to deduplicate data on application servers or backup servers before it is transferred to a centralised HP StoreOnce Backup system." Its effectively Boost.
Customers can use Data Protector 7 Software, Symantec's NetBackup or Backup Exec to manage deduplication and data movement in a Catalyst environment. Independent software vendors (ISVs) can achieve the same level of control with a StoreOnce Catalyst software development kit.
Data Protector 7
HP has also revved its Data Protector software with an injection of Autonomy’s IDOL (Intelligent Data Operating Layer). The claim is that users can “precisely protect, find and recover information based on the meaning and concepts contained within the data.”
Put another way it has “governance tools that enable contextual backup and recovery of information.” HP says it is the industry’s first universal information-protection product for enterprises, as opposed to data protection.
It claims users can retrieve all information relevant to a particular idea or topic, regardless of keywords or other search parameters. ”All information” is a big, a huge claim. HP is saying that you can use IDOL to selectively backup and retrieve data based on information meaning, if you wish, and that this goes beyond a process driven by key words.
This is, El Reg thinks, unique and a first. How useful will it be? Will it enable customers to find things in their backup data that they couldn’t have found before, or found much more slowly? Understanding that will require trial use we would think.
The Data Protector 7 product also “combines cloud-based backup to the world’s largest private cloud with on-premise physical and virtual information protection.”* You can stick the backed-up data in a private cloud and not on a B6200 Backup array. Whether public cloud storage backends will be supported is a moot point.
HP has certainly seized the dedupe speed record and, with the IDOL software, has technology that EMC and other backup vendors have not. That’ll please its sales reps who will probably be very keen to let customers try this out as they work to repel the pirate EMC Data Domain boarders from HP accounts.
Data Protector 7 Software is available now directly from HP or its channel partners. The loud-based backup option is offered via a separate monthly subscription. StoreOnce Catalyst software is available immediately with a starting US list price of $37,500. It's integrated with HP Data Protector 7 and Symantec's NetBackup, with Backup Exec integration coming in August. ®
* The world's largest private cloud? Could that be HP"s own cloud?
Re: Dedupe the answer to the wrong problem?
I don't think you understand what dedupe does - In a backup scenario lots of the data you backup is rightly duplicated - file headers for files of the same type, executables which appear on more than one server, emails which contain forwarded texts. etc. You aren't going to be able to prevent the duplication of these at source and neither should you be able to.
Dedupe the answer to the wrong problem?
Where are the billions spent on not pointlessly replicating data to start with?
Have we given up on data normalisation and the correct use of caching?
I know, not a real-world question, but I get antsy when someone says, "buy a server" and then "buy this to fix the problems from incorrect use of the server."
So BOOST and Catalyst are effectively the same in theory. But at what dedupe levels did HP use to calculate 100TB/hr?
Symantec does the same thing with their client-side dedupe, quoting that throughput assuming a 10-15X level of compression.
DataDomain on the other hand is not client direct (well in small cases it is) how about vendors just quote pure RAW ingest numbers?