NEC mentions it has a high-end dedupe disk backup box, stretches, yawns for 40 seconds
We don't like to brag or anything, mumbles IT giant
Analysis NEC has the biggest, baddest scale-out deduping backup-to-disk array on the planet and we virtually never hear about it. NEC is not a top-six purpose-built backup supplier, according to IDC, being neither a unit ship or revenue leader.
Yet there are 1,100 customers and 1,800 installations, around three exabytes under management, and still the world passes by.
It scales from one to 165 nodes and has global deduplication, something EMC's Data Domain, the backup-to-disk market leader, has never managed to do, and the world goes ho hum.
Why is it so quiet, reticent almost, about its technology?
The company is a fairly classic huge Japanese conglomerate, making and selling a wide variety of technology gear into global markets. It is 130 years old and there are some 99,000 employees, more than 64,000 patents, nine R&D labs around the globe, and one of those serious but slightly empty-sounding (to our Brit ears) mission strap lines: Orchestrating a brighter world.
NEC says that its technical innovation "allows us to enable people to live brighter lives."
It has a $6.216bn telecom unit, a $6.85bn public sector business, a $6.07bn system platform business and a smaller enterprise business, just $2.25bn.
Unlike Hitachi, NEC has not set up a US-based HDS-style subsidiary with American management and culture. It's set one up with Japanese management, NEC Corporation of America or NECAM.
A press visit to its Silicon Valley facility had presentations from Hide Senta, VP IT platform operations; Nobu Morita, product manager; and Hiroaki Mizumachi, executive principal engineer and HYDRAstor CTO, which gives a sense of its culture. There were also US and European staff on hand.
To use that hoary old cliche, NEC's greatest strength is its Japanese management and culture, and its greatest weakness is ... its Japanese management and culture, with its quiet, measured and understated approach.
Among NEC's multitude of "Smart Enterprise" products are M Series SAN arrays including disk and all-flash products, WB Series Fiber Channel switches, and the HS Series backup and archival storage products, sold as HYDRAstor.
The product started as a research project in 2002, with a beta test period in 2006 and HYDRA gen-one launched in 2007. This was followed by regular development: HYDRA2 in 2008, a MiniHYDRA (HS3) in 2010, HYDRA3 in 2011, HYDRA4 in 2013, and gen 5 in 2014 and an archive-specific HS6 the same year. This had a 46x performance boost through deduplicated transfer.
The product has inline global dedupe, online node expansion, protection through NEC's own erasure coding, WAN-optimized async replication with compression, and multi-generation node support. It was originally designed for the high-performance computing market and features parallel ingest for speed.
Front-end access protocols include NFS, CIFS, OST, UEI and REST.
NEC has just launched a software-only version – the HYDRAstor Virtual Appliance (VA). It can be deployed in either vSphere or Hyper-V environments and supports from 1 to 16TB. It's envisaged that remote and branch offices could use it, with a disaster recovery link back to a central site. The list price starts at $2,000 and includes all software, with replication, encryption, WORM, and deduplication.
The system has two kinds of nodes – hybrid accelerator nodes (x86 servers) and storage nodes – providing independent scaling of performance and capacity. The storage nodes use 6TB SATA drives currently. The system uses object storage with NEC's own erasure coding for data protection. Deduplication applies to all data on all nodes. There is a distributed hash table and both deduplication and hash table processing scale linearly as nodes are added.
Let's see what happens when data comes into the array.
First of all it is deduplicated, using variable block sizes, with NEC claiming HYDRAstor has the fastest-level write speed for a single controller at 63TB/hr and is 25 times faster as a system than any other product with a 5.2PB/hr speed.
Erasure coding principles
Next it is erasure coded using Cauchy-based Reed-Solomon codes. That is, it is split into fragments and mathematically processed to produce a number of extra and redundant fragments equal to the number of drive failures you need to protect against. If incoming data is split into 10 fragments with 6 added protection fragments, and the total of 16 fragments written to 16 separate drives or nodes, then 6 drive failures can take place with all the data recoverable.
The mathematical processing is known as forward error correction and typically based on Reed-Solomon encoding. It has less data overhead than a RAID scheme protecting against an equivalent number of drive failures. The two parameters of interest are the CPU burden involved in computing erasure codes and recovering data, and the overhead burden in terms of extra disk capacity to store the added fragments to the original data.
NEC says the HYDRAstor scheme – Distributed Resilient Data – is more efficient than other erasure coding schemes. The recovery time from drive/node failures is, it claims, 10-50 times faster than RAID. The default setting is to protect against three drive failures, with a 25 per cent capacity overhead, but it can be dialled up to six or down to one. You can have different resiliency levels for different applications. At level three (9 data fragments and 3 parity fragments) NEC says you get 1.5x greater protection than RAID 6 with lower overhead and faster recovery.
The technology involves incoming data being distributed in a subspace concept, and then written to log-type buckets on disk. A node can have more than one subspace. (Explore the technology mire by checking out Patent US 8090792 B2.)
Large system customers
How good is HYDRAstor? A South African bank, not named by NECAM but thought to be FNB, replaced 12 x DD890 and 6 x DD990 Data Domain systems and 6 x 5330 NetBackup Appliances with a HYDRAstore implementation.
US-based Global Payments changed from LTO tape to 18 HYDRAstore nodes in 2010. It now has 108 nodes across several data centers.
Our impression is that NECAM gets some of its best wins when existing Data Domain and other backup to disk systems can't cope with customer data growth. Further, we can't see any other product that scales so high with this functionality. Hitachi Data Systems bought Sepaton, which was a competitor to HYDRAstor in 2014. It's now called the Hitachi Protection Platform and majors on deduplication and replication features, with RAID 6 [PDF] but not erasure coding.
In a white paper [PDF], HDS claims it is "the most powerful and flexible data protection platform in the industry," with a grid-scalable architecture delivering "unsurpassed scalability of both performance and capacity."
If you are in the market for a high-end, disk-based backup and archive scale-out system then it seems HYDRAstor or the HDS system are the two main choices, and erasure coding will send you NEC's way. ®