Original URL: https://www.theregister.com/2014/07/01/wandisco_isnt_a_disco_but_it_sure_makes_data_dance/

WANdisco plunges into the Hadoop foam party, shakes its replication booty

'This 6ft-4 Indian guy in hotpants is a friggin GENIUS'

By Chris Mellor

Posted in Storage, 1st July 2014 10:53 GMT

WANdisco is bringing continuous wide-area availability to Hadoop big data users.

The joint UK and US-based WANdisco (Wide Area Network DIStributed COmputing) sells software producing active-active data centres at WAN distances, which it says is suited to the Hadoop Big Data world. Its technology uses patented algorithms invented by co-founder Dr Yeturu Aahlad, an ex-Sun distributed systems architect and IBM researcher.

Richards described meeting Aahlad in an Elite Business interview:

"I was introduced to Dr Yeturu Aahlad at this cocktail party. He’s this 6ft 4 Indian guy, with inordinately long legs, who was wearing what I rudely described as hot pants and a t-shirt from a trade show. Nobody else was dressed like that in the room ... of course, he’s a genius.”

Aahlad's patented software replicates data between data centres in real time, active-active replication. This provides good server failover between different data centres, as the servers involved are mirrors of each other, providing continuous availability - as WANdisco describes it. It also makes multi-location collaboration better.

When a new server/server set joins in the replication scheme there is initial load of the new system, with metadata first and then data - unless you Fedex disks across to/from the new centre. Once the metadata has been loaded then the actual data follows. When a data request for unloaded data occurs, that will bring that data to the head of the transmit queue.

This software can use WAN optimisers like Riverbed's Steelhead to drive the links it uses even faster.

The server mirroring provides continuous backup across sites for non-stop data recovery. HP's purchase of this product to help in then-CIO Randy Mott's data centre consolidation exercise, with 83 data centres becoming six, gave WANdisco some credibility and confidence in its tech.

The customer list includes Apple, Barclays, Cisco, Disney, and Intel alongside HP, which is WANdisco's largest client with 50,000 users throughout the company.

Richards took WANdisco public in 2012, doing so on AIM at the London Stock Exchange, eschewing US alternatives. The float raised £15 million with the company valued at £37 million; seemingly small potatoes in these days of multi-billion dollar valuations. WANdisco is not a typical Silicon Valley, venture-funded startup.

History. In a pot

Here's a potted history table:

Dave Richards

WANdisco co-founder and CEO Dave Richards

WANdisco saw an opportunity for its tech in the big data area and used some of the funds from the listing in buying AltoStor, based in Silicon Valley. That was a two-man business; Jagane Sundar and Konstantin Shvachko, two engineers closely involved with Hadoop, which WANdisco says they helped create as part of a 12-person team at Yahoo.

Shvachko implemented Hadoop at eBay where he was the Principal Big Data Architect. Sundar joined WANdisco as chief technology officer and vice president of engineering for Big Data and Shvachko as the chief architect of Big Data, with Shvachko saying:

"When we first spoke with WANdisco, we recognised immediately that WANdisco's patented replication technology, combined with AltoStor's products and knowledge, could create a compelling product offering that virtually every enterprise looking to deploy Hadoop could utilise."

WANdisco applied its so-called Non-Stop technology to Hadoop’s NameNode, and then to availability and performance bottlenecks in HBase’s architecture, its Region and Master Servers, to counter the risk of downtime and data loss. HBase is used for real-time interactive applications built on Hadoop.

AltoStor's software operates above Cloudian and/or Hortonworks. The idea is that Hadoop big data processing nodes can use WANdisco's ALM technology to provide continuous availability across data centres and so prevent failure in one data centre stopping operations until it is fixed.

The Non-Stop Hadoop technology supports Hortonworks and is now certified to run on Cloudera 5. Tim Stevens, Cloudera's veep for business and corporate development, cannily said:

"Cloudera 5 together with WANdisco’s Non-Stop Hadoop technology enables us to deliver our full suite of real-time data analytics and data management applications for global multi-data centre deployments.”

WANdisco also has an Application Lifecycle Management (ALM) product, including Apache Subversion and Git. Products like its SVN MultiSite Plus allows distributed development teams to work as if they are in the same location with continuous availability across the enterprise.

Branko Čibej is WANdisco's Director of Subversion, one of the most amusing corporate titles we have ever come across.

WANdisco looks like a great niche company with technology that will become the more popular as real time Hadoop-based big data analytics becomes more widely-used in enterprises. It is now a public company, post-IPO in other words, ands joins Neverfail and Vision Solutions' Double-Take in the server continuity business.

It looks to have stolen a march of both of these with this early hop aboard the speeding Hadoop bandwagon. ®