Feeds

LexisNexis open sources Hadoop challenger

Behold! Thor and Roxie

Gartner critical capabilities for enterprise endpoint backup

A super-computer architecture that crunches big data for banks, police, and spooks will soon be open sourced as a super-fast alternative to the Googlesque Hadoop.

LexisNexis Risk Solutions is opening up its High Performance Computing Cluster (HPCC), a system written in C++ that it claims is four-times faster than Hadoop when running data-intensive queries on ordinary Linux servers.

LexisNexis will release a virtual machine for testing, full binaries, and the source code in the next few weeks, the company announced Wednesday.

The company has not yet announced which open-source license it will use, but it will be under a copy-left the company said, permitting for derivations and improvements bearing the HPCC name.

LexisNexis is in talks with Amazon to make HPCC available on the etailer's cloud while also planning to offer its own cloud to customers.

The company – better known for its media database and medical data services – will offer the HPCC code in two flavors: a free Community Edition that comes with the free platform software, and an Enterprise Edition with support and access to "more advanced" modules and features.

HPCC uses LexisNexis' own data-centric declarative programming language, known as ECL. Developed 10-years ago, it compiles to C++. HPCC includes two data-crunching platforms: the Thor Data Refinery Cluster and the Roxie Rapid Data Delivery Cluster.

LexixNexis senior vice president and chief technology officer Armando Escalante says Thor is analogous to Hadoop, while Roxie is the component that Hadoop is currently missing. Since it's written in C++, he says, the system is also faster than Hadoop, which is written in Java.

"We been 10 years perfecting it," Escalante said, "and we tweaked it up the wazoo to get all the performance we can. We can add more use cases and make it better."

"We are four faster than Hadoop on the Thor side. If Hadoop needs 1,000 nodes we can do it with 250 – that means less cooling and data center space."

According to Escalante, HPCC is also more tightly coupled than Hadoop, further boosting performance of complex queries. Nodes talk to each other individually and via Thor, using one master switch that supports 1500 Ethernet ports without blocking - opening up the full bandwidth available to large data packets.

This means there's no single choke point for data, Escalante claimed. The architecture can run queries in memory, on disc, or concurrently on both for fast speeds. Queries that are written in ECL and compile to C++ can be front-ended with JSON and SOAP.

"Hadoop has more of a concept of the racks. It's a little more loosely coupled and you need lots of nodes. When we first saw Hadoop we liked it but you lose a lot of performance because the nodes are connected to distributed switches that then connect to a central switch. We looked at that and said: 'That's lot of congestion'."

When LexisNexis offers its own service, Escalante said, his company will target ordinary business customers – not the kind of super data users have been LexisNexis Risk Solutions customers until now. LexisNexis has built, delivered, and supported Thor and Roxie systems for a telcos who check on customers' credit history to see what service plan they can afford and for law enforcement officers trying to track down a criminal's network of assets as part of an investigation. It also works with the investigation units of insurance giants that are investigating customer's claims. The Thor and Roxy part of the risk business is worth $10m a year. ®

Secure remote control for conventional and virtual desktops

More from The Register

next story
Why has the web gone to hell? Market chaos and HUMAN NATURE
Tim Berners-Lee isn't happy, but we should be
Microsoft boots 1,500 dodgy apps from the Windows Store
DEVELOPERS! DEVELOPERS! DEVELOPERS! Naughty, misleading developers!
Mozilla's 'Tiles' ads debut in new Firefox nightlies
You can try turning them off and on again
'Stop dissing Google or quit': OK, I quit, says Code Club co-founder
And now a message from our sponsors: 'STFU or else'
Apple promises to lift Curse of the Drained iPhone 5 Battery
Have you tried turning it off and...? Never mind, here's a replacement
Uber, Lyft and cutting corners: The true face of the Sharing Economy
Casual labour and tired ideas = not really web-tastic
Linux turns 23 and Linus Torvalds celebrates as only he can
No, not with swearing, but by controlling the release cycle
prev story

Whitepapers

Gartner critical capabilities for enterprise endpoint backup
Learn why inSync received the highest overall rating from Druva and is the top choice for the mobile workforce.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Rethinking backup and recovery in the modern data center
Combining intelligence, operational analytics, and automation to enable efficient, data-driven IT organizations using the HP ABR approach.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Next gen security for virtualised datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.