Feeds

LexisNexis open sources Hadoop challenger

Behold! Thor and Roxie

Internet Security Threat Report 2014

A super-computer architecture that crunches big data for banks, police, and spooks will soon be open sourced as a super-fast alternative to the Googlesque Hadoop.

LexisNexis Risk Solutions is opening up its High Performance Computing Cluster (HPCC), a system written in C++ that it claims is four-times faster than Hadoop when running data-intensive queries on ordinary Linux servers.

LexisNexis will release a virtual machine for testing, full binaries, and the source code in the next few weeks, the company announced Wednesday.

The company has not yet announced which open-source license it will use, but it will be under a copy-left the company said, permitting for derivations and improvements bearing the HPCC name.

LexisNexis is in talks with Amazon to make HPCC available on the etailer's cloud while also planning to offer its own cloud to customers.

The company – better known for its media database and medical data services – will offer the HPCC code in two flavors: a free Community Edition that comes with the free platform software, and an Enterprise Edition with support and access to "more advanced" modules and features.

HPCC uses LexisNexis' own data-centric declarative programming language, known as ECL. Developed 10-years ago, it compiles to C++. HPCC includes two data-crunching platforms: the Thor Data Refinery Cluster and the Roxie Rapid Data Delivery Cluster.

LexixNexis senior vice president and chief technology officer Armando Escalante says Thor is analogous to Hadoop, while Roxie is the component that Hadoop is currently missing. Since it's written in C++, he says, the system is also faster than Hadoop, which is written in Java.

"We been 10 years perfecting it," Escalante said, "and we tweaked it up the wazoo to get all the performance we can. We can add more use cases and make it better."

"We are four faster than Hadoop on the Thor side. If Hadoop needs 1,000 nodes we can do it with 250 – that means less cooling and data center space."

According to Escalante, HPCC is also more tightly coupled than Hadoop, further boosting performance of complex queries. Nodes talk to each other individually and via Thor, using one master switch that supports 1500 Ethernet ports without blocking - opening up the full bandwidth available to large data packets.

This means there's no single choke point for data, Escalante claimed. The architecture can run queries in memory, on disc, or concurrently on both for fast speeds. Queries that are written in ECL and compile to C++ can be front-ended with JSON and SOAP.

"Hadoop has more of a concept of the racks. It's a little more loosely coupled and you need lots of nodes. When we first saw Hadoop we liked it but you lose a lot of performance because the nodes are connected to distributed switches that then connect to a central switch. We looked at that and said: 'That's lot of congestion'."

When LexisNexis offers its own service, Escalante said, his company will target ordinary business customers – not the kind of super data users have been LexisNexis Risk Solutions customers until now. LexisNexis has built, delivered, and supported Thor and Roxie systems for a telcos who check on customers' credit history to see what service plan they can afford and for law enforcement officers trying to track down a criminal's network of assets as part of an investigation. It also works with the investigation units of insurance giants that are investigating customer's claims. The Thor and Roxy part of the risk business is worth $10m a year. ®

Choosing a cloud hosting partner with confidence

More from The Register

next story
Download alert: Nearly ALL top 100 Android, iOS paid apps hacked
Attack of the Clones? Yeah, but much, much scarier – report
NSA SOURCE CODE LEAK: Information slurp tools to appear online
Now you can run your own intelligence agency
Microsoft: Your Linux Docker containers are now OURS to command
New tool lets admins wrangle Linux apps from Windows
Microsoft adds video offering to Office 365. Oh NOES, you'll need Adobe Flash
Lovely presentations... but not on your Flash-hating mobe
You stupid BRICK! PCs running Avast AV can't handle Windows fixes
Fix issued, fingers pointed, forums in flames
HTML5 vs native: Harry Coder and the mudblood mobile app princes
Developers just want their ideas to generate money
prev story

Whitepapers

10 ways wire data helps conquer IT complexity
IT teams can automatically detect problems across the IT environment, spot data theft, select unique pieces of transaction payloads to send to a data source, and more.
Why CIOs should rethink endpoint data protection in the age of mobility
Assessing trends in data protection, specifically with respect to mobile devices, BYOD, and remote employees.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Mitigating web security risk with SSL certificates
Web-based systems are essential tools for running business processes and delivering services to customers.