Feeds

LexisNexis open sources Hadoop challenger

Behold! Thor and Roxie

Intelligent flash storage arrays

A super-computer architecture that crunches big data for banks, police, and spooks will soon be open sourced as a super-fast alternative to the Googlesque Hadoop.

LexisNexis Risk Solutions is opening up its High Performance Computing Cluster (HPCC), a system written in C++ that it claims is four-times faster than Hadoop when running data-intensive queries on ordinary Linux servers.

LexisNexis will release a virtual machine for testing, full binaries, and the source code in the next few weeks, the company announced Wednesday.

The company has not yet announced which open-source license it will use, but it will be under a copy-left the company said, permitting for derivations and improvements bearing the HPCC name.

LexisNexis is in talks with Amazon to make HPCC available on the etailer's cloud while also planning to offer its own cloud to customers.

The company – better known for its media database and medical data services – will offer the HPCC code in two flavors: a free Community Edition that comes with the free platform software, and an Enterprise Edition with support and access to "more advanced" modules and features.

HPCC uses LexisNexis' own data-centric declarative programming language, known as ECL. Developed 10-years ago, it compiles to C++. HPCC includes two data-crunching platforms: the Thor Data Refinery Cluster and the Roxie Rapid Data Delivery Cluster.

LexixNexis senior vice president and chief technology officer Armando Escalante says Thor is analogous to Hadoop, while Roxie is the component that Hadoop is currently missing. Since it's written in C++, he says, the system is also faster than Hadoop, which is written in Java.

"We been 10 years perfecting it," Escalante said, "and we tweaked it up the wazoo to get all the performance we can. We can add more use cases and make it better."

"We are four faster than Hadoop on the Thor side. If Hadoop needs 1,000 nodes we can do it with 250 – that means less cooling and data center space."

According to Escalante, HPCC is also more tightly coupled than Hadoop, further boosting performance of complex queries. Nodes talk to each other individually and via Thor, using one master switch that supports 1500 Ethernet ports without blocking - opening up the full bandwidth available to large data packets.

This means there's no single choke point for data, Escalante claimed. The architecture can run queries in memory, on disc, or concurrently on both for fast speeds. Queries that are written in ECL and compile to C++ can be front-ended with JSON and SOAP.

"Hadoop has more of a concept of the racks. It's a little more loosely coupled and you need lots of nodes. When we first saw Hadoop we liked it but you lose a lot of performance because the nodes are connected to distributed switches that then connect to a central switch. We looked at that and said: 'That's lot of congestion'."

When LexisNexis offers its own service, Escalante said, his company will target ordinary business customers – not the kind of super data users have been LexisNexis Risk Solutions customers until now. LexisNexis has built, delivered, and supported Thor and Roxie systems for a telcos who check on customers' credit history to see what service plan they can afford and for law enforcement officers trying to track down a criminal's network of assets as part of an investigation. It also works with the investigation units of insurance giants that are investigating customer's claims. The Thor and Roxy part of the risk business is worth $10m a year. ®

Providing a secure and efficient Helpdesk

More from The Register

next story
Google+ goes TITSUP. But WHO knew? How long? Anyone ... Hello ...
Wobbly Gmail, Contacts, Calendar on the other hand ...
UNIX greybeards threaten Debian fork over systemd plan
'Veteran Unix Admins' fear desktop emphasis is betraying open source
Preview redux: Microsoft ships new Windows 10 build with 7,000 changes
Latest bleeding-edge bits borrow Action Center from Windows Phone
Microsoft promises Windows 10 will mean two-factor auth for all
Sneak peek at security features Redmond's baking into new OS
Netscape Navigator - the browser that started it all - turns 20
It was 20 years ago today, Marc Andreeesen taught the band to play
DEATH by PowerPoint: Microsoft warns of 0-day attack hidden in slides
Might put out patch in update, might chuck it out sooner
Redmond top man Satya Nadella: 'Microsoft LOVES Linux'
Open-source 'love' fairly runneth over at cloud event
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
Three 1TB solid state scorchers up for grabs
Big SSDs can be expensive but think big and think free because you could be the lucky winner of one of three 1TB Samsung SSD 840 EVO drives that we’re giving away worth over £300 apiece.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.