Feeds

LexisNexis open sources Hadoop challenger

Behold! Thor and Roxie

Choosing a cloud hosting partner with confidence

A super-computer architecture that crunches big data for banks, police, and spooks will soon be open sourced as a super-fast alternative to the Googlesque Hadoop.

LexisNexis Risk Solutions is opening up its High Performance Computing Cluster (HPCC), a system written in C++ that it claims is four-times faster than Hadoop when running data-intensive queries on ordinary Linux servers.

LexisNexis will release a virtual machine for testing, full binaries, and the source code in the next few weeks, the company announced Wednesday.

The company has not yet announced which open-source license it will use, but it will be under a copy-left the company said, permitting for derivations and improvements bearing the HPCC name.

LexisNexis is in talks with Amazon to make HPCC available on the etailer's cloud while also planning to offer its own cloud to customers.

The company – better known for its media database and medical data services – will offer the HPCC code in two flavors: a free Community Edition that comes with the free platform software, and an Enterprise Edition with support and access to "more advanced" modules and features.

HPCC uses LexisNexis' own data-centric declarative programming language, known as ECL. Developed 10-years ago, it compiles to C++. HPCC includes two data-crunching platforms: the Thor Data Refinery Cluster and the Roxie Rapid Data Delivery Cluster.

LexixNexis senior vice president and chief technology officer Armando Escalante says Thor is analogous to Hadoop, while Roxie is the component that Hadoop is currently missing. Since it's written in C++, he says, the system is also faster than Hadoop, which is written in Java.

"We been 10 years perfecting it," Escalante said, "and we tweaked it up the wazoo to get all the performance we can. We can add more use cases and make it better."

"We are four faster than Hadoop on the Thor side. If Hadoop needs 1,000 nodes we can do it with 250 – that means less cooling and data center space."

According to Escalante, HPCC is also more tightly coupled than Hadoop, further boosting performance of complex queries. Nodes talk to each other individually and via Thor, using one master switch that supports 1500 Ethernet ports without blocking - opening up the full bandwidth available to large data packets.

This means there's no single choke point for data, Escalante claimed. The architecture can run queries in memory, on disc, or concurrently on both for fast speeds. Queries that are written in ECL and compile to C++ can be front-ended with JSON and SOAP.

"Hadoop has more of a concept of the racks. It's a little more loosely coupled and you need lots of nodes. When we first saw Hadoop we liked it but you lose a lot of performance because the nodes are connected to distributed switches that then connect to a central switch. We looked at that and said: 'That's lot of congestion'."

When LexisNexis offers its own service, Escalante said, his company will target ordinary business customers – not the kind of super data users have been LexisNexis Risk Solutions customers until now. LexisNexis has built, delivered, and supported Thor and Roxie systems for a telcos who check on customers' credit history to see what service plan they can afford and for law enforcement officers trying to track down a criminal's network of assets as part of an investigation. It also works with the investigation units of insurance giants that are investigating customer's claims. The Thor and Roxy part of the risk business is worth $10m a year. ®

Providing a secure and efficient Helpdesk

More from The Register

next story
Preview redux: Microsoft ships new Windows 10 build with 7,000 changes
Latest bleeding-edge bits borrow Action Center from Windows Phone
Google opens Inbox – email for people too thick to handle email
Print this article out and give it to someone tech-y if you get stuck
Microsoft promises Windows 10 will mean two-factor auth for all
Sneak peek at security features Redmond's baking into new OS
UNIX greybeards threaten Debian fork over systemd plan
'Veteran Unix Admins' fear desktop emphasis is betraying open source
Entity Framework goes 'code first' as Microsoft pulls visual design tool
Visual Studio database diagramming's out the window
Google+ goes TITSUP. But WHO knew? How long? Anyone ... Hello ...
Wobbly Gmail, Contacts, Calendar on the other hand ...
DEATH by PowerPoint: Microsoft warns of 0-day attack hidden in slides
Might put out patch in update, might chuck it out sooner
prev story

Whitepapers

Choosing cloud Backup services
Demystify how you can address your data protection needs in your small- to medium-sized business and select the best online backup service to meet your needs.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.