Feeds

Big data elephant mates with RainStor

RainStor Hadoops its storage

Remote control for virtualized desktops

Fast and really fast

The RainStor Hadoop product can avoid big data transfers, and it can run queries against Hadoop data quicker than other approaches; Bantleman says RainStor can provide a 10-100X performance boost for analytics.

He quotes an extreme example of RainStore analytics acceleration with a New York Stock Exchange example, where the analytics task was to calculate the average daily trading price for a single stock for a day. There were 1.5 billion trades on the day in question in November 2011, and they were stored in a Hadoop data store.

A Hadoop MapReduce batch run took four hours while a RainStor MapReduce run looking at all the data took 80 minutes. With the query treated as an ad hoc query the Hadoop MapReduce time was the same: four hours. A RainStor MapReduce run with filtering took two minutes and a RainStor SQL run took eight seconds.

Bantleman provides these figures with a straight face. Apparently, a four-hour Hadoop MapReduce run to find a single stock's NYSE average price for a day, with 1.5 billion trades in around 8,000 files, ran 1,800 times faster using a SQL query against the Hadoop data stored natively in RainStor.

Partition filtering vs brute force

Bantleman said: "We have partition filtering. Most databases have rows and columns and row indices. The RainStor filter tells me what not to read. The query looks at our metadata and asks which partitions contain, for example, IBM. There might be 8 instead of 8,000. Brute force reads everything, taking lots of time; we don't."

When RainStor was forced to read everything in the batch run – all 8,000 partitions – it was still 3 times faster because its data was compressed 25 times, whereas the raw Hadoop data wasn't: "We ran faster because the I/O overhead was massively reduced."

Other goodies in the RainStor Hadoop product include geo-replication and the ability to set retention and expiration times for data. The data can be input under one schema and can cope with schema changes so that it can be viewed through different schema without having to be re-ingested.

Looking ahead, Bantleman believes machine-to-machine messaging will cause a huge increase in the amount of data organisations may have to deal with. He also said he thinks that big data compression and deduplication will be extremely valuable if you need to store big data in flash-based storage memory. This would enable many concurrent high-speed queries of much less big data than the amount you started out with.

RainStor Enterprise Big Data Analytics On Hadoop is available now. ®

Secure remote control for conventional and virtual desktops

More from The Register

next story
Just don't blame Bono! Apple iTunes music sales PLUMMET
Cupertino revenue hit by cheapo downloads, says report
The DRUGSTORES DON'T WORK, CVS makes IT WORSE ... for Apple Pay
Goog Wallet apparently also spurned in NFC lockdown
IBM, backing away from hardware? NEVER!
Don't be so sure, so-surers
Hey - who wants 4.8 TERABYTES almost AS FAST AS MEMORY?
China's Memblaze says they've got it in PCIe. Yow
Microsoft brings the CLOUD that GOES ON FOREVER
Sky's the limit with unrestricted space in the cloud
This time it's SO REAL: Overcoming the open-source orgasm myth with TODO
If the web giants need it to work, hey, maybe it'll work
'ANYTHING BUT STABLE' Netflix suffers BIG Europe-wide outage
Friday night LIVE? Nope. The only thing streaming are tears down my face
Google roolz! Nest buys Revolv, KILLS new sales of home hub
Take my temperature, I'm feeling a little bit dizzy
prev story

Whitepapers

Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.
Simplify SSL certificate management across the enterprise
Simple steps to take control of SSL across the enterprise, and recommendations for a management platform for full visibility and single-point of control for these Certificates.