Feeds

Big data elephant mates with RainStor

RainStor Hadoops its storage

Top three mobile application threats

Fast and really fast

The RainStor Hadoop product can avoid big data transfers, and it can run queries against Hadoop data quicker than other approaches; Bantleman says RainStor can provide a 10-100X performance boost for analytics.

He quotes an extreme example of RainStore analytics acceleration with a New York Stock Exchange example, where the analytics task was to calculate the average daily trading price for a single stock for a day. There were 1.5 billion trades on the day in question in November 2011, and they were stored in a Hadoop data store.

A Hadoop MapReduce batch run took four hours while a RainStor MapReduce run looking at all the data took 80 minutes. With the query treated as an ad hoc query the Hadoop MapReduce time was the same: four hours. A RainStor MapReduce run with filtering took two minutes and a RainStor SQL run took eight seconds.

Bantleman provides these figures with a straight face. Apparently, a four-hour Hadoop MapReduce run to find a single stock's NYSE average price for a day, with 1.5 billion trades in around 8,000 files, ran 1,800 times faster using a SQL query against the Hadoop data stored natively in RainStor.

Partition filtering vs brute force

Bantleman said: "We have partition filtering. Most databases have rows and columns and row indices. The RainStor filter tells me what not to read. The query looks at our metadata and asks which partitions contain, for example, IBM. There might be 8 instead of 8,000. Brute force reads everything, taking lots of time; we don't."

When RainStor was forced to read everything in the batch run – all 8,000 partitions – it was still 3 times faster because its data was compressed 25 times, whereas the raw Hadoop data wasn't: "We ran faster because the I/O overhead was massively reduced."

Other goodies in the RainStor Hadoop product include geo-replication and the ability to set retention and expiration times for data. The data can be input under one schema and can cope with schema changes so that it can be viewed through different schema without having to be re-ingested.

Looking ahead, Bantleman believes machine-to-machine messaging will cause a huge increase in the amount of data organisations may have to deal with. He also said he thinks that big data compression and deduplication will be extremely valuable if you need to store big data in flash-based storage memory. This would enable many concurrent high-speed queries of much less big data than the amount you started out with.

RainStor Enterprise Big Data Analytics On Hadoop is available now. ®

High performance access to file storage

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Kingston DataTraveler MicroDuo: Turn your phone into a 72GB beast
USB-usiness in the front, micro-USB party in the back
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
Inside the Hekaton: SQL Server 2014's database engine deconstructed
Nadella's database sqares the circle of cheap memory vs speed
BOFH: Oh DO tell us what you think. *CLICK*
$%%&amp Oh dear, we've been cut *CLICK* Well hello *CLICK* You're breaking up...
Just what could be inside Dropbox's new 'Home For Life'?
Biz apps, messaging, photos, email, more storage – sorry, did you think there would be cake?
AMD's 'Seattle' 64-bit ARM server chips now sampling, set to launch in late 2014
But they won't appear in SeaMicro Fabric Compute Systems anytime soon
Amazon reveals its Google-killing 'R3' server instances
A mega-memory instance that never forgets
prev story

Whitepapers

Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
SANS - Survey on application security programs
In this whitepaper learn about the state of application security programs and practices of 488 surveyed respondents, and discover how mature and effective these programs are.