Feeds

Hadoop: Making Linux gobble big data

Growing penguins need petabytes to feast on

Top three mobile application threats

More kinds of data chewing

"As the data in Hadoop becomes more valuable, you will see other forms of computation moving to that data, not just MapReduce," said Collins. Which is funny, considering that the whole mantra of MapReduce was to move computation to data, not the other way around as data processing systems have been doing since the beginning of the computer era in more than six decades ago.

As the Hadoop stack has grown in complexity, the core use cases for the software have expanded, too. Now you can do batch reporting and more sophisticated data processing, and you can also use Hadoop to gather up log files and do real-time systems management. (This is, in fact, where many companies are cutting their teeth on Hadoop before they start using their customer data.) Companies are also using it for content serving and doing real-time aggregates and counters, and oddly enough, Hadoop is becoming a kind of storage controller. "As people use Hadoop for a long time, more of the data gets cold and it starts looking like storage," said Zedlewski.

Looking ahead, Collins said that getting consistency in the Hadoop stack, regardless of who puts together the distro and sells support for it, was going to be a major effort, all brought under the auspices of the BigTop effort. Many components in the Hadoop stack show above have different interfaces and release support levels, which makes it a bit of a nightmare to actually put together a distribution.

You still have to make compromises and choices, and that is not just bad for business customers who don't want to do Hadoop stack integration, but it is also bad for business for Hadoop disties because it increases support costs. There's also a lot of redundancy in the stack components, which only time will shake out. Moreover, HBase has cross-data centre replication, but the underlying HDFS does not. That needs to change not only for the biggest Hadoop users, but for any company that wants a hot site backup of their Hadoop operations. HBase is also expected to get development frameworks to make it more friendly to developers. And because businesses are crazy about security, they want Hadoop to get a more granular security model with access control lists.

The elephant is not exactly wearing a pinstripe suit and wingtips, but it is putting on a pair of khakis and a decent shirt. Unlike many of the Hadoop geeks presenting at the conference, in fact.

The other interesting trend Collins discussed is the underlying hardware. It will soon be common to have a Hadoop host with 40, 64, or 80 cores, and companies are looking at what happens with Hadoop clusters when they move to 10GE or 40GE networks. "One host is now more powerful than what a whole rack of servers was when Google got started," said Collins.

It is also common to have server nodes with 48TB or 60TB of capacity using fat SATA disks. "We even have people running entire Hadoop clusters with just flash," said Collins. Hadoop users are looking at how to make clusters multi-tenant and how server virtualization might fit in to accomplish this and to ease with the underlying management of servers. Companies are interested in low-power X86 processors to boost the node density of their clusters, they want scalable and fault-tolerant Hadoop name nodes, and they are even contemplating how to get MapReduce algorithms to work on GPU coprocessors.

This latter effort is being spearheaded by the oil and gas industry, which already has GPUs in their clusters, said Zedlewski, adding that "this is still a pretty bleeding edge use case". ®

High performance access to file storage

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Kingston DataTraveler MicroDuo: Turn your phone into a 72GB beast
USB-usiness in the front, micro-USB party in the back
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
Inside the Hekaton: SQL Server 2014's database engine deconstructed
Nadella's database sqares the circle of cheap memory vs speed
BOFH: Oh DO tell us what you think. *CLICK*
$%%&amp Oh dear, we've been cut *CLICK* Well hello *CLICK* You're breaking up...
Just what could be inside Dropbox's new 'Home For Life'?
Biz apps, messaging, photos, email, more storage – sorry, did you think there would be cake?
Amazon reveals its Google-killing 'R3' server instances
A mega-memory instance that never forgets
Cisco reps flog Whiptail's Invicta arrays against EMC and Pure
Storage reseller report reveals who's selling what
prev story

Whitepapers

Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
SANS - Survey on application security programs
In this whitepaper learn about the state of application security programs and practices of 488 surveyed respondents, and discover how mature and effective these programs are.