On-Prem

This article is more than 1 year old

IS there anything sexier than hype-converged systems?

First Map Reduce, then Hive/Impala, but what’s next?

Tue 16 Dec 2014 // 17:03 UTC

Comment Everyone likes hyper-converged systems. They are cool, dense, fast, energy-saving, agile, scalable, manageable, easy-to-use, and whatever else you want. But you know what? They have their limits too.

They are good for average workloads, a large range of workloads indeed, but not for those workloads that need huge amounts of a specific resource to the detriment of others, such as Big Data, for example.

Data grows (steadily ... and exponentially) and nothing gets thrown away. Since data adds up, the concept of the “data lake” has taken shape. Even systems created for big data are starting to sense this problem and system architects are beginning to think differently about storage.

I’m going to look at Hadoop because it gives a good example of a hyper-converged infrastructure.

Today, most Hadoop clusters are built on top of an HDFS (Hadoop Distributed File System). HDFS characteristics make this filesystem much cheaper, reliable, and more scalable than many other solutions but, at the same time, it’s limited by the cluster design itself.

CPU/RAM/network/capacity ratios are important to design the best balanced systems, but things change so rapidly that what you have implemented today could become very inefficient tomorrow. I know that we are living in a very commodity-hardware-world right now, but despite the short lifespan of modern hardware I’m not convinced that enterprises are willing to change their infrastructures (and spend boatloads of money) very often.

Look at what's happening. Two years ago it was all about Map Reduce, then it was all about Hive/Impala and the like, now it’s all about Spark and other in-memory technologies. What’s next?

Whatever, my first question is: “Can they run on the same cluster?”

Yes, of course, because the underlying infrastructure, now Hadoop 2.6, has evolved as well.

But the real question is: “Can they run with the same level of efficiency on the same two-year-old cluster?” Mmmm, probably not.

And then another question arises: “Can you update that cluster to meet the new requirements?”

Well, this is a tough one to answer. Capacity grows but you don’t normally need to process all the data at the same time while, on the other hand, applications, business needs, and workloads change very quickly, making it difficult to build a hyper-converged cluster and serve them all efficiently.

Things get even more complicated if the big data analytics cluster becomes an enterprise-wide utility. Classic Hadoop tools are not the only ones, and many departments in your organisation have different views and need to make different analyses on different data sets (which often come from the same raw data); it’s one of the advantages of a data lake.

Next page: Why divergence?

Page:

More about

COMMENTS

TIP US OFF

Send us news

Topics

Special Features

Vendor Voice

Resources

On-Prem

IS there anything sexier than hype-converged systems?

First Map Reduce, then Hive/Impala, but what’s next?

More about

TIP US OFF

Other stories you might like

Miracle-WM tiling window manager for Mir hits 0.2.0

GM shared our driving data with insurers without consent, lawsuit claims

iPhone sales dive 19.1% in China as Huawei comeback hits Apple in the high end

Industrial systems integrating digitalisation

Microsoft shrinks AI down to pocket size with Phi-3 Mini

Digital Realty wants to turn Irish datacenters into grid-stabilizing power jugglers

Microsoft really does not want Windows 11 running on ancient PCs

SAP cloud swells its topline, but profits slide

Mandiant: Orgs are detecting cybercriminals faster than ever

UnitedHealth admits breach could 'cover substantial proportion of people in America'

Voyager 1 regains sanity after engineers patch around problematic memory

Leicester streetlights take ransomware attack personally, shine on 24/7

About Us

Our Websites

Your Privacy