Feeds

Hadoop Hive stung into action, swarms around SQL

More relational, more useful to humans, we're promised

The essential guide to IT transformation

Hortonworks has unveiled the Stinger Initiative, a project to make Hadoop’s Hive data warehouse friendlier with SQL and faster.

Hortonworks has also unveiled two accompanying Hadoop projects, which it’s submitted to the Apache Software Foundation (ASF) in the hope they become community-supported projects. They are a runtime called Tez and a sign-in and authentication system called Gateway. Both Tez and Gateway are ASF incubator projects. You can read more about them here.

Hadoop services startup Hortonworks said Stinger would “enhance Hive with more SQL and better performance” for what it called “human-time use cases”.

Translated, Stinger should make Hive friendlier and faster to use in data querying and analytics normally undertaken by SQL and relational tools.

Hive, like the rest of the Hadoop architecture, has thrived on crunching batches of data – Hadoop is a open-source implementation of Google’s MapReduce and a NoSQL system.

However, the NoSQL crowds realised they need to make their architectures work better with SQL-like tools used by businesses in the real world.

The standard SQL interface for Hive was HiveQL, but it doesn't match the latest SQL standard - and support for HiveQL is not widespread, so banking your data infrastructure on it is a bit of a gamble. ASF's HiveQL project web page is depricated, and simply points you to the HiveQL programming manual.

According to Hortonworks, Stinger will make Hive “a more suitable tool for the decision support queries people want to perform on Hadoop”.

This means the addition of analytics features such as the OVER clause, support for subqueries in WHERE and aligning Hive’s type system with the standard SQL model.

The plan is to speed up Hive, too. There’s a new executing engine to increase the number of records per second Hive can process, a new columnar file format to provide “a more modern, efficient and high performing” means to store Hive data, and the Tez runtime framework to speed up workload speeds by eliminating unnecessary talks and synchronization barriers and that reads and writes to HDFS.

A preview of Stinger is planned ahead of the Hadoop Summit in Amsterdam in March. ®

Boost IT visibility and business value

More from The Register

next story
Munich considers dumping Linux for ... GULP ... Windows!
Give a penguinista a hug, the Outlook's not good for open source's poster child
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Intel's Raspberry Pi rival Galileo can now run Windows
Behold the Internet of Things. Wintel Things
Microsoft cries UNINSTALL in the wake of Blue Screens of Death™
Cache crash causes contained choloric calamity
Eat up Martha! Microsoft slings handwriting recog into OneNote on Android
Freehand input on non-Windows kit for the first time
Time to move away from Windows 7 ... whoa, whoa, who said anything about Windows 8?
Start migrating now to avoid another XPocalypse – Gartner
You'll find Yoda at the back of every IT conference
The piss always taking is he. Bastard the.
prev story

Whitepapers

5 things you didn’t know about cloud backup
IT departments are embracing cloud backup, but there’s a lot you need to know before choosing a service provider. Learn all the critical things you need to know.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Build a business case: developing custom apps
Learn how to maximize the value of custom applications by accelerating and simplifying their development.
Rethinking backup and recovery in the modern data center
Combining intelligence, operational analytics, and automation to enable efficient, data-driven IT organizations using the HP ABR approach.
Next gen security for virtualised datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.