Feeds

Greenplum opens up Big Data control freak: Chorus for all of us

Ties up with Kaggle to head hunt algorithm geeks

Designing a Defense for Mobile Applications

Hadoop World As promised, the Greenplum Big Data subsidiary of IT conglomerate EMC is opening up the Chorus control freak that it created to span the Greenplum data warehousing database and its two implementations of the Hadoop Big Data muncher.

At the Hadoop World extravaganza in New York, Greenplum is taking the wraps off the OpenChorus project, which is open-sourcing the Chorus control freak as Chorus Community Edition. Greenplum had promised to open up the Chorus code back when Chorus 2.0 was announced back in March of this year. That was also when Greenplum acquired Pivotal Labs, a hot-shot mercenary coding outfit that Greenplum hired to help it port the Chorus from Java to Ruby and get the project back on track after it was delayed. Greenplum liked the results so much that it bought the company for an undisclosed amount.

At the time, Greenplum did not divulge what licensing model it would use, but hinted that it would lean towards open licenses like Apache and away from more restrictive licenses like GPL. And, as it turns out, OpenChorus tapped the Apache 2.0 license for the freebie code. The open-source version is based on Chorus 2.1, and the OpenChorus project says that it is in the late stages of development for Chorus 2.2 at this time. The code is available at GitHub here.

Greenplum is very honest about that it intends for OpenChorus and said back in the spring that it did not expect a lot of developers to step up and contribute, as happens with the underlying Hadoop project and related tools, for instance. Rather, OpenChorus is emulating Android, where one vendor, in this case Google, does most of the work and the open sourcing is about making companies comfortable investing in the technology, not about getting them to code. Nothing will prevent Greenplum's competitors in Hadoop – Hortonworks, Cloudera, Teradata, and IBM – from snagging the code and using it or elbowing their way into the project, of course.

Greenplum will obviously continue to distribute a supported version of the tool, now to be known as Chorus Enterprise Edition, according to Josh Klahr, vice president of product management at Greenplum. The Chorus Community Edition will be distributed freely, but it will not have either updating features or tech support.

In addition to opening up the Chorus tool, Greenplum announced a series of partnerships with Kaggle, GNIP, and Tableau, which all have niches in the Big Data space.

Kaggle hosts data science competitions where some 57,000 algo freaks compete to try to solve problems for money. (It turns your job into a game show of sorts, but the problems involve big data and it is definitely not like a steady job.) The Chorus 2.0 tool allows for data warehouse and Hadoop admins to cordon off a chunk of a machine and sandbox it for algorithm writers to test their code against a subset of real data on real iron. In the long haul, Greenplum and Kaggle hope to integrate algorithm contests with Chorus so you can publish contests directly to Kaggle from the Chorus interface and dispatch work from data scientists who are tapped by Kaggle to run their algorithms. At the moment, the integration is a bit looser and more manual, allowing Chorus admins to package up the job around which they want to create a contest – the job description, the data types, and so on – and sent invitations to Kaggle for people to take a whack at solving the problem.

Greenplum is also working with GNIP, which dices, slices, and packages the full-on Twitter feed and resells it, so customers who have a GNIP account can suck in JSON-formatted datasets, drop them into Hadoop, and automatically see them pop up inside of Chorus for use in data munching. Eventually GNIP will provide access to raw feeds from YouTube, Flickr, Facebook, Google+, Tumblr, WordPress, and other social media sites so you can get pre-chewed versions of their feeds for munching.

Greenplum is also integrating Chrous with the multidimensional data visualization tools from Tableau Software. With the links between the two programs, Chorus will be able to grab data from Hadoop file systems and Greenplum databases and spit it out into Tableau workbooks and allow Chorus to tag and annotate Tableau assets as well.

The Chorus 2.3 update that features the Kaggle, GNIP, and Tableau integrations will be available in November. ®

The Power of One eBook: Top reasons to choose HP BladeSystem

More from The Register

next story
Apple fanbois SCREAM as update BRICKS their Macbook Airs
Ragegasm spills over as firmware upgrade kills machines
Attack of the clones: Oracle's latest Red Hat Linux lookalike arrives
Oracle's Linux boss says Larry's Linux isn't just for Oracle apps anymore
THUD! WD plonks down SIX TERABYTE 'consumer NAS' fatboy
Now that's a LOT of porn or pirated movies. Or, you know, other consumer stuff
EU's top data cops to meet Google, Microsoft et al over 'right to be forgotten'
Plan to hammer out 'coherent' guidelines. Good luck chaps!
US judge: YES, cops or feds so can slurp an ENTIRE Gmail account
Crooks don't have folders labelled 'drug records', opines NY beak
Manic malware Mayhem spreads through Linux, FreeBSD web servers
And how Google could cripple infection rate in a second
FLAPE – the next BIG THING in storage
Find cold data with flash, transmit it from tape
prev story

Whitepapers

Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Reducing security risks from open source software
Follow a few strategies and your organization can gain the full benefits of open source and the cloud without compromising the security of your applications.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Consolidation: the foundation for IT and business transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.