Greenplum opens up Big Data control freak: Chorus for all of us

Ties up with Kaggle to head hunt algorithm geeks

Maximizing your infrastructure through virtualization

Hadoop World As promised, the Greenplum Big Data subsidiary of IT conglomerate EMC is opening up the Chorus control freak that it created to span the Greenplum data warehousing database and its two implementations of the Hadoop Big Data muncher.

At the Hadoop World extravaganza in New York, Greenplum is taking the wraps off the OpenChorus project, which is open-sourcing the Chorus control freak as Chorus Community Edition. Greenplum had promised to open up the Chorus code back when Chorus 2.0 was announced back in March of this year. That was also when Greenplum acquired Pivotal Labs, a hot-shot mercenary coding outfit that Greenplum hired to help it port the Chorus from Java to Ruby and get the project back on track after it was delayed. Greenplum liked the results so much that it bought the company for an undisclosed amount.

At the time, Greenplum did not divulge what licensing model it would use, but hinted that it would lean towards open licenses like Apache and away from more restrictive licenses like GPL. And, as it turns out, OpenChorus tapped the Apache 2.0 license for the freebie code. The open-source version is based on Chorus 2.1, and the OpenChorus project says that it is in the late stages of development for Chorus 2.2 at this time. The code is available at GitHub here.

Greenplum is very honest about that it intends for OpenChorus and said back in the spring that it did not expect a lot of developers to step up and contribute, as happens with the underlying Hadoop project and related tools, for instance. Rather, OpenChorus is emulating Android, where one vendor, in this case Google, does most of the work and the open sourcing is about making companies comfortable investing in the technology, not about getting them to code. Nothing will prevent Greenplum's competitors in Hadoop – Hortonworks, Cloudera, Teradata, and IBM – from snagging the code and using it or elbowing their way into the project, of course.

Greenplum will obviously continue to distribute a supported version of the tool, now to be known as Chorus Enterprise Edition, according to Josh Klahr, vice president of product management at Greenplum. The Chorus Community Edition will be distributed freely, but it will not have either updating features or tech support.

In addition to opening up the Chorus tool, Greenplum announced a series of partnerships with Kaggle, GNIP, and Tableau, which all have niches in the Big Data space.

Kaggle hosts data science competitions where some 57,000 algo freaks compete to try to solve problems for money. (It turns your job into a game show of sorts, but the problems involve big data and it is definitely not like a steady job.) The Chorus 2.0 tool allows for data warehouse and Hadoop admins to cordon off a chunk of a machine and sandbox it for algorithm writers to test their code against a subset of real data on real iron. In the long haul, Greenplum and Kaggle hope to integrate algorithm contests with Chorus so you can publish contests directly to Kaggle from the Chorus interface and dispatch work from data scientists who are tapped by Kaggle to run their algorithms. At the moment, the integration is a bit looser and more manual, allowing Chorus admins to package up the job around which they want to create a contest – the job description, the data types, and so on – and sent invitations to Kaggle for people to take a whack at solving the problem.

Greenplum is also working with GNIP, which dices, slices, and packages the full-on Twitter feed and resells it, so customers who have a GNIP account can suck in JSON-formatted datasets, drop them into Hadoop, and automatically see them pop up inside of Chorus for use in data munching. Eventually GNIP will provide access to raw feeds from YouTube, Flickr, Facebook, Google+, Tumblr, WordPress, and other social media sites so you can get pre-chewed versions of their feeds for munching.

Greenplum is also integrating Chrous with the multidimensional data visualization tools from Tableau Software. With the links between the two programs, Chorus will be able to grab data from Hadoop file systems and Greenplum databases and spit it out into Tableau workbooks and allow Chorus to tag and annotate Tableau assets as well.

The Chorus 2.3 update that features the Kaggle, GNIP, and Tableau integrations will be available in November. ®

The Power of One eBook: Top reasons to choose HP BladeSystem

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
SHOCK and AWS: The fall of Amazon's deflationary cloud
Just as Jeff Bezos did to books and CDs, Amazon's rivals are now doing to it
BlackBerry: Toss the server, mate... BES is in the CLOUD now
BlackBerry Enterprise Services takes aim at SMEs - but there's a catch
The triumph of VVOL: Everyone's jumping into bed with VMware
'Bandwagon'? Yes, we're on it and so what, say big dogs
Carbon tax repeal won't see data centre operators cut prices
Rackspace says electricity isn't a major cost, Equinix promises 'no levy'
Disaster Recovery upstart joins DR 'as a service' gang
Quorum joins the aaS crowd with DRaaS offering
prev story


Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Application security programs and practises
Follow a few strategies and your organization can gain the full benefits of open source and the cloud without compromising the security of your applications.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Securing Web Applications Made Simple and Scalable
Learn how automated security testing can provide a simple and scalable way to protect your web applications.