Feeds

Greenplum opens up Big Data control freak: Chorus for all of us

Ties up with Kaggle to head hunt algorithm geeks

High performance access to file storage

Hadoop World As promised, the Greenplum Big Data subsidiary of IT conglomerate EMC is opening up the Chorus control freak that it created to span the Greenplum data warehousing database and its two implementations of the Hadoop Big Data muncher.

At the Hadoop World extravaganza in New York, Greenplum is taking the wraps off the OpenChorus project, which is open-sourcing the Chorus control freak as Chorus Community Edition. Greenplum had promised to open up the Chorus code back when Chorus 2.0 was announced back in March of this year. That was also when Greenplum acquired Pivotal Labs, a hot-shot mercenary coding outfit that Greenplum hired to help it port the Chorus from Java to Ruby and get the project back on track after it was delayed. Greenplum liked the results so much that it bought the company for an undisclosed amount.

At the time, Greenplum did not divulge what licensing model it would use, but hinted that it would lean towards open licenses like Apache and away from more restrictive licenses like GPL. And, as it turns out, OpenChorus tapped the Apache 2.0 license for the freebie code. The open-source version is based on Chorus 2.1, and the OpenChorus project says that it is in the late stages of development for Chorus 2.2 at this time. The code is available at GitHub here.

Greenplum is very honest about that it intends for OpenChorus and said back in the spring that it did not expect a lot of developers to step up and contribute, as happens with the underlying Hadoop project and related tools, for instance. Rather, OpenChorus is emulating Android, where one vendor, in this case Google, does most of the work and the open sourcing is about making companies comfortable investing in the technology, not about getting them to code. Nothing will prevent Greenplum's competitors in Hadoop – Hortonworks, Cloudera, Teradata, and IBM – from snagging the code and using it or elbowing their way into the project, of course.

Greenplum will obviously continue to distribute a supported version of the tool, now to be known as Chorus Enterprise Edition, according to Josh Klahr, vice president of product management at Greenplum. The Chorus Community Edition will be distributed freely, but it will not have either updating features or tech support.

In addition to opening up the Chorus tool, Greenplum announced a series of partnerships with Kaggle, GNIP, and Tableau, which all have niches in the Big Data space.

Kaggle hosts data science competitions where some 57,000 algo freaks compete to try to solve problems for money. (It turns your job into a game show of sorts, but the problems involve big data and it is definitely not like a steady job.) The Chorus 2.0 tool allows for data warehouse and Hadoop admins to cordon off a chunk of a machine and sandbox it for algorithm writers to test their code against a subset of real data on real iron. In the long haul, Greenplum and Kaggle hope to integrate algorithm contests with Chorus so you can publish contests directly to Kaggle from the Chorus interface and dispatch work from data scientists who are tapped by Kaggle to run their algorithms. At the moment, the integration is a bit looser and more manual, allowing Chorus admins to package up the job around which they want to create a contest – the job description, the data types, and so on – and sent invitations to Kaggle for people to take a whack at solving the problem.

Greenplum is also working with GNIP, which dices, slices, and packages the full-on Twitter feed and resells it, so customers who have a GNIP account can suck in JSON-formatted datasets, drop them into Hadoop, and automatically see them pop up inside of Chorus for use in data munching. Eventually GNIP will provide access to raw feeds from YouTube, Flickr, Facebook, Google+, Tumblr, WordPress, and other social media sites so you can get pre-chewed versions of their feeds for munching.

Greenplum is also integrating Chrous with the multidimensional data visualization tools from Tableau Software. With the links between the two programs, Chorus will be able to grab data from Hadoop file systems and Greenplum databases and spit it out into Tableau workbooks and allow Chorus to tag and annotate Tableau assets as well.

The Chorus 2.3 update that features the Kaggle, GNIP, and Tableau integrations will be available in November. ®

High performance access to file storage

More from The Register

next story
Seagate brings out 6TB HDD, did not need NO STEENKIN' SHINGLES
Or helium filling either, according to reports
European Court of Justice rips up Data Retention Directive
Rules 'interfering' measure to be 'invalid'
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
Cisco reps flog Whiptail's Invicta arrays against EMC and Pure
Storage reseller report reveals who's selling what
Bored with trading oil and gold? Why not flog some CLOUD servers?
Chicago Mercantile Exchange plans cloud spot exchange
Just what could be inside Dropbox's new 'Home For Life'?
Biz apps, messaging, photos, email, more storage – sorry, did you think there would be cake?
IT bods: How long does it take YOU to train up on new tech?
I'll leave my arrays to do the hard work, if you don't mind
prev story

Whitepapers

Mainstay ROI - Does application security pay?
In this whitepaper learn how you and your enterprise might benefit from better software security.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Mobile application security study
Download this report to see the alarming realities regarding the sheer number of applications vulnerable to attack, as well as the most common and easily addressable vulnerability errors.