Feeds

Greenplum opens up Big Data control freak: Chorus for all of us

Ties up with Kaggle to head hunt algorithm geeks

Secure remote control for conventional and virtual desktops

Hadoop World As promised, the Greenplum Big Data subsidiary of IT conglomerate EMC is opening up the Chorus control freak that it created to span the Greenplum data warehousing database and its two implementations of the Hadoop Big Data muncher.

At the Hadoop World extravaganza in New York, Greenplum is taking the wraps off the OpenChorus project, which is open-sourcing the Chorus control freak as Chorus Community Edition. Greenplum had promised to open up the Chorus code back when Chorus 2.0 was announced back in March of this year. That was also when Greenplum acquired Pivotal Labs, a hot-shot mercenary coding outfit that Greenplum hired to help it port the Chorus from Java to Ruby and get the project back on track after it was delayed. Greenplum liked the results so much that it bought the company for an undisclosed amount.

At the time, Greenplum did not divulge what licensing model it would use, but hinted that it would lean towards open licenses like Apache and away from more restrictive licenses like GPL. And, as it turns out, OpenChorus tapped the Apache 2.0 license for the freebie code. The open-source version is based on Chorus 2.1, and the OpenChorus project says that it is in the late stages of development for Chorus 2.2 at this time. The code is available at GitHub here.

Greenplum is very honest about that it intends for OpenChorus and said back in the spring that it did not expect a lot of developers to step up and contribute, as happens with the underlying Hadoop project and related tools, for instance. Rather, OpenChorus is emulating Android, where one vendor, in this case Google, does most of the work and the open sourcing is about making companies comfortable investing in the technology, not about getting them to code. Nothing will prevent Greenplum's competitors in Hadoop – Hortonworks, Cloudera, Teradata, and IBM – from snagging the code and using it or elbowing their way into the project, of course.

Greenplum will obviously continue to distribute a supported version of the tool, now to be known as Chorus Enterprise Edition, according to Josh Klahr, vice president of product management at Greenplum. The Chorus Community Edition will be distributed freely, but it will not have either updating features or tech support.

In addition to opening up the Chorus tool, Greenplum announced a series of partnerships with Kaggle, GNIP, and Tableau, which all have niches in the Big Data space.

Kaggle hosts data science competitions where some 57,000 algo freaks compete to try to solve problems for money. (It turns your job into a game show of sorts, but the problems involve big data and it is definitely not like a steady job.) The Chorus 2.0 tool allows for data warehouse and Hadoop admins to cordon off a chunk of a machine and sandbox it for algorithm writers to test their code against a subset of real data on real iron. In the long haul, Greenplum and Kaggle hope to integrate algorithm contests with Chorus so you can publish contests directly to Kaggle from the Chorus interface and dispatch work from data scientists who are tapped by Kaggle to run their algorithms. At the moment, the integration is a bit looser and more manual, allowing Chorus admins to package up the job around which they want to create a contest – the job description, the data types, and so on – and sent invitations to Kaggle for people to take a whack at solving the problem.

Greenplum is also working with GNIP, which dices, slices, and packages the full-on Twitter feed and resells it, so customers who have a GNIP account can suck in JSON-formatted datasets, drop them into Hadoop, and automatically see them pop up inside of Chorus for use in data munching. Eventually GNIP will provide access to raw feeds from YouTube, Flickr, Facebook, Google+, Tumblr, WordPress, and other social media sites so you can get pre-chewed versions of their feeds for munching.

Greenplum is also integrating Chrous with the multidimensional data visualization tools from Tableau Software. With the links between the two programs, Chorus will be able to grab data from Hadoop file systems and Greenplum databases and spit it out into Tableau workbooks and allow Chorus to tag and annotate Tableau assets as well.

The Chorus 2.3 update that features the Kaggle, GNIP, and Tableau integrations will be available in November. ®

Top 5 reasons to deploy VMware with Tegile

More from The Register

next story
729 teraflops, 71,000-core Super cost just US$5,500 to build
Cloud doubters, this isn't going to be your best day
Azure TITSUP caused by INFINITE LOOP
Fat fingered geo-block kept Aussies in the dark
Want to STUFF Facebook with blatant ADVERTISING? Fine! But you must PAY
Pony up or push off, Zuck tells social marketeers
Oi, Europe! Tell US feds to GTFO of our servers, say Microsoft and pals
By writing a really angry letter about how it's harming our cloud business, ta
You think the CLOUD's insecure? It's BETTER than UK.GOV's DATA CENTRES
We don't even know where some of them ARE – Maude
SAVE ME, NASA system builder, from my DEAD WORKSTATION
Anal-retentive hardware nerd in paws-on workstation crisis
Astro-boffins start opening universe simulation data
Got a supercomputer? Want to simulate a universe? Here you go
prev story

Whitepapers

Choosing cloud Backup services
Demystify how you can address your data protection needs in your small- to medium-sized business and select the best online backup service to meet your needs.
Getting started with customer-focused identity management
Learn why identity is a fundamental requirement to digital growth, and how without it there is no way to identify and engage customers in a meaningful way.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
The Heartbleed Bug: how to protect your business with Symantec
What happens when the next Heartbleed (or worse) comes along, and what can you do to weather another chapter in an all-too-familiar string of debilitating attacks?
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.