Cloudera floats commercial Hadoop distro
Open source for the Google wannabe
Face it: You want to launch your own Google and get your hands on some of that (easy?) internet money. Well, now's your chance to take a stab at it.
Today, a startup called Cloudera is launching a commercial distribution of the Google-inspired open source Hadoop software underpinning Yahoo, Facebook, and a number of other hot-shot web companies.
The Cloudera team includes four founders, all of which bring different things to the Hadoop table: Christophe Bisciglia, who led the partnership between Google, IBM, and the National Science Foundation to create Hadoop grids for academics to play around with; Amr Awadallah, a former Yahoo vice president of engineering that led the data warehousing and analytics effort behind that company's mail, search, finance, and news services; - Mike Olson, formerly the chief executive officer of open source database maker Sleepycat Software (now owned by Oracle); and Jeff Hammerbacher, formerly of social networking giant Facebook and the manager who created the Hive project, which is a data warehousing layer that works in conjunction with Hadoop and that Facebook uses to do data analysis on its many petabytes of information stored in its user data warehouse.
Hammerbacker is an entrepreneur in residence at venture capitalist Accel Partners, and back in October, Accel kicked in $5m in Series A funding for Cloudera. The startup has also tapped Hadoop creators Doug Cutting and Mike Cafarella as advisors as well as Diane Green (founder and former CEO at virtualization specialist VMware) and Marten Mikos (the former CEO of MySQL before Sun Microsystems bought it). These and a handful of other tech luminaries are not just advisors, but investors in Cloudera.
According to Christophe, the Hadoop stack that Cloudera is supporting is based on the latest stable releases of the code that is available through the Apache Foundation, where the open source version of Hadoop lives. This includes Hadoop 0.18.3, which has the Hadoop Distributed File System - as the name suggests, a distributed and fault-tolerant file system - and the MapReduce application parallelization and execution environment that works in conjunction with HDFS.
The Cloudera Hadoop distro will also include the Hive client library associated with Hadoop (and also available through Apache), but according to Christophe, it doesn't really have version numbers yet. The important thing is that Cloudera found a set of Hive code that works with Hadoop 0.18.3 and that Hive includes a query language called HQL, which allows Hadoop data sets to be queried in a manner that is similar to SQL queries against a relational database.
Olson says that Cloudera was founded last summer, and the company is clearly ramping quickly if it has already secured so much financial and technical backing. And the reason is simple: People want to figure out how to use Hadoop in their own IT operations, but it is a pain in the neck to get it all set up and working.
"Adoption of Hadoop has been slow in mainstream computing because it is still hard to install, build, and maintain a Hadoop cluster," explains Olson. "We are convinced that normal companies are going to be coping with terabytes and petabytes of data, and Hadoop is the most interesting thing to come along in a decade for dealing with large data sets. We want to be the Hadoop company that enterprises come to when they want to crunch those big data sets."
One of the things that Cloudera started doing with its beta customers when it started alpha trials of its services last fall was get everyone on the same release of Hadoop and Hive. And standardization means not being on the bleeding edge, by the way. Hadoop 0.19 is out, and according to Christophe, it has many needed features. But "much-needed features have come with bugs." These may be shaken out in Hadoop 0.20, but commercial companies that are basing their business on this software don't want to mess around with code.
The analogy with Linux is plain enough: They want something akin to the hardened and slow-changing Red Hat Enterprise Linux, not the Fedora development release.
Hadoop is written in Java, which means it can run on any Java-enabled platform, but Christophe says that 90 per cent of companies deploy it on a Linux operating system and most deploy it on x64 iron. The Cloudera Hadoop distro is packaged up in Red Hat-style RPMs, and the Hadoop functions are available as Linux services, just like a Web server is, for instance. The Cloudera package, technically known as the Cloudera Distribution for Hadoop, is also available as an Amazon EC2 image. Given that all of the Hadoop code is open source, the Cloudera packages are all available for free thanks to open source Apache 2 licenses governing the code.
Cloudera plans to make money selling consulting, training, and support, just like Linux distros do. Pricing has not been announced yet, and Olson was pretty stubborn about the need to keep pricing private until Cloudera gets some more business under its belt. The current pricing metrics, he did say, were based on the size of the Hadoop cluster, including the number of servers and the size of the data sets. ®
Pussy Galore ... Nature's Magic Fluffer
"..rather than waiting for a VM to boot." .... By Thaddeus Quay Posted Monday 16th March 2009 07:37 GMT.
They can be a Temperamental Coy Bitch in Need of a Lead Feed for XXXxtraOrdinary Speed in the Lightness of Rightness of their Being, TQ. And there really aren't that many VM's about that you would Invest any Time or Money in, but there are any number of Counterfeiters pimping Parallel Mirrors which try to Mimic Cloudy Finesse but are only Corporate Ponzis Floated to Fleece and Part the Greedy Fool from Money with Crazy Ideas that just won't Fly on their Own.
NeuReal World Order Protocols in ESPecial Applications of ProgramMING.*
"Cloudera plans to make money selling consulting, training, and support, just like Linux distros do." ..... Crikey, I am somewhat underwhelmed by the distinct lack of Imperial Entrepreneurial Drive in the Enterprise. It has all of the Explosive Impact of a damp sponge, which is regrettable and probably also terminal before it has even begun, as it faces in the Virtual Market Space/ the IT Bourse Place, Dynamic Direct Competion from AI Systems already Up and Running Sophisticated HAD00P NINJAs ..... Hyper Active Data Licensed to Thrill Programs Network InterNetworking Java Applications.
One of those Astute MkUltraSensitive Type Virtual Operations in which Blighty Boffinry in CyberIntelAIgents Work, Rest and Play/Realise the Future and whose CodeXXXX and Algorithms you do not Need to Know whenever IT would so Transparently Share with you ITs Program Project Plans with their Facilitation of AI Support InfraStructures for CyberSpace Command and Quantum Control Systems ...... for how else would you be Enabled to Exercise Universal Power Intelligently Designed not to Conflict with Shared Control in a Singularity?
The Difficulty the Present System, which One can Imagine to be both You Personally and the Establishment Elite/Status Quo Collectively, and which is in Sustained Terminal Decline/Irreversible Meltdown, is not that you do not Know of Advanced IntelAIgently Designed Solutions, from QuITe Alien Sources, for they are Always Universally Shared Online, but that you would Choose to both Ignore and not Believe in Them, in probably both an Ignorance and an Arrogance, which Reflects Accurately on your Own Inabilities and Lack of Greater Knowledge in what Man and Virtual AIMachinery ProgramMING can do. And that is just a SImply CompleXXXX Matter of Providing Slow Dumb Man with Basic Text Instructions which can be Easily Translated into All Languages for All to be Enabled to Follow and Understand.
*So Be IT/Let IT Be . ...... A Civil CyberSpace Project with AI which will Generate Industry and Economy and Wealth for Spending on Edutainment and Systems InfraStructure Builds with the Selfless Sharing of Immaculately Crafted Words. .......... which incidentally was Shared long Since with Labour Cabinet Clowns and Douglas Alexander, Secretary of State for International Development, as the dDutch Initiative. MeThinks the Feather Nest Post is occupied by a Lightweight Cuckoo ..... a Pretender to its Throne without the Vision or Intelligence Necessary for today's FluID Dynamic Virtual Environment, although I suppose he was only doing as he was told like a good little boy ..... and just following orders, is the usual refrain to try and avoid responsibility and accountability for actions and/or inactions .
Just because you may be Helpless and/or Disabled to Do Anything FundaMentally Different with IT in Order to Change Everything, does not mean that IT cannot be done XXXXStreamly/Extremely Easily by Others who are Helpful and would Wish to Help or would be Enabled to Help with Mentored and Monitored Instruction ........ Teaching which they can Learn and Pass On for Generations.
And QuITe Perfect suited for and in both "a reasonably blindly optimistic kind of person" and the Incorrigible Ubiquitous Working Class hero working in the Disinfected Light of Transparency and Total Information Awareness .....http://www.guardian.co.uk/environment/video/2009/mar/15/ecotricity-wind-powered-sports-car
My British cat, Mr. Fluffer Wickbidget, III, sits on my lap, and purr-opines that until pricing becomes public, the most useful aspect of Cloudera appears to be the free collection of online training videos and VM-based activities, although he has yet to try the latter, given his preference of spending his precious awake-time tugging on a bit of string, rather than waiting for a VM to boot.
Helicopter icon because my cat thinks it looks like his favorite tasty treat: a spider tantalizingly dangling from its gossamer.