Big names unleash 1,000-node Hadoop stampede
Coming of age for twee cartoon elephant?
SaaS data loss: The problem you didn’t know you had
Hadoop, the distributed big-data processing system named after a toy elephant, now has a group of suppliers behind it. The tech giants are collectively aiming to develop the popular big data beast with a 1,000 node cluster – dubbed the Analytics Workbench – as a laboratory.
EMC's Greenplum, Intel, Mellanox Technologies, Micron, Seagate, SuperMicro, Switch, VMware and others have collaborated to set up a 1,000 node cluster to develop Hadoop. Its purpose is to "act as an environment for running scale validation of the Apache Hadoop code base".
Greenplum says:
[We are] actively working with the Apache Software Foundation to ensure that all results from the Analytics Workbench are available to the open source community in an effort to leverage the resources of the Analytics Workbench to further accelerate the development of Hadoop as a revolutionary technology for Big Data.
Find out more at the GAW website. ®
COMMENTS
Re: Marketing
This may be a newsfeed (press release) article, but I disagree with your comment:
Relevant news is several companies (including EMC, Intel, Seagate, VMware etc) have donated materials to this project.
Other relevant news is that results will be available to open source communities.
Both of these are significant developments and of great interest to those of us interested in Hadoop or even just large scale data processing.
1000 node HDFS cluster? Not so big...
A 1000 node Hadoop cluster isn't all that big, in terms of those who are actively using the technology to store and analyze petabytes of data. 1000 nodes == ~250 production nodes (3x redundancy + 1x control/fail-over), which if each has 24TB of disc (12x2TB in a 2U enclosure), you have 6PB of production data. Not a small Hadoop cluster, for sure, but not the biggest by any stretch. So, I think this is probably a decent sized cluster for research, and many/most production systems, but not nearly big enough to model installations such as Google, Facebook, et al.
Marketing
This one was a touch over the top as far as an advert disguised as a story goes. Did El Reg piss off the suppliers in this article with the Microsoft Research 'hadoop crushing' article earlier today?
There is nothing in here at all except a link to a pre-sales website.
Shame on El Reg

IT infrastructure monitoring strategies
Agentless Backup is Not a Myth
Top 10 SIEM implementer’s checklist
Steps to Take Before Choosing a Business Continuity Partner
Enabling efficient data center monitoring