Big Blue Google cloud injected with $5m
How to simulate an ocean
The US National Science Foundation has tossed $5 million at Google's effort to educate the country's university students in the ways of Big Data.
Back in the fall 2007, Google teamed with IBM to provide various universities with access to a dedicated compute cluster where students could explore the sort of mega-data-crunching techniques that unpin its web-dominating search engine. Both Google and Big Blue shoved between $20m to $25m behind the initiative, and today, the NSF announced a roughly $5 million grant that will fund the data-crunching research of 14 separate institutions, including MIT, Yale, Carnegie Mellon, and University of Utah.
"The computational and storage resources provided by this Google-IBM initiative allows us to perform complicated interactive analysis of a pretty-much unprecedentedly large amount of data," Claudio Silva, associate professor at the University of Utah, tells The Reg. "It has the ability to completely transform the way we do data analysis and visualization...
"The computing centers that companies like Microsoft, Amazon, and Google are using are even larger than anything the government has built."
For instance, Silva says, the university will use Google's distributed compute power to crunch vast amounts of data on behalf of NSF oceanographers. "The project looks to do coastal observation and prediction...We have a lot of sensor and simulated data involving the Columbia River and the Pacific Northwest Ocean, and right now, it takes an enormous amount of time to shift through all the data and answer the questions that need answering."
You see, Google is interested in prepping the country's top computer science students for life at Google. That research compute cluster runs Hadoop, an open source platform based on Google's distributed file system, GFS, and its software framework for distributed data-crunching, known as MapReduce.
According to Christophe Bisciglia - the former Google engineer who recently jumped ship for the Hadoop startup Cloudera - the cluster sits inside one of Google's famously podified data centers. Biciglia has told The Reg that the cluster was set up in a ring-fenced portion of the data center scheduled for "decommissioning" back in 2007.
Before he left Google, Bisciglia taught a course on Googlicious Big Data at his alma mater, the University of Washington, and the Hadoop-happy curriculum - since open sourced under a Creative Commons license - is now taught at several other universities across the country. Meanwhile, IBM has provided students with Eclipse-based open source tools for building their own apps atop Hadoop.
Hadoop was founded by a man named Doug Cutting, who now works at Yahoo!. The company now backs at least a portion of its web operation with Hadoop, and like Google and IBM, it's working to prepare the next generation of computer scientist for interweb-scale data transformations on low-cost distributed machines. Yahoo! offers up its own Hadoop research cluster, the M45, to various American universities.
But as Hadoop educates the world in Big Data, Google continues to keep its veil of secrecy over the particulars of its own GFS and MapReduce. Naturally. ®
Sponsored: Hyper-scale data management