Feeds

Big Blue Google cloud injected with $5m

How to simulate an ocean

Providing a secure and efficient Helpdesk

The US National Science Foundation has tossed $5 million at Google's effort to educate the country's university students in the ways of Big Data.

Back in the fall 2007, Google teamed with IBM to provide various universities with access to a dedicated compute cluster where students could explore the sort of mega-data-crunching techniques that unpin its web-dominating search engine. Both Google and Big Blue shoved between $20m to $25m behind the initiative, and today, the NSF announced a roughly $5 million grant that will fund the data-crunching research of 14 separate institutions, including MIT, Yale, Carnegie Mellon, and University of Utah.

"The computational and storage resources provided by this Google-IBM initiative allows us to perform complicated interactive analysis of a pretty-much unprecedentedly large amount of data," Claudio Silva, associate professor at the University of Utah, tells The Reg. "It has the ability to completely transform the way we do data analysis and visualization...

"The computing centers that companies like Microsoft, Amazon, and Google are using are even larger than anything the government has built."

For instance, Silva says, the university will use Google's distributed compute power to crunch vast amounts of data on behalf of NSF oceanographers. "The project looks to do coastal observation and prediction...We have a lot of sensor and simulated data involving the Columbia River and the Pacific Northwest Ocean, and right now, it takes an enormous amount of time to shift through all the data and answer the questions that need answering."

You see, Google is interested in prepping the country's top computer science students for life at Google. That research compute cluster runs Hadoop, an open source platform based on Google's distributed file system, GFS, and its software framework for distributed data-crunching, known as MapReduce.

According to Christophe Bisciglia - the former Google engineer who recently jumped ship for the Hadoop startup Cloudera - the cluster sits inside one of Google's famously podified data centers. Biciglia has told The Reg that the cluster was set up in a ring-fenced portion of the data center scheduled for "decommissioning" back in 2007.

Before he left Google, Bisciglia taught a course on Googlicious Big Data at his alma mater, the University of Washington, and the Hadoop-happy curriculum - since open sourced under a Creative Commons license - is now taught at several other universities across the country. Meanwhile, IBM has provided students with Eclipse-based open source tools for building their own apps atop Hadoop.

Hadoop was founded by a man named Doug Cutting, who now works at Yahoo!. The company now backs at least a portion of its web operation with Hadoop, and like Google and IBM, it's working to prepare the next generation of computer scientist for interweb-scale data transformations on low-cost distributed machines. Yahoo! offers up its own Hadoop research cluster, the M45, to various American universities.

But as Hadoop educates the world in Big Data, Google continues to keep its veil of secrecy over the particulars of its own GFS and MapReduce. Naturally. ®

Internet Security Threat Report 2014

More from The Register

next story
UNIX greybeards threaten Debian fork over systemd plan
'Veteran Unix Admins' fear desktop emphasis is betraying open source
Netscape Navigator - the browser that started it all - turns 20
It was 20 years ago today, Marc Andreeesen taught the band to play
Redmond top man Satya Nadella: 'Microsoft LOVES Linux'
Open-source 'love' fairly runneth over at cloud event
Chrome 38's new HTML tag support makes fatties FIT and SKINNIER
First browser to protect networks' bandwith using official spec
Admins! Never mind POODLE, there're NEW OpenSSL bugs to splat
Four new patches for open-source crypto libraries
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
Three 1TB solid state scorchers up for grabs
Big SSDs can be expensive but think big and think free because you could be the lucky winner of one of three 1TB Samsung SSD 840 EVO drives that we’re giving away worth over £300 apiece.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.