Original URL: http://www.theregister.co.uk/2009/03/25/new_internet_archive_data_center/

Sun packs 150 billion web pages into meat locker

Getting your arms around the internet

By Cade Metz

Posted in Servers, 25th March 2009 23:59 GMT

If you believe the Gospel According to Robert J. Cringley, Google pilfered its top-secret modular data center from the Internet Archive.

In a now-famous 2005 online expose, Cringley puts Google co-founder Larry Page at a pitch meeting where the Internet Archive's Bruce Baumgart considers the advantages of stuffing a full-fledged data center into a shipping container. The Archive's "Petabyte Box" presentation is dated November 8, 2003, and on December 30, Google filed for a patent describing its own containerized data center.

Less than four years later, the patent was granted. And according to one former employee, it's now the norm for Google to erect its ultra-hot data centers by piecing together intermodal shipping containers pre-packed with servers and cooling equipment. Inside the Mountain View Chocolate Factory, Page and company call it Project Will Power.

The Internet Archive eventually built the Petabyte Box - though it shrunk the name a bit and stopped short of actually packing its compact contraption into a shipping container. A PetaBox planted at the San Francisco Presido has long hosted the Archive's Wayback Machine - an 150 billion-page web history dating back to 1996.

Wayback Machine - the container

Internet history in a box

Now, more than five years after first pitching the idea, the outfit that launched the container revolution has finally containerized itself. This morning, deep inside the sun-splashed Santa Clara campus of Sun Microsystems, Internet Archive founder Brewster Kahle cut the ribbon on a single Sun Modular Datacenter housing the entire Wayback Machine. That's thirteen years of archived web pages packed into a container significantly smaller than your living room.

"At a metaphysical level, what we're doing today is reconceptualizing what a computer is...We're reconceptualizing what a library is," said Kahle, the MIT-trained computer scientist who sold his Alexa web-ranking engine to Amazon before birthing the not-for-profit Internet Archive.

Wayback Machine - Greg Papadopoulos and Brewster Kahle

Inside the Wayback Machine with Sun's Greg Papadopoulos and the Internet Archive's Brewster Kahle

"You can actually take a tour of this data center and ask 'How big is the web?' You can ask 'How much does it weigh?' These are things you can actually wrap your hands around in a very literal way."

In the beginning...

By all accounts, the notion of a containerized data center originated with Kahle. "The idea of a shipping container came in 2001, out of the absolute frustration of trying to build a data center for the Archive," he told reporters over lunch this afternoon.

Wayback Machine - Jud Cooley, lead engineer

The Project Blackbox director of engineering, Jud Cooley, spies a Thumper

The original thought was that the Internet Archive would ship these containerized copies of its Wayback Machine to strategic spots across the globe - so that Kahle's online database would never suffer the fate of the Library of Alexandria.

"If we could have four or five copies in places around the world and keep them in sync, then as upheavals go up and down and earthquakes happen, we might be able to survive and maintain an insight into our past. If Egypt had made copies of the Library of Alexandria in China or India, we would still have the other works of Aristotle. As it is, we don't."

In theory, you could manufacturer these standardized data centers from a central location and ship them across the planet via trains, planes, and automobiles. And all this would be cheaper - and quicker - than building centers from scratch.

Wayback Machine - the back of a rack

Rack in reverse

By 2003, Kahle colleague Bruce Baumgart was pitching the idea to commercial operations, including IBM, which eventually rejected it. At one point, Kahle remembers, Baumgart delivered his stock presentation - still available here (PDF) - at some sort of west coast hacker's conference. And, yes, Google co-founder Larry Page was in the audience.

Sun follows Google

But as Page reached for a patent on the idea, Kahle's brainstorm sparked two other minds over at Sun Microsystems. In the past, Sun has said that its Modular Datacenter - originally code-named Project Blackbox - grew out of a discussion between Sun chief technology officer Greg Papadopoulos and Danny Hills, now co-chairman and chief technology officer of a California consulting operation called Applied Minds. But this morning, Papadopoulus acknowledged that project sprung from Kahle, whom he had worked with at the Cambridge supercomputer maker Thinking Machines.

Wayback Machine - cables on movable trays

Blackbox racks are tracked. And cables too

"Danny Hills and I developed the original concept for Project Blackbox, but our inspiration was Brewster - the first person we know of to say 'Hey, we should put a bunch of circuit boards in a shipping container and blow cold air over them,'" Padadopoulus told the gathered digerati this morning in the heart of California's Silicon Valley.

Padadopoulus and company officially announced the Sun Modular Datacenter, or Sun MD, in January 2008. And according to Jud Cooley, the project's director of engineering, Sun has shipped its shipping containers "in the low double digits" to operations as far flung as the Radboud University Nijmegen Medical Centre in the Netherlands and the Belgian wind turbine outfit Hansen Transmissions.

Wayback Machine - smoke detector

Blackbox fire suppression

And now it's hosting the Wayback Machine in a container tucked between the Spanish tile roofs of its Santa Clara campus, just down the road from Google. Measuring 20 feet longer by 8 feet deep by 8 feet high, the modular net history holds two petabytes of data - with space for another two.

Sun's cramped container includes eight server racks on sliding tracks, each racking nine Sun Fire x4500 "Thumper" servers running Solaris 10 and Sun's ZFS file system. And the necessary networking, power, cooling, and fire-fighting hardware is packed in as well. All it needs from the outside world is a power source (25kW per rack) and a cooling-fluid hook-up (ordinary tap water).

As you walk into the container, with the fans whirring and the racks tight on either side, you feel as if you've walked into a meat locker. Though it's slightly warmer. And it smells better. And you know it's crunching data. Holding 2 quadrillion characters of information, the Wayback Machine processes 500 queries per second, and it's growing at a rate of four billion data rows per month.

Wayback Machine - spring-mounted racks

In case of earthquake, spring-mounts - but snug the bolts, lads

The rub is that this particular shipping container won't be shipped. The Wayback Machine will live at Sun forever - or least until IBM buys the company and pulls the plug. But the 20-foot container is another step towards Kahle's dream of a digital Alexandria capable of surviving a Caesarean fire - and most any other earthly disaster.

"Even if this first data center never moves, it encapsulates engineering efforts in a building that's reproducible," Kahle told us. "It's something that's centrally manufacturable and shippable."

Meanwhile, Google has built an internet archive of its own. "They're storing more than they let on," Kahle says. But the aims of Google's modular data center project are, shall we say, more commercial. ®

Photos and additional reporting by Rik Myslewski