Google doppelgänger casts riddle over interwebs
Why is Google routing the world through 'Googol'?
Updated Update: This story has been updated with comments from Google open source guru Chris DiBona and it has been revised accordingly.
Sometime in the middle of October, Google silently launched a new net domain - a barely-disguised doppelgänger to the familiar google.com - and according to the latest stats from the site watchers at Alexa, this mystery domain is now visited by nearly three per cent of all net users, making it the 44th most visited domain on the interwebs.
In other words, it's bigger than AOL, Apple.com, or the BBC.
Over the past few months, those keeping a close eye on their PC's net traffic have noticed seemingly random connections to this mystery domain. In some cases, the connections arise even before an application is launched, and since the domain name appears - at first glance - to be little more than a hodgepodge of characters, some netizens have blocked it, under the assumption it serves up malware.
But on closer inspection, the domain is obviously Google's, chosen with a mathematician's wink at the search giant's famously misspelled name. This mystery domain is 1e100.net. "1e100" would be scientific notation for 10 100, a one followed by 100 zeros, also known as a googol.
As pointed out by Sebastian Stadil, founder of the Silicon Valley Cloud Computing Group, 1e100.net translates to "Google Network" - the ever-growing Google private infrastructure that spans nearly forty custom-built data centers worldwide. According to a recent company presentation, Google intends to expand this private interweb to between one million and 10 million servers, spanning “100s to 1000s” of global locations.
Who Is records show that Google registered 1e100.net on September 24, and according to data from Alexa, traffic began hitting the domain around the middle of October.
'Googol' goes live
Asked for comment, Google merely said the domain is used to "identify the servers on our network," and it hinted that such identification involves reverse DNS lookup - the process of determining which domain name is associated with a particular IP address. Reverse DNS is often used by anti-spam services to verify email senders, but it's also used a general means of ensuring a network is working as it should be working.
"That was my first guess," Silicon Valley network architect Richard Bennett, tells The Reg, when asked about Google's brief comments on the new domain. "But it doesn't explain why Alexa sees it. I don't know what Reverse DNS has to do with it." Alexa - now owned by Amazon - tracks net traffic via toolbars installed on netizen browsers.
In a post to Slashdot, Google's Chris DiBona says that 1e100.net was launched to provide a single domain to identify servers across the Google network rather than doing so on separate domains. "Starting in October 2009, we started using a single domain name to identify our servers across all Google products, rather than use different product domains such as youtube.com, blogger.com, and google.com," he writes.
"We did this for two reasons: first, to keep things simpler, and second, to proactively improve security by protecting against potential threats such as cross-site scripting attacks. Most typical Internet users will never see 1e100.net, but we picked we picked a Googley name for it just in case."
According to various web posters and Register readers, any number of Google services interface with the domain - from Google Chrome's "safe browsing" feature to YouTube to the company's main search engine. Reg reader David Gray, a UK-based security consultant, sees connections to the domain that apparently involve Google Chat, Adsense, Google Analytics, Gmail, and Google Earth. "Essentially, it seemed to be all Google traffic," he says.
Using a Wireshark network protocol analyzer on a machine at our San Francisco offices, The Reg was unable to identify such connections, but Gray shared a network capture file where the connections are myriad.
Another net poster sees such traffic after boot-up but before launching any local applications. Among his installed services, this poster sees Google's updater software, used to provide software updates for various local Google applications, but he says the traffic occurs when the updater is not running.
Alexa stats also indicate this 1e100 traffic spreads to machines across the globe.
Spanner in the works
Could this new domain have something to do with major upgrades to the Google's famously distributed internal infrastructure? Judging from DiBona's comments, it would appear not.
In August, Google began testing a new search infrastructure dubbed Caffeine. Speaking with The Reg, uber Googler Matt Cutts confirmed that this new infrastructure includes a rewrite of the company's proprietary distributed file system - known at least informally as Google File System 2, or GFS2 - as well as other infrastructure platforms that could be applied across the company's back-end.
In early November, Google said it had completed its Caffeine testing, and Cutts announced that the new infrastructure would roll out to a single data center sometime after Christmas.
Separately, Google is deploying a global system designed to automatically move and replicate loads between its mega data centers when traffic and hardware issues arise. Known as Spanner, this custom-built platform is described as a “storage and computation system that spans all our data centers [and that] automatically moves and adds replicas of data and computation based on constraints and usage patterns.” This includes constraints involving bandwidth, packet loss, power, resources, and “failure modes.”
Google publicly mentioned Spanner for the first and only time during a distributed computing symposium in mid-October, around the same time it launched 1e100.net.
Google’s 10-million-server vision
Whatever Google is up to with this new domain, the company has launched it in typical Google fashion - on the sly and with a certain amount of geek humor. For those Wireshark-obsessed netizens, the mystery domain is causing a fair amount of confusion. And there's a certain irony in Google hinting that 1e100.net involves reverse DNS lookup. According to one poster to Google's Webmaster Central help forum, Google's FeedBurner crawler uses 1e100.net, which could make it difficult for some netizens to identify the crawler for what it is.
"The FeedBurner crawler doesn't use the google.com domain in its DNS. I am seeing crawls from 18.104.22.168, which resolves to yx-out-f136.1e100.net. Now I get the joke, but someone else might not, and this doesn't seem to be documented anywhere," the poster writes.
"Does Google plan to use this 1e100.net domain elsewhere, or is it just for Google's network ingress/egress points? If I want to validate the FeedBurner crawler, can I use it reliably with the round-trip DNS technique?"
Clarity, in this case, has taken a backseat to an inside joke. According to Stanford University computer science professor David Koller, Google co-founder Larry Page brainstormed the company's name with fellow graduate student Sean Anderson during a meeting in their Stanford office in September 1997.
"Sean and Larry were in their office, using the whiteboard, trying to think up a good name - something that related to the indexing of an immense amount of data," Koller writes.
"Sean verbally suggested the word 'googolplex' [a one followed by a googol zeros], and Larry responded verbally with the shortened form, 'googol'....Sean was seated at his computer terminal, so he executed a search of the Internet domain name registry database to see if the newly suggested name was still available for registration and use.
"Sean is not an infallible speller, and he made the mistake of searching for the name spelled as "google.com," which he found to be available. Larry liked the name, and within hours he took the step of registering the name 'google.com" for himself and [fellow co-founder] Sergey [Brin]."
Google.com was registered on September 15, 1997, little more than 12 years before 1e100.net. ®