Feeds

An open source Google - without the ads

But is it legal?

  • alert
  • submit to reddit

The Essential Guide to IT Transformation

With the hope of returning at least one corner of the web to its non-commercial roots, Google watcher Daniel Brandt, who curates the NameBase archive, has released the source code to a Google scraper. Brandt has been making an ad-free proxy available for two years using Google's little known minimal "ie" interface. By using this proxy, users bypass both Google's notorious "2038" cookie (that's when it expires) and the text ads.

Brandt fully expects Google to throw legal and technical resources at him, but says he welcomes the challenge if only to clarify copyright issues. Google took people's free stuff and made a $50 billion business from it, he argues.

"The commercialization of the web became possible only because tens of thousands of noncommercial sites made the web interesting in the first place," he writes. "All search engines should make a stable, bare-bones, ad-free, easy-to-scrape version of their results available for those who want to set up nonprofit repeaters. Even if it cuts into their ad profits slightly, there's no easier way to give back some of what they stole from us."

He explains in more detail in the source code: "Legally, Google probably has the right to block anyone they want. And legally, we believe that as a tiny nonprofit with an interest in Google's violations of privacy, we have the right to access Google's publicly-available data any way we want. If you want to argue about copyright, then let's start with the fact that Google scrapes billions of web pages and doesn't ask permission before making the cache copies available. Thiss craping is used as a carrier for the ads that make Google stinkin' rich.

"Now that, in our opinion, is an interesting copyright issue. As this is written, Google has a market cap of $55bn. This exceeds the market cap of General Motors and Ford combined. Google is probably the single largest information resource on the planet, and they're getting rich off of us. It's time for Google to give something back to the public sector."

The source code, which runs on Linux, asks the users only to use the program for non-commercial purposes.

"We think it would be splendid if scraping Google for nonprofit purposes, and stripping out their wretched advertising, was established someday as an acceptable, legal practice."

In the week since it launched, the source code has been downloaded about a hundred times a day says Brandt.

Google would rather you licensed its beta Web API. However, as Charles Ferguson writing in MIT Technology Review noted recently, the service is "laughably limited" to 1,000 queries a day, and offers little functionality; Google has let the offering languish.

You can find the code here [ZIP archive, 16kb], an explanation here and try out the proxy here. ®

Related stories

Google exposes web surveillance cams
Major flaw found in Google Desktop
Google News' chief robot speaks out
Gates: PC will replace TV, TV will become a giant Google
Google Desktop privacy branded 'unacceptable'

Build a business case: developing custom apps

More from The Register

next story
iPad? More like iFAD: We reveal why Apple fell into IBM's arms
But never fear fanbois, you're still lapping up iPhones, Macs
Amazon says Hachette should lower ebook prices, pay authors more
Oh yeah ... and a 30% cut for Amazon to seal the deal
Philip K Dick 'Nazi alternate reality' story to be made into TV series
Amazon Studios, Ridley Scott firm to produce The Man in the High Castle
Nintend-OH NO! Sorry, Mario – your profits are in another castle
Red-hatted mascot, red-colored logo, red-stained finance books
Sonos AXES support for Apple's iOS4 and 5
Want to use your iThing? You can't - it's too old
Joe Average isn't worth $10 a year to Mark Zuckerberg
The Social Network deflates the PC resurgence with mobile-only usage prediction
Chips are down at Broadcom: Thousands of workers laid off
Cellphone baseband device biz shuttered
Feel free to BONK on the TUBE, says Transport for London
Plus: Almost NOBODY uses pay-by-bonk on buses - Visa
Twitch rich as Google flicks $1bn hitch switch, claims snitch
Gameplay streaming biz and search king refuse to deny fresh gobble rumors
Stick a 4K in them: Super high-res TVs are DONE
4,000 pixels is niche now... Don't say we didn't warn you
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
The Essential Guide to IT Transformation
ServiceNow discusses three IT transformations that can help CIO's automate IT services to transform IT and the enterprise.
Maximize storage efficiency across the enterprise
The HP StoreOnce backup solution offers highly flexible, centrally managed, and highly efficient data protection for any enterprise.