Feeds

Facebook bars crawls from all but select few

Use the API...bitch

Build a business case: developing custom apps

Facebook has updated its robots.txt file so that the site can only be crawled by a short list of search engines, including Google, Microsoft's Bing, China's Baidu, Russia's Yandex, and a few others.

Previously, Facebook's robot.txt allowed anyone to crawl the site, although the company had threatened to sue at least one developer for crawling, before adding new terms of service that barred scraping without the company's written permission. Some — including programmer and blogger Pete Warden, the man who Facebook threatened to sue — had complained that the social networking site was breaking the rules of the interwebs. The site was allowing unfettered crawling, but the company's legal team was not.

"You've chosen to leave all that information out in the open so you can benefit from the search traffic, and instead try to change the established rules of the web so you can selectively sue anyone you decide is a threat," Warden told the company before it changed its robot.txt.

"The sad fact is, your leadership has decided to change the open rules that have allowed the web to be such an interesting and innovative place for the past decade."

Following Facebook's robot.txt change, Warden is pleased that the situation has been clarified. "I'm very happy that Facebook have done the right thing and abandoned their attempt to change the rules the web has operated under for the last 15 years," he says. "If you could still be sued despite following robots.txt, then the only large corporations with lots of money to pay lawyers could afford to build new search engines and we'd still be using Altavista instead of Google."

Uber Googler Matt Cutts is pleased as well. "A good move by Facebook to bring their robots.txt and related policies into line with internet standards," he said in a Tweet.

So, Facebook is now following the rules. But it's still creating a barrier to entry for new search engines and other crawlers. If you're not a major search engine, you still have to apply for written permission to crawl the site. And that benefits, well, Matt Cutts and Google.

"You're definitely right on that," Warden tells us. "Have the companies mentioned in [Facebook's] robots.txt actually signed the agreement they ask little guys to sign? Or are sites that drive a lot of traffic (including Yandex in Russia!) being given a sweetheart deal? I'll be very impressed if they've persuaded Google to sign up to [its] conditions."

Facebook threatened to sue Warden in April after he built tools that crawled and analyzed Facebook data for a service called fanpageanalytics.com.

Facebook CTO Bret Taylor indicates the company will grant crawling permission to any "legitimate" search outfit. "We will whitelist crawlers when legitimate companies contact us who want to crawl us (presumably search engines)," reads a blog post from Taylor.

Taylor says that the company should have updated the robots.txt sooner. "I think it was bad for us to stray from Internet standards and conventions by having an robots.txt that was open and a separate agreement with additional restrictions. This was just a lapse of judgment." And he says that the company was merely trying to crack down on miscreants. It wants non-search services using the company's data API rather than crawling the site.

"Basically, [Facebook] users have complete control over their data, and as long as [the] user gives an application explicit consent, Facebook doesn't get in the way of the user using their data in your applications beyond basic protections like selling data to ad networks and other sleazy data collectors," he says.

"Crawling is a bit of special case. We have a privacy control enabling users to decide whether they want their profile page to show up in search engines. Many of the other 'crawlers' don't really meet user expectations...Some sleazy crawlers simply aggregate user data en masse and then sell it, which we view as a threat to user privacy."

Facebook did not immediately respond to a request for comment. ®

Build a business case: developing custom apps

More from The Register

next story
Assange™: Hey world, I'M STILL HERE, ignore that Snowden guy
Press conference: ME ME ME ME ME ME ME (cont'd pg 94)
Premier League wants to PURGE ALL FOOTIE GIFs from social media
Not paying Murdoch? You're gonna get a right LEGALLING - thanks to automated software
Online tat bazaar eBay coughs to YET ANOTHER outage
Web-based flea market struck dumb by size and scale of fail
Amazon takes swipe at PayPal, Square with card reader for mobes
Etailer plans to undercut rivals with low transaction fee offer
Caught red-handed: UK cops, PCSOs, specials behaving badly… on social media
No Mr Fuzz, don't ask a crime victim to be your pal on Facebook
US regulators OK sale of IBM's x86 server biz to Lenovo
Now all that remains is for gov't offices to ban the boxes
XBOX One will learn to play media from USB and DLNA sources
Hang on? Aren't those file formats you hardly ever see outside torrents?
Class war! Wikipedia's workers revolt again
Bourgeois paper-shufflers have 'suspended democracy', sniff unpaid proles
prev story

Whitepapers

Endpoint data privacy in the cloud is easier than you think
Innovations in encryption and storage resolve issues of data privacy and key requirements for companies to look for in a solution.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Solving today's distributed Big Data backup challenges
Enable IT efficiency and allow a firm to access and reuse corporate information for competitive advantage, ultimately changing business outcomes.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.