Feeds

Facebook bars crawls from all but select few

Use the API...bitch

New hybrid storage solutions

Facebook has updated its robots.txt file so that the site can only be crawled by a short list of search engines, including Google, Microsoft's Bing, China's Baidu, Russia's Yandex, and a few others.

Previously, Facebook's robot.txt allowed anyone to crawl the site, although the company had threatened to sue at least one developer for crawling, before adding new terms of service that barred scraping without the company's written permission. Some — including programmer and blogger Pete Warden, the man who Facebook threatened to sue — had complained that the social networking site was breaking the rules of the interwebs. The site was allowing unfettered crawling, but the company's legal team was not.

"You've chosen to leave all that information out in the open so you can benefit from the search traffic, and instead try to change the established rules of the web so you can selectively sue anyone you decide is a threat," Warden told the company before it changed its robot.txt.

"The sad fact is, your leadership has decided to change the open rules that have allowed the web to be such an interesting and innovative place for the past decade."

Following Facebook's robot.txt change, Warden is pleased that the situation has been clarified. "I'm very happy that Facebook have done the right thing and abandoned their attempt to change the rules the web has operated under for the last 15 years," he says. "If you could still be sued despite following robots.txt, then the only large corporations with lots of money to pay lawyers could afford to build new search engines and we'd still be using Altavista instead of Google."

Uber Googler Matt Cutts is pleased as well. "A good move by Facebook to bring their robots.txt and related policies into line with internet standards," he said in a Tweet.

So, Facebook is now following the rules. But it's still creating a barrier to entry for new search engines and other crawlers. If you're not a major search engine, you still have to apply for written permission to crawl the site. And that benefits, well, Matt Cutts and Google.

"You're definitely right on that," Warden tells us. "Have the companies mentioned in [Facebook's] robots.txt actually signed the agreement they ask little guys to sign? Or are sites that drive a lot of traffic (including Yandex in Russia!) being given a sweetheart deal? I'll be very impressed if they've persuaded Google to sign up to [its] conditions."

Facebook threatened to sue Warden in April after he built tools that crawled and analyzed Facebook data for a service called fanpageanalytics.com.

Facebook CTO Bret Taylor indicates the company will grant crawling permission to any "legitimate" search outfit. "We will whitelist crawlers when legitimate companies contact us who want to crawl us (presumably search engines)," reads a blog post from Taylor.

Taylor says that the company should have updated the robots.txt sooner. "I think it was bad for us to stray from Internet standards and conventions by having an robots.txt that was open and a separate agreement with additional restrictions. This was just a lapse of judgment." And he says that the company was merely trying to crack down on miscreants. It wants non-search services using the company's data API rather than crawling the site.

"Basically, [Facebook] users have complete control over their data, and as long as [the] user gives an application explicit consent, Facebook doesn't get in the way of the user using their data in your applications beyond basic protections like selling data to ad networks and other sleazy data collectors," he says.

"Crawling is a bit of special case. We have a privacy control enabling users to decide whether they want their profile page to show up in search engines. Many of the other 'crawlers' don't really meet user expectations...Some sleazy crawlers simply aggregate user data en masse and then sell it, which we view as a threat to user privacy."

Facebook did not immediately respond to a request for comment. ®

Security for virtualized datacentres

More from The Register

next story
Phones 4u slips into administration after EE cuts ties with Brit mobe retailer
More than 5,500 jobs could be axed if rescue mission fails
JINGS! Microsoft Bing called Scots indyref RIGHT!
Redmond sporran metrics get one in the ten ring
Driving with an Apple Watch could land you with a £100 FINE
Bad news for tech-addicted fanbois behind the wheel
Murdoch to Europe: Inflict MORE PAIN on Google, please
'Platform for piracy' must be punished, or it'll kill us in FIVE YEARS
Phones 4u website DIES as wounded mobe retailer struggles to stay above water
Founder blames 'ruthless network partners' for implosion
Sony says year's losses will be FOUR TIMES DEEPER than thought
Losses of more than $2 BILLION loom over troubled Japanese corp
Radio hams can encrypt, in emergencies, says Ofcom
Consultation promises new spectrum and hints at relaxed licence conditions
Why Oracle CEO Larry Ellison had to go ... Except he hasn't
Silicon Valley's veteran seadog in piratical Putin impression
Big Content Australia just blew a big hole in its credibility
AHEDA's research on average content prices did not expose methodology, so appears less than rigourous
prev story

Whitepapers

Secure remote control for conventional and virtual desktops
Balancing user privacy and privileged access, in accordance with compliance frameworks and legislation. Evaluating any potential remote control choice.
WIN a very cool portable ZX Spectrum
Win a one-off portable Spectrum built by legendary hardware hacker Ben Heck
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
The next step in data security
With recent increased privacy concerns and computers becoming more powerful, the chance of hackers being able to crack smaller-sized RSA keys increases.