Feeds

Publishers punt new web crawler blocking standards

Rusty robots.txt for the scrapheap?

Build a business case: developing custom apps

A long-awaited new standard designed to give webmasters more control over how search engines and newsreaders access their content will be unveiled in New York today.

After a year-long pilot the Automated Content Access Protocol (ACAP) will be launched at the headquarters of the Associated Press. It aims to improve on the current robots.txt permission file for spiders and other bots.

ACAP will include the commands designed to allow web publishers to limit how long content can be indexed for and how much of an article news aggregators are allowed to display.

A standard "Follow" command will block or allow crawlers to follow links in a page - the basis of Google's PageRank algorithm. Google currently obeys the non-standard HTML "NOFOLLOW" meta tag.

Robots.txt was created by consensus way back in in 1994 and is voluntary, though all the major search engines comply. The campaign for a new protocol was fired by the emergence of Google News and other aggregators.

More traditional news organisations including AFP and the Telegraph have engaged in sabre-rattling over such indexes, which they said parasitise their journalism.

AFP eventually got what it wanted - a revenue-sharing deal - after it threatened a landmark test case in the US. A Belgian newspaper group has led the anti-indexing charge lately.

ACAP is being pushed by the World Association of Newspapers, the European Publishers Council and the International Publishers Association. It's an attempt to soothe their industry's web worries by handing more control back to the producers of news.

The new standards have been cautiously welcomed by Google, according to AP, but the firm is still "evaluating" the new system.

There's more info on version 1.0 of ACAP here. More features are planned, including permissions for indexing web video. ®

HP ProLiant Gen8: Integrated lifecycle automation

More from The Register

next story
Secure microkernel that uses maths to be 'bug free' goes open source
Hacker-repelling, drone-protecting code will soon be yours to tweak as you see fit
KDE releases ice-cream coloured Plasma 5 just in time for summer
Melty but refreshing - popular rival to Mint's Cinnamon's still a work in progress
NO MORE ALL CAPS and other pleasures of Visual Studio 14
Unpicking a packed preview that breaks down ASP.NET
Cheer up, Nokia fans. It can start making mobes again in 18 months
The real winner of the Nokia sale is *drumroll* ... Nokia
Put down that Oracle database patch: It could cost $23,000 per CPU
On-by-default INMEMORY tech a boon for developers ... as long as they can afford it
Another day, another Firefox: Version 31 is upon us ALREADY
Web devs, Mozilla really wants you to like this one
Google shows off new Chrome OS look
Athena springs full-grown from Chromium project's head
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Application security programs and practises
Follow a few strategies and your organization can gain the full benefits of open source and the cloud without compromising the security of your applications.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Securing Web Applications Made Simple and Scalable
Learn how automated security testing can provide a simple and scalable way to protect your web applications.