Feeds

Publishers punt new web crawler blocking standards

Rusty robots.txt for the scrapheap?

Secure remote control for conventional and virtual desktops

A long-awaited new standard designed to give webmasters more control over how search engines and newsreaders access their content will be unveiled in New York today.

After a year-long pilot the Automated Content Access Protocol (ACAP) will be launched at the headquarters of the Associated Press. It aims to improve on the current robots.txt permission file for spiders and other bots.

ACAP will include the commands designed to allow web publishers to limit how long content can be indexed for and how much of an article news aggregators are allowed to display.

A standard "Follow" command will block or allow crawlers to follow links in a page - the basis of Google's PageRank algorithm. Google currently obeys the non-standard HTML "NOFOLLOW" meta tag.

Robots.txt was created by consensus way back in in 1994 and is voluntary, though all the major search engines comply. The campaign for a new protocol was fired by the emergence of Google News and other aggregators.

More traditional news organisations including AFP and the Telegraph have engaged in sabre-rattling over such indexes, which they said parasitise their journalism.

AFP eventually got what it wanted - a revenue-sharing deal - after it threatened a landmark test case in the US. A Belgian newspaper group has led the anti-indexing charge lately.

ACAP is being pushed by the World Association of Newspapers, the European Publishers Council and the International Publishers Association. It's an attempt to soothe their industry's web worries by handing more control back to the producers of news.

The new standards have been cautiously welcomed by Google, according to AP, but the firm is still "evaluating" the new system.

There's more info on version 1.0 of ACAP here. More features are planned, including permissions for indexing web video. ®

Internet Security Threat Report 2014

More from The Register

next story
Euro Parliament VOTES to BREAK UP GOOGLE. Er, OK then
It CANNA do it, captain.They DON'T have the POWER!
Download alert: Nearly ALL top 100 Android, iOS paid apps hacked
Attack of the Clones? Yeah, but much, much scarier – report
NSA SOURCE CODE LEAK: Information slurp tools to appear online
Now you can run your own intelligence agency
Post-Microsoft, post-PC programming: The portable REVOLUTION
Code jockeys: count up and grab your fabulous tablets
Twitter App Graph exposes smartphone spyware feature
You don't want everyone to compile app lists from your fondleware? BAD LUCK
Microsoft adds video offering to Office 365. Oh NOES, you'll need Adobe Flash
Lovely presentations... but not on your Flash-hating mobe
prev story

Whitepapers

Designing and building an open ITOA architecture
Learn about a new IT data taxonomy defined by the four data sources of IT visibility: wire, machine, agent, and synthetic data sets.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
5 critical considerations for enterprise cloud backup
Key considerations when evaluating cloud backup solutions to ensure adequate protection security and availability of enterprise data.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Managing SSL certificates with ease
The lack of operational efficiencies and compliance pitfalls associated with poor SSL certificate management, and how the right SSL certificate management tool can help.