Feeds

Publishers punt new web crawler blocking standards

Rusty robots.txt for the scrapheap?

New hybrid storage solutions

A long-awaited new standard designed to give webmasters more control over how search engines and newsreaders access their content will be unveiled in New York today.

After a year-long pilot the Automated Content Access Protocol (ACAP) will be launched at the headquarters of the Associated Press. It aims to improve on the current robots.txt permission file for spiders and other bots.

ACAP will include the commands designed to allow web publishers to limit how long content can be indexed for and how much of an article news aggregators are allowed to display.

A standard "Follow" command will block or allow crawlers to follow links in a page - the basis of Google's PageRank algorithm. Google currently obeys the non-standard HTML "NOFOLLOW" meta tag.

Robots.txt was created by consensus way back in in 1994 and is voluntary, though all the major search engines comply. The campaign for a new protocol was fired by the emergence of Google News and other aggregators.

More traditional news organisations including AFP and the Telegraph have engaged in sabre-rattling over such indexes, which they said parasitise their journalism.

AFP eventually got what it wanted - a revenue-sharing deal - after it threatened a landmark test case in the US. A Belgian newspaper group has led the anti-indexing charge lately.

ACAP is being pushed by the World Association of Newspapers, the European Publishers Council and the International Publishers Association. It's an attempt to soothe their industry's web worries by handing more control back to the producers of news.

The new standards have been cautiously welcomed by Google, according to AP, but the firm is still "evaluating" the new system.

There's more info on version 1.0 of ACAP here. More features are planned, including permissions for indexing web video. ®

Secure remote control for conventional and virtual desktops

More from The Register

next story
'Windows 9' LEAK: Microsoft's playing catchup with Linux
Multiple desktops and live tiles in restored Start button star in new vids
Not appy with your Chromebook? Well now it can run Android apps
Google offers beta of tricky OS-inside-OS tech
New 'Cosmos' browser surfs the net by TXT alone
No data plan? No WiFi? No worries ... except sluggish download speed
Greater dev access to iOS 8 will put us AT RISK from HACKERS
Knocking holes in Apple's walled garden could backfire, says securo-chap
NHS grows a NoSQL backbone and rips out its Oracle Spine
Open source? In the government? Ha ha! What, wait ...?
Google extends app refund window to two hours
You now have 120 minutes to finish that game instead of 15
Intel: Hey, enterprises, drop everything and DO HADOOP
Big Data analytics projected to run on more servers than any other app
prev story

Whitepapers

Secure remote control for conventional and virtual desktops
Balancing user privacy and privileged access, in accordance with compliance frameworks and legislation. Evaluating any potential remote control choice.
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.
Providing a secure and efficient Helpdesk
A single remote control platform for user support is be key to providing an efficient helpdesk. Retain full control over the way in which screen and keystroke data is transmitted.