Original URL: http://www.theregister.co.uk/2005/03/01/insecure_indexing/

Insecure indexing risk dissected

How did THAT get out?

By John Leyden

Posted in Security, 1st March 2005 17:32 GMT

It's embarrassing when future PR items, upcoming security advisories or boilerplates for obituaries that are not meant to be visible to external users drift into the public domain. These documents might get accidentally uploaded to the wrong part of a website but mischievous attacks can also play a role.

Web application security researcher Amit Klein this week published a paper explaining how "insecure indexing" allows attackers to expose hidden files on web servers. Some site-installed search engines index files that search engines are programmed to ignore. Typically search engines look in a root domain for a special file called "robots.txt" which tells the robot (spider) which files it may download.

If an attacker can get to internal search engines he can get around files denied to him by the Robots Exclusion Standard. Klein explains that these attacks are "fundamentally different from exploiting external (remote) search engines".

Klein explains various attack techniques, ranging all the way from guessing a file name from names that already exist to targeted search strings and far more complicated traffic-intensive attacks, and concludes with methods for detecting insecure indexing and suggested defences. "Crawling style indexing should be preferred over direct file indexing. If file-level indexing cannot be avoided, more consideration should be made when deploying a search engine that facilitates it. In particular those search engines should be systematically limited to the visible resources (or at the very least, to accessible resources)," he writes.

The paper - Insecure Indexing Vulnerability: Attacks Against Local Search Engines - can be found on the Web Application Security Consortium's site here. ®

Related stories

Botnets strangle Google Adwords campaigns
Phishers suspected of eBay Germany domain hijack
Interview with a link spammer
Google's No-Google tag blesses the Balkanized web
Google exposes web surveillance cams
Major flaw found in Google Desktop