Feeds

The trails left in Web server logs – and who's seeing them

Fear of a million Big Brothers

  • alert
  • submit to reddit

Protecting against web application threats using SSL

NEW YORK--The privacy advocates and civil libertarians at the 13th annual Computers, Freedom and Privacy conference sometimes seem dwarfed by the enormity of the projects they oppose -- larger-than-life enterprises worthy of a James Bond villain.

John Poindexter's Total Information Awareness project, if successful, would combine every government and private sector database into a massive data mining system capable of picking out aberrant behavior in the actions of seemingly-ordinary citizens. The Department of Homeland Security's CAPPS II program aims to run automatic background checks on every airline passenger in the U.S.

But the day before CFP 2003 began, a smaller invitation-only group of technologists and policy wonks met at the conference site to discuss a matter that some say is just as important to Internet privacy as any of the monolithic omniscient supercomputers being hatched in Washington... The humble Web server log.

Or more to the point, the countless thousands of logs routinely kept by servers throughout the Internet, each marking every visit to a given website, identifying what pages were viewed, what transactions made, and the Internet IP address of the visitor. Recent laws have made it easier for government agencies to get their hands on server log entries, and civil litigators are increasingly finding logs a valuable target for subpoenas. At the same time, the art of wringing every ounce of useful information out of such logs is advancing, as is the ease of tracking down a user's identity from their IP address by correlating data from different sources.

Last month, scientists at Carnegie Mellon University's Laboratory for International Data Privacy even published a formal algorithm for "re-identifying" a Web surfer from pieces of information left like breadcrumbs on different sites. "The methodology involves constructing trails across locations from small amounts of seemingly anonymous or innocuous evidence the person has been there," the paper reads.

That's a troubling prospect to privacy advocates, at a time when activists and human rights workers in repressive countries are using the Internet to communicate, while ordinary netizens are turning to the Web for things like medical information or personal finance. "It's our sense that certain companies have entire staffs dedicated to handling subpoenas and court orders, and quite often those subpoenas and court orders involve usage logs," says Will Doherty of the Electronic Frontier Foundation.

Smaller companies may be keeping logs without thinking about the potential for misuse, and a careful Google search can turn up random server and proxy logs sitting unprotected on the Web. "Most people don't give it any thought; their default is to just log anything in Apache or IIS," says Richard Smith, a technology and privacy consultant. "At most, they have to worry about how much disk space it's taking up."

It's with that vision of a million tiny surveillance logs growing like weeds that the informal "User Log Data Management Working Group" had that first day-long meeting Tuesday. "We got as far as discovering the extent of the problem, and some sense of who had an interest in it," says Jeff Ubois, the workshop's organizer. Among the 18-odd attendees, which included Doherty and Smith, the meeting drew Internet archivist Brewster Kahle, FTC consumer-protection attorney Laura Mazzarella, and John Young, curator of the controversial full-disclosure cryptography and intelligence site Cryptome.org. Young, who himself has received at least one broad subpoena for usage log information, takes pride in deleting his logs on a daily basis.

Nobody expects Yahoo or MSNBC.com to delete their logs every day. But attendees say the workshop concluded that companies of all sizes need to become more familiar with the privacy risks of their routine logging. The group plans to launch an education campaign to dispel the notion that Internet surfing is anonymous by default. "If it becomes widely believed that IP addresses are personally identifiable, that has implications for businesses that are logging them," says Ubois.

The group is also working on specifications for a free open-source tool that would allow administrators to easily trim unwanted information from their logs. Smith, who occasionally moonlights as a forensic crime fighter, admits that Web server logs can serve a valuable purpose in tracking down bad guys. But he says webmasters should know the significance of the data they routinely collect. "Most of this is about educating people that this could leave them in the legal line of fire," he says.

© SecurityFocus logo

Reducing the cost and complexity of web vulnerability management

More from The Register

next story
Spies would need SUPER POWERS to tap undersea cables
Why mess with armoured 10kV cables when land-based, and legal, snoop tools are easier?
Early result from Scots indyref vote? NAW, Jimmy - it's a SCAM
Anyone claiming to know before tomorrow is telling porkies
Apple Pay is a tidy payday for Apple with 0.15% cut, sources say
Cupertino slurps 15 cents from every $100 purchase
Israeli spies rebel over mass-snooping on innocent Palestinians
'Disciplinary treatment will be sharp and clear' vow spy-chiefs
YouTube, Amazon and Yahoo! caught in malvertising mess
Cisco says 'Kyle and Stan' attack is spreading through compromised ad networks
Hackers pop Brazil newspaper to root home routers
Step One: try default passwords. Step Two: Repeat Step One until success
China hacked US Army transport orgs TWENTY TIMES in ONE YEAR
FBI et al knew of nine hacks - but didn't tell TRANSCOM
Microsoft to patch ASP.NET mess even if you don't
We know what's good for you, because we made the mess says Redmond
prev story

Whitepapers

Providing a secure and efficient Helpdesk
A single remote control platform for user support is be key to providing an efficient helpdesk. Retain full control over the way in which screen and keystroke data is transmitted.
WIN a very cool portable ZX Spectrum
Win a one-off portable Spectrum built by legendary hardware hacker Ben Heck
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Security and trust: The backbone of doing business over the internet
Explores the current state of website security and the contributions Symantec is making to help organizations protect critical data and build trust with customers.