Feeds

The trails left in Web server logs – and who's seeing them

Fear of a million Big Brothers

  • alert
  • submit to reddit

Boost IT visibility and business value

NEW YORK--The privacy advocates and civil libertarians at the 13th annual Computers, Freedom and Privacy conference sometimes seem dwarfed by the enormity of the projects they oppose -- larger-than-life enterprises worthy of a James Bond villain.

John Poindexter's Total Information Awareness project, if successful, would combine every government and private sector database into a massive data mining system capable of picking out aberrant behavior in the actions of seemingly-ordinary citizens. The Department of Homeland Security's CAPPS II program aims to run automatic background checks on every airline passenger in the U.S.

But the day before CFP 2003 began, a smaller invitation-only group of technologists and policy wonks met at the conference site to discuss a matter that some say is just as important to Internet privacy as any of the monolithic omniscient supercomputers being hatched in Washington... The humble Web server log.

Or more to the point, the countless thousands of logs routinely kept by servers throughout the Internet, each marking every visit to a given website, identifying what pages were viewed, what transactions made, and the Internet IP address of the visitor. Recent laws have made it easier for government agencies to get their hands on server log entries, and civil litigators are increasingly finding logs a valuable target for subpoenas. At the same time, the art of wringing every ounce of useful information out of such logs is advancing, as is the ease of tracking down a user's identity from their IP address by correlating data from different sources.

Last month, scientists at Carnegie Mellon University's Laboratory for International Data Privacy even published a formal algorithm for "re-identifying" a Web surfer from pieces of information left like breadcrumbs on different sites. "The methodology involves constructing trails across locations from small amounts of seemingly anonymous or innocuous evidence the person has been there," the paper reads.

That's a troubling prospect to privacy advocates, at a time when activists and human rights workers in repressive countries are using the Internet to communicate, while ordinary netizens are turning to the Web for things like medical information or personal finance. "It's our sense that certain companies have entire staffs dedicated to handling subpoenas and court orders, and quite often those subpoenas and court orders involve usage logs," says Will Doherty of the Electronic Frontier Foundation.

Smaller companies may be keeping logs without thinking about the potential for misuse, and a careful Google search can turn up random server and proxy logs sitting unprotected on the Web. "Most people don't give it any thought; their default is to just log anything in Apache or IIS," says Richard Smith, a technology and privacy consultant. "At most, they have to worry about how much disk space it's taking up."

It's with that vision of a million tiny surveillance logs growing like weeds that the informal "User Log Data Management Working Group" had that first day-long meeting Tuesday. "We got as far as discovering the extent of the problem, and some sense of who had an interest in it," says Jeff Ubois, the workshop's organizer. Among the 18-odd attendees, which included Doherty and Smith, the meeting drew Internet archivist Brewster Kahle, FTC consumer-protection attorney Laura Mazzarella, and John Young, curator of the controversial full-disclosure cryptography and intelligence site Cryptome.org. Young, who himself has received at least one broad subpoena for usage log information, takes pride in deleting his logs on a daily basis.

Nobody expects Yahoo or MSNBC.com to delete their logs every day. But attendees say the workshop concluded that companies of all sizes need to become more familiar with the privacy risks of their routine logging. The group plans to launch an education campaign to dispel the notion that Internet surfing is anonymous by default. "If it becomes widely believed that IP addresses are personally identifiable, that has implications for businesses that are logging them," says Ubois.

The group is also working on specifications for a free open-source tool that would allow administrators to easily trim unwanted information from their logs. Smith, who occasionally moonlights as a forensic crime fighter, admits that Web server logs can serve a valuable purpose in tracking down bad guys. But he says webmasters should know the significance of the data they routinely collect. "Most of this is about educating people that this could leave them in the legal line of fire," he says.

© SecurityFocus logo

Gartner critical capabilities for enterprise endpoint backup

More from The Register

next story
Microsoft: We plan to CLEAN UP this here Windows Store town
Paid-for apps that provide free downloads? Really
Snowden on NSA's MonsterMind TERROR: It may trigger cyberwar
Plus: Syria's internet going down? That was a US cock-up
Who needs hackers? 'Password1' opens a third of all biz doors
GPU-powered pen test yields more bad news about defences and passwords
e-Borders fiasco: Brits stung for £224m after US IT giant sues UK govt
Defeat to Raytheon branded 'catastrophic result'
Hear ye, young cyber warriors of the realm: GCHQ wants you
Get involved, get a job and then never discuss work ever again
Chinese hackers spied on investigators of Flight MH370 - report
Classified data on flight's disappearance pinched
Microsoft cries UNINSTALL in the wake of Blue Screens of Death™
Cache crash causes contained choloric calamity
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
7 Elements of Radically Simple OS Migration
Avoid the typical headaches of OS migration during your next project by learning about 7 elements of radically simple OS migration.
BYOD's dark side: Data protection
An endpoint data protection solution that adds value to the user and the organization so it can protect itself from data loss as well as leverage corporate data.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?