Feeds

The trails left in Web server logs – and who's seeing them

Fear of a million Big Brothers

  • alert
  • submit to reddit

3 Big data security analytics techniques

NEW YORK--The privacy advocates and civil libertarians at the 13th annual Computers, Freedom and Privacy conference sometimes seem dwarfed by the enormity of the projects they oppose -- larger-than-life enterprises worthy of a James Bond villain.

John Poindexter's Total Information Awareness project, if successful, would combine every government and private sector database into a massive data mining system capable of picking out aberrant behavior in the actions of seemingly-ordinary citizens. The Department of Homeland Security's CAPPS II program aims to run automatic background checks on every airline passenger in the U.S.

But the day before CFP 2003 began, a smaller invitation-only group of technologists and policy wonks met at the conference site to discuss a matter that some say is just as important to Internet privacy as any of the monolithic omniscient supercomputers being hatched in Washington... The humble Web server log.

Or more to the point, the countless thousands of logs routinely kept by servers throughout the Internet, each marking every visit to a given website, identifying what pages were viewed, what transactions made, and the Internet IP address of the visitor. Recent laws have made it easier for government agencies to get their hands on server log entries, and civil litigators are increasingly finding logs a valuable target for subpoenas. At the same time, the art of wringing every ounce of useful information out of such logs is advancing, as is the ease of tracking down a user's identity from their IP address by correlating data from different sources.

Last month, scientists at Carnegie Mellon University's Laboratory for International Data Privacy even published a formal algorithm for "re-identifying" a Web surfer from pieces of information left like breadcrumbs on different sites. "The methodology involves constructing trails across locations from small amounts of seemingly anonymous or innocuous evidence the person has been there," the paper reads.

That's a troubling prospect to privacy advocates, at a time when activists and human rights workers in repressive countries are using the Internet to communicate, while ordinary netizens are turning to the Web for things like medical information or personal finance. "It's our sense that certain companies have entire staffs dedicated to handling subpoenas and court orders, and quite often those subpoenas and court orders involve usage logs," says Will Doherty of the Electronic Frontier Foundation.

Smaller companies may be keeping logs without thinking about the potential for misuse, and a careful Google search can turn up random server and proxy logs sitting unprotected on the Web. "Most people don't give it any thought; their default is to just log anything in Apache or IIS," says Richard Smith, a technology and privacy consultant. "At most, they have to worry about how much disk space it's taking up."

It's with that vision of a million tiny surveillance logs growing like weeds that the informal "User Log Data Management Working Group" had that first day-long meeting Tuesday. "We got as far as discovering the extent of the problem, and some sense of who had an interest in it," says Jeff Ubois, the workshop's organizer. Among the 18-odd attendees, which included Doherty and Smith, the meeting drew Internet archivist Brewster Kahle, FTC consumer-protection attorney Laura Mazzarella, and John Young, curator of the controversial full-disclosure cryptography and intelligence site Cryptome.org. Young, who himself has received at least one broad subpoena for usage log information, takes pride in deleting his logs on a daily basis.

Nobody expects Yahoo or MSNBC.com to delete their logs every day. But attendees say the workshop concluded that companies of all sizes need to become more familiar with the privacy risks of their routine logging. The group plans to launch an education campaign to dispel the notion that Internet surfing is anonymous by default. "If it becomes widely believed that IP addresses are personally identifiable, that has implications for businesses that are logging them," says Ubois.

The group is also working on specifications for a free open-source tool that would allow administrators to easily trim unwanted information from their logs. Smith, who occasionally moonlights as a forensic crime fighter, admits that Web server logs can serve a valuable purpose in tracking down bad guys. But he says webmasters should know the significance of the data they routinely collect. "Most of this is about educating people that this could leave them in the legal line of fire," he says.

© SecurityFocus logo

3 Big data security analytics techniques

More from The Register

next story
Obama allows NSA to exploit 0-days: report
If the spooks say they need it, they get it
Samsung Galaxy S5 fingerprint scanner hacked in just 4 DAYS
Sammy's newbie cooked slower than iPhone, also costs more to build
Putin tells Snowden: Russia conducts no US-style mass surveillance
Gov't is too broke for that, Russian prez says
Snowden-inspired crypto-email service Lavaboom launches
German service pays tribute to Lavabit
Mounties always get their man: Heartbleed 'hacker', 19, CUFFED
Canadian teen accused of raiding tax computers using OpenSSL bug
One year on: diplomatic fail as Chinese APT gangs get back to work
Mandiant says past 12 months shows Beijing won't call off its hackers
Call of Duty 'fragged using OpenSSL's Heartbleed exploit'
So it begins ... or maybe not, says one analyst
Heartbleed exploit, inoculation, both released
File under 'this is going to hurt you more than it hurts me'
prev story

Whitepapers

Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.