Feeds

Data-mining technique outs authors of anonymous email

Unmasking trolls, one 'write-print' at a time

  • alert
  • submit to reddit

Remote control for virtualized desktops

Engineers and computer scientists say they have devised a novel method for identifying authors of anonymous emails that's reliable enough to be used in courts of law.

In a series of papers published over the past few years, the researchers from Concordia University in Montreal have described what they say is the first ever data-mining algorithm for identifying the most plausible author of an anonymous email. It works by establishing a “write-print” of each suspected author by quantifying unique patterns in each individual's email writings. It can be used to unmask authors of emails used in spam, phishing cyberbullying and other types of offenses.

“Our insight is that the write-print of an individual is the combinations of features that occur frequently in his/her written emails,” the researchers wrote in a paper (PDF) first published in the publication Digital Investigation. “The commonly used features are lexical, syntactical, structural and content-specific attributes. By matching the write-print with the malicious email, the true author can be identified.”

Characteristics include word usage, word sequence, common spelling and grammatical mistakes, vocabulary richness, hyphenation and punctuation.

The new approach differs from previous methods by filtering out characteristics found in two or more of the suspects' writing styles. So-called decision tree methods often attempt to use the same set of features to deduce the write-print of different suspects. By excluding the styles that multiple suspects share, the technique attempts to generate a unique signature for each potential author under investigation.

At the heart of the method is an algorithm known as AuthorMiner. It mathematically extracts frequent patterns found in suspects emails and then filters out those that are common to other suspects. It then compares the anonymous email with each of the generated write-prints to identify the closest match.

To test the method, they used it on a set of more than 200,000 emails written by 158 employees of Enron before the energy company was exposed for financial fraud. When finely tuned, the technique identified the author about 80 percent of the time.

Additional papers from the researchers – who include Farkhund Iqbal, Rachid Hadjidj, Benjamin Fung, and Mourad Debbabi – are available here. ®

Remote control for virtualized desktops

More from The Register

next story
Webcam hacker pervs in MASS HOME INVASION
You thought you were all alone? Nope – change your password, says ICO
You really need to do some tech support for Aunty Agnes
Free anti-virus software, expires, stops updating and p0wns the world
Meet OneRNG: a fully-open entropy generator for a paranoid age
Kiwis to seek random investors for crowd-funded randomiser
USB coding anarchy: Consider all sticks licked
Thumb drive design ruled by almighty buck
Attack reveals 81 percent of Tor users but admins call for calm
Cisco Netflow a handy tool for cheapskate attackers
Privacy bods offer GOV SPY VICTIMS a FREE SPYWARE SNIFFER
Looks for gov malware that evades most antivirus
Patch NOW! Microsoft slings emergency bug fix at Windows admins
Vulnerability promotes lusers to domain overlords ... oops
prev story

Whitepapers

Choosing cloud Backup services
Demystify how you can address your data protection needs in your small- to medium-sized business and select the best online backup service to meet your needs.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Choosing a cloud hosting partner with confidence
Download Choosing a Cloud Hosting Provider with Confidence to learn more about cloud computing - the new opportunities and new security challenges.
New hybrid storage solutions
Tackling data challenges through emerging hybrid storage solutions that enable optimum database performance whilst managing costs and increasingly large data stores.