Feeds

Data-mining technique outs authors of anonymous email

Unmasking trolls, one 'write-print' at a time

  • alert
  • submit to reddit

Secure remote control for conventional and virtual desktops

Engineers and computer scientists say they have devised a novel method for identifying authors of anonymous emails that's reliable enough to be used in courts of law.

In a series of papers published over the past few years, the researchers from Concordia University in Montreal have described what they say is the first ever data-mining algorithm for identifying the most plausible author of an anonymous email. It works by establishing a “write-print” of each suspected author by quantifying unique patterns in each individual's email writings. It can be used to unmask authors of emails used in spam, phishing cyberbullying and other types of offenses.

“Our insight is that the write-print of an individual is the combinations of features that occur frequently in his/her written emails,” the researchers wrote in a paper (PDF) first published in the publication Digital Investigation. “The commonly used features are lexical, syntactical, structural and content-specific attributes. By matching the write-print with the malicious email, the true author can be identified.”

Characteristics include word usage, word sequence, common spelling and grammatical mistakes, vocabulary richness, hyphenation and punctuation.

The new approach differs from previous methods by filtering out characteristics found in two or more of the suspects' writing styles. So-called decision tree methods often attempt to use the same set of features to deduce the write-print of different suspects. By excluding the styles that multiple suspects share, the technique attempts to generate a unique signature for each potential author under investigation.

At the heart of the method is an algorithm known as AuthorMiner. It mathematically extracts frequent patterns found in suspects emails and then filters out those that are common to other suspects. It then compares the anonymous email with each of the generated write-prints to identify the closest match.

To test the method, they used it on a set of more than 200,000 emails written by 158 employees of Enron before the energy company was exposed for financial fraud. When finely tuned, the technique identified the author about 80 percent of the time.

Additional papers from the researchers – who include Farkhund Iqbal, Rachid Hadjidj, Benjamin Fung, and Mourad Debbabi – are available here. ®

New hybrid storage solutions

More from The Register

next story
Israeli spies rebel over mass-snooping on innocent Palestinians
'Disciplinary treatment will be sharp and clear' vow spy-chiefs
Google recommends pronounceable passwords
Super Chrome goes into battle with Mr Mxyzptlk
Infosec geniuses hack a Canon PRINTER and install DOOM
Internet of Stuff securo-cockups strike yet again
'Speargun' program is fantasy, says cable operator
We just might notice if you cut our cables
Snowden, Dotcom, throw bombs into NZ election campaign
Claim of tapped undersea cable refuted by Kiwi PM as Kim claims extradition plot
Reddit wipes clean leaked celeb nudie pics, tells users to zip it
Now we've had all THAT TRAFFIC, we 'deplore' this theft
Apple Pay is a tidy payday for Apple with 0.15% cut, sources say
Cupertino slurps 15 cents from every $100 purchase
YouTube, Amazon and Yahoo! caught in malvertising mess
Cisco says 'Kyle and Stan' attack is spreading through compromised ad networks
prev story

Whitepapers

Secure remote control for conventional and virtual desktops
Balancing user privacy and privileged access, in accordance with compliance frameworks and legislation. Evaluating any potential remote control choice.
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.
Providing a secure and efficient Helpdesk
A single remote control platform for user support is be key to providing an efficient helpdesk. Retain full control over the way in which screen and keystroke data is transmitted.