The Register® — Biting the hand that feeds IT

Feeds

Data-mining technique outs authors of anonymous email

Unmasking trolls, one 'write-print' at a time

  • print
  • alert

Agentless Backup is Not a Myth

Engineers and computer scientists say they have devised a novel method for identifying authors of anonymous emails that's reliable enough to be used in courts of law.

In a series of papers published over the past few years, the researchers from Concordia University in Montreal have described what they say is the first ever data-mining algorithm for identifying the most plausible author of an anonymous email. It works by establishing a “write-print” of each suspected author by quantifying unique patterns in each individual's email writings. It can be used to unmask authors of emails used in spam, phishing cyberbullying and other types of offenses.

“Our insight is that the write-print of an individual is the combinations of features that occur frequently in his/her written emails,” the researchers wrote in a paper (PDF) first published in the publication Digital Investigation. “The commonly used features are lexical, syntactical, structural and content-specific attributes. By matching the write-print with the malicious email, the true author can be identified.”

Characteristics include word usage, word sequence, common spelling and grammatical mistakes, vocabulary richness, hyphenation and punctuation.

The new approach differs from previous methods by filtering out characteristics found in two or more of the suspects' writing styles. So-called decision tree methods often attempt to use the same set of features to deduce the write-print of different suspects. By excluding the styles that multiple suspects share, the technique attempts to generate a unique signature for each potential author under investigation.

At the heart of the method is an algorithm known as AuthorMiner. It mathematically extracts frequent patterns found in suspects emails and then filters out those that are common to other suspects. It then compares the anonymous email with each of the generated write-prints to identify the closest match.

To test the method, they used it on a set of more than 200,000 emails written by 158 employees of Enron before the energy company was exposed for financial fraud. When finely tuned, the technique identified the author about 80 percent of the time.

Additional papers from the researchers – who include Farkhund Iqbal, Rachid Hadjidj, Benjamin Fung, and Mourad Debbabi – are available here. ®

Steps to Take Before Choosing a Business Continuity Partner

Correct 80% when finely tuned.

So, wrong 20% when finely tuned and even more wrong when not in perfect lab conditions.

So, hanging at least 1 in 5 innocent men is OK then....... FAIL as this should *never* be accepted as evidence in court!

15
0

No really......

...... you underestimate its accuracy. Apparently they tested it on the message boards of the Daily Mail, and it correctly identified that 87.4% of the postings had been written by the Twat-O-Tron.

14
0
Anonymous Coward

So, let's see...

"When finely tuned, the technique identified the author about 80 percent of the time."

In other words, they think a 20% failure rate is "reliable enough to be used in courts of law"?

Well, in combination with other evidence it might be, I suppose. But given the "believe anything the computer says" attitude of some people I doubt it.

9
0

More from The Register

 breaking news
Number of cops abusing Police National Computer access on the rise
Only a telegram from the Queen can get you off it
 breaking news
NSA PRISM snoop-gate: Won't someone think of the children, wails Apple
10,000 things probed, mostly about missing kids, Alzheimer patients, we're told
Flash flaw potentially makes every webcam or laptop a PEEPHOLE
But it's a Google problem - Chrome only, insists Adobe
Internet fraud still stings suckers
Australians twice as gullible as Americans
 breaking news
NSA PRISM-gate: Relax, GCHQ spooks 'keep us safe', says Cameron
Whatever they are up to, it's all above board, we're told
 breaking news
Yahoo! joins! rivals! in! PRISM! data! request! admission!
Keep calm and carry on using American tech firms, folks
PRISM snitch claims NSA hacked Chinese targets since 2009
Snowden suddenly looks safer in Hong Kong after revelations
 breaking news
US chief spook: Look, we only want to spy on 6.66 BEELLLION of you
Americans assured they are not in the NSA's sights
Speech-to-text drives motorists to distraction
Will talking to you mean I crash into that car up ahead, Siri?
DHS warns of vulns in hospital medical equipment
Has your doctor's anasthesia machine been hacked?