Security rEsrchRs find nu way 2 spot TXT spam

Symantec boffins analyse 400,000 TXTs to develop new spam-spotting approach

Tue 2 Sep 2014 // 07:03 UTC

Symantec boffins reckon it's no longer enough to shield e-mail users from malicious email and that spam and phishing over SMS are now worthy of some decent defences. They've even penned a study to back up the proposition, suggesting that SMS spam could be 97 per cent detectable with a false positive rate as low as 0.02 per cent.

The researchers, from Symantec offices in the UK, Ireland and the US, have published their paper at Arxiv saying that although spam detection in SMS is harder than in e-mail, it can be done.

SMS remains popular – even in an era of over-the-top messaging platforms that want to eat the carriers' lunch by shifting their texts to the data channel – and the paper argues that various habits in SMS make spam detection a problem. They cite “lexical variants”, along with contractions, wordplay and other obfuscations as posing challenges for anyone wanting to detect malicious messages.

With better baselines, the researchers argue, including text normalisation and substring clustering, these problems could be overcome.

Working with an unnamed US carrier, Symantec was able to use a large SMS dataset to test their machine learning approaches to spam-blocking. To avoid false positives, they note, they also used “a combination of behavioural and linguistic information” to get more robust results.

The researchers had around 400,000 text messages to work with (including 300,000 spams), allowing them to test what they describe as “clustered substring tokens from a subset of 100k messages using t-distributed stochastic neighbour embeddings … string similarity functions based on matching n-grams and word co-occurrences.”

To expand the total training data set, the researchers also cleaned up 200,000 Twitter messages (removing hashtags and user mentions). Their study used two approaches: MELA (message linguistic analysis) and MPA (messaging pattern analysis).

The MELA approach showed a 0.05 per cent false positive and 9.4 per cent false negative rate, the paper says, while MPA scored a much better 0.02 per cent false positives and just 3.1 per cent false negatives. ®

Topics

Special Features

Vendor Voice

Resources

Security

Security rEsrchRs find nu way 2 spot TXT spam

Symantec boffins analyse 400,000 TXTs to develop new spam-spotting approach

More about

More about

More about

More about

More about

TIP US OFF

Other stories you might like

Broadcom builds a SASE out of VMware VeloCloud and Symantec

VMware revealed Symantec SASE integration plan before Broadcom finished buying it

Ordinary web access request or command to malware?

Protecting distributed branch office environments from ransomware

Could you not? BlackByte ransomware slinger twists the knife with data stealer

Noberus ransomware gets info-stealing upgrades, targets Veeam backup software

Steganography alert: Backdoor spyware stashed in Microsoft logo

Here's how 5 mobile banking apps put 300,000 users' digital fingerprints at risk

Symantec: More malware operators moving in to exploit Follina

Clipminer rakes in $1.7m in crypto hijacking scam

VMware customers fear Broadcom acquisition will stall innovation, increase cost

About Us

Our Websites

Your Privacy