This article is more than 1 year old

Here's some phish-AI research: Machine-learning code crafts phishing URLs that dodge auto-detection

Humans, keep your eyes out for dodgy web links

An artificially intelligent system has been demonstrated generating URLs for phishing websites that appear to evade detection by security tools.

Essentially, the software can come up with URLs for webpages that masquerade as legit login pages for real websites, when in actual fact, the webpages simply collect the entered username and passwords to later hijack accounts.

Blacklists and algorithms – intelligent or otherwise – can be used to automatically identify and block links to phishing pages. Humans should be able to spot that the web links are dodgy, but not everyone is so savvy.

Using the Phishtank database, a group of computer scientists from Cyxtera Technologies, a cybersecurity biz based in Florida, USA, have built DeepPhish, which is machine-learning software that, allegedly, generates phishing URLs that beat these defense mechanisms.

“Through intelligent algorithms, intelligent detection systems have been able to identify patterns and detect phishing URLs with 98.7 per cent accuracy, giving the battle advantage to defensive teams,” claimed Cyxtera's Alejandro Bahnsen claimed earlier this month.

"However, if AI is being used to prevent attacks, what is stopping cyber criminals from using the same technology to defeat both traditional and AI-based cyber-defense systems?"

Training

The team inspected more than a million URLs on Phishtank to identify three different phishing miscreants who had generated webpages to steal people's credentials. The team fed these web addresses into AI-based phishing detection algorithm to measure how effective the URLs were at bypassing the system.

The first scumbag of the trio used 1,007 attack URLs, and only 7 were effective at avoiding setting off alarms, across 106 domains, making it successful only 0.69 per cent of the time. The second one had 102 malicious web addresses, across 19 domains. Only five of them bypassed the threat detection algorithm and it was effective 4.91 per cent of the time.

Next, they fed this information into a Long-Short Term Memory network (LSTM) to learn the general structure and extract features from the malicious URLs - for example the second threat actor commonly used “tdcanadatrustindex.html” in its address.

Stupid computer

AI quickly cooks malware that AV software can't spot

READ MORE

All the text from effective URLs were taken to create sentences and encoded into a vector and fed into the LSTM, where it is trained to predict the next character given the previous one.

Over time it learns to generate a stream of text to simulate a list of pseudo URLs that are similar to the ones used as input. When DeepPhish was trained on data from the first threat actor, it also managed to create 1,007 URLs, and 210 of them were effective at evading detection, bumping up the score from 0.69 per cent to 20.90 per cent.

When it was following the structure from the second threat actor, it also produced 102 fake URLs and 37 of them were successful, increasing the likelihood of tricking the existent defense mechanism from 4.91 per cent to 36.28 per cent.

The effectiveness rate isn’t very high as a lot of what comes out the LSTM is effective gibberish, containing strings of forbidden characters.

“It is important to automate the process of retraining the AI phishing detection system by incorporating the new synthetic URLs that each threat actor may create,” the researchers warned. ®

Hat tip to ex-Reg scribe Jack Clark, who highlighted the project in the latest edition of his weekly AI newsletter.

More about

TIP US OFF

Send us news


Other stories you might like