Berkeley boffins build better spear-phishing black-box bruiser
Machine learning and code to detect and alert attempts to extract passwords from staff
Nonetheless, the researchers managed to create a system that detected known and previously unknown attacks in the sample data set while keeping the number of alerts down to a reasonable level. Their work is noteworthy for its Directed Anomaly Scoring (DAS) algorithm, which can identify the most suspicious events from an unlabeled dataset, and for achieving a rate of false positives that's 200 times lower than prior work.
The counter-spear-phishing scheme requires enterprise log data, real time network traffic analysis using a network intrusion detection system (NIDS) like Bro, and rules for a tolerable volume of alerts.
"As each email arrives, for each URL in the email, our detector extracts the feature vector for that URL and saves it in a table indexed by the URL," the paper explains. "Each HTTP request seen by the enterprise’s NIDS is looked up in the table. Each time the detector sees a visit to a URL that was earlier seen in some email (a 'clickin-email event'), it adds that feature vector to a list of events. Finally, our detector uses the DAS algorithm to rank the events and determine which ones to alert on."
The researcher's DAS algorithm outperformed three established machine learning techniques – Kernel Density Estimation (KDE), Gaussian Mixture Models (GMM), and k-Nearest Neighbors (kNN) – because it incorporates additional site-specific information that keeps it from mistaking obviously benign network activity as suspicious activity.
The approach described has limitations: its reliance on network log data makes it less accurate when that information is not available and it will miss attacks where the email links to an HTTPS website, unless the network implements a man-in-the-middle traffic monitoring scheme. The researchers suggests organizations might deploy endpoint monitoring agents on employee machines to deal with this.
The researchers also say their detector might miss spear-phishing that comes from a compromised personal email account. ®