Adversarial models, already known to defeat the artificial intelligence behind image classifiers and computer audio, are also good at defeating malware detection.
Last year, researchers from NVIDIA, Booz Allen Hamilton, and the University of Maryland probably felt justifiably pleased with themselves when they trained a neural network to ingest EXEs and spot malware samples among them.
Their MalConv software ran a static analysis on executables (that is, it looked at the binaries but didn't run them), and they claimed up to 98 per cent accuracy in malware classification once their neural network had a big enough learning set.
Alas, it's a neural network, and neural networks are subject to adversarial attacks.
On Monday March 12th, 2018, this paper (by boffins from the Technical University of Munich, the University of Cagliari in Italy, and Italian company Pluribus One) described one way of defeating MalConv.
The researchers were taking the by-now-standard approach to adversarial attacks: what's the smallest amount of change needed to disrupt an AI?
They started with simple byte-padding, adding 10,000 bytes to the end of binaries, an approach that degraded MalConv's accuracy by “over 50 per cent”.
Relative to the malware samples, 10 KB of padding was a tiny change: “less than one per cent of the bytes passed as input to the deep network”, the adversarial paper said.
Even that attack can be reduced, because instead of padding the end of the binary, the “attack bytes” could be put inside the binary to “drastically increase the success of the attack.”
Operating on bytes inside an executable is, however, more complex and brittle, making it difficult to automate, whereas byte padding is simple.
The European researchers also found that a gradient-based sequence of padding bytes worked better than random padding bytes: “adding random bytes is not really effective to evade the network,” the paper said, but “our gradient-based attack allows evading MalConv in 60 per cent of the cases when 10,000 padding bytes are modified”.
That's because over sufficient training runs, the gradient-based approach created “an organised padding byte pattern specific to each sample” – that is, the malicious model learned what pattern worked best for each of the malware samples it tested. ®