Researchers eye machines to tackle malware
Automation eliminates human error
Reverse engineer Dullien takes a different approach. Working with other researchers at Sabre Security, he used automated tools to deconstruct the actual code of virus and bot software, removing any common libraries that the code might use and then comparing the relationships between functions to characterise the software.
Using a database of 200 samples of bot software, a test case for the automated process resulted in two major families of code, three smaller groups, and several pairs and singletons. The system also identified variants of bot software not recognised by a signature-based anti-virus system.
Dullien believes that static analysis is a better approach to malware classification than Microsoft's runtime analysis. Actions that a malicious program does not perform right away - known as time-delayed triggers - can foil runtime analysis, he said. And virus and attack-tool writers could add a few lines of code to a program to confuse runtime analysis, he added.
"The approach presented in the paper can be trivially foiled with very minor high-level-language modifications in the source of the program," he stated in a blog entry analysing Microsoft's system.
Microsoft declined to make its researchers available for interviews. However, in the paper, the authors argued that a combination of both static analysis and runtime analysis would likely perform best. For example, static analysis appears to deliver results more quickly; Microsoft's behavioral classification requires three hours to cluster 400 files at the 1,000 event limit, according to the paper.
In some ways, software classification resembles the state of biological classification back in the time of Carl Linnaeus. The 18th century botanist pushed the scientific community of his day into accepting a hierarchical classification system for plants and animals. However, early classifications relied on external similarities, much in the way that many of today's classifications rely on external attributes of programs rather than their internal processes.
At least one other project hopes to help human analysts do a better job of classification.
OffensiveComputing.net, a project founded by researchers Val Smith and Danny Quist, aims to create a database of malware that records a number of basic attributes of the code, including checksums, anti-virus scanner results, and what type of packer the malware uses to compress itself. The project started in response to the increase in code sharing amongst virus and attack-tool writers and the faster development of exploits and the faster incorporation of those exploits into existing malicious software, OffensiveComputing's Smith said.
"The biggest benefit is more rapid response to complex threats. As the synergy between viruses, Trojans, worms, rootkits and exploits grows, waiting for a solution becomes more dangerous."
OffensiveComputing's database gives incident response workers and analysts access to meaningful data about malicious software, which is especially necessary until automated analysis programs, such as Microsoft's and Dullien's classification systems, mature. The project strives to be adaptable, involve the community, have measurable results, and remain open, Smith said.
"There is an arms race going on between analysts and malware authors, so any solution will have to keep pace with advances on both sides."
This article originally appeared in Security Focus.
Copyright © 2006, SecurityFocus