Camouflaged code threatens security apps
Evil twin hash bash
Antivirus firms are concerned about the emergence of techniques that could render meaningless the use of checksums to mark applications as safe.
The issue concerns hash functions - one way mathematical functions that produce a small fixed length checksum or message digest from a much longer batch of code or email message. When two different input values produce the same output value this is called a collision.
Weaknesses in hashing algorithms, such as MD5, that allowed the discovery of collisions much more quickly than would be possible using brute-force attacks have been known about by cryptographic researchers for more than three years.
Previous techniques meant one type of junk message might be mistaken for another junk message, a weakness of interest to cryptographers but that carried little sting in practice. In addition, high speed computers were needed to discover collisions.
But a recent post on a full disclosure list explains a method to append a few thousand bytes to two arbitrary files such that both files have the same MD5 value. One of the arbitrary files might be malicious. Not only that but the researchers - Marc Stevens, Arjen K. Lenstra, and Benne de Weger - produced their proof-of-concept files using a single PC in less than two days.
Symantec reports that the approach threatens to undermine the use of hash functions to identify applications as safe (whitelisting). Malware authors might get harmless code, which generates the same MD5 output as a companion (malicious) app, whitelisted by submitting it to a classification server. Such a technique would clear the way to later distribute a companion malicious application that generates a MD5 result previously flagged as safe.
The approach is far from trivial but creates a means to smuggle malicious apps past whitelisting tools. Both the malicious and harmless apps might be digitally signed to make the malware look even more harmless.
"While what they have achieved is not the same as producing an identical MD5 for an existing file, it's still not a good thing. In particular it causes serious trouble for application white-listing implementations," Symantec notes.
Looking for extra bytes might be a common sense means of detecting the trick. But the extra bytes may look like compressed data in an installer application, or some kind of signature, so that approach to solving the problem is unreliable.
MD5 is not the only hashing function known to have cryptographic weaknesses. SHA-1 is also known to produce collisions and is thus potentially subject to the same kinds of trickery. The solution might be to move towards more robust hashing algorithms such as SHA-2, Symantec researcher Peter Ferrie concludes. ®
"If, for example, you consider ONLY files of 4MB in size. There are 2^25 possible such files. A 128-bit hash has 2^7 possible values. Therefore, on average, each hash value can be derived from 2^18 different files."
Really? I don't think so. 2^7 possible values - that's only 128 values. A 128 bit hash has 2^128 possible values. That's over 3.4 X 10^38. As the sun has a mass of approx 1.9 X 10^30. That's 17 million times the mass of the sun in kilos. Or, in file sizes, a single file of 3.4 X 10^14 terrabytes. One file of this size would be needed to generate all the possible combinations.
That's (for current computers) not a space that can brute-forced.
There are 10 types of people in this world:
Those who understand binary,
Those who don't and
Those who start counting at 0
re: AV Technology....A known Failure
Eh? Maybe true(ish) for an active scanner that just looks for virus activity on a running PC but anyone who only uses that level of anti-malware protection on a Windows PC deserves viruses.
Minimum basic precautions are..
Properly configure (and use) your PC in as secure a manner as possible.
Scan all dangerous files at download time with a good, up-to-date virus scanner, preferably on a mail/proxy server and certainly before installing anything.
Run an active scanner in case the other precautions failed.
Back-up everything regularly in case you have to flatten your PC and start-over.
Just a note about key sizes (and heuristics)
Not all crypto is equal: the "safe" key sizes for symmetric algorithms (AES et co) and asymmetric ie public-key (RSA et co) are vastly different simply because they're fundamentally different principles. Nevertheless the problem usually isn't the key space but some implementation/design mistake on a detail that allows the attacker to shortcut (or a major breakthrough in mathematics that allows "easy" solutions to problems previously thought to be "hard").
As for heuristics the engines have been pretty good for a long time and false positives can be managed with a signature list, nowadays with increasing computing power more thorough behaviour analysis is possible and at least F-Secure is doing reasonable job at it (http://www.heise-security.co.uk/news/100900)