Linguists use sounds to bypass Skype crypto
And you thought grammar was useless…
Decryption is difficult and computationally expensive. So what if, instead of decrypting the content of a message, you found a correlation between the encrypted data and its meaning – without having to crack the code itself?
Such an approach has been demonstrated by a group of University of North Carolina linguists working with computer scientists on encrypted Skype calls. While their research paper only managed to partially recover conversations, an encryption scheme that leaks even some of the data it’s meant to protect is no longer secure.
It works like this: spoken English has a set of known – and quite settled – rules for its phonetic grammar.
For non-linguists, this means the order in which we can and cannot put different sounds together. The “ds” sound, or phoneme, at the end of sounds is fairly common at the end of English words, but doesn’t occur at the beginning.
Systems like speech-to-text converters use these rules to break strings of sounds into individual words; they match sounds against a dictionary of legal phoneme combinations and map these into words. What the researchers discovered is that encryption leaves a pattern that can be subjected to this kind of analysis – without decrypting the data.
When you encode spoken English for VoIP using (in the case of Skype) CELP (code excited linear projection), you will end up with patterns in the data that match the patterns in the sounds. In particular, those patterns end up being reflected in the size of the data frame: the more complex the sound that’s being encoded, the larger the frame, resulting in a correlation between frame size and the original sounds spoken.
When the data created by CELP is encrypted, it retains the original frame size – and that means that even encrypted Skype data will retain the correlation between the size of the data frame and the original phonemes.
The technique gets another helping hand: at least some of the time, boundaries between sounds correspond to sudden changes in frame size, hinting at the difference between “Han Solo” and “Hans Solo”.
The researchers mapped the size of encrypted data frames in the Skype stream back to likely patterns of phonemes, and used that mapping – which they called “Phonetic Reconstruction” – to reconstruct the call, without decrypting the data.
So how well does it work? Not so well that we should all abandon Skype tomorrow. However, the researchers noted that if an encryption scheme is to be considered secure, “no reconstruction, even a partial one, should be possible; indeed, any cryptographic system that leaked as much information as shown here would immediately be deemed insecure.”
Bigger phoneme-word dictionaries (covering more dialects and languages) and faster processing would improve the accuracy of this kind of analysis ®