How an ancient printer can spill your most intimate secrets
Needles and pins
Researchers have devised a novel way to recover confidential messages processed in doctors' offices and elsewhere by analyzing the sounds made when documents are reproduced on dot-matrix printers.
This so-called side-channel attack works by recording the “acoustic emanations” of a confidential document being printed, and then processing it with software that translates the sounds into words. The method recovers as much as 95 per cent of the printed words when an attacker has contextual knowledge about the text being printed, such as the words included in a medical prescription or a living-will declaration. Up to 72 per cent of the text can be recovered when no context is known.
The attack, which so far works only on English text, was carried out under what the researchers described as “realistic — and arguably even pessimistic —– circumstances,” in which there was no shielding from ambient noise such as that made by people chatting in a nearby waiting room. Despite the wide availability of inkjet and laser printers, about 60 per cent of doctors in Germany continue to use dot-matrix devices. About 30 per cent of banks in Germany do so as well, according to the researchers.
Countries such as Germany, Switzerland, and Austria require carbon-copy-capable dot-matrix printers to be used for printing prescriptions for narcotics, they said.
“We have presented a novel attack that takes as input a sound recording of a dot-matrix printer processing English text,” the authors wrote in a paper to be presented this week at the Usenix Security Symposium in Washington, DC. “If we assume contextual knowledge about the text, the attack achieves recognition rates up to 95 per cent.”
The attack was demonstrated by using a Sennheiser MKH-8040 microphone to record the sounds of an Epson LQ-300+II as it printed several articles from Wikipedia, the medical prescription of a fictitious patient, and declarations from a living will. The sounds were then input into software designed to recognize the characteristic sound features of each entry in a large list of English words. The software then translated sounds into the corresponding text.
To increase the software's success rate, the researchers ran the text output through a widely used algorithm known as the Viterbi. Used with speech-recognition technology known as Hidden Markov Model (HMM), it is able to spot errors like the phrase “such of the” and replace it with the words “such as the,” a combination that statistically is much more likely.
“Intuitively, this technology works well for us because most errors that we encounter in the recognition phase are due to incorrectly recognized words that do not fit the context,” the paper states. “By making use of linguistic knowledge about likely and unlikely sequences of words, we have a good chance of detecting and correcting such errors.”
The technique can be used to snoop on the print jobs of a variety of dot-matrix printers. Although it's necessary to train the software to recognize the sounds for each model line, the specific device being targeted need not have ever been encountered before. The “recognition rate only decreases slightly when using a different printer in the training phase,” the researchers said.
They also said it may one day be possible to use similar techniques to recover text processed by more modern printers.
“Ink-jet printers might be susceptible to similar attacks, as they construct the printout from individual dots, as dot-matrix printers do,” the paper states. “On the one hand, the bubbles of ink might produce shock-waves in the air that potentially can be captured by a microphone.” The researchers, however, said they were unable to capture the emanations, most likely because the faint sounds were drowned out by the noise coming from the mechanical parts of the ink-jet printers they tested.
Recognition for the four Wikipedia articles printed averaged an accuracy rate of about 63 per cent when just the input was analyzed and almost 70 percent when the HMM technology was employed. When two known living-will declarations were analyzed using HMM technology that had been finely tuned, the success rate was as high as 95.5 per cent.
The researchers said the most effective countermeasure is to block the sound of a printer using acoustic shielding foam. Their experiments showed that recognition rates drop precipitously if the distance between the printer and microphone is increased. Whereas their results were achieved with a distance of two centimeters, the rate dropped to about four per cent when the distance reached two meters.
Side-channel attacks, in which potentially sensitive data is leaked through emanations in electronic devices, are believed to have been employed as early as World War I, when the Germans spied on French field phone lines. In 1985, the first known attack was published when it was shown that electromagnetic radiation from CRT monitors could be used to reconstruct the words it displayed. The technique has since been used to fashion all kinds of attacks, including jimmying open keyless entry systems used to secure cars, garages, and office buildings.
The researchers are Michael Backes, Markus Durmuth, Sebastian Gerling, Manfred Pinkal, and Caroline Sporleder, members of the computer science and computer linguistic departments of Germany's Saarland University. A PDF of their paper is here. ®
Well there's a stroke of luck
And I only just threw out my DM printer too. Fifteen years ago.
Pedantic, I know
but legally, there is a difference between two copies printed at the same time using multi-part stationery, and two copies printed one-after-another. There is no guarantee that the two serially printed sheets are identical, because they could just be one print after another, with the second one slightly different. How would you know unless you minutely compared them?
And yes, I know that the lower copies in a multi-part *could* have been pre-printed, but that is why they come bound together with tear-off sprockets, so that you can tell whether the lower copy has been tampered with.
"the four Wikipedia articles printed averaged an accuracy rate of about 63 per cent"
Seems a tad higher than I expected.
Oh, wait ... Are you referring to the validity of the source material or the accuracy of the transcription?