Machine translation cracks 18th century occult cipher
Secret society members were thrilled by ... eye surgery
Statistical translation techniques have been successfully applied to decode an 18th century document written using an encryption scheme that has baffled scholars for decades.
The Copiale Cipher was found in book housed in an East Berlin Academy after the Cold War. The book’s pages contained about 75,000 neatly hand-written characters featuring abstract symbols and doodles alongside Roman and Greek characters. The mysterious cryptogram, bound in gold and green brocade paper, was inscribed in a 105-page book thought to contain the rituals and writings of an 18th-century secret society in Germany. The manuscript can be dated back to between 1760 and 1780.
The cipher had withstood previous attempts by crack it. But computer scientists from Sweden and the United States found that decryption was possible using statistical translation techniques of the kind used by Google Translate, Wired reports.
University of Southern California Viterbi School of Engineering computer scientist Kevin Knight – and colleagues Beáta Megyesi and Christiane Schaefer of Uppsala University in Sweden – first transcribed a machine-readable version of the document before applying various approaches to cracking the code.
The team firstly tried isolating the Roman and Greek characters and tried to uncover its meaning using translation project software and a library of 80 different languages. "It took quite a long time and resulted in complete failure,” Knight said, in a statement on the work.
The codebreakers hit on the idea that the recognisable characters might be there just as a smokescreen. They formed a theory that abstract symbols sharing similar shapes might represent the same letter, or a common letter sequence. Testing this theory using German and frequency analysis allied to statistical translation techniques yielded some meaningful words including "Ceremonies of Initiation" and "secret section". More on the code-breaking technique applied can be found here.
After this breakthrough, the researchers knew they were on the right track and they were subsequently able to decode the book, which has been revealed as the rituals and political thoughts of a German secret society, with a strange fascination for eye surgery and ophthalmology. Members of the secret society were not themselves eye doctors.
"When you get a new code and look at it, the possibilities are nearly infinite," Knight said. "Once you come up with a hypothesis based on your intuition as a human, you can turn over a lot of grunt work to the computer."
Flushed with their success, the group plans to apply their techniques to other documents that have baffled crypto-analysts, such as an unbroken message from the Zodiac Killer, a serial murderer who terrorised northern Californians in the the '60s as well the medieval Voynich Manuscript.
Knight is an expert in machine translation – teaching computers to turn Chinese into English or Arabic into Korean – not cryptography. "Translation remains a tough challenge for artificial intelligence," said Knight.
With researcher Sujith Ravi, a PhD in computer science, Knight has been approaching translation as a cryptographic problem.
The team hopes the approach will not only improve human language translation but also prove useful in making sense of languages that are not currently spoken by humans, including ancient languages and communication between animals. ®
"a German secret society, with a strange fascination for eye surgery and ophthalmology"
Or maybe that part of the translation is wrong?
My hovercraft is full of eels.
"with a strange fascination for eye surgery and ophthalmology."
Must be the Illuminati
The Voynich is more complex than that.
There's the huge problem of how many characters are used - there's almost no agreement about whether some characters are distinct or whether they are actually different characters with ligatures. Estimates vary that Voynichese uses between 20 and 30 characters for the bulk of its text plus a few other rare characters.
Then when you start doing the number crunching odd things begin to appear - there are definitely word-like groups in the text, but the word lengths do not resemble any known language - there are very few short words and very few ones over 10 characters long. Some words are only found in certain parts of the manuscript. Individual words are often repeated either identically or with slight variations - a pattern not usually found in real texts.
The patterns of characters are definitely not random, there are rules about which characters follow others and which do not and whether they appear anywhere in a word or only at the beginning.
When you measure the entropy of the whole text (ie. how predictable the text is), it comes much lower than most European languages, around the same as English or Latin - but neither of those match the previous patterns found in the text.
It most probably is completely meaningless, but a huge amount of work was put into its creation and it would be wonderful to know more about where this thing came from and why it was made.
The best suggestion is that it was an alchemical fake designed to impress the rich and powerful in Central Europe, but there is a frustrating lack of contemporaneous evidence for the book prior to the early 17th Century (we now know from C-14 that the vellum is early 15th Century, but that does not necessarily mean the book itself is that old).