Codebreaker II Our winner reveals all

Mark Wutka explains his attack

  • alert
  • submit to reddit

Security for virtualized datacentres

In response to readers' requests for our Codebreaker II supremo, Mark Wutka, to explain how he attacked the problem, we are pleased to print the following from the man himself. Enjoy:

When I first saw the Codebreaker II Challenge, I did what any cryptanalyst would do with an unknown cipher - I counted the occurrences of each letter. In normal English text, the letters ETAONIRSH occur most frequently, with E appearing about 13% of the time. You also expect letters like J, Q, X and Z (the high-point Scrabble letters) to appear very infrequently, if at all.

Because Codebreaker II was 587 letters long, you might expect to see some of the low-frequency letters appear once or twice. When I counted the letters, however, I saw the following numbers: Q:28 Y:28 G:28 R:28 U:28 T:27 A:27 I:25 H:25 C:23 F:23 V:22 X:22 W:22 P:21 E:21 S:21 L:21 N:20 Z:20 O:20 B:19 K:19 J:18 D:17 M:13. Notice that all 26 letters appear, and that the lowest frequency is roughly half the highest one. The highest value should have been around 76 (13% of 587). Obviously, this was not a simple substitution.

Instead, it was likely to be a polyalphabetic cipher where a letter can represent different letters at different times. In conversations on the rec.puzzles newsgroup, Jim Gillogly suggested several possibilities and ruled out others. For example, many polyalphabetic ciphers use a 5x5 square to encrypt letters, using the same letter to encrypt both I and J. In this cipher, however, all 26 letters appear, so it couldn't be any of those (like Playfair, for instance).

Jim suggested something in the Vigenere family of ciphers, which use a set of 26 different alphabets, or possibly a mechanical encryption like Enigma. Enigma was an obvious choice since The Register is in the U.K. and Britain is very proud (rightly so) of their work on breaking the Enigma during World War II. At one point, I had been very close to the correct settings. I guessed that the set of rotors might be a permutation of the submission date (04/05/01) and that the initial position for the rings might be REG, but I never would have guessed the 23/06/12 (Alan Turing's birthday) for the ring setting.

It was at this point that Jim Gillogly was too smart for his own good. He reasoned that if this was an Enigma encryption, then the letters ETAONIRSH should appear less frequently than the others because the Enigma never encrypts a letter as itself. That is, if you type an E into an Enigma machine, you'll get anything BUT an E as the output. Jim performed tests on several English texts to verify his hypothesis, and it looked correct. We never suspected that the text might have been encrypted some other way first. Jim is an expert at classical cryptography and has written a program that is quite good at cracking Enigma cipher.

After the smoke cleared, he ran his program on the Codebreaker II and found that it did, in fact, figure out the correct settings, including the plugboard. He also wrote a paper describing his Enigma cracking technique, which is available here. After incorrectly ruling out Enigma, we floundered around with variations of Vigenere ciphers, but obviously never got anywhere. We finally got discouraged and realized that we desperately needed a clue.

When clue time arrived, I immediately recognized the clues as Enigma settings. I didn't know much about the internals of the Enigma when the contest started but by the time the clue arrived, I had read several papers describing its operation. I grabbed an Enigma simulator off the net and decrypted the text. The SO FAR SO GOOD at the end was the only clue that I had done it right, although the supplementary encrypted text also told me that I had the right settings.

Now that I had this newly decrypted text, I went back to the beginning and did a frequency count again. This time, it was much more in line with a simple substitution cipher: U:77 L:57 F:51 D:51 S:50 M:37 K:37 N:36 J:27 O:24 W:21 B:21 X:18 T:17 P:14 Q:11 C:10 E:9 A:5 G:5 R:5 H:2 V:1. Notice that U appears 77 times, just a hair over 13% of the time. I assumed that this was English text and substituted E for U and T for L. I also employed another technique which is to count the number of times various letters contact each other. In the case of L (which I assumed was T), I noticed that it came before M 19 times. I figured that M must represent H, because H frequently appears after T.

At this point, I noticed several occurrences of TH T (where the blank was an S in the ciphertext, making me think that S must represent A. By this time, I was beginning to think that I was wrong about M representing H, because I saw HH appear several times, and you don't normally see that in English text. I did notice, however, that HATH occurred several times, so I got the impression that this was old English.

Next, I looked for the pattern T THE, expecting TO THE to occur somewhere, but I didn't find it. Going back to the contact frequencies, I noticed that J appeared after U 9 times - the most frequent letter to appear after U. In English, the letter that most frequently comes after E is R, so I guessed that J represented R. Now, looking at the beginning of the cryptogram, I had A TER, with a T representing the blank spot. I guessed that T represented F. Next, I looked for the F R pattern, figuring that FOR must appear somewhere. I saw several places, each with F representing the blank spot. I guessed that F must represent O.

At this point, I noticed a piece of a phrase - THATTHEROOTEOFA - that really made me realize I was on the right track. First, I began to suspect that this was something like Chaucer, because of "roote", which you may remember if you ever had to memorize the opening to the Canterbury Tales, that included the line "The Droute of March hath perced to the roote". Since the next two letters after A were the same letter (B), I guessed that B was probably L. I also noticed FOR OOTH, which certainly looked like it should be FORSOOTH, so I substituted S for K. The introduction of S into the text also brought about another piece of phrase that looked solveable - THATHESO LEHATHLOSTTHE. Since ROOT was spelled ROOTE, I thought maybe SOUL was spelled SOULE and substituted U for the P that represented the blank spot.

Now, all is fair in love, war, and Codebreaker challenges, so at this point, I decided to bring the Internet to bear on the problem. I went to www.altavista.com and typed +"that the soule hath lost the" (Actually, I tried Google first, but because it always breaks up the words, it didn't find anything that looked like a match). Altavista gave me three results, all containing Chaucer's "The Parson's Tale". After that, it was a simple matter of completing the substitutions.

Had I not taken that shortcut, my probable next step would have been to substitute ND for DW, since I saw FOU and would assume that it was probably FOUND. You would then be able to solve the rest quite easily.

This was a fun and challenging puzzle and I look forward to see what torture Lester dreams up for Codebreaker III!

If you are interested in cryptograms of all sorts, you might consider joining the American Cryptogram Association. Although the name says "American", they welcome members from all nationalities (although most of their cryptograms are in English).

I'm the author of Special Edition Using Java Server Pages and Servlets, and also Special Edition Using Java 2 Enterprise Edition. I can be reached here.


Thanks to Mark for sharing that with us. You can find the Codebreaker II solution here.

Providing a secure and efficient Helpdesk

More from The Register

next story
Are you a fat boy? Get to university NOW, you PENNILESS SLACKER
Rotund types paid nearly 20% less than people who didn't eat all the pies
Emma Watson should SHUT UP, all this abuse is HER OWN FAULT
... said an anon coward who we really wish hadn't posted on our website
Japan develops robot CHEERLEADERS which RIDE on BALLS
'Will put smiles on faces worldwide', predicts corporate PR chief
Bruges Booze tubes to pump LOVELY BEER underneath city
Belgian booze pumped from underground
Let it go, Steve: Ballmer bans iPads from his LA Clippers b-ball team
Can you imagine the scene? 'Hey guys, it's your new owner – WTF is that on your desk?'
Amazon: Wish in one hand, Twit in the other – see which one fills first
#AmazonWishList A year's supply of Arran scotch, ta
SLOSH! Cops dethrone suspect - by tipping over portaloo with him inside
Talk about raising a stink and soiling your career
Ingredient found in TASTY BEER is GOOD for your BRAIN
You only have to drink 2k litres a day to see the effect...
Oz carrier Tiger Air takes terror alerts to new heights
Don't doodle, it might cost you your flight
prev story


Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.
The next step in data security
With recent increased privacy concerns and computers becoming more powerful, the chance of hackers being able to crack smaller-sized RSA keys increases.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.