# Codebreaker II A helping hand…

## And let's face it - you need it

Ok, you've convinced us. We're going to give you a helping hand with our fiendish Codebreaker II competition. Let's face it, you need it, if Jeffrey Kane is anything to go by:

I feel that I am very close to breaking the code... I've been using a rather unorthodox methodology. You see, I've replicated the code verbatim on a wall in my living room, and have been using the process of random selection to figure out what letter comes when. I've been doing this by swinging my cat by the tail until he's good and wound up, then tossing him at the wall. Whatever letter he hits is then the next letter in the sequence. Of course if he hits a letter he's already hit, I have to toss him again. He's doing very well, but he's started exhibiting the funniest habit of walking sideways instead of forwards and he won't stop licking himself. Anyway... cheers.

Happily, several readers have been applying rather more scientific methods. A quick round-up might prove illuminating. Jeremy Ardley writes:

I've analysed the data and figure it is a digraphic encryption of some sort. I am pretty sure I have eliminated playfair and variants as possibilites, so it must be some unknown method of combining two letters to produce two other letters. I've developed 'coarse' and 'fine' kasiski analysis software to extract a key length and have got a useful set of stats out of it.

The 'fine kasiski' approach looks at letter repeats over different intervals and finds weighted peaks in the distribution. Peaks at particular intervals (and multiples thereof) give a strong indication of key-length. There is distinct structure visible using this approach, and I guess it is only a matter of time to fix the key length and apply stat analysis at key-length intervals to try and break it.

It's hard to say how close the solution is but... there is an anomaly in the cipher, the letters of the second half of the alphabet make roughly 52% of the total (48% for the letters of the first half). This is the exact opposite of how a plain text should split and therefore suggestive of a Porta algorithm, or a variant. Two ways to attack this, probable words (but what is a probable word) and their inverse patterns matches or the heavy handed dictionnary+trigraph attack...

Last up we have Freek Brysse, whose hard work may help eliminate a few possible red herrings:

I haven't found a solution yet. My final attempt is based on the assumption that we're dealing with a substitution cipher, probably polyalphabetic; I'm implementing an algorithm to brute-force frequency-match it. If it's not, well,..

Here's my analysis up until now. Analysis of The Register's crypto II challenge.

1. MonoAlpabetic substitution & simple transposition
These are the frequency count statistics of the ciphertext message
Total letter count = 586
Letter use frequencies:
E: 28 4.7%
I: 28 4.7%
N: 28 4.7%
O: 28 4.7%
T: 28 4.7%
A: 27 4.6%
S: 27 4.6%
L: 25 4.2%
R: 25 4.2%
D: 23 3.9%
U: 23 3.9%
C: 22 3.7%
H: 22 3.7%
M: 22 3.7%
F: 21 3.5%
G: 21 3.5%
P: 21 3.5%
Y: 21 3.5%
B: 20 3.4%
K: 20 3.4%
V: 20 3.4%
J: 19 3.2%
W: 19 3.2%
Q: 18 3.0%
X: 17 2.9%
Z: 13 2.2%

This pattern does not match at all the pattern of English, as demonstrated below: [count based on Project Gutenberg Etext of Allan Quatermain, by H. Rider Haggard]

Total letter count = 460432
E: 56265 12.2%
T: 42259 9.1%
A: 38746 8.4%
O: 34910 7.5%
N: 31232 6.7%
I: 30014 6.5%
H: 29759 6.4%
S: 28091 6.1%
R: 25741 5.5%
D: 20824 4.5%
L: 18752 4.0%
U: 13395 2.9%
W: 12257 2.6%
F: 10857 2.3%
M: 10738 2.3%
G: 9965 2.1%
C: 9963 2.1%
Y: 8185 1.7%
P: 7525 1.6%
B: 6515 1.4%
V: 4334 0.9%
K: 3757 0.8%
': 2376 0.5%
-: 1746 0.3%
X: 728 0.1%
Z: 565 0.1%
Q: 541 0.1%
J: 392 0.0%

This will rule out any simple transposition cipher, as a transposition will not alter the frequency of occurrence of a character, only its position.

However, note the fairly similar high-to-low frequency sequence of the alphabet....
Cipher EINOTASLRDUCHMFGPYBKV JWQXZ [High to Low]
English ETAONIHSRDLUWFMGCYPBV KXZQJ

The frequency count may be very distorted and off, but my gut feeling tells me that something is going on. If we regroup this alphabet a bit based on the equi-frequency counts
Cipher EINOTA SLRDUC HMFG PYBKV JWQXZ [High to Low]
English ETAONI HSRDLU WFMG CYPBV KXZQJ

The high frequency cipher letter groups map almost directly onto the same letters in plain English. This is odd, and I don't quite get it; apparently there is a process going on that will flatten the frequency count, but not in such a way that it will significantly alter the order of the alphabet, I feel that this presents an angle of attack I can't quite see yet; Also, the grouping is around 4-6 characters. Does this represent a polyalphabetic system with a period around 5? I'll delve into that later...

2. Is ElReg sending us off on a wild goose chase? The folks of ElReg are very capable of sending their readers off on a wild goose chase; so let's see if they are taking the piss?

Friedman has developed a set of tables to figure out if you're dealing with random text or something which is actually valid.

In the cipher text we count the following:
Digrams:
occurring 2x: 117
3x: 28
4x: 5
5x: 1

Friedman expects for a message of 600 characters (our's is 586) the counts 110, 32.3, 7.11 and 1.25; on average.... I could bother with figuring out the standard deviation, etc etc to see if this matches; but it looks pretty much on track.

Trigrams:
occurring 2x: 7
Friedman expects 9.81 for 600 characters and 6.85 for 500 characters; that looks about right too. So, I'd argue that ElReg is doing a proper crypto contest.

3. Polyalphabetic cipher.

LANAKI observes that solving poly-alpha ciphers boils down to figuring out the period (let's call it P) of the repeating key, and subsequently breaking up the message into P messages; each which can be broken as a monoalphabetic cipher. Let's see if that's going to work. [Am still working on that approach... still need to implement (in Java) Jakobsen's Fast Polyalpha-cracker] Jakobsen, A Fast Method for Cryptanalysis of Substitution Ciphers from www.mat.dtu.dk/persons/Jakobsen_Thomas/pub.

4. Computer automated analysis

- Vigenere & Variant and Beaufort cipher
Using Mizra's VigSolve
Index of Coincidence and Periodic Key Find
Len Index Vigenere Var. Beaufort Beaufort
--- ------ c = p+k c = p-k c = k-p
p = c-k p = c+k p = k-c
k = c-p k = p-c k = p+c
2 0.038524 .. .. ..
3 0.038815 ... ... ...
4 0.038309 .... .... ....
5 0.037273 ..... ..... .....
6 0.038476 ...... ...... ......
7 0.035978 ....... ....... .......
8 0.039101 ........ ........ ........
9 0.039107 ......... ......... .........
10 0.037692 .......... .......... ..........
11 0.038105 ........... ........... ...........
12 0.038537 ............ ............ ............
13 0.037796 .x........... .d........... .............
14 0.036154 .......o...... .......m...... ..............

This yields nothing really, also note the low index of coincidence. [English has 1.73; that's way off] - Playfair cipher cracker. Using Gunnar Andersson's implementation of the shotgun hill-climbing algorithm, no result was found after 87 minutes of computation on a 700Mhz P3. [results are normally yielded within seconds to minutes.

OK, that's the state of play. Well, not quite. We had a few emails that convinced us that several people were on the right track, albeit stumbling along it with a white stick. This might help you find your way:

I-IV-V
23-6-12
(B) R-E-G

But then, you'll need these as well:

S = A
H = B
O = C
W = D
U = E
T = F
N = I
Y = J

Right, see how you get on with that little lot. Be warned that even if you understand how to impliment the above, that doesn't mean you're home and dry. Have a look at our original Codebreaker competition.

CQBMQ NYDXL XIWKP HOVOL HXBPZ UHJLQ MUZRG WFZID DXWTL ZOQAT FILBL SJMYA DKVEF UTJZF KFXHH TRTRD PMJVF KFASV ZAXYP DAOYE KPWVW KKXKH YTFYM QCNZM WRLPN IVAHN ZMAKY WHSIT DCDDS UMCQR IKIMT BHPKB ICDCX JDMCO XTOLR PMDJD OWRRC NKBUB XREQG NPSNH YGLAS CHDJR HNWTU LBVBO ITBYN ZLGKS YDOZR IKYOW ZWZJU HGZND VRCLC PQHBG PZJMN NVKYN HHSPI DSFQP LUQVQ LEPDA JKYGN ZJUHD MYUWV LBPXP ODEWL HDFWZ RPLMI OSVNE ICKKS KVVKQ AAEUS FAKDT YTQGH GLWHG ANBLD YNFVV SZQUI AMZEV GJEBY QBUCM HRHAG SAKKD ZNYKJ ODILJ MSKUN UFCLZ QXVZH MZQHI ZYTCA MHQRA WXFVQ JLNKS AZGIB FSNLB KFNKX KMDUN DNWVK EVKHR WYGJS WTLLR OYFKE OZHER YMTFG ZMCYV PSLYX UFCPJ VWAQA YPNTU JSMDL WSNPN UPGVW JYBZO PMUYY EMTWO OYJSP DFZNP DFCKV AUSBK JK

Carry on. We've given you an extra three weeks, so we'll be expecting some results soon. You can find the competition here. The closing date is now 5.00pm BST on Friday, 25th May.