AI slurps, learns millions of passwords to work out which ones you may use next

Get creative – bringbackfirefly! will no longer cut it, nerds


Eggheads have produced a machine-learning system that has studied millions of passwords used by folks online to work out other passphases people are likely to use.

These AI-guessed passwords could be used with today's tools to crack more hashed passwords, and log into more strangers' accounts on systems, than ever before.

When it comes to cracking a password, you typically start with a hashed version of the passphrase, stolen from a database or similar. Hashed means the password has been encrypted one-way: you can't unscramble it to get the original. Today's tools either brute-force their way through all possible combinations of words and letters (such as AAAAA, AAAAB, AAAAC etc) for a password, calculating a hash for each combo and comparing it to the stolen hash. If they match, there's your password. This is particularly intensive, especially if the hashes are individually salted.

Alternatively, as an optimized approach, a tool can take a dictionary of words and commonly used passwords – as well previously cracked passphrases – and turn them into hashes to check against the stolen hash or hashes.

But what if software could be trained to stay one step ahead and predict the passwords people are going to use, or using right now, based on what they've all done in the past?

A team at the Stevens Institute of Technology in New Jersey, USA, this month produced a paper [PDF] in which they detail how – using a generative adversarial network of two machine learning systems called PassGAN, which train each other – they were able to double the code-cracking skills of open-source tools HashCat and Jack the Ripper – and, more importantly, use this to protect against password-stealing attacks.

The researchers took their machine-learning system and fed it 32,603,388 plain-text passwords taken from the 2010 leak from music site RockYou, and let it work out the rules that people were using to generate their passphrases. It then attempted to use this knowledge to crack a hashed list of passwords taken during the 2016 LinkedIn intrusion.

At first, the AI correctly guessed 46.85 per cent of the RockYou passwords it was trained on – 2,774,269 out of 5,919,936 – and 11.53 per cent of the LinkedIn passwords – 4,996,980 out of 43,354,871. If you exclude from the correctly guessed LinkedIn passwords any passphrases it saw during the RockYou training, the number of correctly generated passwords drops to 3,890,043 or 9.582 per cent. In other words, the AI was able to crack one in ten hashed LinkedIn passwords it had never seen before.

It therefore outperformed John the Ripper, which was able to crack 6.37 per cent of the LinkedIn passwords (and 4.98 per cent of those excluded) and was behind HashCat, which cracked 22.9 per cent and 17.67 per cent respectively. When the neural network software was combined with HashCat, it fared better, as you'd expect, cracking 27 per cent and 22.039 per cent of the leaked account database, respectively. In other words, the AI and HashCat together could crack between one in five and one in four LinkedIn password hashes.

To achieve all this, the PassGAN had to come up with 528,834,530 passwords, HashCat generated 441,357,719, and John the Ripper also 528,834,530. The combined HashCat and AI produced 947,606,924 passphrases.

The team summarized their work thus:

Our experiments show that this approach is very promising. When we evaluated PassGAN on two large password datasets, we were able to outperform John the Ripper’s SpyderLab rules by a 2x factor, on average, and we were competitive with the best64 and gen2 rules from HashCat — our results were within a 2x factor from HashCat’s rules. More importantly, when we combined the output of PassGAN with the output of HashCat, we were able to match 18%-24% more passwords than HashCat alone. This is remarkable because it shows that PassGAN can generate a considerable number of passwords that are out of reach for current tools.

"Also, our evaluation of training performance suggests that, when supplied with a large enough leaked password set, the performance of PassGAN could surpass that of the best rule-based password generation techniques," they added.

In other words, HashCat is still good. And this early stage AI can fill in the gaps – until it overtakes the well-known tool. ®

Sponsored: The Joy and Pain of Buying IT - Have Your Say

Biting the hand that feeds IT © 1998–2017