Humans best crossword-puzzling computer

Dr Fill is no Deep Blue or Watson – yet

Combat fraud and increase customer satisfaction

Officially, humans are the only ones who can enter the American Crossword Puzzle Tournament, which was held in Brooklyn, New York over the weekend. But this time artificial intelligence expert Matt Ginsberg of On Time Systems has put his Dr Fill crossword solver to the test. As the results show, you don't have to throw out your pencil just yet.

This week, the puzzle masters at the ACPT put the hurt on Dr Fill, who according to a report in the New York Times, did terribly on two out of the five crossword puzzles and would have only ranked 141st among the 600 people that took the test. The program has done better in the past, and in simulated runs among fifteen former tournaments, Dr Fill came out on top three times.

"I'll be back next year," Ginsberg told the Times.

Ginsberg's company, located in Eugene, Oregon, is a niche player in industrial optimization. Among other things, On Time Systems has created algorithms behind the Green Driver application used in the cities of Eugene and Portland to anticipate red lights and to route around them, which can cut commute times by 5 per cent.

The company has also created the algorithms for routing the US Air Force's fleet of cargo aircraft around weather while still obeying air traffic rules. The Worldwide Aeronautical Route Planner was created to run on mainframes and its being modernized to run on laptops. The optimized routing saves the Air Force something on the order of 1 to 2 per cent on its fuel bill each year, which works out to $35m to $90m.

In his spare time, in addition to actually creating crossword puzzles for the Times, Ginsberg has been working on the Dr Fill program. Ginsberg submitted a paper describing Dr Fill to the Journal of Artificial Intelligence Research, which was published (PDF) in December 2011. Based on its past performance, Ginsberg reckons that Dr Fill ranks among the top 50 puzzle solvers in the world, even though it did not have a very good day on Sunday.

The Watson Jeopardy! question-answer system built by IBM that took on the two human champs (perhaps chimps or chumps might be more appropriate, given how badly Watson spanked Ken Jennings and Brad Rutter) did terribly at first, too, but with tweaks and tuning and algorithm changes, it got better.

So there's no question that Ginsberg will be back, and it might even turn out that he gets some help. The Gray Lady is Big Blue's hometown rag, so some collaboration might be in order. The puzzles that threw Dr Fill through a loop had words spelled backwards and some that had words diagonally as well as horizontally and vertically.

As with the Watson QA machine, Dr Fill is not actually understanding what it is doing, but using statistical probabilities to calculate its answers. Technically, what Dr Fill does is convert a crossword puzzle into a singly weighed constraint satisfaction problem. The constraints, of course, are the clues. Otherwise, you could just use a vast dictionary and try to cram every word of every length into every appropriate spot and then create a cross index of where letters match up where words cross.

Such an unconstrained problem is very tough to solve, so if you can "understand" the clues, then you can limit the possible answers and therefore the size of the dictionary you need to rifle through. The clues have all kinds of other cues that people process, such as a clue that ends in a question is usually tricky in some fashion.

Behind the Dr Fill program, Ginsberg has amassed a database of over 47,000 crossword puzzles and their solutions, with almost 1.9 million unique clues. The system also makes use of a dictionary with over six million words and a smaller dictionary with 8,542 common words.

The answers to all of these puzzles were then rated in a number of ways, including hand scoring them by 100 volunteers and cross-checking them for the number of Google hits, their Scrabble score if you were playing that word game, their length, and other criteria. Based on the 50,000 words, scoring was done for the remaining clues in the 47,000 puzzles.

The Dr Fill database also includes a database of Wikipedia titles, grammatical information about 154,000 words that tells Dr Fill their part of speech, and 1.2 million synonyms.

Armed with all this data and super-secret algorithms, Dr Fill solves crossword puzzles. Watson needed a rack of IBM Power 750 servers with 2,880 cores and 16TB of memory in a high-speed cluster, but Ginsberg says you can run Dr Fill on a notebook with two x86 cores and 8GB of memory with a compressed database that weighs in at 300MB.

Like many others, Ginsberg thinks that Watson cheated a bit. Humans did beat Watson in a dry run, and then IBM sped up the button pushing a bit for the real showdown.

"Watson, too, outperforms humans easily in terms of speed; its much-ballyhooed victory against human Jeopardy! competitors was probably due far more to Watson's mastery of button pushing than to its question-answering ability," writes Ginsberg in his JAIR paper. "In terms of the underlying cognitive task, Watson appears to not yet be a match for the best Jeopardy players, who are in general capable of answering virtually all of the questions without error."

Ginsberg says that the task for Dr Fill is a lot harder, because you can't decline to answer or bet to boost your earnings as you can on Jeopardy!. You have to find the right answers, period. And perhaps more significantly, Ginsberg is having a harder time as Dr Fill gets better and better of figuring out why the program is not working.

Perhaps we need to create Stallman? ®

Combat fraud and increase customer satisfaction

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Kingston DataTraveler MicroDuo: Turn your phone into a 72GB beast
USB-usiness in the front, micro-USB party in the back
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
BOFH: Oh DO tell us what you think. *CLICK*
$%%&amp Oh dear, we've been cut *CLICK* Well hello *CLICK* You're breaking up...
AMD's 'Seattle' 64-bit ARM server chips now sampling, set to launch in late 2014
But they won't appear in SeaMicro Fabric Compute Systems anytime soon
Amazon reveals its Google-killing 'R3' server instances
A mega-memory instance that never forgets
Cisco reps flog Whiptail's Invicta arrays against EMC and Pure
Storage reseller report reveals who's selling what
Microsoft builds teleporter weapon to send VMware into Azure
Updated Virtual Machine Converter now converts Linux VMs too
prev story


Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.