Winning at chess, losing at language
This approach is much like computerized chess: make a statistical model of the domain and optimize the hell out of it, ultimately winning by sheer computational horsepower. Like chess (but unlike vision), language is a source of pride, something both complex and uniquely human. For chess, computational optimization worked brilliantly; the best chess-playing computers, like Deep Blue, are better than the best human players. But score-based optimization won't work for language in its current form, even though it does do two really important things right
The first good thing about statistical machine translation is the statistics. Human brains are statistical-inference engines, and our senses routinely make up for noisy data by interpolating and extrapolating whatever pixels or phonemes we can rely on. Statistical analysis makes better sense of more data than strict rules do, and statistical rules produce more robust outputs. So any ultimate human-quality translation engine must use statistics at its core.
The other good thing is the optimization. As I've argued earlier, the key to understanding and duplicating brain-like behavior lies in optimization, the evolutionary ratchet which lets an accumulation of small, even accidental adjustments slowly converge on a good result. Optimization doesn't need an Einstein, just the right quality metric and an army of engineers.
So Och's team (and their competitors) have the overall structure right: they converted text translation into an engineering problem, and have a software architecture allowing iterative improvement. So they can improve their Black Box - but what's inside it? Och hinted at various trendy algorithms (Discriminative Learning and Expectation Maximization, I'll bet Bayesian Inference too), although our ever-vigilant chaperon from Google Communications wouldn't let him speak in detail. But so what? The optimization architecture lets you swap out this month's algorithm for a better one, so algorithms will change as performance improves.
Or maybe not. The Achilles' Heel of optimization is that everything depends on the performance metric, which in this case clearly misses a lot. That's not a problem for winning contests - the NIST competition used the same "BLEU"(Bilingual Evaluation Understudy) metric as Google practiced on, so Google's dramatic win mostly proved that Google gamed the scoring system better than IBM did. But the worse the metric, the less likely the translations will make sense.
The gist of the problem is that because machines don't yet understand language - that's the original problem, right? - they can't be too good at automatically evaluating language translations either. So researchers have to bootstrap the BLEU score, taking a scheme like (which merely compares the similarity of two same-language documents) and verifying that on average humans prefer reading outputs with high scores. (They compare candidate translations against gold-standard human translations)
But all BLEU really measures is word-by-word similarity: are the same words present in both documents, somewhere? The same word-pairs, triplets, quadruplets? In obviously extreme cases, BLEU works well; it gives a low score if the documents are completely different, and a perfect score if they're identical. But in between, it can produce some very screwy results.
The most obvious problem is that paraphrases and synonyms score zero; to get any credit with , you need to produce the exact same words as the reference translation has: "Wander" doesn't get partial credit for "stroll," nor "sofa" for "couch."
The complementary problem is that BLEU can give a high similarity score to nonsensical language which contains the right phrases in the wrong order. Consider first this typical, sensible output from a NIST contest:
"Appeared calm when he was taken to the American plane, which will to Miami, Florida"
Now here is a possible garbled output which would get the very same score:
"was being led to the calm as he was would take carry him seemed quite when taken"
The core problem is that word-counting scores like BLEU - the linchpin of the whole machine-translation competitions - don't even recognize well-formed language, much less real translated meaning. (A stinging academic critique of BLEU can be found here.)
A classic example of how the word-by-word translation approach fails comes from German, a language which is too "tough" for Och's team to translate yet (although Och himself is a native speaker). German's problem is its relative-to-English-tangled Wordorder; take this example from Mark Twain's essay "The Awful German Language":
"But when he, upon the street, the (in-satin-and-silk-covered-now-very-unconstrained-after-the-newest-fashioned-dressed) government counselor's wife met, etc"
Until computers deal with the actual language structure (the hyphens and parentheses above), they will have no hope of translating even as well as Mark Twain did here.
So why are computers so much worse at language than at chess? Chess has properties that computers like: a well-defined state and well-defined rules for play. Computers do win at chess, like at calculation, because they are so exact and fussy about rules. Language, on the other hand, needs approximation and inference to extract "meaning" (whatever that is) together from text, context, subject matter, tone, expectations, and so on - and the computer needs yet more approximation to produce a translated version of that meaning with all the right interlocking features. Unlike chess, the game of language is played on the human home-turf of multivariate inference and approximation, so we will continue to beat the machines.
But for Google's purposes, perfect translation may not even be necessary. Google succeeded in web-search partly by avoiding the exact search language of AltaVista in favor of a tool which was fast, easy to use, and displayed most of the right results in mostly the right order. Perhaps it will also be enough for Google to machine-translate most of the right words in mostly the right order, leaving to users the much harder task of extracting meaning from them. ®
Bill Softky has written a neat utility for Excel power users called FlowSheet: it turns cryptic formulae like "SUM(A4:A7)/D5" into pretty, intuitive diagrams. It's free, for now. Check it out.
good start but needs more
Google's approach is a good one. Translation is very similar to code breaking, so use similar algorithms.
However, when you already know things about the languages, you can incorporate this knowledge. For example give it a dictionary and thesaurus, teach it a little about grammar, in each language. Then it can put things in (some sort of) context.
But lets look at it this way. Assuming there is life outside of this planet, and we someday meet them, how do we communicate? Would this approach not be way to get the very first insights into the way they communicate. Sure it wouldnt be perfect, but it would help.
It will never be perfect. I do beleive that language is based on hard and fast rules, but humans dont like rules. It's like my music composition teacher said, "You've got to know the rules, THEN you can break them". We continualy go against the rules with language, make up new words, say things wrong. Computers wont keep up with that, but Googles translator can still do its job: Giving you a rough guide of what is said.
Rules, yes, but self-adapting rules, and not rules in the form of what most people would consider as "grammar". Language operates at a much deeper level, as you can see from the fact that good translations hardly ever reproduce the most apparent grammatical structures of the original text.
On the UN producing "expert" translation, I wouldn't count on it. Most UN and EU translations better machine translation in degree only, but not in essence. They are by and large atrociously overliteral, and have little in common with natural language.
If language is algorithmic at all (and I don't think it is), it can only be so at a degree of complexity that defies reverse engineering along the lines of an electronic translator. Nobody has ever come close to writing a full grammar of any language, and I suspect the very nature of language (total open-ended versatility) is such that no such grammar can exist. This is because meaning is not encapsulated in the words of the speaker but revealed solely in the response of the listener. Words only mean what people take them to mean.
That is the first insurmountable problem for electronic translation. The second is that meaning is distributed across huge expanses of discourse. In the case of spoken language, it is distributed beyond phonetics into prosody, then beyond prosody into gesture. Written language uses a whole panoply of devices to simulate the effects of prosody and even gesture, and I don't see how an algorithmic approach could possibly allow for this.
Time flies vs fruit flies and white house vs casa blanca
These are examples of the ambiguity of language. The first is a case of the same string of letters representing different words (which may or may not have the same pronunciation). I remember a science fiction story I read years ago where a blue print of a device was written in Russian. Due to compartmentalization restrictions, the Russian wording was copied into a word list which was then translated by someone from Russian to English for passing on to someone who then could analyze parts of the translated blue print (no-one saw the complete Blue Print but just sections of it). Due to the words being translated with no context there was problems such as translation ending up with the string "lead" as both the metal as the technical term for a wire (ie: Lead lead as in a wire made of Lead). The Flies example uses the String "Flies" both as a verb to connote movement and to designate an insect class (modified by the designation of Fruit). Words can not be translated/interpreted in isolation but need to be viewed in context so that the proper meaning is assigned to them for purposes of the translation. Science fiction writer Piers Anthony makes use of this type of word play in his Xanth Series.
As to white house and casa blanca there was an incident during WWII where there was a secret meeting of all the allied leaders in Casa Blanca (where they could have been attacked and killed by the Germans if the meeting plans became known). As it happened. a spy reported the plans to the Germans but due to encoding, decoding, and translation into German the reference to the meeting being help Casa Blanca ended up as getting reported to the Spy's controller as being held in [the] White House (ie: Washington DC/USA),