AI + ML

This article is more than 1 year old

Boffins build a NAZI AI – wait, let's check that... OK, it's a grammar nazi

How'd you like those 'robots won't steal your job' headlines now, Reg editors? Muahaha

Thu 2 Aug 2018 // 06:02 UTC

Pedants, imagine how much more relaxed your life would be if artificial intelligence automatically corrected grammar mistake's in online forum and social network posts.

Never again would you explode with frustration and anger over misplaced apostrophe's, commas full stop's and exclamation! marks! The faults could be fixed up by machine-learning software, and your soul would be soothed.

Software, you say? Yes, software of the kind built by Mengyi Shan, a mathematics student at Harvey Mudd College in California, USA. She trained recurrent neural networks to restore missing punctuation in text. At the moment, it can only deal with commas and full stops, the most common and easiest of English's punctuation marks.

“In natural language processing problems such as automatic speech recognition (ASR), the generated text is normally unpunctuated, which is hard for further recognition or analysis. Thus punctuation restoration is a small but crucial problem that deserves our attention,” she explained last month.

Summer project

In a project for the Wolfram Summer School held at Bentley University in Boston, Shan trained her recurrent neural networks using three million words that were gathered from 50 novels, plus Wikipedia pages, and transformed into vectors.

The text was filtered so that question marks, exclamation marks, and colons were replaced with full stops. The words were then tagged to indicate whether they were followed by either a comma or full stop. This information, in the form of complete sentences, were fed into the system to train the models so that they could identify common patterns where commas and full stops should appear.

The AI thus ought to pick up that the word "but" is more likely to be followed by a comma than a full stop, and that words like "the" normally feature at the start of sentences so it’s unlikely to be followed by any punctuation at all.

To demonstrate the software, you feed it blocks of sentences, which are converted into sequences of vectors, and passed through the neural network, which outputs the same sentences with full stops and commas added as it thinks is necessary.

Nobody's perfect

Total accuracy isn’t a good measure of performance for the models in this case, she explained. Instead, an F1 score that takes an average of the system’s precision and recall is a better benchmark.

The best F1 score hovers around the 70 per cent mark, and that isn’t good enough to be used in real applications yet. A larger training dataset would help boost scores as would higher quality material.

Facebook pulls plug on language-inventing chatbots? THE TRUTH

Sometimes text from Wikipedia, especially academic citations, contain too many commas, and that can confuse machines and make them inject excessive commas, too. Interestingly, neural networks find it harder to deal with commas than full stops.

“The overall performance on commas is slightly worse than on periods. This also makes sense from a linguistics point of view," Shan explained.

"There seems to be a concrete linguistics set of rules for the period, but the usage of comma greatly depends on personal writing style. For example, you could say either 'I like apples but I don't like bananas.', or 'I like apples, but I don't like bananas.'

“In this way, it's really hard to build a model for comma prediction with such high accuracy. But fortunately, sometimes adding commas or not doesn't really influence the overall meaning of the sentence. So it's OK to be tolerant to a slightly worse performance on commas.”

Shan also added that restoring punctuation shouldn’t be limited to full stops and commas:

A more rigorous study of the question mark, exclamation mark, colon, and quotation mark is expected. However, we should note that the choice of most punctuations is not restricted to one possibility. In cases like distinguishing a period with an exclamation mark, we cannot expect a high F1-score. But it's still an interesting topic, may be useful for topics like sentimental analysis.

Topics

Special Features

Vendor Voice

Resources

AI + ML

Boffins build a NAZI AI – wait, let's check that... OK, it's a grammar nazi

How'd you like those 'robots won't steal your job' headlines now, Reg editors? Muahaha

Summer project

Nobody's perfect

Facebook pulls plug on language-inventing chatbots? THE TRUTH

More about

More about

Narrower topics

Broader topics

More about

More about

More about

Narrower topics

Broader topics

TIP US OFF

Other stories you might like

Google Cloud chief is really psyched about this AI thing

AI spam is winning the battle against search engine quality

Arm flexes silicon muscles to push generative AI at the edge

A different view from the edge

Developers are calling the shots on AI planning, judging by your experience

Why making pretend people with AGI is a waste of energy

Intel CEO suggests AI can help to create a one-person Unicorn

Microsoft puts ex-DeepMind boffin in charge of London AI hub

Tech titans assemble to decide which jobs AI should cut first

US House mulls forcing AI makers to reveal use of copyrighted training data

Hailo's latest AI chip shows up integrated NPUs and sips power like fine wine

British watchdog has 'real concerns' about the staggering love-in between cloud giants and AI upstarts

About Us

Our Websites

Your Privacy