Reg comments9

From drugs to galaxy hunting, AI is elbowing its way into boffins' labs

Machine learning is cropping up more and more in research papers – does it work?

Feature Powerful artificially intelligent algorithms and models are all the rage. They're knocking it out of the park in language translation and image recognition, but autonomous cars and chatbots? Not so much.

One area machine learning could do surprisingly well in is science research. As AI advances, its potential is being seized by academics. The number of natural science studies that use machine learning is steadily rising.

Two separate papers that show how neural networks can be trained to pinpoint when the precise shuffle of particles leads to a physical phase transition – something that could help scientists understand phenomena like superconductivity – were published on the same day earlier this month in Nature Physics.

Chemistry is a game for AI to play

Science has had an affair with AI for a while, said Marwin Segler, a PhD student studying chemistry under Professor Mark Waller at the University of Münster, Germany. However, until now, the relationship hasn’t been terribly fruitful.

Segler is interested in retrosynthesis, a technique that reveals how a desired molecule can be broken down into simpler chemical building blocks. Chemists can then carry out the necessary reaction steps to craft the required molecule from these building blocks. These molecules can then be used in drugs and other products.

A good analogy would be something like a “cooking recipe,” Segler told The Register. “Imagine you’re trying to make a complicated cake. Retrosynthesis will show you how to make the cake, and the ingredients you need.”

In the 1990s, before the deep learning hype kicked off, expert systems were used to perform retrosynthesis. Rules for reactions had to be manually programmed in: this is tedious work, and it never delivered any convincing results.

Now things are starting to look more promising with modern AI techniques. Retrosynthesis has strong analogies to puzzle games, particularly Go. Software can attempt to solve retrosynthesis problems in the same way it solves Go challenges: splintering the problem ahead into component parts and finding the best route to the solution.

All the viable moves in a Go match can be fanned out into a large search tree and the winning moves are identified using a Monte Carlo Tree Search – an algorithm used by AlphaGo to defeat Lee Sedol, a Korean Go champion.

Just like how AlphaGo was trained to triumph in Go games, Segler’s AlphaChem program is trained to determine the best move to find the puzzle pieces that fit together to build the desired molecule. The code is fed a library containing millions of chemical reactions to obtain the necessary bank of knowledge to ultimately break down molecules into building blocks.

“Chemists rely on their intuition, which they master during long years of work and study, to prioritize which rules to apply when retroanalyzing molecules. Analogous to master move prediction in Go, we showed recently that, instead of hand-coding, neural networks can learn the ‘master chemist moves’,” the AlphaChem paper [PDF], submitted to AI conference ICLR 2017 in January, reads.

There are thousands of possible moves per position to play on the Go board, just as there are multiple pathways to consider when trying to break down a molecule into simpler components.

AlphaGo and AlphaChem both cut down on computational costs by pruning the search tree, so there are fewer branches to consider. Only the top 50 most-promising moves are played out, so it doesn’t take a fancy supercomputer packing tons of CPU cores and accelerators to perform the retrosynthesis – an Apple MacBook Pro will do.

During the testing phase, AlphaChem was pitted against two other more-traditional search algorithms to find the best reactions for 40 molecules. Although AlphaChem proved slower than the best-first search algorithm, it was more accurate solving the problem up to 95 per cent of the time.

Segler hopes AlphaChem will one day be used to find new ways of making drugs more cheaply or to help chemists manufacture new molecules. It is possible the software will, in future revisions, reveal reactions and techniques humans had not considered.

It’s true that using AI is fashionable right now, and interest has piqued in science because of the hype, he said. “But on the other hand, it’s getting used more because it’s producing better results.”

Investment in AI has led to better algorithms, and a lot of the frameworks, such as TensorFlow, Caffe, and PyTorch, are publicly available, making it easier for non-experts to use.

“I coded the Monte Carlo Tree Search algorithm myself, but for the neural network stuff I used Keras,” Segler told us.

Although AI has been used in chemistry for over 40 years, it’s more challenging to apply it in chemistry compared to other subjects, Segler said. “Gathering training data is very expensive in chemistry, because every data point is a laboratory experiment. We cannot simply annotate photos or gather lots of text from the internet, as in computer vision or natural language processing.”

For one thing, a lot of medical-related data is kept confidential, and companies don’t generally share this information to chemists and biochemists for training systems.

Biting the hand that feeds IT © 1998–2017