DeepMind quits playing games with AI, ups the protein stakes with machine-learning code
Meet AlphaFold, an artificially intelligent system to predict crucial biochemical structures
Researchers at DeepMind are using AI software to study how proteins fold, with the hope that it will help scientists design new drugs more quickly.
The human body produces several thousands of different proteins, each with a unique form. Although there are only 20 amino acids, these can be organised in an astronomical number of ways.
How they are arranged affects how the resulting protein works, and what its task is. For example, antibody proteins hook onto viruses and bacteria to mark them for elimination. By working out how proteins are shaped, eggheads can produce drugs with chemicals that mimic these proteins, and make people better.
Google: Our DeepMind health slurp is completely kosherREAD MORE
It takes a lot of work, though, to figure out the 3D structure of any given protein. Instructions on how to form the proteins are encoded in our DNA, but these recipes only tell your body how to form long polypeptide chains of amino acid residues that eventually fold into the complex protein structures.
How they fold is non-obvious, and so the hard part is working out the final folded form from a given chain of acids – hence why figuring out how proteins form from genetic sequences is known as protein folding.
Cyrus Levinthal, an American molecular biologist, believed it would take longer than the age of the universe to model all the folding combinations for a single polypeptide chain.
AlphaFold enters the fold
This is where DeepMind's AlphaFold may come in handy. AlphaFold is made up of three different neural networks designed to predict the 3D structure of a protein given its constituent amino acids. These systems determine the correct distance and angles between pairs of amino acids. Another model determines how accurate the proposed structure is.
Scientists have used techniques from x-ray crystallography to cryo-electron microscopy, or utilized spare processor cycles on a small army of volunteer machines over the years, to work out various protein structures. The Protein Data Bank (PDB) contains 146,000 proteins, and DeepMind used 29,000 of them to train its neural networks.
There are two methods for computing proteins. One is known as the “template-based modelling,” and the other, “free modelling.”
“In predicting a structure for a new target sequence, one standard strategy is to look in PDB to see if there’s a protein with a similar sequence and a known structure,” Andrew Senior, team lead on AlphaFold, explained to The Register on Wednesday. "If there is, a good strategy is to use the PDB structure of that similar sequence as a 'template' and adapt it in ways to make it consistent with the target sequence."
Template-based modelling, however, only works if there is another well-known protein that is comparable. If there isn’t, developers have to turn to free modelling to construct formations from scratch. This is where DeepMind's neural-network software comes in, and generates new structures from a given set of amino acids, and is scored on its accuracy.
Don't try and beat AI, merge with it says chess champ Garry KasparovREAD MORE
“We can attempt to model any protein but accuracy will vary according to many factors,” Senior said.
"Longer proteins and those where there are no similar sequences to be found in sequence databases are harder to model, and we have really focused on globular proteins. We do fairly well on membrane proteins, but wouldn’t expect the system to do well on fibrous or disordered proteins."
To test the idea, DeepMind researchers entered its free-modelling CASP (Critical Assessment of protein Structure Prediction), a competition to test the latest protein folding techniques, under “A7D” and won.
Demis Hassabis, CEO and cofounder of DeepMind, told El Reg that although their method is state-of-the-art, the problem of protein folding hasn’t been solved yet. “We have to get much more accurate in order for it to be useful for biologists,” he said.
DeepMind have been working on the problem for two years. It hoped that its successes with building systems that mastered games like Go, Chess or Shogi would transfer to protein folding. But that remains to be seen as AlphaFold doesn’t use reinforcement learning, unlike AlphaGo and AlphaZero. “We hope to bring in reinforcement learning eventually,” Hassabis said. ®
Sponsored: Beyond the Data Frontier