An artificial intelligence (AI) network developed by Google AI research lab DeepMind has made a gigantic leap in solving one of biology’s greatest challenges — determining a protein’s 3D shape from its amino-acid sequence. The company has created the most comprehensive map of human proteins to date using artificial intelligence, releasing its data for free, collaborating with other eminent scientists.
Proteins are building blocks of life. Understanding its structure helps scientists with a wide variety of tasks. For example, the information can help design new medicines, synthesize novel enzymes that break down waste materials and create crops resistant to viruses or extreme weather.
Research on proteins structure is time-consuming and costly; however, last year, DeepMind came up with true predictions of a protein’s structure using AI software called AlphaFold. Currently, it is releasing hundreds of thousands of predictions to the public.
“I see this as the culmination of the entire 10-year-plus lifetime of DeepMind,” company CEO and co-founder Demis Hassabis told The Verge. “From the beginning, this is what we set out to do: to make breakthroughs in AI, test that on games like Go and Atari, [and] apply that to real-world problems, to see if we can accelerate scientific breakthroughs and use those to benefit humanity.”
The more than 350,000 protein structures across 20 different organisms, including animals like mice and fruit flies and bacteria like E. coli., are available through a public database. But researchers say the resource — which is set to grow to 130 million structures by the end of the year — can revolutionize the life sciences. For example, the human proteome holds the instructions for more than 20,000 proteins. It is the most comprehensive and accurate public dataset of human proteins.
Scientists can download the entire human proteome for themselves, says AlphaFold’s technical lead John Jumper. “There is a HumanProteome.zip effectively, I think it’s about 50 gigabytes in size,” Jumper tells The Verge. “You can put it on a flash drive if you want, though it wouldn’t do you much good without a computer for analysis!”
After completing the first tranche of data, the DeepMind team set out to predict the structures of nearly every known protein encoded by the human genome. The structures are available in a database maintained by EMBL-EBI (the European Molecular Biology Laboratory European Bioinformatics Institute) in Hinxton, UK. At the end of the year, DeepMind plans on releasing predictions for 100 million protein structures, a dataset that will be “transformative for our understanding of how life works,” according to Edith Heard, director-general of the EMBL.
“The data will be free in perpetuity for both scientific and commercial researchers,” says Hassabis.
“Anyone can use it for anything,” the DeepMind CEO noted at a press briefing. “They just need to credit the people involved in the citation.”
Currently, DeepMind’s protein predictions are being used for medical research purposes, including studying the SARS-CoV-2.
However, plenty of time is required to turn this information into real-world results. “I don’t think it’s going to be something that changes the way patients are treated within the year, but it will definitely have a huge impact for the scientific community,” Marcelo C. Sousa, a professor at the University of Colorado’s biochemistry department, told The Verge.
Scientists will have to learn this information by rote, says DeepMind senior research scientist Kathryn Tunyasuvunakool. “As a biologist, I can confirm we have no playbook for looking at even 20,000 structures, so this [amount of data] is hugely unexpected,” Tunyasuvunakool told The Verge. “To be analysing hundreds of thousands of structures — it’s crazy.”
Helen Walden, a professor of structural biology at the University of Glasgow, tells The Verge that DeepMind’s work will “significantly ease” research bottlenecks, but “the laborious, resource-draining work of doing the biochemistry and biological evaluation of, for example, drug functions” will continue.
According to Sousa, for scientists, the effect will be felt instantly. He has previously used AlphaFold’s data. “In our collaboration, we had with DeepMind, we had a dataset with a protein sample we’d had for 10 years, and we’d never got to the point of developing a model that fits,” he says. “DeepMind agreed to provide us with a structure, and they were able to solve the problem in 15 minutes after we’d been sitting on it for 10 years.”
Proteins are made from chains of amino acids, which come in 20 different varieties in the human body. An individual protein comprises hundreds of individual amino acids. Proteins can’t be examined through microscopes; therefore, their structure is determined by using complex methods like nuclear magnetic resonance and X-ray crystallography.
Determining the structure of a protein simply through its amino acids is difficult; therefore, many scientists call it a “grand challenge” of biology. Although in recent years, artificial intelligence has made the required analysis possible.
For many years, various groups have been working on it; however, recently, DeepMind’s deep-rooted access to AI methods has increased this effort drastically. Last year, DeepMind’s program, AlphaFold, outperformed around 100 other teams in a biennial protein-structure prediction challenge called CASP (Critical Assessment of Structure Prediction). The results were so accurate that computational biologist John Moult, one of CASP’s co-founders, said that “in some sense, the problem [of protein folding] is solved.”
“We can fold an average protein in a matter of minutes, most cases seconds,” says Hassabis. The company also released the underlying code for AlphaFold last week as open-source, allowing others to work in the future.
Liam McGuffin, a professor at Reading University who developed some of the UK’s leading protein-folding software, applauded the performance of AlphaFold but also said that the program’s success has decades of prior research to it. “DeepMind has vast resources to keep this database up to date, and they are better placed to do this than any single academic group,” McGuffin told The Verge. “I think academics would have got there in the end, but it would have been slower because we’re not as well resourced.”
Hassabis told The Verge that the company always planned to make this information freely available, it is fulfilling its founding ethos. He emphasized that DeepMind’s work is used in various places at Google — “almost anything you use, there’s some of our technology that’s part of that under the hood” — but that the company’s primary goal has always been fundamental research.
“The agreement when we got acquired is that we are here primarily to advance the state of AGI and AI technologies and then use that to accelerate scientific breakthroughs,” says Hassabis. “[Alphabet] has plenty of divisions focused on making money,” he adds, noting that DeepMind’s focus on research “brings all sorts of benefits, in terms of prestige and goodwill for the scientific community. So there are many ways value can be attained.”
Hassabis foresees AlphaFold as a future sign of many spectacular pieces of research. It’s a project that will solve ambiguities related to human biology.
“I think we’re at a really exciting moment,” he says. “In the next decade, we, and others in the AI field, are hoping to produce amazing breakthroughs that will genuinely accelerate solutions to the really big problems we have here on Earth.”