DeepMind and EMBL release the most complete database of predicted 3D structures of human proteins.
Partners use AlphaFold, the AI system recognized last year as a solution to the protein structure prediction problem, to release more than 350,000 protein structure predictions including the entire human proteome to the scientific community.
DeepMind today announced its partnership with the European Molecular Biology Laboratory (EMBL), Europe’s flagship laboratory for the life sciences, to make the most complete and accurate database yet of predicted protein structure models for the human proteome. This will cover all ~20,000 proteins expressed by the human genome, and the data will be freely and openly available to the scientific community. The database and artificial intelligence system provide structural biologists with powerful new tools for examining a protein’s three-dimensional structure, and offer a treasure trove of data that could unlock future advances and herald a new era for AI-enabled biology.
AlphaFold’s recognition in December 2020 by the organizers of the Critical Assessment of protein Structure Prediction (CASP) benchmark as a solution to the 50-year-old grand challenge of protein structure prediction was a stunning breakthrough for the field. The AlphaFold Protein Structure Database builds on this innovation and the discoveries of generations of scientists, from the early pioneers of protein imaging and crystallography, to the thousands of prediction specialists and structural biologists who’ve spent years experimenting with proteins since. The database dramatically expands the accumulated knowledge of protein structures, more than doubling the number of high-accuracy human protein structures available to researchers. Advancing the understanding of these building blocks of life, which underpin every biological process in every living thing, will help enable researchers across a huge variety of fields to accelerate their work.
Last week, the methodology behind the latest highly innovative version of AlphaFold, the sophisticated AI system announced last December that powers these structure predictions, and its open source code were published in Nature. Today’s announcement coincides with a second Nature paper that provides the fullest picture of proteins that make up the human proteome, and the release of 20 additional organisms that are important for biological research.
“Our goal at DeepMind has always been to build AI and then use it as a tool to help accelerate the pace of scientific discovery itself, thereby advancing our understanding of the world around us,” said DeepMind Founder and CEO Demis Hassabis, PhD. “We used AlphaFold to generate the most complete and accurate picture of the human proteome. We believe this represents the most significant contribution AI has made to advancing scientific knowledge to date, and is a great illustration of the sorts of benefits AI can bring to society.”
AlphaFold is already helping scientists to accelerate discovery
The ability to predict a protein’s shape computationally from its amino acid sequence — rather than determining it experimentally through years of painstaking, laborious, and often costly techniques — is already helping scientists to achieve in months what previously took years.
“The AlphaFold database is a perfect example of the virtuous circle of open science,” said EMBL Director General Edith Heard. “AlphaFold was trained using data from public resources built by the scientific community so it makes sense for its predictions to be public. Sharing AlphaFold predictions openly and freely will empower researchers everywhere to gain new insights and drive discovery. I believe that AlphaFold is truly a revolution for the life sciences, just as genomics was several decades ago and I am very proud that EMBL has been able to help DeepMind in enabling open access to this remarkable resource.”
AlphaFold is already being used by partners such as the Drugs for Neglected Diseases Initiative (DNDi), which has advanced their research into life-saving cures for diseases that disproportionately affect the poorer parts of the world, and the Centre for Enzyme Innovation (CEI) is using AlphaFold to help engineer faster enzymes for recycling some of our most polluting single-use plastics. For those scientists who rely on experimental protein structure determination, AlphaFold’s predictions have helped accelerate their research. For example, a team at the University of Colorado Boulder is finding promise in using AlphaFold predictions to study antibiotic resistance, while a group at the University of California San Francisco has used them to increase their understanding of SARS-CoV-2 biology.
The AlphaFold Protein Structure Database
The AlphaFold Protein Structure Database* builds on many contributions from the international scientific community, as well as AlphaFold’s sophisticated algorithmic innovations and EMBL-EBI’s decades of experience in sharing the world’s biological data. DeepMind and EMBL’s European Bioinformatics Institute (EMBL-EBI) are providing access to AlphaFold’s predictions so that others can use the system as a tool to enable and accelerate research and open up completely new avenues of scientific discovery.
“This will be one of the most important datasets since the mapping of the Human Genome,” said EMBL Deputy Director General, and EMBL-EBI Director Ewan Birney. “Making AlphaFold predictions accessible to the international scientific community opens up so many new research avenues, from neglected diseases to new enzymes for biotechnology and everything in between. This is a great new scientific tool, which complements existing technologies, and will allow us to push the boundaries of our understanding of the world.”
In addition to the human proteome, the database launches with ~350,000 structures including 20 biologically-significant organisms such as E.coli, fruit fly, mouse, zebrafish, malaria parasite and tuberculosis bacteria. Research into these organisms has been the subject of countless research papers and numerous major breakthroughs. These structures will enable researchers across a huge variety of fields — from neuroscience to medicine — to accelerate their work.
The future of AlphaFold
The database and system will be periodically updated as we continue to invest in future improvements to AlphaFold, and over the coming months we plan to vastly expand the coverage to almost every sequenced protein known to science — over 100 million structures covering most of the UniProt reference database.
To learn more, please see the Nature papers describing our full method and the human proteome*, and read the Authors’ Notes*. See the open-source code to AlphaFold if you want to view the workings of the system, and Colab notebook* to run individual sequences. To explore the structures, visit EMBL-EBI’s searchable database* that is open and free to all.
Statements from independent leading scientists:
Paul Nurse, Nobel Laureate for Physiology or Medicine 2001, Director of the Francis Crick Institute and Chair of EMBL Science Advisory Committee
“Computational methods are transforming scientific research, opening up new possibilities for discovery and applications for the public good. Understanding the function of proteins is central to advancing our knowledge of life and will ultimately lead to improvements in health care, food sustainability, new technologies, and much beyond. DeepMind’s release of the AlphaFold Protein Structure Database with EMBL, Europe’s flagship organization for molecular biology, is a great leap for biological innovation that demonstrates the impact of interdisciplinary collaboration for scientific progress. With this resource freely and openly available, the scientific community will be able to draw on collective knowledge to accelerate discovery, ushering in a new era for AI-enabled biology.”
Venki Ramakrishnan, Nobel Laureate for Chemistry 2009 and former President of the Royal Society
“This computational work represents a stunning advance on the protein-folding problem, a 50-year old grand challenge in biology. It has occurred long before many people in the field would have predicted. It will be exciting to see the many ways in which it will fundamentally change biological research.”
Elizabeth Blackburn, Nobel Laureate for Physiology or Medicine 2009 and Professor Emerita University of California San Francisco
“As these revolutionary approaches to protein structures pioneered by DeepMind become accessible, this will open new windows for the scientific community onto the biological meaning of the genome sequence.”
Patrick Cramer, Director at Max Planck Institute for Biophysical Chemistry
“The marvelous resource provided by DeepMind and EMBL will change the way we do structural biology. The predictions demonstrate the power of machine learning and serve the world-wide community, which had provided open data to enable this breakthrough achievement. A seminal example of how science in the 21st century may be done.”
Statements from research partners using AlphaFold:
Ben Perry, Discovery Open Innovation Leader, Drugs for Neglected Diseases Initiative (DNDi)
“We need to supercharge the discovery of new drugs for the millions of people at risk of neglected diseases around the world. AI can be a game changer: by quickly and accurately predicting protein structures, AlphaFold opens new research horizons, improving both the scope and efficiency of R&D and facilitating our research in endemic countries. It is inspiring to see powerful cutting-edge AI enabling work on diseases which are concentrated almost exclusively in impoverished populations.”
Professor John McGeehan, Professor of Structural Biology and Director for the Centre, Centre for Enzyme Innovation (CEI) at the University of Portsmouth
“Our mission is to develop enzyme-enabled solutions for circular recycling of plastics. This technology is accelerating our research in a way that no one could have predicted. DeepMind offering to make this open access is going to change the whole community and allow everyone to do these types of experiments. What took us months and years to do, AlphaFold was able to do in a weekend. I feel that we have just jumped at least a year ahead of where we were yesterday.
Professor Marcelo Sousa, Department of Biochemistry, University of Colorado Boulder
“AlphaFold’s predictions have helped accelerate our research into antibiotic resistance by finally solving experimental data that we’ve been stuck on for more than 10 years. The predictions were so accurate and precise that I initially thought I might have done something wrong with the setup!”
Statements from DeepMind / Alphabet:
Sundar Pichai, CEO, Google and Alphabet
“The AlphaFold database shows the potential for AI to profoundly accelerate scientific progress. Not only has DeepMind’s machine learning system greatly expanded our accumulated knowledge of protein structures and the human proteome overnight, its deep insights into the building blocks of life hold extraordinary promise for the future of scientific discovery.”
Pushmeet Kohli, PhD, Head of AI for Science, DeepMind
“Our team has been working on AlphaFold to decipher and unlock the world of proteins by predicting their structure. We are making AlphaFold’s predictions available to everyone via a database to maximize the scientific progress that can be made from these insights. This database and AlphaFold have the potential to open up new avenues of scientific inquiry that will ultimately advance our understanding of many areas of biology and life itself. We believe that this will have a transformative impact for research on problems related to health and disease, the drug design process and environmental sustainability, and are very excited to see what applications are developed in the coming months and years.”
John Jumper, PhD, AlphaFold Lead, DeepMind
“As the database expands, models will be available for almost every cataloged protein. AlphaFold DB is likely to transform how we approach bioinformatics, the large-scale study of DNA and proteins, as it will enable us to study the proteins of all known organisms with near-atomic precision. We are optimistic that the promise and machine learning advances of AlphaFold will spur the development of an exciting new phase of protein research, where deep learning tools enable quantitative understanding of biology hand-in-hand with experimental methods.”
Kathryn Tunyasuvunakool, PhD, Research Scientist, DeepMind
“AlphaFold models can be used to help determine structures through experimental methods. Having a sufficiently accurate initial prediction of the structure will allow researchers to revisit and solve old X-ray datasets and cryo-EM maps for which model building wasn’t previously possible. This is a great example of how computational methods are complementary to experimental approaches.”
Statements from EMBL:
Prof. Dame Janet Thornton, Director Emeritus of EMBL-EBI
“The power of AI underlies the AlphaFold predictions, based on data gathered by scientists all over the world during the last 50 years. Making these models available will undoubtedly galvanize both the experimental and theoretical protein structure researchers to apply this new knowledge to their own areas of research and to open up new areas of interest. This contributes to our knowledge and understanding of living systems, with all the opportunities for humanity this will unlock.”
Sameer Velankar, PhD, Section Head at EMBL-EBI
“Twenty years on from the human genome revolution, AlphaFold is a significant breakthrough in biological research. Protein function is dictated by its structure, and the AlphaFold Protein Structure Database will deliver millions of predicted protein structures, accelerating the discovery process. The unprecedented scale will unleash a new wave of innovations to help us address challenges from health to climate change.”
Dr. Christoph Müller, Head of Structural and Computational Biology Unit, EMBL
“This is a huge step forward. AlphaFold structure predictions will greatly speed up structural biology research and will put three-dimensional protein structures even more into the limelight in life sciences research.”
“Highly accurate protein structure prediction for the human proteome” by Kathryn Tunyasuvunakool, Jonas Adler, Zachary Wu, Tim Green, Michal Zielinski, Augustin Žídek, Alex Bridgland, Andrew Cowie, Clemens Meyer, Agata Laydon, Sameer Velankar, Gerard J. Kleywegt, Alex Bateman, Richard Evans, Alexander Pritzel, Michael Figurnov, Olaf Ronneberger, Russ Bates, Simon A. A. Kohl, Anna Potapenko, Andrew J. Ballard, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Ellen Clancy, David Reiman, Stig Petersen, Andrew W. Senior, Koray Kavukcuoglu, Ewan Birney, Pushmeet Kohli, John Jumper and Demis Hassabis, 22 July 2021, Nature.
“Highly accurate protein structure prediction with AlphaFold” by John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, Alex Bridgland, Clemens Meyer, Simon A. A. Kohl, Andrew J. Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Jonas Adler, Trevor Back, Stig Petersen, David Reiman, Ellen Clancy, Michal Zielinski, Martin Steinegger, Michalina Pacholska, Tamas Berghammer, Sebastian Bodenstein, David Silver, Oriol Vinyals, Andrew W. Senior, Koray Kavukcuoglu, Pushmeet Kohli and Demis Hassabis, 15 July 2021, Nature.