Using machine learning to create new proteins by translating proteins into musical scores.
Proteins are the building blocks of life, and consequently, scientists have long studied how they can improve proteins and design completely new proteins that perform new functions and processes.
Traditionally, new proteins are created by either mimicking existing proteins or manually editing the
Amino acids are a set of organic compounds used to build proteins. There are about 500 naturally occurring known amino acids, though only 20 appear in the genetic code. Proteins consist of one or more chains of amino acids called polypeptides. The sequence of the amino acid chain causes the polypeptide to fold into a shape that is biologically active. The amino acid sequences of proteins are encoded in the genes. Nine proteinogenic amino acids are called “essential” for humans because they cannot be produced from other compounds by the human body and so must be taken in as food.
” class=”glossaryLink “>amino acids that make up the proteins. This process, however, is time-consuming, and it is difficult to predict the impact of changing any one of the amino acid components of a given protein.
In this week’s APL Bioengineering, from AIP Publishing, researchers in the United States and Taiwan explore how to create new proteins by using machine learning to translate protein structures into musical scores, presenting an unusual way to translate physics concepts across disparate domains.
Each of the 20 amino acids that make up proteins has a unique vibrational frequency. The chemical structure of entire proteins can consequently be mapped with audible representations, using known concepts from music theory like note volume, melody, chords and rhythm. The specific sounds generated, determined by the way a protein folds, can be used to train deep learning neural networks.
“These networks learn to understand the complex language folded proteins speak at multiple time scales,” said Markus J. Buehler, from the Massachusetts Institute of Technology. “And once the computer has been given a seed of a sequence, it can extrapolate and design entirely new proteins by improvising from this initial idea, while considering various levels of musical variations — controlled through a temperature parameter — during the generation.”
The team compared the new proteins against a large database with information about all known proteins and used molecular dynamics equilibration and characterization by using a normal mode analysis. Through these steps, the researchers demonstrated the method could design proteins that nature had not yet invented. The new proteins appear to be stable, folded designs, and scientists created an algorithm to materialize music from sound waves to matter.
“This paves the way for making entirely new biomaterials,” said Buehler. “Or perhaps you find an enzyme in nature and want to improve how it catalyzes or come up with new variations of proteins altogether.”
By adjusting the temperature, the number of variations the algorithm creates can be increased. The new mutations can be measured to see which are most effective as enzymes, for example.
The “protein music” (listen on SoundCloud) the researchers uncovered could also help create new compositional techniques in classical music by illuminating the rhythms and tones of proteins, a method Buehler refers to as materiomusic.
“In the evolution of proteins over thousands of years, nature also gives us new ideas for how sounds be combined and merged,” said Buehler.
Reference: “Sonification based de novo protein design using artificial intelligence, structure prediction and analysis using molecular modeling” by Markus J. Buehler and Chi Hua Yu, 17 March 2020, APL Bioengineering.