From Chess to AI to Neural Network to Chemistry Nobel Prize 2024
Prof. Homendra Naorem *
As the festive season begins in India, the Royal Swedish Academy of Sciences has announced on October 9, 2024 that the Chemistry Nobel Prize 2024 will be shared by David Baker of University of Washington, Seattle, USA (half the prize money) and Demis Hassabis and John M. Jumper of Google DeepMind, London, UK (the remaining half) in recognition of their pioneering works on computational protein design and protein structure prediction respectively.
The medal and the prize money will be presented personally to the winners during a solemn ceremony to be held on Dec 10, 2024, the death anniversary of the Alfred Nobel, the Noble multi-millionaire Chemist who instituted the most prestigious prize on the earth. Paradoxical as it may appear, all the three winners of this year’s chemistry Nobel prize had no initial formal training in conventional Chemistry, not even theoretical quantum chemistry calculations!
David Baker who studied philosophy and social science at Harward University transitioned to Biology in his final years when he developed interest in cell biology after reading the textbook Molecular Biology of the Cell as part of the course in evolutionary biology. Eventually, he earned his PhD from University of California, Berkley studying the protein transport in yeast.
Fascinated by the nature of protein folding, he started investigating how different sequences of amino acids form variety of proteins with definite structure as found in cells. Demis Hassabis, on the other hand, is a child prodigy in chess who played at master level at the tender age of 13 and is a successful programmer and game developer. He is a co-founder of DeepMind, a company that developed masterful AI models for popular boardgames.
He also studied brain and neuroscience in order to develop better neural networks using Artificial Intelligence (AI). John Jumper earned his Ph.D. in Physics from University of Cambridge. He then started developing simulation methods for protein dynamics but soon shifted to computational biology where he applied machine learning to explore the physics of protein folding.
He then joined Google’s DeepMind, an AI development company, in 2017 as a research scientist working on AlphaFold, which can predict the 3-D structure of proteins using machine-learning algorithms. Despite coming from different backgrounds, they all had one common interest - PROTEIN - designing of new proteins and predicting their structure using computer algorithms using machine learning and AI.
Inexplicable as it is, life is composed of lifeless chemicals, among which proteins are perhaps the most important ones since they perform the major structural and functional aspects of the living system. The major constituents of proteins are Carbon, Hydrogen, Oxygen and Nitrogen with Sulfur and Phosphorous sometimes as minor constituents. From the prism of chemistry, proteins are polymers formed by various arrangements of 20 different monomer units which are known as amino acids.
Amino acids are simple organic (carbon) compounds with an amine (-NH2) and an acid (-COOH) group covalently linked to a central carbon atom (known as the alpha carbon) which is also bonded to one alkyl R group. In fact, the term amino acid is short form of alpha -amino carboxylic acid. Two or more amino acids will polymerize by combining the alpha-acid group of one amino acid with the alpha-amine group of another amino acid to form a peptide bond (-CO-NH-) with the elimination of a water molecule.
Depending upon the number of amnio acids polymerized, they are known as dipeptide, tripeptide, or oligopeptide when there are about 20 amino acids and polypeptides with about 50 amino acids. When the number of amino acids in the polymer is more than 50, they are known as proteins. Of about 300 known amino acids, only 20 are found in human body.
Since proteins are formed by polymerization of these 20 amino acids, there are huge number of proteins that can be formed by different combination and permutations of the 20 amino acids, e.g. for 2 amino acids, there are 20^2 or 400 possible ways of forming a dipeptide, 20^3 or 8000 for tripeptide, etc. which become frustratingly huger even for 10 amino acids, let alone the protein!
Despite these colossal possibilities, only a few proteins with particular amino acid sequences and structure are found in the body. How does nature select and design the sequence of amino acids in proteins and their structure? That is a baffling question! But, knowing the sequence of the amino acids alone is not enough to understand its role in life processes unless the structure of the protein is known.
Determination of the structure of protein is indeed intellectually very challenging since the combination of 2 or more amino acids through peptide linkage will not yield a straight polymer chain but a nonplanar one with the alpha-carbon conserving its tetrahedral structure. This will lead to twisting or folding of the polymer chain.
Therefore, approach to understanding the polymer structure is through four different stages, namely, primary, secondary, tertiary and quaternary structures. In brief, primary structure provides the number and sequences of the amino acids in the protein, secondary provides the spatial conformation or folding of the chain, tertiary give the overall three-dimensional structure of the protein while quaternary structure determines the arrangement of different units when more than one polypeptide chain known as units aggregate to form a protein.
Because of the complexities involved, structure of only some proteins have been fully established till now while the structure of many more are yet to be established. How does the protein fold in a particular pattern, is it a random process? Christian Anfinsen in 1961 showed that a protein’s three-dimensional structure is entirely governed by the sequence of amino acids in the protein for which he got Chemistry Nobel Prize in 1972.
However, a protein with the same sequence of amino acids can fold in enormous ways – with 100 amino acids, it can have as many as ten to the power of 47 (10^47) different 3-D structures! But, in a cell, it will have only one particular folding! Moreover, if the protein is somehow made to unfold, it will return to the same folding as if all the information about how the protein is to be folded is present in the amino acid sequence! How does the chain of amino acids fold? Can the folding of a protein be predicted from its sequences? These have remained as enigmatic questions in life sciences.
Considering the complexities involved in experimental determination of structure of proteins, a group known as Critical Assessment of Protein Structure Prediction (CASP) began (1994) kind of competition to predict the structure of proteins based on the known amino acids sequence and whose structures is established but not revealed to the competitors.
The prediction accuracy could not be better than 40% until Demis Hassabis with his AI model AlphaFold entered the CASP competition when the accuracy was improved to about 60%, better but still not acceptable yet. Meanwhile, John Jumper with a background of simulating protein dynamics joined the Google’s DeepMind and Hassabis to develop the AlphaFold2 using innovative breakthrough in AI and neural networks model.
With the new AI architecture, it started delivering good results in the 14th CASP (2020) competition. The predictability of protein structures has so greatly improved that CASP’s organisers in 2020 realized that the age-old challenge for prediction of protein structure was over. Because, AlphaFold2 can predict a structure almost as good as X-ray crystallography can give which left the CASP’s founders with the question ‘what now’?
Hassabis and Jumper generated the structure of all human proteins just in seconds or minutes at the same time predicting the structure of virtually all the 200 million proteins discovered so far! In 1998, Baker also participated the CASP competition using Rosetta, the computer software he developed to predict protein structures from a given sequence of amino acids.
His program did really well but not good enough, which made him change his approach - instead of using the amino acid sequences in Rosetta to get the structures, he starts with a desired protein structure to obtain possible amino acid sequences – a kind of reverse engineering! This reverse approach enabled them to create entirely new proteins with designed sequences of amino acids, which made Baker de novo protein designer and constructor.
The software is so successful that the structure of a protein (Top7) with a known amino acid sequence found in bacteria could be precisely predicted as good as the one given by X-ray diffraction patterns. Since then, many spectacular proteins have been created in Baker’s laboratory. In the best scientific traditions, Google DeepMind has made the code for AlphaFold2 publicly available and Baker has also released the code for Rosetta so that the global research community can not only use them but continue developing better software for wider use and finding new areas of application.
Development of computer programs or software to predict the structure of a protein or the amino acid sequences requires deep understanding of the underlying principles of chemical binding and the major driving forces behind the protein folding like electrostatic interactions, hydrogen bond, hydrophobic effects, disulfide bonds, the van der waals forces, etc.
Despite having no formal training in chemistry, all the three Nobel laureates have mastered them to effectively use it while developing the program algorithm with skilful use of machine learning and AI to predict the protein structure or construct new proteins.
In the process, they have used the available data of almost all known proteins obtained through XRD or other experimental methods to extract the pattern before developing the AI driven programs. Large set of experimental data are necessary for development of AI driven predictive programs.
The awarding of the Nobel prize in chemistry for using AI to predict protein structure and Nobel prize in physics for the foundational work in machine learning using AI networks marks the entry of AI in sciences in a big way. Soon, AI would be reshaping chemistry and the chemical industry with algorithms that accelerate molecular design and how chemists solve complex structural problems.
Apparently, majority of chemists currently are exuberantly in a race of synthesizing and ascertaining the structures of host of new compounds. The chemistry Nobel 2024 is prompting them to start looking for any pattern or design in the formation of the compounds and their structure with natural, if not artificial, intelligence before some weirder trained in AI and machine learning skills uses your date and develop a program that can predict the library of new compounds you have made including the ones you can make!
The message of this year’s Nobel prize in Physics and Chemistry, therefore, is to make AI and machine learning as integral part of undergraduate courses in physics as well as chemistry. Are we ready to in India or, nearer home, in Manipur?
(It is a popular article intended for the common man with interest in Chemistry Nobel Prize)
* Prof. Homendra Naorem wrote this article for e-pao.net
The writer is at the Department of Chemistry at Manipur University, Canchipur
and can be contacted at naorem(DOT)homendra(AT)gmail(DOT)com
This article was webcasted on October 29 2024.
* Comments posted by users in this discussion thread and other parts of this site are opinions of the individuals posting them (whose user ID is displayed alongside) and not the views of e-pao.net. We strongly recommend that users exercise responsibility, sensitivity and caution over language while writing your opinions which will be seen and read by other users. Please read a complete Guideline on using comments on this website.