Figure - available from: The FEBS Journal
This content is subject to copyright. Terms and conditions apply.
Structure of avidin. (A) The homotetrameric avidins consist of four identical monomers shown in cyan, magenta, red and yellow. Each monomer binds one biotin molecule. (B) The structure of the monomeric subunit of hen egg‐white avidin (representative of all other avidins). The monomer consists of a β‐barrel topology formed by eight anti‐parallel β‐strands connected by hairpin loops. The binding site of biotin (black) is positioned at the wide edge of the β‐barrel. (C) The monomer‐monomer interactions of the tetrameric avidins. The 1–2 interaction is shown, where Trp‐110 (indicated by an arrow) is contributed from one monomer to the biotin‐binding site of the other. The 1–3 interaction involves only 3–4 residues (not shown) from each monomer and exhibits the smallest contact surface. The extensive 1–4 sandwich‐like interaction involves numerous amino acid residues from each monomer (not shown). All molecular graphics presented in the figures were generated using pymol (The PyMOL Molecular Graphics System, Version 1.7; Schrödinger, LLC, New York, NY, USA).

Structure of avidin. (A) The homotetrameric avidins consist of four identical monomers shown in cyan, magenta, red and yellow. Each monomer binds one biotin molecule. (B) The structure of the monomeric subunit of hen egg‐white avidin (representative of all other avidins). The monomer consists of a β‐barrel topology formed by eight anti‐parallel β‐strands connected by hairpin loops. The binding site of biotin (black) is positioned at the wide edge of the β‐barrel. (C) The monomer‐monomer interactions of the tetrameric avidins. The 1–2 interaction is shown, where Trp‐110 (indicated by an arrow) is contributed from one monomer to the biotin‐binding site of the other. The 1–3 interaction involves only 3–4 residues (not shown) from each monomer and exhibits the smallest contact surface. The extensive 1–4 sandwich‐like interaction involves numerous amino acid residues from each monomer (not shown). All molecular graphics presented in the figures were generated using pymol (The PyMOL Molecular Graphics System, Version 1.7; Schrödinger, LLC, New York, NY, USA).

Source publication
Article
Full-text available
The dimeric avidin family has been expanded in recent years to include many new members. All of them lack the intermonomeric Trp that plays a critical role in biotin‐binding. Nevertheless, these new members of the avidins maintain the high affinity towards biotin. Additionally, all of the dimeric avidins share a very unique property: namely, the cy...

Similar publications

Article
Full-text available
An immunoassay is an analytical test method in which analyte quantitation is based on signal responses generated as a consequence of an antibody–antigen interaction. They are the method of choice for the measurement of a large panel of diagnostic markers. Not only are they fully automated, allowing for a short turnaround time and high throughput, b...

Citations

... The small classes have a dozen or even less entries, many of them with high sequence similarity, thus clustering together and providing less additional information. Moreover, the changes needed to shift a sequence from one qs to another may be very small, involving as few as 5 residues and in cases even a single point mutation [4,27]. ...
Article
Full-text available
Background Determining a protein’s quaternary state, i.e. the number of monomers in a functional unit, is a critical step in protein characterization. Many proteins form multimers for their activity, and over 50% are estimated to naturally form homomultimers. Experimental quaternary state determination can be challenging and require extensive work. To complement these efforts, a number of computational tools have been developed for quaternary state prediction, often utilizing experimentally validated structural information. Recently, dramatic advances have been made in the field of deep learning for predicting protein structure and other characteristics. Protein language models, such as ESM-2, that apply computational natural-language models to proteins successfully capture secondary structure, protein cell localization and other characteristics, from a single sequence. Here we hypothesize that information about the protein quaternary state may be contained within protein sequences as well, allowing us to benefit from these novel approaches in the context of quaternary state prediction. Results We generated ESM-2 embeddings for a large dataset of proteins with quaternary state labels from the curated QSbio dataset. We trained a model for quaternary state classification and assessed it on a non-overlapping set of distinct folds (ECOD family level). Our model, named QUEEN (QUaternary state prediction using dEEp learNing), performs worse than approaches that include information from solved crystal structures. However, it successfully learned to distinguish multimers from monomers, and predicts the specific quaternary state with moderate success, better than simple sequence similarity-based annotation transfer. Our results demonstrate that complex, quaternary state related information is included in such embeddings. Conclusions QUEEN is the first to investigate the power of embeddings for the prediction of the quaternary state of proteins. As such, it lays out strengths as well as limitations of a sequence-based protein language model approach, compared to structure-based approaches. Since it does not require any structural information and is fast, we anticipate that it will be of wide use both for in-depth investigation of specific systems, as well as for studies of large sets of protein sequences. A simple colab implementation is available at: https://colab.research.google.com/github/Furman-Lab/QUEEN/blob/main/QUEEN_prediction_notebook.ipynb.
... ; information. Moreover, the changes needed to shift a sequence from one qs to another may be very small, involving as few as 5 residues and in cases even a single point mutation (4,25). In this study we examine the power of pLM embeddings, derived from the pre-trained pLM ESM2 model (23), to capture and consequently classify the qs of proteins. ...
Preprint
Full-text available
Background: Determining a protein's quaternary state, i.e. how many monomers assemble together to form the functioning unit, is a critical step in protein characterization, and deducing it is not trivial. Many proteins form multimers for their activity, and over 50% are estimated to naturally form homomultimers. Experimental quaternary state determination can be challenging and require extensive work. To complement these efforts, a number of computational tools have been developed for quaternary state prediction, often utilizing experimentally validated structural information. Recently, dramatic advances have been made in the field of deep learning for predicting protein structure and other characteristics. Protein language models that apply computational natural-language models to proteins successfully capture secondary structure, protein cell localization and other characteristics, from a single sequence. Here we hypothesize that information about the protein quaternary state may be contained within protein sequences as well, allowing us to benefit from these novel approaches in the context of quaternary state prediction. Results: We generated embeddings for a large dataset of quaternary state labels, extracted from the curated QSbio dataset. We then trained a model for quaternary state classification and assessed it on a non-overlapping set of distinct folds (ECOD family level). Our model, named QUEEN (QUaternary state prediction using dEEp learNing), performs worse than approaches that include information from solved crystal structures. However, we show that it successfully learned to distinguish multimers from monomers, and that the specific quaternary state is predicted with moderate success, better than a simple model that transfers annotation based on sequence similarity. Our results demonstrate that complex, quaternary state related information is included in these embeddings. Conclusions: QUEEN is the first to investigate the power of embeddings for the prediction of the quaternary state of proteins. As such, it lays out the strength as well as limitations of a sequence-based protein language model approach compared to structure-based approaches. Since it does not require any structural information and is fast, we anticipate that it will be of wide use both for in-depth investigation of specific systems, as well as for studies of large sets of protein sequences. A simple colab implementation is available at: https://colab.research.google.com/github/Orly-A/QUEEN_prediction/blob/main/QUEEN_prediction_notebook.ipynb.
... The small classes have a dozen or even less entries, many of them with high sequence similarity, thus clustering together and providing less additional information. Moreover, the changes needed to shift a sequence from one qs to another may be very small, involving as few as 5 residues and in cases even a single point mutation (4,25). ...
Preprint
Full-text available
Background: Determining a protein’s quaternary state, i.e. the number of monomers in a functional unit, is a critical step in protein characterization. Many proteins form multimers for their activity, and over 50% are estimated to naturally form homomultimers. Experimental quaternary state determination can be challenging and require extensive work. To complement these efforts, a number of computational tools have been developed for quaternary state prediction, often utilizing experimentally validated structural information. Recently, dramatic advances have been made in the field of deep learning for predicting protein structure and other characteristics. Protein language models, such as ESM-2, that apply computational natural-language models to proteins successfully capture secondary structure, protein cell localization and other characteristics, from a single sequence. Here we hypothesize that information about the protein quaternary state may be contained within protein sequences as well, allowing us to benefit from these novel approaches in the context of quaternary state prediction. Results: We generated ESM-2 embeddings for a large dataset of proteins with quaternary state labels from the curated QSbio dataset. We trained a model for quaternary state classification and assessed it on a non-overlapping set of distinct folds (ECOD family level). Our model, named QUEEN (QUaternary state prediction using dEEp learNing), performs worse than approaches that include information from solved crystal structures. However, it successfully learned to distinguish multimers from monomers, and predicts the specific quaternary state with moderate success, better than simple sequence similarity-based annotation transfer. Our results demonstrate that complex, quaternary state related information is included in such embeddings. Conclusions: QUEEN is the first to investigate the power of embeddings for the prediction of the quaternary state of proteins. As such, it lays out strengths as well as limitations of a sequence-based protein language model approach, compared to structure-based approaches. Since it does not require any structural information and is fast, we anticipate that it will be of wide use both for in-depth investigation of specific systems, as well as for studies of large sets of protein sequences. A simple colab implementation is available at: https://colab.research.google.com/github/Orly-A/QUEEN_prediction/blob/main/QUEEN_prediction_notebook.ipynb.