ArticleLiterature Review

Prediction of protein–protein interactions by docking method

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Recently, developments have been made in predicting the structure of docked complexes when the coordinates of the components are known. The process generally consists of a stage during which the components are combined rigidly and then a refinement stage. Several rapid new algorithms have been introduced in the rigid docking problem and promising refinement techniques have been developed, based on modified molecular mechanics force fields and empirical measures of desolvation, combined with minimisations that switch on the short-range interactions gradually. There has also been progress in developing a benchmark set of targets for docking and a blind trial, similar to the trials of protein structure prediction, has taken place.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Cette approche utilise comme caractéristiques de prédiction la représentation structurelle d'une protéine, c'est à dire la forme, les surfaces réceptrices, les hélices, les informations structurelles 3D, et bien d'autres. [Smith et Sternberg 2002 ;Veber 2007]. ...
... La première consiste à générer un grand nombre de conformations (représentation tridimensionnelle) possibles pour l'association des deux protéines. Ensuite, une fonction de score est utilisée pour classer les différentes conformations [Smith et Sternberg 2002 ;Barradas-Bautista et al. 2018]. Les méthodes de Docking sont certes très précises, cependant elles sont très coûteuses en temps de calcul. ...
Thesis
La protéine est un composant essentiel de la cellule biologique. Les différentes interactions chimiques entre les protéines appelées interactions protéine-protéine (IPP) sont liées à certaines maladies ainsi qu’à certaines thérapies. De ce fait, leur identification a des implications importantes pour plusieurs processus parmi lesquels la prévention des maladies, l’annotation fonctionnelle et la conception de médicaments. De nombreuses IPP ont été détectées par des expériences biologiques au cours des dernières décennies, mais beaucoup d’entre elles restent encore non découvertes. En outre, ces expériences biologiques sont limitées en raison des contraintes de temps et de coûts. Par conséquent, le développement d’outils informatiques est vivement recommandé. Cela pourrait permettre d'accélérer la découverte de médicaments en réduisant les expériences biologiques lentes et coûteuses à l’aide de simulations informatiques plus rapides et moins coûteuses, et d'annoter la fonction des protéines à partir des séquences de protéines. Compte tenu de nombreuses IPP détectées expérimentalement et dont les informations sont disponibles dans des banques de données sur les protéines, plusieurs outils d’inférence et d’apprentissage automatique (machine learning) ont été proposés pour prédire les IPP. La conception de tels outils passe par deux étapes : l’extraction de caractéristiques (descripteurs) à partir des informations de la séquence et la classification des interactions à l’aide d’un algorithme d’apprentissage supervisé. Cependant, les caractéristiques extraites par les techniques existantes ne permettent pas aux algorithmes d’apprentissage supervisé d’être efficaces et d’obtenir des résultats idéaux. Ainsi, notre objectif est d'améliorer les performances prédictives par la conception de nouvelles techniques complémentaires permettant aux algorithmes d’apprentissage supervisé d’inférer correctement les interactions à partir des données de séquences de protéines et d’obtenir des résultats idéaux. Dans cette thèse, nous avons premièrement proposé une technique d’extraction de caractéristiques notée BP (Bigram-Physicochemical). Cette technique permet d’extraire des caractéristiques bigrammes à partir des séquences de protéines pour un apprentissage automatique efficace. Un bigramme est un ensemble de deux lettres (ici les acides aminés) successives dans un document texte (ici la séquence de protéine). Pour une protéine donnée, BP calcule d’abord une matrice à partir d’informations de propriétés physicochimiques de la protéine. Cette matrice peut être obtenue soit à partir d’une distance (approche BP1) ou soit à partir d’une fonction (approche BP). Ensuite, BP extrait des caractéristiques bigrammes en se servant de la matrice calculée. La technique BP ne produit pas de vecteurs strictement parcimonieux et ne dépend pas d’une base de données comme certaines techniques d’extraction de caractéristiques bigrammes existantes. Deuxièmement nous avons proposé une nouvelle approche de sélection de valeurs optimales d’hyperparamètres d’un modèle d’apprentissage automatique notée SVOH. Les hyperparamètres sont les paramètres influents du modèle d’apprentissage automatique. Généralement, la technique de la recherche sur gille (Grid search) est combinée à la technique de validation croisée k-fois pour la recherche des valeurs optimales d’hyperparamètres. Contrairement à la littérature qui fixe la valeur du nombre k, correspondant en fait au nombre de sous-ensembles de l’échantillon, à 5 ou 10 sur des bases a priori, SVOH fait un apprentissage du nombre k afin de déterminer une valeur optimale du nombre de sous-ensembles. L’approche développée permet ainsi de rechercher rigoureusement sur un nombre k ajusté de sous-ensembles de valeurs optimales d’hyperparamètres. Les techniques décrites au sein de la thèse, combinées à une méthode des machines à vecteurs de supports (SVM) et formant ainsi les outils SVM-BP et SVM-BP1, sont testées et validées sur trois différents ensembles de données IPP réelles : les IPP humaines HPRD, les IPP de la levure Saccharomyces cerevisiae et les IPP de la bactérie Helicobacter pylori. Les résultats obtenus après certaines expériences comparatives ont montré que ces outils et particulièrement l’outil SVM-BP, ont obtenu des performances supérieures sur les trois différents ensembles de données IPP dans les métriques justesse, précision et sensibilité. Nous pouvons dire que les outils SVM-BP et SVM-BP1 améliorent bien les performances prédictives des IPP et constituent ainsi une véritable aide pour les biologistes dans l’identification des interactions protéine-protéine et la recherche médicamenteuse.
... Therefore, computational modeling can be an efficient way in the structural study of the molecular interactions between various Fc, mutants, and FcγRIIIa. It is particularly suitable for investigating the structural effect of chain-specific mutation of Fc on the binding that is not easy to be performed experimentally [10]. Fc is composed of two identical chains, and it is not so feasible to introduce a mutation into only one of the chains in reality. ...
... Protein-protein docking simulation is a representative computational modeling method to predict a protein-protein binding mode at the molecular level. Although proteinprotein docking shows various prediction accuracy depending on target proteins, an optimization of docking condition for target protein-protein complex generally allows us to employ the docking tool for the structural study of proteinprotein interaction quite efficiently [10]. Our previous work investigated the computational modeling accuracy of a protein-protein docking tool, HADDOCK (High Ambiguity Driven protein-protein DOCKing) [12,13], for the Fc-FcγRIIIa complex and optimized its docking condition. ...
Article
Engineering of Fc for improved affinity to its receptor, FcγRIIIa, can enhance the therapeutic activity of monoclonal antibodies. S239D/I332E mutation of Fc has been extensively employed in various Fc engineering studies. Still, it is not clear how the mutations have structurally influenced the molecular interactions between Fc and FcγRIIIa. In this study, the point or combined mutations of S239D/I332E were introduced into one chain (A) or the other chain (chain B) of the homodimeric Fc domain computationally. Their structural effects on the binding to FcγRIIIa were investigated through a computational docking method. These results showed that the chain-specific point mutation, S239D induced a new salt-bridge with the receptor in A and B chains of Fc, whereas I332E mutation generated a new salt-bridge with the receptor only in A chain. The combined mutation study identified that the Fc variant with four mutations reproduced the three salt-bridges. This showed that the mutation of S239D and I332E in chain A of Fc induced complex salt-bridge formation with the Lys158 of FcγRIIIa. This study is expected to provide more structural insight into Fc variants’ design based on S239D/I332E mutation.
... Computational modeling is an alternative approach in the structural study of protein-protein complexes. It allows us to study the structures of protein complexes without the laborious or sometimes difficult procedure of crystallization [8]. For instance, the computational approach can be efficiently employed in the study of large-scale investigation on various mutational effects in protein-protein binding. ...
... Protein-Protein docking simulation is a representative computational modeling method to identify a proteinprotein binding mode at the molecular level [8]. In general, it searches the stable binding modes of protein pairs using the scoring functions based on binding energies, shape complementarity and other information related to proteinprotein interaction. ...
Research
Structural information of Fc-Fc receptor interaction may contribute to the design of drugs or therapeutic antibodies associated with the interaction. Computational protein-protein docking can be employed in structural study of protein-protein interaction, but its efficiency and reliability are still unstable and need to be validated and optimized for respective target protein complexes. In this study, we investigated and assessed the computational modeling efficiency of Fc-FcγR complex through HADDOCK by defining five different sets of active residues, a major parameter to determine the prediction efficiency of HADDOCK. The binding residues identified experimentally or the residues in the binding pocket were confirmed to be efficient active residues to achieve a high prediction efficiency, and too narrower or wider specification of active residues led to poor prediction efficiency. Most binding residues and crucial molecular interactions such as conserved interactions and hydrogen bonds in the crystal structure were reproduced in the best model. The HADDOCK docking condition determined in this study is expected to be applied to the computational characterization of various Fc-Fc receptor complexes and mutants.
... This has led to the widespread adoption of computational approaches for predicting protein-protein interaction (PPI). PPI refers to highly specific physical contact between protein molecules, including electrostatic forces, hydrogen bonding, and the hydrophobic effect, and plays important roles in various biological processes such as biochemical reactions, signaling, cell cycle control, and neurotransmission [30][31][32][33][34]. Techniques for predicting PPIs include docking and molecular dynamics simulations, which help to understand biological systems and disease mechanisms by structurally predicting the interfaces at which interaction takes place [35,36]. ...
Article
Full-text available
Myosin, a superfamily of motor proteins, obtain the energy they require for movement from ATP hydrolysis to perform various functions by binding to actin filaments. Extensive studies have clarified the diverse functions performed by the different isoforms of myosin. However, the unavailability of resolved structures has made it difficult to understand the way in which their mechanochemical cycle and structural diversity give rise to distinct functional properties. With this study, we seek to further our understanding of the structural organization of the myosin 7A motor domain by modeling the tertiary structure of myosin 7A based on its primary sequence. Multiple sequence alignment and a comparison of the models of different myosin isoforms and myosin 7A not only enabled us to identify highly conserved nucleotide binding sites but also to predict actin binding sites. In addition, the actomyosin-7A complex was predicted from the protein–protein interaction model, from which the core interface sites of actin and the myosin 7A motor domain were defined. Finally, sequence alignment and the comparison of models were used to suggest the possibility of a pliant region existing between the converter domain and lever arm of myosin 7A. The results of this study provide insights into the structure of myosin 7A that could serve as a framework for higher resolution studies in future.
... For regulation and understanding of these biological processes, the crucial steps are binding efficiencies and structural determination of the particular interactions. Significantly, binding affinity, which regulates molecular interactions, discovers whether the complex formation takes place under certain circumstances (55). To determine the structural mechanisms of higher pathogenicity of various mutants of SARS-CoV-2, molecular docking of KPNA2 with WT ORF6 and its various mutants including V9F, V24A, W27L, and I33T was performed by using the HDOCK server. ...
Article
Full-text available
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) surfaced on 31 December, 2019, and was identified as the causative agent of the global COVID-19 pandemic, leading to a pneumonia-like disease. One of its accessory proteins, ORF6, has been found to play a critical role in immune evasion by interacting with KPNA2 to antagonize IFN signaling and production pathways, resulting in the inhibition of IRF3 and STAT1 nuclear translocation. Since various mutations have been observed in ORF6, therefore, a comparative binding, biophysical, and structural analysis was used to reveal how these mutations affect the virus’s ability to evade the human immune system. Among the identified mutations, the V9F, V24A, W27L, and I33T, were found to have a highly destabilizing effect on the protein structure of ORF6. Additionally, the molecular docking analysis of wildtype and mutant ORF6 and KPNA2 revealed the docking score of - 53.72 kcal/mol for wildtype while, -267.90 kcal/mol, -258.41kcal/mol, -254.51 kcal/mol and -268.79 kcal/mol for V9F, V24A, W27L, and I33T respectively. As compared to the wildtype the V9F showed a stronger binding affinity with KPNA2 which is further verified by the binding free energy (-42.28 kcal/mol) calculation. Furthermore, to halt the binding interface of the ORF6-KPNA2 complex, we used a computational molecular search of potential natural products. A multi-step virtual screening of the African natural database identified the top 5 compounds with best docking scores of -6.40 kcal/mol, -6.10 kcal/mol, -6.09 kcal/mol, -6.06 kcal/mol, and -6.03 kcal/mol for tophit1-5 respectively. Subsequent all-atoms simulations of these top hits revealed consistent dynamics, indicating their stability and their potential to interact effectively with the interface residues. In conclusion, our study represents the first attempt to establish a foundation for understanding the heightened infectivity of new SARS-CoV-2 variants and provides a strong impetus for the development of novel drugs against them.
... Identifying the structural determinants of these interactions and their binding energy is crucial for a deeper understanding and regulation of these processes. Notably, the binding affinity, which determines whether or not complex formation occurs under particular circumstances, holds the key to regulating molecular interactions (e.g., engineering high-affinity interactions), developing novel therapeutics (e.g., guiding rational drug design), or predicting the effect of variations on protein interfaces [50]. The binding affinity has been calculated for decades by different methodologies, ranging from exact approaches (e. g., free energy perturbation) that are precise but computationally expensive compared to empirical methods (e.g., scoring functions in docking, various regression models), which are fast and accurate [51]. ...
Research
Studies on nonhuman primates, wild-type and transgenic mice have shown the presence of SARS-CoV-2 RNA components in the brains. Despite the Blood-Brain Barrier (BBB) provides protection there are less evidences on how the SARS-CoV-2 crosses the BBB. Given that there is an increase of Omicron reinfection rates, trans-missibility rate and involvement to cause neurological dysfunctions, we hypothesized to investigate how the Omicron variant (B.1.1.529) binds structurally to key BBB-maintaining proteins and thus can possibly challenge the integrity and transportation to the brain. By using molecular dynamics simulation approaches we examined the interaction of Omicron variant (B.1.1.529) with different structural and transporter proteins located at the BBB. Our results show that in Zona Ocludin 1-RBD complex, we observe a distinct pattern. Omicron demonstrates a docking score of − 88.9 ± 6.8 kcal/mol and six interactions, while the wild type (WT) presents a higher score of − 94.0 ± 2.3 kcal/mol, forming eight interactions. Comparing affinities, the WT-RBD displays a stronger preference for Claudin-5, boasting a docking score of − 110.2 ± 3.0 and nine interactions, versus Omicron-RBD's slightly reduced engagement, with a docking score of − 105.6 ± 0.2 and seven interactions. Interestingly, the Omicron variant exhibits heightened stability in interactions with Glucose Transporter and ABC transporters, registering docking scores of − 110.6 ± 1.9 and − 112.0 ± 3.6 kcal/mol, respectively. This surpasses the WT's respective scores of − 95.2 ± 2.2 and − 104.0 ± 6.2 kcal/mol, reflecting a unique interaction profile. Rigorous molecular dynamics simulations validate our findings. Our study emphasizes the Omicron variant's increased affinity towards transporter proteins, illuminating potential implications for BBB integrity and brain transportation. While these insights offer a valuable framework, comprehensive experimental validation is indispensable for a comprehensive understanding.
... Black Box Optimization (BBO) is a class of optimization problems featured by its objective function that is either unknown or too intricate to be mathematically formulated. It has a broad range of applications such as hyper-parameter tuning [1], neural architecture searching [2], and proteindocking [3]. Due to the black-box nature, the optimizer has no access to the mathematical expression, gradients, or any other structural information related to the problem. ...
Conference Paper
Full-text available
Recently, Meta-Black-Box Optimization with Reinforcement Learning (MetaBBO-RL) has showcased the power of leveraging RL at the meta-level to mitigate manual fine-tuning of low-level black-box optimizers. However, this field is hindered by the lack of a unified benchmark. To fill this gap, we introduce MetaBox, the first benchmark platform expressly tailored for developing and evaluating MetaBBO-RL methods. MetaBox offers a flexible algorithmic template that allows users to effortlessly implement their unique designs within the platform. Moreover, it provides a broad spectrum of over 300 problem instances, collected from synthetic to realistic scenarios, and an extensive library of 19 baseline methods, including both traditional black-box optimizers and recent MetaBBO-RL methods. Besides, MetaBox introduces three standardized performance metrics, enabling a more thorough assessment of the methods. In a bid to illustrate the utility of MetaBox for facilitating rigorous evaluation and in-depth analysis, we carry out a wide-ranging benchmarking study on existing MetaBBO-RL methods. Our MetaBox is open-source and accessible at: https://github.com/GMC-DRL/MetaBox
... Protein function evaluation is a challenging task approached by various sequence-based and structural-based methods [30]. However, the fact that the function of a protein is intrinsically related to its 3D conformation (more than to its primary sequence) motivates the use of structure in predicting protein function [31,32]. During protein-protein interactions, the geometrical structure of the underlying topological manifold plays crucial roles that affect specific biologically related functions, such as driving the cellular immune response [33]. ...
Article
Full-text available
Understanding the binding behavior and conformational dynamics of intrinsically disordered proteins (IDPs) is crucial for unraveling their regulatory roles in biological processes. However, their lack of stable 3D structures poses challenges for analysis. To address this, we propose an algorithm that explores IDP binding behavior with protein complexes by extracting topological and geometric features from the protein surface model. Our algorithm identifies a geometrically favorable binding pose for the IDP and plans a feasible trajectory to evaluate its transition to the docking position. We focus on IDPs from Homo sapiens and Mus-musculus, investigating their interaction with the Plasmodium falciparum (PF) pathogen associated with malaria-related deaths. We compare our algorithm with HawkDock and HDOCK docking tools for quantitative (computation time) and qualitative (binding affinity) measures. Our results indicated that our method outperformed the compared methods in computation performance and binding affinity in experimental conformations.
... Structure-based methods rely on known protein structures to predict potential interactions. These methods typically include Downloaded from https://academic.oup.com/bib/article/24/5/bbad261/7227809 by UNIWERSYTET WARMINSKO-MAZURSKI user on 20 January 2024 docking [15] and domain-based approaches [16]. However, the limited availability of 3D structural information may restrict the applicability of these methods [17]. ...
Article
Most life activities in organisms are regulated through protein complexes, which are mainly controlled via Protein-Protein Interactions (PPIs). Discovering new interactions between proteins and revealing their biological functions are of great significance for understanding the molecular mechanisms of biological processes and identifying the potential targets in drug discovery. Current experimental methods only capture stable protein interactions, which lead to limited coverage. In addition, expensive cost and time consuming are also the obvious shortcomings. In recent years, various computational methods have been successfully developed for predicting PPIs based only on protein homology, primary sequences of protein or gene ontology information. Computational efficiency and data complexity are still the main bottlenecks for the algorithm generalization. In this study, we proposed a novel computational framework, HNSPPI, to predict PPIs. As a hybrid supervised learning model, HNSPPI comprehensively characterizes the intrinsic relationship between two proteins by integrating amino acid sequence information and connection properties of PPI network. The experimental results show that HNSPPI works very well on six benchmark datasets. Moreover, the comparison analysis proved that our model significantly outperforms other five existing algorithms. Finally, we used the HNSPPI model to explore the SARS-CoV-2-Human interaction system and found several potential regulations. In summary, HNSPPI is a promising model for predicting new protein interactions from known PPI data.
... In this situation, a critical step toward a more in-depth comprehension and management of these processes is the discovery of the structural drivers of these interactions and their binding energy. The ability to construct high-affinity contacts, create innovative treatments, guide rational drug design, and forecast the impact of changes to protein interfaces all depend on the binding affinity, which dictates whether or not complex formation happens under specific conditions [44]. Since many years ago, various methodologies have been used to determine the binding affinity. ...
Article
Full-text available
The SARS COV-2 and its variants are spreading around the world at an alarming speed, due to its higher transmissibility and the conformational changes caused by mutations. The resulting COVID-19 pandemic has imposed severe health consequences on human health. Several countries of the world including Pakistan have studied its genome extensively and provided productive findings. In the current study, the mCSM, DynaMut2, and I-Mutant servers were used to analyze the effect of identified mutations on the structural stability of spike protein however, the molecular docking and simulations approaches were used to evaluate the dynamics of the bonding network between the wild-type and mutant spike proteins with furin. We addressed the mutational modifications that have occurred in the spike protein of SARS-COV-2 that were found in 215 Pakistani's isolates of COVID-19 patients to study the influence of mutations on the stability of the protein and its interaction with the host cell. We found 7 single amino acid substitute mutations in various domains that reside in spike protein. The H49Y, N74K, G181V, and G446V were found in the S1 domain while the D614A, V622F, and Q677H mutations were found in the central helices of the spike protein. Based on the observation, G181V, G446V, D614A, and V622F mutants were found highly destabilizing and responsible for structural perturbation. Protein-protein docking and molecular simulation analysis with that of furin have predicted that all the mutants enhanced the binding efficiency however, the V622F mutant has greatly altered the binding capacity which is further verified by the KD value (7.1 E−14) and therefore may enhance the spike protein cleavage by Furin and increase the rate of infectivity by SARS-CoV-2. On the other hand, the total binding energy for each complex was calculated which revealed −50.57 kcal/mol for the wild type, for G181V −52.69 kcal/mol, for G446V −56.44 kcal/mol, for D614A −59.78 kcal/mol while for V622F the TBE was calculated to be −85.84 kcal/mol. Overall, the current finding shows that these mutations have increased the binding of Furin for spike protein and shows that D614A and V622F have significant effects on the binding and infectivity.
... However, the occurrence of most biologically relevant interactions are in transient protein complexes, which makes the experimental determination of their structures largely difficult, even when structures of the interacting partners are known. Computational docking approaches have therefore been designed for the structural prediction of protein complexes with an accuracy similar to that provided by X-ray crystallography [68,69]. A substantial amount of models with well defined atomic positions are usually generated after protein-protein docking protocols, but the currently available scoring functions possess low predictive accuracy for a reliable discrimination of models, and most often, models closest to the native structure are not easily detected solely through computational tools [69]. ...
Preprint
Protein-peptide and protein-protein interactions play an essential role in different functional and structural cellular organizational aspects. While X-ray crystallography generates the most complete structural characterization, most biological interactions exist in biomolecular complexes that are neither compliant nor responsive to direct experimental analysis. The development of computational docking approaches is therefore necessary. This starts from component protein structures to the prediction of their complexes, preferentially with precision close to complex structures generated by X-ray crystallography. To guarantee faithful chromosomal segregation, there must be a proper assembling of the kinetochore (a protein complex with multiple subunits) at the centromere during the process of cell division. As an important member of the inner kinetochore, defects in any of the subunits making up the CENP-HIKM complex leads to kinetochore dysfunction and an eventual chromosomal mis-segregation and cell death. Previous studies in an attempt to understand the assembly and mechanism devised by the CENP-HIKM in promoting functionality of the kinetochore, have reconstituted the protein complex from different organisms including fungi and yeast. Here, we present a detailed computational model of the physical interactions that exist between each component of the human CENP-HIKM, while validating each modeled structure using orthologs with existing crystal structures from the protein data bank. Results from this study substantiates the existing hypothesis that the human CENP-HIK complex share a similar architecture with its fungal and yeast orthologs, and likewise validates the binding mode of CENP-M to the C-terminus of the human CENP-I based on existing experimental reports.
... Traditional protein-protein docking methods have been of central importance for sampling the conformational space of protein complexes (Smith and Sternberg 2002). In the last 10 years, sophisticated high-precision docking methods such as HADDOCK (van Zundert et al. 2015), Clus-Pro (Desta et al. 2020), ZDOCK (Pierce et al. 2014), and LightDock (Jiménez-García et al. 2018) have been developed and continually improved. ...
Article
Full-text available
Protein–protein interactions (PPIs), such as protein–protein inhibitor, antibody–antigen complex, and supercomplexes play diverse and important roles in cells. Recent advances in structural analysis methods, including cryo-EM, for the determination of protein complex structures are remarkable. Nevertheless, much room remains for improvement and utilization of computational methods to predict PPIs because of the large number and great diversity of unresolved complex structures. This review introduces a wide array of computational methods, including our own, for estimating PPIs including antibody–antigen interactions, offering both historical and forward-looking perspectives.
... Development of computational methods must be encouraged to complement the identify PPIs on a wide scale and short time. Several computational approaches have been proposed to predict PPIs based on different data sources (Shoemaker and Panchenko 2007;Zhao et al. 2008;Lam and Chan 2012) such as 3D structural data (Smith and Sternberg 2002), Gene Ontology (GO) and annotations (Lee et al. 2006), phylogenetic profile, gene fusion (Marcotte et al. 1999;Enright et al. 1999;Huang and Zheng 2006), and the co-evolution pattern of interacting proteins (Jothi et al. 2006). But these approaches fail to capture the novel interactions as it greatly depends on the existing knowledge as a reference. ...
Article
CRISPR-Cas system, responsible for bacterial adaptive immune response, has evolved as the game-changer in the field of genome editing and has revolutionized both animal and plant research owing to its efficiency and feasibility. CRSIPR- associated (Cas) protein, an integral component of the CRSIPR-Cas toolkit, cut the target genetic material for making the desirable edits. However, unchecked nuclease activity of Cas protein may lead to unforeseen off-target effects. Anti-CRISPR (Acr), small proteins usually found in phages and other mobile genetic elements, are the natural inhibitors of the Cas proteins that help phages to escape the immune system of the host. Acr proteins regulate the activity of the Cas nuclease by interacting with its different domains which results in the blockage of CRISPR activity. Thus, it is essential to understand the interactions between these two rival proteins in order to switch off the cutting machinery when needed. Experimental methods to identify protein–protein interaction, are often costly, time-consuming, and labor-intensive. Computational strategies, such as data- driven predictive models, can complement experimental studies by providing fast, efficient, reliable, and cheaper alternatives to predict protein interactions. Herein, we report the first machine learning-based predictive model to identify novel interactions between Acr and Cas proteins using an ensemble strategy. The accuracy of our proposed ensemble model was more than 95%, indicating its high predictive power. The developed model can contribute to automate the process of discovering the natural inhibitors of Cas protein for controlling the off- target cleavage and improving the efficiency of CRISPR-Cas technology. To extend the support for diverse levels of end-users, a web application named AcrCasPPI was developed which is available at http://login1.cabgrid.res.in:5020/. Alternatively, a python package named acrcasppi-ml, is also available at https://pypi.org/project/acrcasppi-ml/.
... However, the occurrence of most biologically relevant interactions is in transient protein complexes, which makes the experimental determination of their structures largely difficult, even when the structures of the interacting partners are known. Computational docking approaches have therefore been designed for the structural prediction of protein complexes with an accuracy similar to that provided by X-ray crystallography [69,70]. A substantial amount of models with well-defined atomic positions are usually generated after protein-protein docking protocols, but the currently available scoring functions possess low predictive accuracy for reliable discrimination of models, and most often, models closest to the native structure are not easily detected solely through computational tools [70]. ...
Article
Full-text available
Background Protein–peptide and protein–protein interactions play an essential role in different functional and structural cellular organizational aspects. While Cryo-EM and X-ray crystallography generate the most complete structural characterization, most biological interactions exist in biomolecular complexes that are neither compliant nor responsive to direct experimental analysis. The development of computational docking approaches is therefore necessary. This starts from component protein structures to the prediction of their complexes, preferentially with precision close to complex structures generated by X-ray crystallography. Results To guarantee faithful chromosomal segregation, there must be a proper assembling of the kinetochore (a protein complex with multiple subunits) at the centromere during the process of cell division. As an important member of the inner kinetochore, defects in any of the subunits making up the CENP-HIKM complex lead to kinetochore dysfunction and an eventual chromosomal mis-segregation and cell death. Previous studies in an attempt to understand the assembly and mechanism devised by the CENP-HIKM in promoting the functionality of the kinetochore have reconstituted the protein complex from different organisms including fungi and yeast. Here, we present a detailed computational model of the physical interactions that exist between each component of the human CENP-HIKM, while validating each modeled structure using orthologs with existing crystal structures from the protein data bank. Conclusions Results from this study substantiate the existing hypothesis that the human CENP-HIK complex shares a similar architecture with its fungal and yeast orthologs, and likewise validate the binding mode of CENP-M to the C-terminus of the human CENP-I based on existing experimental reports. Graphical abstract
... Figure 8A-C illustrated a protein model of a Cryptococcus protein with the host cell protein. This Z-dock pro used the divide-and-conquer strategy [126,127]. The Zdock protocol runs resulted in the 2000 confirmation binding possess and 59 clusters using the BIOVIA discovery studio version 17. Analyzing the interaction pattern between the EVs surface protein of glutamate dehydrogenase with the plasminogen, more than 15 amino acids are interacting with the surface of the plasminogen protein ( Figure 8B, C). ...
Article
Fungal extracellular vesicles (EVs) are released during pathogenesis and are found to be an opportunistic infection in most cases. EVs are immunocompetent with their host and have paved the way for new biomedical approaches to drug delivery and the treatment of complex diseases including cancer. With computing and processing advancements, the rise of bioinformatics tools for the evaluation of various parameters involved in fungal EVs has blossomed. In this review, we have complied and explored the bioinformatics tools to analyze the host-pathogen interaction, toxicity, omics and pathogenesis with an array of specific tools that have depicted the ability of EVs as vector/carrier for therapeutic agents and as a potential theme for immunotherapy. We have also discussed the generation and pathways involved in the production, transport, pathogenic action and immunological interactions of EVs in the host system. The incorporation of network pharmacology approaches has been discussed regarding fungal pathogens and their significance in drug discovery. To represent the overview, we have presented and demonstrated an in silico study model to portray the human Cryptococcal interactions.
... X-ray crystallography is the gold standard for understanding and confirming PPIs; however, the method is complicated and time-consuming [20]. Protein docking is a computational tool that provides a low-cost method of generating potential poses for PPIs that can be validated experimentally [21]. Clu-sPro provides a means to dock submitted proteins. ...
Article
Full-text available
Within the last few decades, increases in computational resources have contributed enormously to the progress of science and engineering (S & E). To continue making rapid advancements, the S & E community must be able to access computing resources. One way to provide such resources is through High-Performance Computing (HPC) centers. Many academic research institutions offer their own HPC Centers but struggle to make the computing resources easily accessible and user-friendly. Here we present SHABU, a RESTful Web API framework that enables S & E communities to access resources from Boston University's Shared Computing Center (SCC). The SHABU requirements are derived from the use cases described in this work.
... However, some of the biologically enormous interactions manifest in transient complexes and henceforth experimental structure determination may be very challenging, even when the structures of the component proteins are known. Subsequently, computational docking techniques have been hooked up beginning from the constructions of element proteins and efforts to decide the structure of their complexes focused on exactness close to that delivered by using X-ray crystallography 23,27 . ...
Article
Full-text available
Nephrin is widely known to protect the β cells of islets of Langerhans, podocytes of the kidney and brain. Previous studies reveal the mechanisms that involve downregulation of ROS-induced regulations of an anti-inflammatory marker MMP-9 expression. The question is whether communications are based on direct or indirect interactions of proteins. So, for knowing the interaction or affinity of MMP-9 with Nephrin, in silico Protein-Protein docking approach has been used using ClusPro 2.0 software. PyMol software was used to visualize the best-docked complexes. M00 was found to be the best model generated based on cluster size and weightage scores for Nephrin with MMP-9. The best binding score is-991.2 for Nephrin with MMP-9. Met 247, Pro 246, Arg 249, Leu 1255, His 1254, Leu 243, Arg 1252, Leu 187 and Asp 282 are some of the prominent residues involved in the interactions between Nephrin and MMP-9. Thus, through in silico Protein-Protein docking, we have found a significant interaction of nephrin with MMP-9. Through this study, we may propose that nephrin may protect β cell damage and kidney podocytes, brain accompanied by the downregulation of ROS induced MMP 9 activation.
... Moreover, each binding model is sorted, with the binding model possessing the highest score suggested as a protein complex formation model. 43 Molecular docking is then performed to ensure that a candidate epitope vaccine could generate a stable immune response. This is achieved by measuring interactions between the candidate and target immune cell receptors such as Toll-like receptor 2 (TLR2), TLR3, and TLR4. ...
Article
Full-text available
Epitope-based DNA vaccine development is one application of bioinformatics or in silico studies, that is, computational methods, including mathematical, chemical, and biological approaches, which are widely used in drug development. Many in silico studies have been conducted to analyze the efficacy, safety, toxicity effects, and interactions of drugs. In the vaccine design process, in silico studies are performed to predict epitopes that could trigger T-cell and B-cell reactions that would produce both cellular and humoral immune responses. Immunoinformatics is the branch of bioinformatics used to study the relationship between immune responses and predicted epitopes. Progress in immunoinformatics has been rapid and has led to the development of a variety of tools that are used for the prediction of epitopes recognized by B cells or T cells as well as the antigenic responses. However, the in silico approach to vaccine design is still relatively new; thus, this review is aimed at increasing understanding of the importance of in silico studies in the design of vaccines and thereby facilitating future research in this field.
... The 3D structure of a protein forms an active site essential for its function. It is possible to predict interactions based on these structures, for instance by docking or threading methods [108][109][110][111][112][113][114][115][116][117][118], and therefore to predict interactions in species lacking experimental interactome data [119]. Methods that combine sequence and structural data can improve prediction compared with sequence or structure alone [104,120,121]. ...
Article
Full-text available
Interactome analyses have traditionally been applied to yeast, human and other model organisms due to the availability of protein-protein interaction data for these species. Recently, these techniques have been applied to more diverse species using computational interaction prediction from genome sequence and other data types. This review describes the various types of computational interactome networks that can be created and how they have been used in diverse eukaryotic species, highlighting some of the key interactome studies in non-model organisms.
... The interactions between protein and protein were reported in other systems that aggregations of fluorescent-tag proteins were tracked under the fluorescence microscope 44,45 . The adsorption on the cell surface or the aggregation can result from electrostatic interactions 44,46,47 , as one of most factors to influence the folding and binding of the protein, and the rate of protein-protein association, which also supported by the simulation with the docking methods 48 . Therefore, these interactions could result in adsorption or accumulation of lysozyme molecules on the cell surface, contributing to the promotion of the aggregation process during the pre-nucleation process and crystal growth process. ...
Article
Full-text available
Macromolecular protein crystallisation was one of the potential tools to accelerate the biomanufacturing of biopharmaceuticals. In this work, it was the first time to investigate the roles of biotemplates, Saccharomyces cerevisiae live cells, in the crystallisation processes of lysozyme, with different concentrations from 20 to 2.5 mg/mL lysozyme and different concentrations from 0 to 5.0 × 107 (cfu/mL) Saccharomyces cerevisiae cells, during a period of 96 h. During the crystallisation period, the nucleation possibility in droplets, crystal numbers, and cell growth and cell density were observed and analysed. The results indicated the strong interaction between the lysozyme molecules and the cell wall of the S. cerevisiae, proved by the crystallization of lysozyme with fluorescent labels. The biotemplates demonstrated positive influence or negative influence on the nucleation, i.e. shorter or longer induction time, dependent on the concentrations of the lysozyme and the S. cerevisiae cells, and ratios between them. In the biomanufacturing process, target proteins were various cells were commonly mixed with various cells, and this work provides novel insights of new design and application of live cells as biotemplates for purification of macromolecules.
... Experimental data and evolutionary information (conservation or coevolution signals) may help to improve the selection of candidate conformations [29][30][31]. To address this problem, molecular docking algorithms have been developed and improved over the past twenty years, stimulated by the CAPRI competition [32][33][34][35][36]. Nevertheless, a number of challenges remain, including the modelling of large conformational rearrangements associated to the binding [32,37,38]. ...
Article
Full-text available
Proteins ensure their biological functions by interacting with each other. Hence, characterising protein interactions is fundamental for our understanding of the cellular machinery, and for improving medicine and bioengineering. Over the past years, a large body of experimental data has been accumulated on who interacts with whom and in what manner. However, these data are highly heterogeneous and sometimes contradictory, noisy, and biased. Ab initio methods provide a means to a “blind” protein-protein interaction network reconstruction. Here, we report on a molecular cross-docking-based approach for the identification of protein partners. The docking algorithm uses a coarse-grained representation of the protein structures and treats them as rigid bodies. We applied the approach to a few hundred of proteins, in the unbound conformations, and we systematically investigated the influence of several key ingredients, such as the size and quality of the interfaces, and the scoring function. We achieved some significant improvement compared to previous works, and a very high discriminative power on some specific functional classes. We provide a readout of the contributions of shape and physico-chemical complementarity, interface matching, and specificity, in the predictions. In addition, we assessed the ability of the approach to account for protein surface multiple usages, and we compared it with a sequence-based deep learning method. This work may contribute to guiding the exploitation of the large amounts of protein structural models now available toward the discovery of unexpected partners and their complex structure characterisation.
... The molecular docking method has been widely used to investigate the interaction between different proteins and predict the structure of their complexes, which can provide a convincing theoretical guidance for further experiments . Molecular docking is used to predict the binding mode of the receptor molecule and ligand molecule based on their 3D structures, which were known or obtained by modeling method (Smith and Sternberg 2002). ...
Article
Full-text available
In the previous study, monoclonal antibody 2C12 which could specifically recognize Cry1Ab toxin and has no cross-reaction with Cry1Ac toxin was prepared by using the synthetic polypeptide T4 as immunogen. In this study, 2C12 was used as the gene source, and the single chain antibody scFv-2C12 was successfully constructed and expressed. Indirect ELISA result showed that scFv-2C12 could recognize both Cry1Ab and Cry1Ac toxins. In order to clarify the recognition mechanism of scFv-2C12 with Cry1Ab and Cry1Ac toxins, the interaction models of scFv-2C12 with Cry1Ab and Cry1Ac toxins were built and analyzed using homology modeling and molecular docking techniques. The results showed that scFv-2C12 was able to penetrate deep into Cry1Ac toxin and recognize hidden epitopes that could not recognized by conventional monoclonal antibody 2C12. The number of hydrogen bonds formed between scFv-2C12 and Cry1Ab/Ac was nine and eight, separately. The hydrophobic interactions formed between scFv-2C12 and Cry1Ab/Ac were both eight. Subsequently, based on scFv-2C12, double-antibody sandwich ELISA (DAS-ELISA) methods were developed to detect Cry1Ab and Cry1Ac toxins. The results showed that the minimum limits of detection and quantification (LOD and LOQ) for Cry1Ab were 5.014 and 20.45 ng·mL⁻¹, and 7.409 and 22.01 ng·mL⁻¹ for Cry1Ac, respectively. This study provides a basic material for the establishment of detection method for Cry1Ab and Cry1Ac toxins, and provides a new idea for the preparation of broad-spectrum antibodies.
... Binding affinity, which defines whether a complex is formed under specific conditions, is essential to regulate molecular interactions, create new treatments (i.e. directing rational drug design) and anticipate the impact variation on the protein interfaces [68]. Prior to docking of Spike and ACE2 superimposition of the mutants was performed to check the structural variations caused by the mutations. ...
Article
SARS-CoV-2, an RNA virus, has been prone to high mutations since its first emergence in Wuhan, China, and throughout its spread. Its genome has been sequenced continuously by many countries, including Pakistan, but the results vary. Understanding its genomic patterns and connecting them with phenotypic features will help in devising therapeutic strategies. Thus, in this study, we explored the mutation landscape of 250 Pakistani isolates of SARS-CoV-2 genomes to check the genome diversity and examine the impact of these mutations on protein stability and viral pathogenesis in comparison with a reference sequence (Wuhan NC 045512.2). Our results revealed that structural proteins mainly exhibit more mutations than others in the Pakistani isolates; in particular, the nucleocapsid protein is highly mutated. In comparison, the spike protein is the most mutated protein globally. Furthermore, nsp12 was found to be the most mutated NSP in the Pakistani isolates and worldwide. Regarding accessory proteins, ORF3A is the most mutated in the Pakistani isolates, whereas ORF8 is highly mutated in world isolates. These mutations decrease the structural stability of their proteins and alter different biological pathways. Molecular docking, the dissociation constant (KD), and MM/GBSA analysis showed that mutations in the S protein alter its binding with ACE2. The spike protein mutations D614G-S943T-V622F (−75.17 kcal/mol), D614G-Q677H (−75.78 kcal/mol), and N74K-D614G (−73.84 kcal/mol) exhibit stronger binding energy than the wild type (−66.34 kcal/mol), thus increasing infectivity. Furthermore, the simulation results strongly corroborated the predicted protein servers. Our analysis findings also showed that E, M, ORF6, ORF7A, ORF7B, and ORF10 are the most stable coding genes; they may be suitable targets for vaccine and drug development.
... The binding energies and structural determinants of these interactions are important steps to study the regulations of these processes. Binding affinity, which regulates molecular interactions, determines if the formation of complexes occurs under definite conditions (Smith and Sternberg, 2002). Proteinprotein dockings of NSP13 WT and its various mutants were performed using HDOCK so that the level of pathogenicity of different mutants of SARS-CoV-2 could be determined. ...
Article
Full-text available
Mutations in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) have made this virus more infectious. Previous studies have confirmed that non-structural protein 13 (NSP13) plays an important role in immune evasion by physically interacting with TANK binding kinase 1 (TBK1) to inhibit IFNβ production. Mutations have been reported in NSP13; hence, in the current study, biophysical and structural modeling methodologies were adapted to dissect the influence of major mutations in NSP13, i.e., P77L, Q88H, D260Y, E341D, and M429I, on its binding to the TBK1 and to escape the human immune system. The results revealed that these mutations significantly affected the binding of NSP13 and TBK1 by altering the hydrogen bonding network and dynamic structural features. The stability, flexibility, and compactness of these mutants displayed different dynamic features, which are the basis for immune evasion. Moreover, the binding was further validated using the MM/GBSA approach, revealing that these mutations have higher binding energies than the wild-type (WT) NSP13 protein. These findings thus justify the basis of stronger interactions and evasion for these NSP13 mutants. In conclusion, the current findings explored the key features of the NSP13 WT and its mutant complexes, which can be used to design structure-based inhibitors against the SARS-CoV-2 new variants to rescue the host immune system.
... 2 Computational protein docking, currently the most widely used approach for modeling complex structures, takes the tertiary structures of individual proteins as input to build the quaternary structure of the complex as output. [3][4][5][6][7][8][9] Docking methods can be largely divided into two categories including template-based modeling, in which known protein complex structures in the Protein Data Bank (PDB) are used as templates [10][11][12][13][14][15][16][17] to guide modeling, and template-free modeling (ab initio docking), which does not use any known structure as template, and instead searches through a large conformation space for relative orientations of protein chains with minimum binding energy. The binding energy is often roughly approximated by geometric and electrostatic complementarity, inter-chain hydrogen binding, hydrophobic interactions, and residue-residue contact potentials. ...
Article
Full-text available
Predicting the quaternary structure of protein complex is an important problem. Inter-chain residue-residue contact prediction can provide useful information to guide the ab initio reconstruction of quaternary structures. However, few methods have been developed to build quaternary structures from predicted inter-chain contacts. Here, we develop the first method based on gradient descent optimization (GD) to build quaternary structures of protein dimers utilizing inter-chain contacts as distance restraints. We evaluate GD on several datasets of homodimers and heterodimers using true/predicted contacts and monomer structures as input. GD consistently performs better than both simulated annealing and Markov Chain Monte Carlo simulation. Starting from an arbitrarily quaternary structure randomly initialized from the tertiary structures of protein chains and using true inter-chain contacts as input, GD can reconstruct high-quality structural models for homodimers and heterodimers with average TM-score ranging from 0.92 to 0.99 and average interface root mean square distance (I-RMSD) from 0.72 å to 1.64 å. On a dataset of 115 homodimers, using predicted inter-chain contacts as restraints, the average TM-score of the structural models built by GD is 0.76. For 46% of the homodimers, high-quality structural models with TM-score >= 0.9 are reconstructed from predicted contacts. There is a strong correlation between the quality of the reconstructed models and the precision and recall of predicted contacts. Only a moderate precision or recall of inter-chain contact prediction is needed to build good structural models for most homodimers. Moreover, GD improves the quality of quaternary structures predicted by AlphaFold2 on a CASP-CAPRI dataset. This article is protected by copyright. All rights reserved.
... Interactomes are often represented as networks (graphs) [26], allowing both visual and computational analysis of their structure and connectivity [27][28][29][30][31]. Graph theoretic analyses allows interactome data to be used in a number of ways: detection of protein complexes [32][33][34]; prediction of protein functions [35][36][37]; identification of evolutionary relationships [38][39][40][41]; and inference The 3D structure of a protein forms an active site essential for its function. It is possible to predict interactions based on these structures, for instance by docking or threading methods [93][94][95][96][97][98][99][100][101][102][103], and therefore to predict interactions in species lacking experimental interactome data [104]. Methods that combine sequence and structural data can improve prediction compared with sequence or structure alone [86,105,106]. ...
Preprint
Full-text available
Interactome analyses have traditionally been applied to yeast, human and other model organisms due to the availability of protein-protein interactions data for these species. Recently these techniques have been applied to more diverse species using computational interaction prediction from genome sequence and other data types. This review describes the various types of computational interactome networks that can be created and how they have been used in diverse eukaryotic species, highlighting some of the key interactome studies in non-model organisms.
... Thus, the small number of interactions found in this work more likely indicates limitations in currently available structural data. We note that it may be possible to infer many more states from the available structural data via a range of focused methods (Smith & Sternberg, 2002;Franzosa & Xia, 2011;Kaj an et al, 2014;Du et al, 2017;Gervasoni et al, 2020). ...
Article
Full-text available
We modeled 3D structures of all SARS-CoV-2 proteins, generating 2,060 models that span 69% of the viral proteome and provide details not available elsewhere. We found that~6% of the proteome mimicked human proteins, while~7% was implicated in hijacking mechanisms that reverse post-translational modifications , block host translation, and disable host defenses; a further 29% self-assembled into heteromeric states that provided insight into how the viral replication and translation complex forms. To make these 3D models more accessible, we devised a structural coverage map, a novel visualization method to show what is-and is not-known about the 3D structure of the viral proteome. We integrated the coverage map into an accompanying online resource (https://aquaria.ws/covid) that can be used to find and explore models corresponding to the 79 structural states identified in this work. The resulting Aquaria-COVID resource helps scientists use emerging structural data to understand the mechanisms underlying coronavirus infection and draws attention to the 31% of the viral proteome that remains structurally unknown or dark.
... During the last 2 decades, many computational techniques, which are fast and inexpensive, have been developed to generate the quaternary structural models of dimers using the tertiary structures of interacting proteins as input (Smith and Sternberg, 2002;Chen et al., 2003;Gray et al., 2003;Comeau et al., 2004;Tovchigrechko and Vakser, 2006;Lyskov and Gray, 2008;de Vries et al., 2010;Hwang et al., 2010). Although the classic ab initio docking methods have achieved some success for some protein complexes, according to the last several rounds of Critical Assessments of Predictions of Interactions (CAPRI) (Janin, 2005;Janin, 2002), the general accuracy of these approaches is still low (Lensink et al., 2019). ...
Article
Full-text available
Proteins interact to form complexes. Predicting the quaternary structure of protein complexes is useful for protein function analysis, protein engineering, and drug design. However, few user-friendly tools leveraging the latest deep learning technology for inter-chain contact prediction and the distance-based modelling to predict protein quaternary structures are available. To address this gap, we develop DeepComplex, a web server for predicting structures of dimeric protein complexes. It uses deep learning to predict inter-chain contacts in a homodimer or heterodimer. The predicted contacts are then used to construct a quaternary structure of the dimer by the distance-based modelling, which can be interactively viewed and analysed. The web server is freely accessible and requires no registration. It can be easily used by providing a job name and an email address along with the tertiary structure for one chain of a homodimer or two chains of a heterodimer. The output webpage provides the multiple sequence alignment, predicted inter-chain residue-residue contact map, and predicted quaternary structure of the dimer. DeepComplex web server is freely available at http://tulip.rnet.missouri.edu/deepcomplex/web_index.html
... Structural determinants and binding energies determination of these interactions are pivotal steps toward a deeper understanding and regulations of these processes. Importantly, binding affinity, which is the key element for regulating molecular interactions, developing novel therapeutics or predicting the effect of variations on protein interfaces determines whether the complex formation occurs under specific circumstances (Smith and Sternberg, 2002). HADDOCK was used to perform the protein-protein docking of IRF3 with the ORF8 WT and ORF8 mutants including S24L, W45L, V62L, L84S, and V62L and L84S double mutants to unwind the structural mechanisms behind the higher infectivity of different variants of SARS-CoV-2. ...
Article
Full-text available
SARS-Cov-2 has been continuously mutating since its first emergence in early 2020. These alterations have led this virus to gain significant differences in infectivity, pathogenicity and host immune evasion. We previously found that the open reading frame 8 (ORF8) of SARS-CoV-2 can inhibit interferon production by decreasing the nuclear translocation of interferon regulatory factor 3 (IRF3). Since several mutations in ORF8 have been observed, therefore, in the present study, we adapted structural and biophysical analysis approaches to explore the impact of various mutations of ORF8, such as S24L, L84S, V62L, and W45L, the recently circulating mutant in Pakistan on its ability to bind IRF3 and that may help to evade host immune system. We found that mutations in ORF8 could affect the binding efficiency with IRF3 based on molecular docking analysis, which was further supported by molecular dynamics simulations. Among all the reported mutations, W45L was found to bind most stringently to IRF3. Our analysis revealed that mutations in ORF8 may help the virus evade immune system by changing its binding affinity with IRF3.
... Computational protein docking, currently the most widely used approach for modeling complex structures, takes the tertiary structures of individual proteins as input to build the quaternary structure of the complex as output [3][4][5][6][7][8][9] . Docking methods can be largely divided into two categories including template-based modeling, in which known protein complex structures in the Protein Data Bank (PDB) are used as templates [10][11][12][13][14][15][16][17] to guide modeling, and template-free modeling (ab initio docking), which does not use any known structure as template, and instead searches through a large conformation space for relative orientations of protein chains with minimum binding energy. ...
Preprint
Full-text available
Predicting the quaternary structure of a protein complex is an important and challenging problem. Inter-chain residue-residue contact prediction can provide useful information to guide the ab initio reconstruction of quaternary structures of protein complexes. However, few methods have been developed to build quaternary structures from predicted inter-chain contacts. Here, we introduce a new gradient descent optimization algorithm (GD) to build quaternary structures of protein dimers utilizing inter-chain contacts as distance restraints. We evaluate GD on several datasets of homodimers and heterodimers using true or predicted contacts. GD consistently performs better than a simulated annealing method and a Markov Chain Monte Carlo simulation method. Using true inter-chain contacts as input, GD can reconstruct high-quality structural models for homodimers and heterodimers with average TM-score ranging from 0.92 to 0.99 and average interface root mean square distance (I-RMSD) from 0.72 Å to 1.64 Å. On a dataset of 115 homodimers, using predicted inter-chain contacts as input, the average TM-score of the structural models built by GD is 0.76. For 46% of the homodimers, high-quality structural models with TM-score >= 0.9 are reconstructed from predicted contacts. There is a strong correlation between the quality of the reconstructed models and the precision and recall of predicted contacts. If the precision or recall of predicted contacts is >20%, GD can reconstruct good models for most homodimers, indicating only a moderate precision or recall of inter-chain contact prediction is needed to build good structural models for most homodimers. Moreover, the accuracy of reconstructed models positively correlates with the contact density in dimers and depends on the initial model and the probability threshold of selecting predicted contacts for the distance-based structure optimization.
... The prediction of protein-protein interface using docking methods is still an important field of research (Smith and Sternberg, 2002;Lensink et al., 2007) but the predictive power of these methods greatly varies depending on the protein families (Bendell et al., 2014;Wang et al., 2017). As no GEF or RHOA experimental structure was used as target to assess the methods in recent CASP experiments, we benchmarked how these methods could perform on our specific case, using re-docking and cross-docking experiments. ...
Article
Full-text available
The interaction between two proteins may involve local movements, such as small side-chains re-positioning or more global allosteric movements, such as domain rearrangement. We studied how one can build a precise and detailed protein-protein interface using existing protein-protein docking methods, and how it can be possible to enhance the initial structures using molecular dynamics simulations and data-driven human inspection. We present how this strategy was applied to the modeling of RHOA-ARHGEF1 interaction using similar complexes of RHOA bound to other members of the Rho guanine nucleotide exchange factor family for comparative assessment. In parallel, a more crude approach based on structural superimposition and molecular replacement was also assessed. Both models were then successfully refined using molecular dynamics simulations leading to protein structures where the major data from scientific literature could be recovered. We expect that the detailed strategy used in this work will prove useful for other protein-protein interface design. The RHOA-ARHGEF1 interface modeled here will be extremely useful for the design of inhibitors targeting this protein-protein interaction (PPI).
... | 3 these processes. Notably, the binding affinity, which determines whether or not complex formation occurs under particular circumstances, holds the key to regulating molecular interactions (e.g., engineering high-affinity interactions), developing novel therapeutics (e.g., guiding rational drug design), or predicting the effect of variations on protein interfaces (Smith & Sternberg, 2002). The binding affinity has been calculated for decades by different methodologies ranging from exact approaches (e.g., free energy perturbation), that are precise however computationally expensive compared to empirical methods (e.g., scoring functions in docking, various regression models), which are fast and accurate (Sprinzak et al., 2003). ...
Article
Full-text available
The evolution of the SARS-CoV-2 new variants reported to be 70% more contagious than the earlier one is now spreading fast worldwide. There is an instant need to discover how the new variants interact with host receptor (ACE2). Among the reported mutations in the Spike glycoprotein of the new variants, three are specific to the receptor-binding domain (RBD) and required insightful scrutiny for new therapeutic options. These structural evolutions in the RBD domain may impart critical role to the unique pathogenicity of the SARS-CoV-2 new variants. Herein, using structural and biophysical approaches, we explored that the specific mutations in the UK (N501Y), South African (K417N-E484K-N501Y), Brazilian (K417T-E484K-N501Y) and hypothetical (N501Y-E484K) variants alter the binding affinity, create new inter-protein contacts and changes the internal structural dynamics thereby increases the binding and eventually the infectivity. Our investigation highlighted that the South African (K417N-E484K-N501Y), Brazilian (K417T-E484K-N501Y) variants are more lethal than the UK variant (N501Y). The behaviour of the wild type and N501Y is comparable. Free energy calculations further confirmed that increased binding of the spike RBD to the ACE2 is mainly due to the electrostatic contribution. Further, we find that the unusual virulence of this virus is potentially the consequence of Darwinian selection-driven epistasis in protein evolution. The triple mutants (South African and Brazilian) may pose serious threat to the efficacy of the already developed vaccine. Our analysis would help to understand the binding and structural dynamics of the new mutations in the RBD domain of the Spike protein and demand further investigation in in vitro and in vivo models to design potential therapeutics against the new variants.
... Relative to traditional FFT-based docking, FMFT accelerates calculations 10-fold [10 •• ]. Another shape-based approach is geometric hashing, which indexes point sets or curves to match geometric features under arbitrary transformations like translations, rotations or even scaling [15]. Local 3D Zernike descriptor-based docking (LZerD), one of the top methods in CAPRI, projects 3D surfaces onto spheres to efficiently capture complementarity of protein surfaces [16]. ...
Article
Computational docking methods can provide structural models of protein–protein complexes, but protein backbone flexibility upon association often thwarts accurate predictions. In recent blind challenges, medium or high accuracy models were submitted in less than 20% of the ‘difficult’ targets (with significant backbone change or uncertainty). Here, we describe recent developments in protein–protein docking and highlight advances that tackle backbone flexibility. In molecular dynamics and Monte Carlo approaches, enhanced sampling techniques have reduced time-scale limitations. Internal coordinate formulations can now capture realistic motions of monomers and complexes using harmonic dynamics. And machine learning approaches adaptively guide docking trajectories or generate novel binding site predictions from deep neural networks trained on protein interfaces. These tools poise the field to break through the longstanding challenge of correctly predicting complex structures with significant conformational change.
... Relative to traditional FFT-based docking, FMFT accelerates calculations ten-fold [13]*. Another shape-based approach is geometric hashing, which indexes point sets or curves to match geometric features under arbitrary transformations like translations, rotations or even scaling [22]. Local 3D Zernike descriptor-based docking (LZerD), one of the top methods in CAPRI, projects 3D surfaces onto spheres to efficiently capture complementarity of protein surfaces [23]. ...
Preprint
Full-text available
Computational docking methods can provide structural models of protein-protein complexes, but protein backbone flexibility upon association often thwarts accurate predictions. In recent blind challenges, medium or high accuracy models were submitted in less than 20% of the "difficult" targets (with significant backbone change or uncertainty). Here, we describe recent developments in protein-protein docking and highlight advances that tackle backbone flexibility. In molecular dynamics and Monte Carlo approaches, enhanced sampling techniques have reduced time-scale limitations. Internal coordinate formulations can now capture realistic motions of monomers and complexes using harmonic dynamics. And machine learning approaches adaptively guide docking trajectories or generate novel binding site predictions from deep neural networks trained on protein interfaces. These tools poise the field to break through the longstanding challenge of correctly predicting complex structures with significant conformational change.
... The accumulated and annotated proteomic data have urged the necessity of a more rapid and less expensive methods for prediction o f protein-protein interactions. Bioinformatic tools are rapidly gaining more attention amongst protein scientists for the computational methods that have been developed to help prediction of docked complexes (Smith et al, 2002). ...
Thesis
Daunorubicin (DNR) and its C-14 hydroxylated derivative doxorubicin (DXR) are the most widely used anthracyclines as anti-tumour agents. DNR and DXR are produced by the soil bacteria Streptomyces peucetius through a biosynthetic pathway that employs a type II polyketide synthase (PKS). Type II PKSs consist of several discrete, monofunctional proteins that form a dissociable complex. Studies on enzyme complex formation and substrate channelling are essential for a better understanding of metabolism and could lead to the generation of novel compounds by 'combinatorial' biosynthesis. The 21-carbon atom aromatic polyketide, aklanonic acid, is the first enzyme-free intermediate in the DNR/DXR biosynthetic pathway. The DNR/DXR PKS is composed of eight proteins: DpsABCDG form the 'minimal' PKS, responsible for the condensation reactions between the propionate starter unit and nine malonate extender units, whereas DpsEFY catalyse successive modifications of the carbon backbone such as reduction, aromatisation and cyclisation. This type II PKS is intriguing since it contains a type III ketosynthase (DpsC) that selects the starter unit and a putative malonyl/acyltransferase (DpsD) whose role seems obscure. The network of protein interactions within this complex have been investigated using a yeast two-hybrid system (GAL4), affinity chromatography (Tandem Affinity Purification) and computer-aided protein docking simulations (Hex 4.5 software). The results have led to the proposal of a head-to-tail arrangement for the 'minimal' PKS suggested by the interactions established by DpsA, DpsB and DpsD. A putative role for DpsD is suggested as physical inhibitor of the incorporation of acetate in the priming reaction allowing the choice of propionate. Also, a structural role for the cyclase DpsY is proposed perhaps to maintain the overall structural integrity of the complex. This represents the first study attempting to analyse in vivo protein interactions forming a type II PKS. The purification method has allowed isolation of DpsA and DpsB and in silico docking simulations have produced results consistent with the proposed arrangements of proteins based on yeast two-hybrid assays.
Article
Full-text available
The prediction of potential protein–protein interactions (PPIs) is a critical step in decoding diseases and understanding cellular mechanisms. Traditional biological experiments have identified plenty of potential PPIs in recent years, but this problem is still far from being solved. Hence, there is urgent to develop computational models with good performance and high efficiency to predict potential PPIs. In this study, we propose a multi-source molecular network representation learning model (called MultiPPIs) to predict potential protein–protein interactions. Specifically, we first extract the protein sequence features according to the physicochemical properties of amino acids by utilizing the auto covariance method. Second, a multi-source association network is constructed by integrating the known associations among miRNAs, proteins, lncRNAs, drugs, and diseases. The graph representation learning method, DeepWalk, is adopted to extract the multisource association information of proteins with other biomolecules. In this way, the known protein–protein interaction pairs can be represented as a concatenation of the protein sequence and the multi-source association features of proteins. Finally, the Random Forest classifier and corresponding optimal parameters are used for training and prediction. In the results, MultiPPIs obtains an average 86.03% prediction accuracy with 82.69% sensitivity at the AUC of 93.03% under five-fold cross-validation. The experimental results indicate that MultiPPIs has a good prediction performance and provides valuable insights into the field of potential protein–protein interactions prediction. MultiPPIs is free available at https://github.com/jiboyalab/multiPPIs.
Article
Atrazine (ATR), one of the most used herbicides worldwide, causes persistent contamination of water and soil due to its high resistance to degradation. ATR is associated with low fertility and increased risk of prostate cancer in humans, as well as birth defects, low birth weight and premature delivery. Describing ATR binding to human serum albumin (HSA) is clinically relevant to future studies about pharmacokinetics, pharmacodynamics and toxicity of ATR, as albumin is the most abundant carrier protein in plasma and binds important small biological molecules. In this work we characterize, for the first time, the binding of ATR to HSA by using fluorescence spectroscopy and performing simulations using molecular docking, classical molecular dynamics and quantum biochemistry based on density functional theory (DFT). We determine the most likely binding sites of ATR to HSA, highlighting the fatty acid binding site FA8 (located between subdomains IA-IB-IIA and IIB-IIIA-IIIB) as the most important one, and evaluate each nearby amino acid residue contribution to the binding interactions explaining the fluorescence quenching due to ATR complexation with HSA. The stabilization of the ATR/FA8 complex was also aided by the interaction between the atrazine ring and SER454 (hydrogen bond) and LEU481(alkyl interaction).
Article
Full-text available
Since the large-scale experimental characterization of protein-protein interactions (PPIs) is not possible for all species, several computational PPI prediction methods have been developed that harness existing data from other species. While PPI network prediction has been extensively used in eukaryotes, microbial network inference has lagged behind. However, bacterial interactomes can be built using the same principles and techniques; in fact, several methods are better suited to bacterial genomes. These predicted networks allow systems-level analyses in species that lack experimental interaction data. This review describes the current network inference and analysis techniques and summarizes the use of computationally-predicted microbial interactomes to date.
Article
Diseases caused by bacterial infections become a critical problem in public heath. Antibiotic, the traditional treatment, gradually loses their effectiveness due to the resistance. Meanwhile, antibacterial proteins attract more attention because of broad spectrum and little harm to host cells. Therefore, exploring new effective antibacterial proteins is urgent and necessary. In this paper, we are committed to evaluating the effectiveness of ab-initio docking methods in antibacterial protein-protein docking. For this purpose, we constructed a three-dimensional (3D) structure dataset of antibacterial protein complex, called APCset, which contained $19$ protein complexes whose receptors or ligands are homologous to antibacterial peptides from Antimicrobial Peptide Database. Then we selected five representative ab-initio protein-protein docking tools including ZDOCK3.0.2, FRODOCK3.0, ATTRACT, PatchDock and Rosetta to identify these complexes' structure, whose performance differences were obtained by analyzing from five aspects, including top/best pose, first hit, success rate, average hit count and running time. Finally, according to different requirements, we assessed and recommended relatively efficient protein-protein docking tools. In terms of computational efficiency and performance, ZDOCK was more suitable as preferred computational tool, with average running time of $6.144$ minutes, average Fnat of best pose of $0.953$ and average rank of best pose of $4.158$. Meanwhile, ZDOCK still yielded better performance on Benchmark 5.0, which proved ZDOCK was effective in performing docking on large-scale dataset. Our survey can offer insights into the research on the treatment of bacterial infections by utilizing the appropriate docking methods.
Article
Full-text available
Objectives: Leishmaniasis is one of the common forms of neglected parasitic diseases that cause a worldwide disease burden without any effective therapeutic strategy. Control of the disease currently relies on chemotherapy because most of the available drugs have toxic side-effects and drug-resistant strains have emerged. Therefore, the development of new therapeutic strategies to treat patients for leishmaniasis has become a priority. The first step in drug discovery is to identify an effective drug target by methods such as system biology. Protein kinases are a promising drug target for different diseases. Due to lack of a functional krebs cycle in Leishmania species, they use glycolysis as the only source of ATP generation. Pyruvate kinase is the enzyme involved in the last step of glycolysis and considered as essential enzyme for the Leishmania survival. Materials and methods: This study sought to discover FDA approved compounds against the leishmanial pyruvate kinase protein. Our approach involved using quantitative proteomics, protein interaction networks and docking to detect new drug targets and potent inhibitors. Results: Pyruvate kinase was determined as the potential drug target based on protein network analysis. The docking studies suggested trametinib and irinotecan with high binding energies of -10.4 and -10.3 kcal/mol, respectively, as the potential chemotherapeutic agents against L. major. Conclusion: This study demonstrated the importance of integrating protein network analysis and molecular docking to identify new anti-leishmanial drugs. These potential inhibitors constitute novel drug candidates that should be tested in vitro and in vivo to determine their potential as an alternative chemotherapy in the treatment of leishmaniasis.
Article
This review focuses on pharmacophore approaches in researching protein interfaces that bind protein ligands. Pharmacophore descriptions of binding interfaces that employ molecular dynamics simulation can account for effects of solvation and conformational flexibility. In addition, these calculations provide an approximation to entropic considerations and as such, a better approximation of the free energy of binding. Residue-based pharmacophore approaches can facilitate a variety of drug discovery tasks such as the identification of receptor–ligand partners, identifying their binding poses, designing protein interfaces for selectivity, or defining a reduced mutational combinatorial exploration for subsequent experimental engineering techniques by orders of magnitudes.
Article
The complement system is a complex network of soluble and membrane-associated serum proteins that regulate immune response. Activation of the complement C5 generates C5a and C5b which generate chemoattractive effect on myeloid cells and initiate the membrane attack complex (MAC) assembly. However, the study of evolutionary process and systematic function of C5 are still limited. In this study, we performed an evolutionary analysis of C5. Phylogeny analysis indicated that C5 sequences underwent complete divergence in fish and non-fish vertebrate. It was found that codon usage bias improved and provided evolution evidence of C5 in species. Notably, the codon usage bias of grass carp was evolutionarily closer to the zebrafish genome compared with humans and stickleback. This suggested that the zebrafish cell line may provide an alternative environment for heterologous protein expression of grass carp. Sequence comparison showed a higher similarity between human and mouse, grass carp, and zebrafish. Moreover, selective pressure analysis revealed that the C5 genes in fish and non-fish vertebrates exhibited different evolutionary patterns. To study the function of C5, gene co–expression networks of human and zebrafish were built which revealed the complexity of C5 function networks in different species. The protein structure simulation of C5 indicated that grass carp and zebrafish are more similar than to human, however, differences between species in C5a proteins are extremely smaller. Spatial conformations of C5a–C5AR (CD88) protein complex were constructed, which showed that possible interaction may exist between C5a and CD88 proteins. Furthermore, the protein docking sites/residues were measured and calculated according to the minimum distance for all atoms from C5a and CD88 proteins. In summary, this study provides insights into the evolutionary history, function and potential regulatory mechanism of C5 in fish immune responses.
Chapter
Protein-protein docking algorithms are powerful computational tools, capable of analyzing the protein-protein interactions at the atomic-level. In this chapter, we will review the theoretical concepts behind different protein-protein docking algorithms, highlighting their strengths as well as their limitations and pointing to important case studies for each method. The methods we intend to cover in this chapter include various search strategies and scoring techniques. This includes exhaustive global search, fast Fourier transform search, spherical Fourier transform-based search, direct search in Cartesian space, local shape feature matching, geometric hashing, genetic algorithm, randomized search, and Monte Carlo search. We will also discuss the different ways that have been used to incorporate protein flexibility within the docking procedure and some other future directions in this field, suggesting possible ways to improve the different methods.
Chapter
Protein-protein docking algorithms are powerful computational tools, capable of analyzing the protein-protein interactions at the atomic-level. In this chapter, we will review the theoretical concepts behind different protein-protein docking algorithms, highlighting their strengths as well as their limitations and pointing to important case studies for each method. The methods we intend to cover in this chapter include various search strategies and scoring techniques. This includes exhaustive global search, fast Fourier transform search, spherical Fourier transform-based search, direct search in Cartesian space, local shape feature matching, geometric hashing, genetic algorithm, randomized search, and Monte Carlo search. We will also discuss the different ways that have been used to incorporate protein flexibility within the docking procedure and some other future directions in this field, suggesting possible ways to improve the different methods.
Chapter
Nowadays molecular docking has become an important methodology in CADD (Computer-Aided Drug Design)-assisted drug discovery process. It is an important computational tool widely used to predict binding mode, binding affinity and binding free energy of a protein-ligand complex. The important factors responsible for accurate results in docking studies are correct binding site prediction, use of suitable small-molecule databases, consistent docking pose, high dock score with good MD (Molecular Dynamics), clarity whether the compound is an inhibitor or agonist, etc. However, still there are several limitations which make it difficult to obtain accurate results from docking studies. In this chapter, the main focus is on recent advancements in various aspects of molecular docking such as ligand sampling, protein flexibility, scoring functions, fragment docking, post-processing, docking into homology models and protein-protein docking.
Article
Full-text available
Biofilms have a significant role in microbial persistence, antibiotic resistance, and chronic infections; consequently, there is a pressing need for development of novel “anti-biofilm strategies.” One of the fundamental mechanisms involved in biofilm formation is protein–protein interactions of “amyloid-like proteins” (ALPs) in the extracellular matrix. Such interactions could be potential targets for development of novel anti-biofilm strategies; therefore, assessing the structural features of these interactions could be of great scientific value. Characterization of structural features the of protein–protein interaction with conventional structure biology tools including X-ray diffraction and nuclear magnetic resonance is technically challenging, expensive, and time-consuming. In contrast, modeling such interactions is time-efficient and economical, and might provide deeper understanding of structural basis of interactions. Although it is often acknowledged that molecular modeling methods have varying accuracy, their careful implementation with supplementary verification methods can provide valuable insight and directions for future studies. With this reasoning, during the present study, the protein–protein interaction of TasA(28–261)–TapA(33–253) (which is a decisive process for biofilm formation by Bacillus subtilis) was modeled using in silico approaches, viz., molecular modeling, protein–protein docking, and molecular dynamics simulations. Results obtained here identified amino acid residues present within intrinsically disordered regions of both proteins to be critical for interaction. These results were further supported with principal component analyses (PCA) and free energy landscape (FEL) analyses. Results presented here represent novel finding, and we hypothesize that amino acid residues identified during the present study could be targeted for inhibition of biofilm formation by B. subtilis.
Article
Full-text available
We used a nonredundant set of 621 protein–protein interfaces of known high-resolution structure to derive residue composition and residue–residue contact preferences. The residue composition at the interfaces, in entire proteins and in whole genomes correlates well, indicating the statistical strength of the data set. Differences between amino acid distributions were observed for interfaces with buried surface area of less than 1,000 Ų versus interfaces with area of more than 5,000 Ų. Hydrophobic residues were abundant in large interfaces while polar residues were more abundant in small interfaces. The largest residue–residue preferences at the interface were recorded for interactions between pairs of large hydrophobic residues, such as Trp and Leu, and the smallest preferences for pairs of small residues, such as Gly and Ala. On average, contacts between pairs of hydrophobic and polar residues were unfavorable, and the charged residues tended to pair subject to charge complementarity, in agreement with previous reports. A bootstrap procedure, lacking from previous studies, was used for error estimation. It showed that the statistical errors in the set of pairing preferences are generally small; the average standard error is ≈0.2, i.e., about 8% of the average value of the pairwise index (2.9). However, for a few pairs (e.g., Ser–Ser and Glu–Asp) the standard error is larger in magnitude than the pairing index, which makes it impossible to tell whether contact formation is favorable or unfavorable. The results are interpreted using physicochemical factors and their implications for the energetics of complex formation and for protein docking are discussed. Proteins 2001;43:89–102. © 2001 Wiley-Liss, Inc.
Article
Full-text available
We present a rapidly executable minimal binding energy model for molecular docking and use it to explore the energy landscape in the vicinity of the binding sites of four different enzyme inhibitor complexes. The structures of the complexes are calculated starting with the crystal structures of the free monomers, using DOCK 4.0 to generate a large number of potential configurations, and screening with the binding energy target function. In order to investigate possible correlations between energy and variation from the native structure, we introduce a new measure of similarity, which removes many of the difficulties associated with root mean square deviation. The analysis uncovers energy gradients, or funnels, near the binding site, with decreasing energy as the degree of similarity between the native and docked structures increases. Such energy funnels can increase the number of random collisions that may evolve into productive stable complex, and indicate that short-range interactions in the precomplexes can contribute to the association rate. The finding could provide an explanation for the relatively rapid association rates that are observed even in the absence of long-range electrostatic steering. Proteins 1999; 34:255–267. © 1999 Wiley-Liss, Inc.
Article
Full-text available
A geometric recognition algorithm was developed to identify molecular surface complementarity. It is based on a purely geometric approach and takes advantage of techniques applied in the field of pattern recognition. The algorithm involves an automated procedure including (i) a digital representation of the molecules (derived from atomic coordinates) by three-dimensional discrete functions that distinguishes between the surface and the interior; (ii) the calculation, using Fourier transformation, of a correlation function that assesses the degree of molecular surface overlap and penetration upon relative shifts of the molecules in three dimensions; and (iii) a scan of the relative orientations of the molecules in three dimensions. The algorithm provides a list of correlation values indicating the extent of geometric match between the surfaces of the molecules; each of these values is associated with six numbers describing the relative position (translation and rotation) of the molecules. The procedure is thus equivalent to a six-dimensional search but much faster by design, and the computation time is only moderately dependent on molecular size. The procedure was tested and validated by using five known complexes for which the correct relative position of the molecules in the respective adducts was successfully predicted. The molecular pairs were deoxyhemoglobin and methemoglobin, tRNA synthetase-tyrosinyl adenylate, aspartic proteinase-peptide inhibitor, and trypsin-trypsin inhibitor. A more realistic test was performed with the last two pairs by using the structures of uncomplexed aspartic proteinase and trypsin inhibitor, respectively. The results are indicative of the extent of conformational changes in the molecules tolerated by the algorithm.
Article
Full-text available
This review examines protein complexes in the Brookhaven Protein Databank to gain a better understanding of the principles governing the interactions involved in protein-protein recognition. The factors that influence the formation of protein-protein complexes are explored in four different types of protein-protein complexes--homodimeric proteins, heterodimeric proteins, enzyme-inhibitor complexes, and antibody-protein complexes. The comparison between the complexes highlights differences that reflect their biological roles.
Article
Full-text available
Crystallization of the 1:1 molecular complex between the beta-lactamase TEM-1 and the beta-lactamase inhibitory protein BLIP has provided an opportunity to put a stringent test on current protein-docking algorithms. Prior to the successful determination of the structure of the complex, nine laboratory groups were given the refined atomic coordinates of each of the native molecules. Other than the fact that BLIP is an effective inhibitor of a number of beta-lactamase enzymes (KI for TEM-1 approximately 100 pM) no other biochemical or structural data were available to assist the practitioners in their molecular docking. In addition, it was not known whether the molecules underwent conformational changes upon association or whether the inhibition was competitive or non-competitive. All six of the groups that accepted the challenge correctly predicted the general mode of association of BLIP and TEM-1.
Article
Full-text available
We report the computer generation of a high-density map of the thermodynamic properties of the diffusion-accessible encounter conformations of four receptor-ligand protein pairs, and use it to study the electrostatic and desolvation components of the free energy of association. Encounter complex conformations are generated by sampling the translational/rotational space of the ligand around the receptor, both at 5-A and zero surface-to-surface separations. We find that partial desolvation is always an important effect, and it becomes dominant for complexes in which one of the reactants is neutral or weakly charged. The interaction provides a slowly varying attractive force over a small but significant region of the molecular surface. In complexes with no strong charge complementarity this region surrounds the binding site, and the orientation of the ligand in the encounter conformation with the lowest desolvation free energy is similar to the one observed in the fully formed complex. Complexes with strong opposite charges exhibit two types of behavior. In the first group, represented by barnase/barstar, electrostatics exerts strong orientational steering toward the binding site, and desolvation provides some added adhesion within the local region of low electrostatic energy. In the second group, represented by the complex of kallikrein and pancreatic trypsin inhibitor, the overall stability results from the rather nonspecific electrostatic attraction, whereas the affinity toward the binding region is determined by desolvation interactions.
Article
A new computationally efficient and automated “soft docking” algorithm is described to assist the prediction of the mode of binding between two proteins, using the three-dimensional structures of the unbound molecules. The method is implemented in a software package called BiGGER (Bimolecular Complex Generation with Global Evaluation and Ranking) and works in two sequential steps: first, the complete 6-dimensional binding spaces of both molecules is systematically searched. A population of candidate protein-protein docked geometries is thus generated and selected on the basis of the geometric complementarity and amino acid pairwise affinities between the two molecular surfaces. Most of the conformational changes observed during protein association are treated in an implicit way and test results are equally satisfactory, regardless of starting from the bound or the unbound forms of known structures of the interacting proteins. In contrast to other methods, the entire molecular surfaces are searched during the simulation, using absolutely no additional information regarding the binding sites. In a second step, an interaction scoring function is used to rank the putative docked structures. The function incorporates interaction terms that are thought to be relevant to the stabilization of protein complexes. These include: geometric complementarity of the surfaces, explicit electrostatic interactions, desolvation energy, and pairwise propensities of the amino acid side chains to contact across the molecular interface. The relative functional contribution of each of these interaction terms to the global scoring function has been empirically adjusted through a neural network optimizer using a learning set of 25 protein-protein complexes of known crystallographic structures. In 22 out of 25 protein-protein complexes tested, near-native docked geometries were found with Cα RMS deviations ≤ 4.0 Å from the experimental structures, of which 14 were found within the 20 top ranking solutions. The program works on widely available personal computers and takes 2 to 8 hours of CPU time to run any of the docking tests herein presented. Finally, the value and limitations of the method for the study of macromolecular interactions, not yet revealed by experimental techniques, are discussed. Proteins 2000;39:372–384. © 2000 Wiley-Liss, Inc.
Article
Rigid-body methods, particularly Fourier correlation techniques, are very efficient for docking bound (co-crystallized) protein conformations using measures of surface complementarity as the target function. However, when docking unbound (separately crystallized) conformations, the method generally yields hundreds of false positive structures with good scores but high root mean square deviations (RMSDs). This paper describes a two-step scoring algorithm that can discriminate near-native conformations (with less than 5 Å RMSD) from other structures. The first step includes two rigid-body filters that use the desolvation free energy and the electrostatic energy to select a manageable number of conformations for further processing, but are unable to eliminate all false positives. Complete discrimination is achieved in the second step that minimizes the molecular mechanics energy of the retained structures, and re-ranks them with a combined free-energy function which includes electrostatic, solvation, and van der Waals energy terms. After minimization, the improved fit in near-native complex conformations provides the free-energy gap required for discrimination. The algorithm has been developed and tested using docking decoys, i.e., docked conformations generated by Fourier correlation techniques. The decoy sets are available on the web for testing other discrimination procedures. Proteins 2000;40:525–537. © 2000 Wiley-Liss, Inc.
Article
We used a nonredundant set of 621 protein–protein interfaces of known high-resolution structure to derive residue composition and residue–residue contact preferences. The residue composition at the interfaces, in entire proteins and in whole genomes correlates well, indicating the statistical strength of the data set. Differences between amino acid distributions were observed for interfaces with buried surface area of less than 1,000 Ų versus interfaces with area of more than 5,000 Ų. Hydrophobic residues were abundant in large interfaces while polar residues were more abundant in small interfaces. The largest residue–residue preferences at the interface were recorded for interactions between pairs of large hydrophobic residues, such as Trp and Leu, and the smallest preferences for pairs of small residues, such as Gly and Ala. On average, contacts between pairs of hydrophobic and polar residues were unfavorable, and the charged residues tended to pair subject to charge complementarity, in agreement with previous reports. A bootstrap procedure, lacking from previous studies, was used for error estimation. It showed that the statistical errors in the set of pairing preferences are generally small; the average standard error is ≈0.2, i.e., about 8% of the average value of the pairwise index (2.9). However, for a few pairs (e.g., Ser–Ser and Glu–Asp) the standard error is larger in magnitude than the pairing index, which makes it impossible to tell whether contact formation is favorable or unfavorable. The results are interpreted using physicochemical factors and their implications for the energetics of complex formation and for protein docking are discussed. Proteins 2001;43:89–102. © 2001 Wiley-Liss, Inc.
Article
Free energy potentials, combining molecular mechanics with empirical solvation and entropic terms, are used to discriminate native and near-native protein conformations from slightly misfolded decoys. Since the functional forms of these potentials vary within the field, it is of interest to determine the contributions of individual free energy terms and their combinations to the discriminative power of the potential. This is achieved in terms of quantitative measures of discrimination that include the correlation coefficient between RMSD and free energy, and a new measure labeled the minimum discriminatory slope (MDS). In terms of these criteria, the internal energy is shown to be a good discriminator on its own, which implies that even well-constructed decoys are substantially more strained than the native protein structure. The discrimination improves if, in addition to the internal energy, the free energy expression includes the electrostatic energy, calculated by assuming non-ionized side chains, and an empirical solvation term, with the classical atomic solvation parameter model providing slightly better discrimination than a structure-based atomic contact potential. Finally, the inclusion of a term representing the side chain entropy change, and calculated by an established empirical scale, is so inaccurate that it makes the discrimination worse. It is shown that both the correlation coefficient and the MDS value (or its dimensionless form) are needed for an objective assessment of a potential, and that together they provide much more information on the origins of discrimination than simple inspection of the RMSD-free energy plots. Proteins 2000;41:518–534. © 2000 Wiley-Liss, Inc.
Article
A new computationally efficient and automated “soft docking” algorithm is described to assist the prediction of the mode of binding between two proteins, using the three-dimensional structures of the unbound molecules. The method is implemented in a software package called BiGGER (Bimolecular Complex Generation with Global Evaluation and Ranking) and works in two sequential steps: first, the complete 6-dimensional binding spaces of both molecules is systematically searched. A population of candidate protein-protein docked geometries is thus generated and selected on the basis of the geometric complementarity and amino acid pairwise affinities between the two molecular surfaces. Most of the conformational changes observed during protein association are treated in an implicit way and test results are equally satisfactory, regardless of starting from the bound or the unbound forms of known structures of the interacting proteins. In contrast to other methods, the entire molecular surfaces are searched during the simulation, using absolutely no additional information regarding the binding sites. In a second step, an interaction scoring function is used to rank the putative docked structures. The function incorporates interaction terms that are thought to be relevant to the stabilization of protein complexes. These include: geometric complementarity of the surfaces, explicit electrostatic interactions, desolvation energy, and pairwise propensities of the amino acid side chains to contact across the molecular interface. The relative functional contribution of each of these interaction terms to the global scoring function has been empirically adjusted through a neural network optimizer using a learning set of 25 protein-protein complexes of known crystallographic structures. In 22 out of 25 protein-protein complexes tested, near-native docked geometries were found with Cα RMS deviations ≤ 4.0 Å from the experimental structures, of which 14 were found within the 20 top ranking solutions. The program works on widely available personal computers and takes 2 to 8 hours of CPU time to run any of the docking tests herein presented. Finally, the value and limitations of the method for the study of macromolecular interactions, not yet revealed by experimental techniques, are discussed. Proteins 2000;39:372–384. © 2000 Wiley-Liss, Inc.
Article
A number of studies have addressed the question of which are the critical residues at protein-binding sites. These studies examined either a single or a few protein–protein interfaces. The most extensive study to date has been an analysis of alanine-scanning mutagenesis. However, although the total number of mutations was large, the number of protein interfaces was small, with some of the interfaces closely related.
Article
We present a new shape-based polynomial time algorithm for the rapid docking of rigid ligands into their macromolecular receptors. The method exploits molecular surface complementarity existing between a putative ligand and its receptor protein. Molecular shapes are represented by using a new shape descriptor that is based on local quadratic approximations to the molecular surface. The quadratic shape descriptor is capable of representing a plethora of molecular shapes and is not limited to describing convex or concave regions of molecular surface. A single pair of complementary descriptors is sufficient for computing the transformation matrix that positions a ligand into the receptor site. We demonstrate the capabilities of our algorithm by successfully reproducing the crystallographically determined orientation for a test set of 20 ligand-protein complexes. Proteins 2000;38:79–94. © 2000 Wiley-Liss, Inc.
Article
Empirical residue–residue pair potentials are used to screen possible complexes for protein–protein dockings. A correct docking is defined as a complex with not more than 2.5 Å root-mean-square distance from the known experimental structure. The complexes were generated by “ftdock” (Gabb et al. J Mol Biol 1997;272:106–120) that ranks using shape complementarity. The complexes studied were 5 enzyme-inhibitors and 2 antibody-antigens, starting from the unbound crystallographic coordinates, with a further 2 antibody-antigens where the antibody was from the bound crystallographic complex. The pair potential functions tested were derived both from observed intramolecular pairings in a database of nonhomologous protein domains, and from observed intermolecular pairings across the interfaces in sets of nonhomologous heterodimers and homodimers. Out of various alternate strategies, we found the optimal method used a mole-fraction calculated random model from the intramolecular pairings. For all the systems, a correct docking was placed within the top 12% of the pair potential score ranked complexes. A combined strategy was developed that incorporated “multidock,” a side-chain refinement algorithm (Jackson et al. J Mol Biol 1998;276:265–285). This placed a correct docking within the top 5 complexes for enzyme-inhibitor systems, and within the top 40 complexes for antibody–antigen systems. Proteins 1999;35:364–373. © 1999 Wiley-Liss, Inc.
Article
We present a new computational method of docking pairs of proteins by using spherical polar Fourier correlations to accelerate the search for candidate low-energy conformations. Interaction energies are estimated using a hydrophobic excluded volume model derived from the notion of “overlapping surface skins,” augmented by a rigorous but “soft” model of electrostatic complementarity. This approach has several advantages over former three-dimensional grid-based fast Fourier transform (FFT) docking correlation methods even though there is no analogue to the FFT in a spherical polar representation. For example, a complete search over all six rigid-body degrees of freedom can be performed by rotating and translating only the initial expansion coefficients, many infeasible orientations may be eliminated rapidly using only low-resolution terms, and the correlations are easily localized around known binding epitopes when this knowledge is available. Typical execution times on a single processor workstation range from 2 hours for a global search (5 × 10⁸ trial orientations) to a few minutes for a local search (over 6 × 10⁷ orientations). The method is illustrated with several domain dimer and enzyme–inhibitor complexes and 20 large antibody–antigen complexes, using both the bound and (when available) unbound subunits. The correct conformation of the complex is frequently identified when docking bound subunits, and a good docking orientation is ranked within the top 20 in 11 out of 18 cases when starting from unbound subunits. Proteins 2000;39:178–194. © 2000 Wiley-Liss, Inc.
Article
Rigid-body methods, particularly Fourier correlation techniques, are very efficient for docking bound (co-crystallized) protein conformations using measures of surface complementarity as the target function. However, when docking unbound (separately crystallized) conformations, the method generally yields hundreds of false positive structures with good scores but high root mean square deviations (RMSDs). This paper describes a two-step scoring algorithm that can discriminate near-native conformations (with less than 5 Å RMSD) from other structures. The first step includes two rigid-body filters that use the desolvation free energy and the electrostatic energy to select a manageable number of conformations for further processing, but are unable to eliminate all false positives. Complete discrimination is achieved in the second step that minimizes the molecular mechanics energy of the retained structures, and re-ranks them with a combined free-energy function which includes electrostatic, solvation, and van der Waals energy terms. After minimization, the improved fit in near-native complex conformations provides the free-energy gap required for discrimination. The algorithm has been developed and tested using docking decoys, i.e., docked conformations generated by Fourier correlation techniques. The decoy sets are available on the web for testing other discrimination procedures. Proteins 2000;40:525–537. © 2000 Wiley-Liss, Inc.
Article
Several global optimization algorithms were applied to the problem of molecular docking: random walk and Metropolis Monte Carlo Simulated Annealing as references, and Stochastic Approximation with Smoothing (SAS), and Terminal Repeller Unconstrained Subenergy Tunneling (TRUST) as new methodologies. Of particular interest is whether any of these algorithms could be used to dock a database of typical small molecules in a reasonable amount of time. To address this question, each algorithm was used to dock four small molecules presenting a wide range of sizes, degrees of flexibility, and types of interactions. Of the algorithms tested, only stochastic approximation with smoothing appeared to be sufficiently fast and reliable to be useful for database searches. This algorithm can reliably dock relatively small and fairly rigid molecules in a few seconds, and larger and more flexible molecules in a few minutes. The remaining algorithms tested were able to reliably dock the small and fairly rigid molecules, but showed little or no reliability when docking large or flexible molecules. in addition, to decrease the error in the typical grid-based energy evaluations a new form of interpolation, logarithmic interpolation, is proposed. This interpolation scheme is shown to both quantitatively reduce the numerical error and practically to improve the docking results. (C) 1999 John Wiley & Sons, Inc.
Article
A new program named "DARWIN" has been developed to perform docking calculations with proteins and other biological molecules. The program uses the Genetic Algorithm to optimize the molecule's conformation and orientation under the selective pressure of minimizing the potential energy of the complex. A unique feature of DARWIN is that it communicates with the molecular mechanics program CHARMM to make the energy calculations. A second important feature is its parallel interface, which allows simultaneous use of multiple stand-alone copies of CHARMM to rapidly evaluate large numbers of potential solutions. This permits an "accuracy first" approach to docking, which avoids many of the common assumptions and shortcuts often made to reduce computation time. The method was applied to three protein-carbohydrate complexes: the crystallographically determined structures of Concanavalin A and Fab Se155-4; and a model structure for Fab ME36.1. Conformations close to the crystal structures were obtained with this approach, but some "false positive" solutions were also selected. Many of these could be eliminated by introducing different methods for simulating solvent effects. An effective screening method for docking a database of compounds to a single target enzyme using DARWIN is also presented. Proteins 2000;41:173-191. (C) 2000 Wiley-Liss, Inc.
Article
The use of computer simulations in investigations of protein−protein interactions is discussed. First, crystallographic analyses of known protein−protein complexes are summarized with particular emphasis being placed on the atomic nature of the interactions. Models available for describing macromolecular association energetics are then discussed, with special reference to the treatment of electrostatic and nonpolar interactions. The use of these models in combination with efficient search methods is discussed in the context of the so-called protein docking problem and in the description of weaker (i.e., noncrystallizable) protein−protein interactions. Finally, simulations of the dynamics of protein−protein association events are outlined. In all cases, differences are stressed between the atomically detailed view of protein−protein interactions and the view implicit in the use of simpler colloidal models.
Article
We provide some tests of the convex global underestimator (CGU) algorithm, which aims to find global minima on funnel-shaped energy landscapes. We use two different potential functions—the reduced Lennard–Jones cluster potential, and the modified Sun protein folding potential, to compare the CGU algorithm with the simplest versions of the traditional trajectory-based search methods, simulated annealing (SA), and Monte Carlo (MC). For both potentials, the CGU reaches energies lower on the landscapes than both SA and MC, even when SA and MC are given the same number of starting points as in a full CGU run or when all methods are given the same amount of computer time. The CGU consistently finds the global minima of the Lennard–Jones potential for all cases with up to at least n=30 degrees of freedom. Finding the global or near-global minimum in the CGU method requires polynomial time [scaling between O(n3) and O(n4)], on average. ©1999 John Wiley & Sons, Inc. J Comput Chem 20: 1527–1532, 1999
Chapter
Introduction The need for protein–protein and protein–DNA dockingOverview of the computational approachScope of this chapterStructural studies of protein complexesMethodology of a protein–protein docking strategy Rigid body docking by Fourier correlation theoryUse of residue pair potentials to re-rank docked complexesUse of distance constraintsRefinement and additional screening of complexesImplementation of the docking suiteResults from the protein–protein docking strategyModelling protein–DNA complexes MethodResultsStrategies for protein–protein docking Evaluation of the results of docking simulationsFourier correlation methodsOther rigid-body docking approachesFlexible protein–protein dockingRigid-body treatment to re-rank putative docked complexesIntroduction of flexibility to re-rank putative docked complexesBlind trials of protein–protein dockingEnergy landscape for protein dockingConclusions The need for protein–protein and protein–DNA dockingOverview of the computational approachScope of this chapter Rigid body docking by Fourier correlation theoryUse of residue pair potentials to re-rank docked complexesUse of distance constraintsRefinement and additional screening of complexesImplementation of the docking suite Method Results Evaluation of the results of docking simulationsFourier correlation methodsOther rigid-body docking approachesFlexible protein–protein dockingRigid-body treatment to re-rank putative docked complexesIntroduction of flexibility to re-rank putative docked complexes
Article
Because of their wide use in molecular modeling, methods to compute molecular surfaces have received a lot of interest in recent years. However, most of the proposed algorithms compute the analytical representation of only the solvent-accessible surface. There are a few programs that compute the analytical representation of the solvent-excluded surface, but they often have problems handling singular cases of self-intersecting surfaces and tend to fail on large molecules (more than 10,000 atoms). We describe here a program called MSMS, which is shown to be fast and reliable in computing molecular surfaces. It relies on the use of the reduced surface that is briefly defined here and from which the solvent-accessible and solvent-excluded surfaces are computed. The four algorithms composing MSMS are described and their complexity is analyzed. Special attention is given to the handling of self-intersecting parts of the solvent-excluded surface called singularities. The program has been compared with Connolly's program PQMS [M. L. Connolly (1993) Journal of Molecular Graphics, Vol. 11, pp. 139–141] on a set of 709 molecules taken from the Brookhaven Data Base. MSMS was able to compute topologically correct surfaces for each molecule in the set. Moreover, the actual time spent to compute surfaces is in agreement with the theoretical complexity of the program, which is shown to be O[n log(n)] for n atoms. On a Hewlett-Packard 9000/735 workstation, MSMS takes 0.73 s to produce a triangulated solvent-excluded surface for crambin (1crn, 46 residues, 327 atoms, 4772 triangles), 4.6 s for thermolysin (3tln, 316 residues, 2437 atoms, 26462 triangles), and 104.53 s for glutamine synthetase (2gls, 5676 residues, 43632 atoms, 476665 triangles). © 1996 John Wiley & Sons, Inc.
Article
The methods of continuum electrostatics are used to calculate the binding free energies of a set of protein–protein complexes including experimentally determined structures as well as other orientations generated by a fast docking algorithm. In the native structures, charged groups that are deeply buried were often found to favor complex formation (relative to isosteric nonpolar groups), whereas in nonnative complexes generated by a geometric docking algorithm, they were equally likely to be stabilizing as destabilizing. These observations were used to design a new filter for screening docked conformations that was applied, in conjunction with a number of geometric filters that assess shape complementarity, to 15 antibody–antigen complexes and 14 enzyme-inhibitor complexes. For the bound docking problem, which is the major focus of this paper, native and near-native solutions were ranked first or second in all but two enzyme-inhibitor complexes. Less success was encountered for antibody–antigen complexes, but in all cases studied, the more complete free energy evaluation was able to idey native and near-native structures. A filter based on the enrichment of tyrosines and tryptophans in antibody binding sites was applied to the antibody–antigen complexes and resulted in a native and near-native solution being ranked first and second in all cases. A clear improvement over previously reported results was obtained for the unbound antibody–antigen examples as well. The algorithm and various filters used in this work are quite efficient and are able to reduce the number of plausible docking orientations to a size small enough so that a final more complete free energy evaluation on the reduced set becomes computationally feasible.
Article
We examine a simple kinetic model for association that incorporates the basic features of protein-protein recognition within the rigid body approximation, that is, when no large conformation change occurs. Association starts with random collision at the rate kcoll predicted by the Einstein-Smoluchowski equation. This creates an encounter pair that can evolve into a stable complex if and only if the two molecules are correctly oriented and positioned, which has a probability pr. In the absence of long-range interactions, the bimolecular rate of association is pr kcoll. Long-range electrostatic interactions affect both kcoll and pr. The collision rate is multiplied by qt, a factor larger than 1 when the molecules carry net charges of opposite sign as coulombic attraction makes collisions more frequent, and less than 1 in the opposite case. The probability pr is multiplied by a factor qr that represents the steering effect of electric dipoles, which preorient the molecules before they collide. The model is applied to experimental data obtained by Schreiber and Fersht (Nat. Struct. Biol. 3:427–431, 1996) on the kinetics of barnase-barstar association. When long-range electrostatic interactions are fully screened or mutated away, qtqr ≈1, and the observed rate of productive collision is pr kcoll ≈105 M−1 · s−1. Under these conditions, pr ≈1.5 · 10−5 is determined by geometric constraints corresponding to a loss of rotational freedom. Its value is compatible with computer docking simulations and implies a rotational entropy loss ΔSrot ≈ 22 e.u. in the transition state. At low ionic strength, long-range electrostatic interactions accelerate barnase-barstar association by a factor qtqrof up to 105 as favorable charge-charge and charge-dipole interactions work together to make it much faster than free diffusion would allow. Proteins 28:153–161, 1997. © 1997 Wiley-Liss Inc.
Article
The goal of this study is to verify the concept of the funnel-like intermolecular energy landscape in protein–protein interactions by use of a series of computational experiments. Our preliminary analysis revealed the existence of the funnel in many protein–protein interactions. However, because of the uncertainties in the modeling of these interactions and the ambiguity of the analysis procedures, the detection of the funnels requires detailed quantitative approaches to the energy landscape analysis. A number of such approaches are presented in this study. We show that the funnel detection problem is equivalent to a problem of distinguishing between distributions of low-energy intermolecular matches in the funnel and in the low-frequency landscape fluctuations. If the fluctuations are random, the decision about whether the minimum is the funnel is equivalent to determining whether this minimum is significantly different from a would-be random one. A database of 475 nonredundant cocrystallized protein–protein complexes was used to re-dock the proteins by use of smoothed potentials. To detect the funnel, we developed a set of sophisticated models of random matches. The funnel was considered detected if the binding area was more populated by the low-energy docking predictions than by the matches generated in the random models. The number of funnels detected by use of different random models varied significantly. However, the results confirmed that the funnel may be the general feature in protein–protein association.
Article
A number of studies have addressed the question of which are the critical residues at protein-binding sites. These studies examined either a single or a few protein–protein interfaces. The most extensive study to date has been an analysis of alanine-scanning mutagenesis. However, although the total number of mutations was large, the number of protein interfaces was small, with some of the interfaces closely related. Here we show that although overall binding sites are hydrophobic, they are studded with specific, conserved polar residues at specific locations, possibly serving as energy “hot spots.” Our results confirm and generalize the alanine-scanning data analysis, despite its limited size. Previously Trp, Arg, and Tyr were shown to constitute energetic hot spots. These were rationalized by their polar interactions and by their surrounding rings of hydrophobic residues. However, there was no compelling reason as to why specifically these residues were conserved. Here we show that other polar residues are similarly conserved. These conserved residues have been detected consistently in all interface families that we have examined. Our results are based on an extensive examination of residues which are in contact across protein interfaces. We utilize all clustered interface families with at least five members and with sequence similarity between the members in the range of 20–90%. There are 11 such clustered interface families, comprising a total of 97 crystal structures. Our three-dimensional superpositioning analysis of the occurrences of matched residues in each of the families identifies conserved residues at spatially similar environments. Additionally, in enzyme inhibitors, we observe that residues are more conserved at the interfaces than at other locations. On the other hand, antibody–protein interfaces have similar surface conservation as compared to their corresponding linear sequence alignment, consistent with the suggestion that evolution has optimized protein interfaces for function. Proteins 2000;39:331–342.
Article
Here we carry out an examination of shape complementarity as a criterion in protein--protein docking and binding. Specifically, we examine the quality of shape complementarity as a critical determinant not only in the docking of 26 protein--protein "bound", complexed cases, but in particular, of 19 "unbound" protein--protein cases, where the structures have been determined separately. In all cases, entire molecular surfaces are utilized in the docking, with no consideration of the location of the active site, or of particular residues/atoms in either the receptor or the ligand which participate in the binding. To evaluate the goodness of the strictly geometry-based shape complementarity in the docking process as compared to the main favorable and unfavorable energy components, we study systematically a potential correlation between each of these components and the RMSD of the "unbound" protein--protein cases. Specifically, we examine the non-polar buried surface area, polar b...
Article
Folding funnels have been the focus of considerable attention during the last few years. These have mostly been discussed in the general context of the theory of protein folding. Here we extend the utility of the concept of folding funnels, relating them to biological mechanisms and function. In particular, here we describe the shape of the funnels in light of protein synthesis and folding; flexibility, conformational diversity, and binding mechanisms; and the associated binding funnels, illustrating the multiple routes and the range of complexed conformers. Specifically, the walls of the folding funnels, their crevices, and bumps are related to the complexity of protein folding, and hence to sequential vs. nonsequential folding. Whereas the former is more frequently observed in eukaryotic proteins, where the rate of protein synthesis is slower, the latter is more frequent in prokaryotes, with faster translation rates. The bottoms of the funnels reflect the extent of the flexibility of the proteins. Rugged floors imply a range of conformational isomers, which may be close on the energy landscape. Rather than undergoing an induced fit binding mechanism, the conformational ensembles around the rugged bottoms argue that the conformers, which are most complementary to the ligand, will bind to it with the equilibrium shifting in their favor. Furthermore, depending on the extent of the ruggedness, or of the smoothness with only a few minima, we may infer nonspecific, broad range vs. specific binding. In particular, folding and binding are similar processes, with similar underlying principles. Hence, the shape of the folding funnel of the monomer enables making reasonable guesses regarding the shape of the corresponding binding funnel. Proteins having a broad range of binding, such as proteolytic enzymes or relatively nonspecific endonucleases, may be expected to have not only rugged floors in their folding funnels, but their binding funnels will also behave similarly, with a range of complexed conformations. Hence, knowledge of the shape of the folding funnels is biologically very useful. The converse also holds: If kinetic and thermodynamic data are available, hints regarding the role of the protein and its binding selectivity may be obtained. Thus, the utility of the concept of the funnel carries over to the origin of the protein and to its function.
Article
Empirical residue–residue pair potentials are used to screen possible complexes for protein–protein dockings. A correct docking is defined as a complex with not more than 2.5 Å root-mean-square distance from the known experimental structure. The complexes were generated by “ftdock” (Gabb et al. J Mol Biol 1997;272:106–120) that ranks using shape complementarity. The complexes studied were 5 enzyme-inhibitors and 2 antibody-antigens, starting from the unbound crystallographic coordinates, with a further 2 antibody-antigens where the antibody was from the bound crystallographic complex. The pair potential functions tested were derived both from observed intramolecular pairings in a database of nonhomologous protein domains, and from observed intermolecular pairings across the interfaces in sets of nonhomologous heterodimers and homodimers. Out of various alternate strategies, we found the optimal method used a mole-fraction calculated random model from the intramolecular pairings. For all the systems, a correct docking was placed within the top 12% of the pair potential score ranked complexes. A combined strategy was developed that incorporated “multidock,” a side-chain refinement algorithm (Jackson et al. J Mol Biol 1998;276:265–285). This placed a correct docking within the top 5 complexes for enzyme-inhibitor systems, and within the top 40 complexes for antibody–antigen systems. Proteins 1999;35:364–373. © 1999 Wiley-Liss, Inc.
Article
While a number of approaches have been geared toward multiple sequence alignments, to date there have been very few approaches to multiple structure alignment and detection of a recurring substructural motif. Among these, none performs both multiple structure comparison and motif detection simultaneously. Further, none considers all structures at the same time, rather than initiating from pairwise molecular comparisons. We present such a multiple structural alignment algorithm. Given an ensemble of protein structures, the algorithm automatically finds the largest common substructure (core) of Cα atoms that appears in all the molecules in the ensemble. The detection of the core and the structural alignment are done simultaneously. Additional structural alignments also are obtained and are ranked by the sizes of the substructural motifs, which are present in the entire ensemble. The method is based on the geometric hashing paradigm. As in our previous structural comparison algorithms, it compares the structures in an amino acid sequence order-independent way, and hence the resulting alignment is unaffected by insertions, deletions and protein chain directionality. As such, it can be applied to protein surfaces, protein–protein interfaces and protein cores to find the optimally, and suboptimally spatially recurring substructural motifs. There is no predefinition of the motif. We describe the algorithm, demonstrating its efficiency. In particular, we present a range of results for several protein ensembles, with different folds and belonging to the same, or to different, families. Since the algorithm treats molecules as collections of points in three-dimensional space, it can also be applied to other molecules, such as RNA, or drugs. Proteins 2001;43:235–245. © 2001 Wiley-Liss, Inc.
Article
A computer algorithm is presented for calculating the part of the van der Waals surface of molecule that is accessible to solvent. The solvent molecule is modeled by a sphere. This sphere is, in effect, rolled over the molecule to generate a smooth outer-surface contour. This surface contour is made up of pieces of spheres and tori that join at circular arcs. The spheres, tori and arcs are defined by analytical expressions in terms of the atomic coordinates, van der Waals radii and the probe radius. The area of each surface piece may be calculated analytically and the surface may be displayed on either vector or raster computer-graphics systems. These methods are useful for studying the structure and interactions of proteins and nucleic acids.
Article
Recent studies increasingly point to the importance of structural flexibility and plasticity in proteins, highlighting the evolutionary advantage. There are an increasing number of cases in which given, presumably specific, binding sites have been shown to bind a range of ligands with different compositions and shapes. These studies have also revealed that evolution tends to find convergent solutions for stable intermolecular associations, largely via conservation of polar residues as hot spots of binding energy. On the other hand, the ability to bind multiple ligands at a given site is largely derived from hinge-based motions. The consideration of these two factors in functional epitopes allows more realism and robustness in the description of protein binding surfaces and, as such, in applications to mutants, modeled structures and design. Efficient multiple structure comparison and hinge-bending structure comparison tools enable the construction of combinatorial binding epitope libraries.
Article
A new, fast, and easy-to-implement method, van der Waals–fast Fourier transform (vdW-FFT), for locating possible binding sites on the surface of a protein was developed and tested on a set of 15 different enzyme–ligand complexes. The method scans the whole protein surface and possible ligand orientations in order to find the best geometrical match, which corresponds to the minimum of the modified vdW energy. Two different grids, fine and coarse, and two sets of MM parameters, from the OPLS and Amber-94 force fields, were used. The method has been shown to work accurately on the fine grid. On the coarse grid, the vdW-FFT method failed only on two complexes. The C program implementing the method and test set of proteins is available free on our web site: http://biocomp.anu.edu.au/~aab20: 983–988, 1999
Article
Several global optimization algorithms were applied to the problem of molecular docking: random walk and Metropolis Monte Carlo Simulated Annealing as references, and Stochastic Approximation with Smoothing (SAS), and Terminal Repeller Unconstrained Subenergy Tunneling (TRUST) as new methodologies. Of particular interest is whether any of these algorithms could be used to dock a database of typical small molecules in a reasonable amount of time. To address this question, each algorithm was used to dock four small molecules presenting a wide range of sizes, degrees of flexibility, and types of interactions. Of the algorithms tested, only stochastic approximation with smoothing appeared to be sufficiently fast and reliable to be useful for database searches. This algorithm can reliably dock relatively small and fairly rigid molecules in a few seconds, and larger and more flexible molecules in a few minutes. The remaining algorithms tested were able to reliably dock the small and fairly rigid molecules, but showed little or no reliability when docking large or flexible molecules. In addition, to decrease the error in the typical grid-based energy evaluations a new form of interpolation, logarithmic interpolation, is proposed. This interpolation scheme is shown to both quantitatively reduce the numerical error and practically to improve the docking results. © 1999 John Wiley & Sons, Inc. J Comput Chem 20: 1740–1751, 1999
Article
A method for simulating the Brownian dynamics of N particles with the inclusion of hydrodynamic interactions is described. The particles may also be subject to the usual interparticle or external forces (e.g., electrostatic) which have been included in previous methods for simulating Brownian dynamics of particles in the absence of hydrodynamic interactions. The present method is derived from the Langevin equations for the N particle assembly, and the results are shown to be consistent with the corresponding Fokker--Planck results. Sample calculations on small systems illustrate the importance of including hydrodynamic interactions in Brownian dynamics simulations. The method should be useful for simulation studies of diffusion limited reactions, polymer dynamics, protein folding, particle coagulation, and other phenomena in solution.
Article
A computationally tractable strategy has been developed to refine protein-protein interfaces that models the effects of side-chain conformational change, solvation and limited rigid-body movement of the subunits. The proteins are described at the atomic level by a multiple copy representation of side-chains modelled according to a rotamer library on a fixed peptide backbone. The surrounding solvent environment is described by "soft" sphere Langevin dipoles for water that interact with the protein via electrostatic, van der Waals and field-dependent hydrophobic terms. Energy refinement is based on a two-step process in which (1) a probability-based conformational matrix of the protein side-chains is refined iteratively by a mean field method. A side-chain interacts with the protein backbone and the probability-weighted average of the surrounding protein side-chains and solvent molecules. The resultant protein conformations then undergo (2) rigid-body energy minimization to relax the protein interface. Steps (1) and (2) are repeated until convergence of the interaction energy. The influence of refinement on side-chain conformation starting from unbound conformations found improvement in the RMSD of side-chains in the interface of protease-inhibitor complexes, and shows that the method leads to an improvement in interface geometry. In terms of discriminating between docked structures, the refinement was applied to two classes of protein-protein complex: five protease-protein inhibitor and four antibody-antigen complexes. A large number of putative docked complexes have already been generated for the test systems using our rigid-body docking program, FTDOCK. They include geometries that closely resemble the crystal complex, and therefore act as a test for the refinement procedure. In the protease-inhibitors, geometries that resemble the crystal complex are ranked in the top four solutions for four out of five systems when solvation is included in the energy function, against a background of between 26 and 364 complexes in the data set. The results for the antibody-antigen complexes are not as encouraging, with only two of the four systems showing discrimination. It would appear that these results reflect the somewhat different binding mechanism dominant in the two types of protein-protein complex. Binding in the protease-inhibitors appears to be "lock and key" in nature. The fixed backbone and mobile side-chain representation provide a good model for binding. Movements in the backbone geometry of antigens on binding represent an "induced-fit" and provides more of a challenge for the model. Given the limitations of the conformational sampling, the ability of the energy function to discriminate between native and non-native states is encouraging. Development of the approach to include greater conformational sampling could lead to a more general solution to the protein docking problem.
Article
The prediction of protein-protein interactions in solution is a major goal of theoretical structural biology. Here, we implement a continuum description of the thermodynamic processes involved. The model differs considerably from previous models in its use of "molecular surface" area to describe the hydrophobic component to the free energy of conformational change in solution. We have applied this model to a data set of alternative docked conformations of protein-protein complexes which were generated independently of this work. It was found previously that commonly used energy evaluation techniques fail to distinguish between near-native and certain non-native complexes in this data set. Here, we found that an energy function that takes into account (1) total electrostatic free energy, (2) hydrophobic free energy and (3) loss in side-chain conformational energy was able to reliably discriminate between near-native and non-native configurations but only when molecular surface is used as a descriptor of the hydrophobic effect. It is shown that the molecular surface and the more conventional surface descriptor "solvent accessible surface" give very different quantitative measures of hydrophobicity. In terms of the contribution of different energy components to the free energy of complex formation it was found that loss in side-chain conformational entropy is a second order effect. Electrostatic interaction energy (which is commonly used to score docked conformations) was a poor indicator of complementarity when starting from unbound conformations. It was found that electrostatic desolvation energy and the hydrophobic contribution (based on a molecular surface area descriptor) are much less sensitive to local fluctuations in atomic structure than point-to-point interaction energies and thus may be more suited for use as a scoring function when docking unbound conformations, where atomic complementarity is much less apparent. Whilst a combined energy function was able to distinguish near-native from non-native conformations in the six systems studied here, it remains to be determined to what extent more sizeable conformational changes would influence the results.
Article
A typical problem for a docking procedure is how to match two molecules with known 3-D structure so as to predict the configuration of their complex. A very serious obstacle to docking is an inherent inaccuracy in the 3-D structures of the molecules. In general, existing molecular recognition techniques are not designed for cases where (i) conformational changes upon macromolecular complex formation are substantial or (ii) the X-ray data on one or both (macro) molecules are not available, and the structures, based on alternative sources (NMR, modeling), are not well defined. We designed a direct computer experiment using molecules totally deprived of any structural features smaller than 7 Å. This was performed on the basis of a previously developed docking algorithm. The modified procedure was applied to a number of known protein complexes taken from the Brookhaven Protein Data Bank. In most cases, a pronounced trend towards the correct structure of the molecular complex was clearly indicated and the real binding sites were predicted. The distinction between the prediction of the antigen-antibody complex and other molecular pairs may reflect important differences in the principles of complex formation. The results strongly suggest the use of our recognition procedure for docking studies where the detailed structures of the molecules are lacking.
Article
We have developed a geometry-based suite of processes for molecular docking. The suite consists of a molecular surface representation, a docking algorithm, and a surface inter-penetration and contact filter. The surface representation is composed of a sparse set of critical points (with their associated normals) positioned at the face centers of the molecular surface, providing a concise yet representative set. The docking algorithm is based on the Geometric Hashing technique, which indexes the critical points with their normals in a transformation invariant fashion preserving the multi-element geometric constraints. The inter-penetration and surface contact filter features a three-layer scoring system, through which docked models with high contact area and low clashes are funneled. This suite of processes enables a pipelined operation of molecular docking with high efficacy. Accurate and fast docking has been achieved with a rich collection of complexes and unbound molecules, including protein-protein and protein-small molecule associations. An energy evaluation routine assesses the intermolecular interactions of the funneled models obtained from the docking of the bound molecules by pairwise van der Waals and Coulombic potentials. Applications of this routine demonstrate the goodness of the high scoring, geometrically docked conformations of the bound crystal complexes.
Article
Two tasks must be accomplished when calculating the binding modalities and binding energies of two molecules in solution: the calculation of the interaction energy and the calculation of the effects of solvation. It is the competition between the energy of binding and the energy of remaining solvation which determines the binding properties. It is necessary to calculate (or at least approximate in some manner) the partition function in order to make a theoretical estimate of these effects. An efficient algorithm for performing the energy evaluations necessary for this calculation is presented in this paper. The fast Fourier transform (FFT) is used in combination with a polar factorization of the potentials to calculate the interaction energy at all relative translations between two molecules of fixed orientation. Thermodynamic quantities, including the partition function, internal and free energies can then be estimated from a set of these calculations covering the orientation space.
Article
The rapid association of barnase and its intracellular inhibitor barstar has been analysed from the effects of mutagenesis and electrostatic screening. A basal association rate constant of 10(5) M(-1) s(-1) is increased to over 5 x 10(9) M(-1) s(-1) by electrostatic forces. The association between the oppositely charged proteins proceeds through the rate-determining formation of an early, weakly specific complex, which is dominated by long-range electrostatic interactions, followed by precise docking to form the high affinity complex. This mode of binding is likely to be used widely in nature to increase association rate constants between molecules and its principles may be used for protein design.
Article
This chapter recounts efforts to dissect the cellular and circuit basis of a memory system in the primate cortex with the goal of extending the insights gained from the study of normal brain organization in animal models to an understanding of human cognition and related memory disorders. Primates and humans have developed an extraordinary capacity to process information "on line," a capacity that is widely considered to underlay comprehension, thinking, and so-called executive functions. Understanding the interactions between the major cellular constituents of cortical circuits-pyramidal and nonpyramidal cells-is considered a necessary step in unraveling the cellular mechanisms subserving working memory mechanisms and, ultimately, cognitive processes. Evidence from a variety of sources is accumulating to indicate that dopamine has a major role in regulating the excitability of the cortical circuitry upon which the working memory function of prefrontal cortex depends. Here, I describe several direct and indirect intercellular mechanisms for modulating working memory function in prefrontal cortex based on the localization of dopamine receptors on the distal dendrites and spines of pyramidal cells and on interneurons in the prefrontal cortex. Interactions between monoamines and a compromised cortical circuitry may hold the key to understanding the variety of memory disorders associated with aging and disease.
Article
Prediction of small molecule binding modes to macromolecules of known three-dimensional structure is a problem of paramount importance in rational drug design (the "docking" problem). We report the development and validation of the program GOLD (Genetic Optimisation for Ligand Docking). GOLD is an automated ligand docking program that uses a genetic algorithm to explore the full range of ligand conformational flexibility with partial flexibility of the protein, and satisfies the fundamental requirement that the ligand must displace loosely bound water on binding. Numerous enhancements and modifications have been applied to the original technique resulting in a substantial increase in the reliability and the applicability of the algorithm. The advanced algorithm has been tested on a dataset of 100 complexes extracted from the Brookhaven Protein DataBank. When used to dock the ligand back into the binding site, GOLD achieved a 71% success rate in identifying the experimental binding mode.
Article
We estimated effective atomic contact energies (ACE), the desolvation free energies required to transfer atoms from water to a protein's interior, using an adaptation of a method introduced by S. Miyazawa and R. L. Jernigan. The energies were obtained for 18 different atom types, which were resolved on the basis of the way their properties cluster in the 20 common amino acids. In addition to providing information on atoms at the highest resolution compatible with the amount and quality of data currently available, the method itself has several new features, including its reference state, the random crystal structure, which removes compositional bias, and a scaling factor that makes contact energies quantitatively comparable with experimentally measured energies. The high level of resolution, the explicit accounting of the local properties of protein interiors during determination of the energies, and the very high computational efficiency with which they can be assigned during any computation, should make the results presented here widely applicable. First we used ACE to calculate the free energies of transferring side-chains from protein interior into water. A comparison of the results thus obtained with the measured free energies of transferring side-chains from n-octanol to water, indicates that the magnitude of protein to water transfer free energies for hydrophobic side-chains is larger than that of n-octanol to water transfer free energies. The difference is consistent with observations made by D. Shortle and co-workers, who measured differential free energies of protein unfolding for site-specific mutants in which Ala or Gly was substituted for various hydrophobic side-chains. A direct comparison (calculated versus observed free energy differences) with those experiments finds slopes of 1.15 and 1.13 for Gly and Ala substitutions, respectively. Finally we compared calculated and observed binding free energies of nine protease-inhibitor complexes. This requires a full free energy function, which is created by adding direct electrostatic interactions and an appropriate entropic component to the solvation free energy term. The calculated free energies are typically within 10% of the observed values. Taken collectively, these results suggest that ACE should provide a reasonably accurate and rapidly evaluatable solvation component of free energy, and should thus make accessible a range of docking, design and protein folding calculations that would otherwise be difficult to perform.
Article
We examine a simple kinetic model for association that incorporates the basic features of protein-protein recognition within the rigid body approximation, that is, when no large conformation change occurs. Association starts with random collision at the rate k(coll) predicted by the Einstein-Smoluchowski equation. This creates an encounter pair that can evolve into a stable complex if and only if the two molecules are correctly oriented and positioned, which has a probability p(r). In the absence of long-range interactions, the bimolecular rate of association is p(r) k(coll). Long-range electrostatic interactions affect both k(coll) and p(r). The collision rate is multiplied by q(t), a factor larger than 1 when the molecules carry net charges of opposite sign as coulombic attraction makes collisions more frequent, and less than 1 in the opposite case. The probability p(r) is multiplied by a factor q(r) that represents the steering effect of electric dipoles, which preorient the molecules before they collide. The model is applied to experimental data obtained by Schreiber and Fersht (Nat. Struct. Biol. 3:427-431, 1996) on the kinetics of barnase-barstar association. When long-range electrostatic interactions are fully screened or mutated away, q(t)q(r) approximately 1, and the observed rate of productive collision is p(r) k(coll) approximately 10(5) M(-1) x s(-1). Under these conditions, p(r) approximately 1.5 x 10(-5) is determined by geometric constraints corresponding to a loss of rotational freedom. Its value is compatible with computer docking simulations and implies a rotational entropy loss deltaS(rot) approximately 22 e.u. in the transition state. At low ionic strength, long-range electrostatic interactions accelerate barnase-barstar association by a factor q(t)q(r) of up to 10(5) as favorable charge-charge and charge-dipole interactions work together to make it much faster than free diffusion would allow.
Article
A protein docking study was performed for two classes of biomolecular complexes: six enzyme/inhibitor and four antibody/antigen. Biomolecular complexes for which crystal structures of both the complexed and uncomplexed proteins are available were used for eight of the ten test systems. Our docking experiments consist of a global search of translational and rotational space followed by refinement of the best predictions. Potential complexes are scored on the basis of shape complementarity and favourable electrostatic interactions using Fourier correlation theory. Since proteins undergo conformational changes upon binding, the scoring function must be sufficiently soft to dock unbound structures successfully. Some degree of surface overlap is tolerated to account for side-chain flexibility. Similarly for electrostatics, the interaction of the dispersed point charges of one protein with the Coulombic field of the other is measured rather than precise atomic interactions. We tested our docking protocol using the native rather than the complexed forms of the proteins to address the more scientifically interesting problem of predictive docking. In all but one of our test cases, correctly docked geometries (interface Calpha RMS deviation </=2 A from the experimental structure) are found during a global search of translational and rotational space in a list that was always less than 250 complexes and often less than 30. Varying degrees of biochemical information are still necessary to remove most of the incorrectly docked complexes.
Article
The non-covalent assembly of proteins that fold separately is central to many biological processes, and differs from the permanent macromolecular assembly of protein subunits in oligomeric proteins. We performed an analysis of the atomic structure of the recognition sites seen in 75 protein-protein complexes of known three-dimensional structure: 24 protease-inhibitor, 19 antibody-antigen and 32 other complexes, including nine enzyme-inhibitor and 11 that are involved in signal transduction.The size of the recognition site is related to the conformational changes that occur upon association. Of the 75 complexes, 52 have "standard-size" interfaces in which the total area buried by the components in the recognition site is 1600 (+/-400) A2. In these complexes, association involves only small changes of conformation. Twenty complexes have "large" interfaces burying 2000 to 4660 A2, and large conformational changes are seen to occur in those cases where we can compare the structure of complexed and free components. The average interface has approximately the same non-polar character as the protein surface as a whole, and carries somewhat fewer charged groups. However, some interfaces are significantly more polar and others more non-polar than the average. Of the atoms that lose accessibility upon association, half make contacts across the interface and one-third become fully inaccessible to the solvent. In the latter case, the Voronoi volume was calculated and compared with that of atoms buried inside proteins. The ratio of the two volumes was 1.01 (+/-0.03) in all but 11 complexes, which shows that atoms buried at protein-protein interfaces are close-packed like the protein interior. This conclusion could be extended to the majority of interface atoms by including solvent positions determined in high-resolution X-ray structures in the calculation of Voronoi volumes. Thus, water molecules contribute to the close-packing of atoms that insure complementarity between the two protein surfaces, as well as providing polar interactions between the two proteins.
Article
We present a rapidly executable minimal binding energy model for molecular docking and use it to explore the energy landscape in the vicinity of the binding sites of four different enzyme inhibitor complexes. The structures of the complexes are calculated starting with the crystal structures of the free monomers, using DOCK 4.0 to generate a large number of potential configurations, and screening with the binding energy target function. In order to investigate possible correlations between energy and variation from the native structure, we introduce a new measure of similarity, which removes many of the difficulties associated with root mean square deviation. The analysis uncovers energy gradients, or funnels, near the binding site, with decreasing energy as the degree of similarity between the native and docked structures increases. Such energy funnels can increase the number of random collisions that may evolve into productive stable complex, and indicate that short-range interactions in the precomplexes can contribute to the association rate. The finding could provide an explanation for the relatively rapid association rates that are observed even in the absence of long-range electrostatic steering.
Article
In this work, we present an algorithm developed to handle biomolecular structural recognition problems, as part of an interdisciplinary research endeavor of the Computer Vision and Molecular Biology fields. A key problem in rational drug design and in biomolecular structural recognition is the generation of binding modes between two molecules, also known as molecular docking. Geometrical fitness is a necessary condition for molecular interaction. Hence, docking a ligand (e.g., a drug molecule or a protein molecule), to a protein receptor (e.g., enzyme), involves recognition of molecular surfaces. Conformational transitions by "hinge-bending" involves rotational movements of relatively rigid parts with respect to each other. The generation of docked binding modes between two associating molecules depends on their three dimensional structures (3-D) and their conformational flexibility. In comparison to the particular case of rigid-body docking, the computational difficulty grows considerably when taking into account the additional degrees of freedom intrinsic to the flexible molecular docking problem. Previous docking techniques have enabled hinge movements only within small ligands. Partial flexibility in the receptor molecule is enabled by a few techniques. Hinge-bending motions of protein receptors domains are not addressed by these methods, although these types of transitions are significant, e.g., in enzymes activity. Our approach allows hinge induced motions to exist in either the receptor or the ligand molecules of diverse sizes. We allow domains/subdomains/group of atoms movements in either of the associating molecules. We achieve this by adapting a technique developed in Computer Vision and Robotics for the efficient recognition of partially occluded articulated objects. These types of objects consist of rigid parts which are connected by rotary joints (hinges). Our method is based on an extension and generalization of the Hough transform and the Geometric Hashing paradigms for rigid object recognition. We show experimental results obtained by the successful application of the algorithm to cases of bound and unbound molecular complexes, yielding fast matching times. While the "correct" molecular conformations of the known complexes are obtained with small RMS distances, additional, predictive good-fitting binding modes are generated as well. We conclude by discussing the algorithm's implications and extensions, as well as its application to investigations of protein structures in Molecular Biology and recognition problems in Computer Vision.