ArticleLiterature Review

High-resolution protein-protein docking by global optimization: Recent advances and future challenges

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

A computational protein-protein docking method that predicts atomic details of protein-protein interactions from protein monomer structures is an invaluable tool for understanding the molecular mechanisms of protein interactions and for designing molecules that control such interactions. Compared to low-resolution docking, high-resolution docking explores the conformational space in atomic resolution to provide predictions with atomic details. This allows for applications to more challenging docking problems that involve conformational changes induced by binding. Recently, high-resolution methods have become more promising as additional information such as global shapes or residue contacts are now available from experiments or sequence/structure data. In this review article, we highlight developments in high-resolution docking made during the last decade, specifically regarding global optimization methods employed by the docking methods. We also discuss two major challenges in high-resolution docking: prediction of backbone flexibility and water-mediated interactions. Copyright © 2015 Elsevier Ltd. All rights reserved.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The transmembrane region of H3R is often the site of ligand and drug interaction. Several H3R antagonists/inverse agonists appear to be promising drug candidates [15][16][17] . Threedimensional (3D) atomistic models of antagonist-receptor complexes have been used to investigate the details of ligand and drug interactions with H3R and have been successful in providing important insights regarding their binding; additionally, several groups have reported the features of the general H3R pharmacophore. ...
... We used the Ballesteros-Weinstein generic numbering scheme for the amino acid residues of class A GPCRs 16,72 . The numbers attributed correspond to frame 0 of the trajectory. ...
Article
Full-text available
In this work, we studied the mechanisms of classical activation and inactivation of signal transduction by the histamine H3 receptor, a 7-helix transmembrane bundle G-Protein Coupled Receptor through long-time-scale atomistic molecular dynamics simulations of the receptor embedded in a hydrated double layer of dipalmitoyl phosphatidyl choline, a zwitterionic polysaturated ordered lipid. Three systems were prepared: the apo receptor, representing the constitutively active receptor; and two holo-receptors-the receptor coupled to the antagonist/inverse agonist ciproxifan, representing the inactive state of the receptor, and the receptor coupled to the endogenous agonist histamine and representing the active state of the receptor. An extensive analysis of the simulation showed that the three states of H3R present significant structural and dynamical differences as well as a complex behavior given that the measured properties interact in multiple and interdependent ways. In addition, the simulations described an unexpected escape of histamine from the orthosteric binding site, in agreement with the experimental modest affinities and rapid off-rates of agonists.
... Therefore, it is clear the importance of accounting for flexibility during the docking process. However, its implementation involves a high complexity due mainly to the need of atomistic representation and the increment of degrees of freedom (Lexa and Carlson, 2012;Park et al., 2015). ...
... Some algorithms aim to improve the side-chain conformations by moving them "on-the-fly" using rotamer libraries or randomly. However, the side-chain flexibility is not enough if global conformational changes include backbone rearrangements (Lexa and Carlson, 2012;Park et al., 2015). ...
Thesis
Determination of tri-dimensional (3D) structures of protein complexes is crucial to increase research advances on biological processes that help, for instance, to understand the development of diseases and their possible prevention or treatment. The difficulties and high costs of experimental methods to determine protein 3D structures and the importance of protein complexes for research have encouraged the use of computer science for developing tools to help filling this gap, such as protein docking algorithms. The protein docking problem has been studied for over 40 years. However, developing accurate and efficient protein docking algorithms remains a challenging problem due to the size of the search space, the approximate nature of the scoring functions used, and often the inherent flexibility of the protein structures to be docked. This thesis presents an algorithm to rigidly dock proteins using a series of exhaustive 3D branch-and-bound rotational searches in which non-clashing orientations are scored using ATTRACT. The rotational space is represented as a quaternion “π-ball”, which is systematically sub-divided in a “branch-and-bound” manner, allowing efficient pruning of rotations that will give steric clashes. The contribution of this thesis can be described in three main parts as follows. 1) The algorithm called EROS-DOCK to assemble two proteins. It was tested on 173 Docking Benchmark complexes. According to the CAPRI quality criteria, EROS-DOCK typically gives more acceptable or medium quality solutions than ATTRACT and ZDOCK. 2)The extension of the EROS-DOCK algorithm to allow the use of atom-atom or residue-residue distance restraints. The results show that using even just one residue-residue restraint in each interaction interface is sufficient to increase the number of cases with acceptable solutions within the top-10 from 51 to 121 out of 173 pairwise docking cases. Hence, EROS-DOCK offers a new improved search strategy to incorporate experimental data, of which a proof-of-principle using data-driven computational restraints is demonstrated in this thesis, and this might be especially important for multi-body complexes. 3)The extension of the algorithm to dock trimeric complexes. Here, the proposed method is based on the premise that all of the interfaces in a multi-body docking solution should be similar to at least one interface in each of the lists of pairwise docking solutions. The algorithm was tested on a home-made benchmark of 11 three-body cases. Seven complexes obtained at least one acceptable quality solution in the top-50. In future, the EROS-DOCK algorithm can evolve by integrating improved scoring functions and other types of restraints. Moreover, it can be used as a component in elaborate workflows to efficiently solve complex problems of multi-protein assemblies.
... [8][9][10][11][12][13][14][15][16][17] Validation of all the existing global docking programs has been performed on certain test sets, yielding good results as a direct docking tool for binding structure prediction and/or as an initial docking program for postdocking approaches. 4,6,18,19 Due to the complexity of these procedures, many docking software programs are still being developed in order to extend the application field and reach the highest accuracy; [19][20][21][22][23][24][25] in fact, the large number of available docking programs shows many differences in the results on the basis of different benchmarks making it hard for a nonspecialist to choose an appropriate docking protocol for a given purpose. In addition, current docking programs vary significantly in their search strategies, scoring functions, and/or molecular representation of the proteins. ...
... [8][9][10][11][12][13][14][15][16][17] Validation of all the existing global docking programs has been performed on certain test sets, yielding good results as a direct docking tool for binding structure prediction and/or as an initial docking program for postdocking approaches. 4,6,18,19 Due to the complexity of these procedures, many docking software programs are still being developed in order to extend the application field and reach the highest accuracy; [19][20][21][22][23][24][25] in fact, the large number of available docking programs shows many differences in the results on the basis of different benchmarks making it hard for a nonspecialist to choose an appropriate docking protocol for a given purpose. In addition, current docking programs vary significantly in their search strategies, scoring functions, and/or molecular representation of the proteins. ...
Article
Glyoxalase II (GlxII) is an antioxidant glutathione-dependent enzyme, which catalyzes the hydrolysis of S-D-lactoylglutathione to form D-lactic acid and glutathione (GSH). The last product is the most important thiol reducing agent present in all eukaryotic cells that have mitochondria and chloroplasts. It is generally known that GSH plays a crucial role on the cellular redox state but also on various cellular processes. One of them is protein S-glutathionylation, a process that can occur through an oxidation reaction of proteins thiol groups by GSH. Changes in protein S-glutathionylation have been associated with a range of human diseases such as diabetes, cardiovascular and pulmonary diseases, neurodegenerative diseases and cancer. Within a major project aimed to elucidate the role of GlxII in the mechanism of S-glutathionylation, a reliable computational protocol consisting in a protein-protein docking approach followed by atomistic Molecular Dynamics (MD) simulations was settled out and it was applied to the prediction of molecular associations between human GlxII (in presence and in absence of GSH), with some proteins that are known to be S-glutathionylated in vitro, as actin, malate dehydrogenase (MDH) and glyceraldehyde-3-phosphate dehydrogenase (GAPDH). The computational results show a high propensity of GlxII to interact with actin and MDH through its active site and a high stability of the GlxII-protein systems when GSH is present. Moreover, close proximities of GSH with actin and MDH cysteine residues have been found, suggesting that GlxII could be able to perform protein S-glutathionylation by using the GSH molecule present in its catalytic site.
... The above example showed the importance of interfacial hydration networks of large proteinpeptide complexes for forming interactions between the partners in agreement with previous studies [41,46]. This problem of correct interface hydration is vital as handling explicit water molecules is still an intractable challenge for machine learning technologies [128] and docking methods [129,130]. ...
Preprint
Full-text available
Histones are keys to many epigenetic events and their complexes have therapeutic and diagnostic importance. The determination of the structures of histone complexes is fundamental in the design of new drugs. Computational molecular docking is widely used for the prediction of target-ligand complexes. Large, linear peptides like the tail regions of histones are challenging ligands for docking due to their large conformational flexibility, extensive hydration, and weak interactions with the shallow binding pockets of their reader proteins. Thus, fast docking methods often fail to produce complex structures of such peptide ligands at a level appropriate for drug design. To answer this challenge, and improve the structural quality of the docked complexes, post-docking refinement has been applied using various molecular dynamics (MD) approaches. However, a final consensus has not been reached on the desired MD refinement protocol. In the present study, MD refinement strategies were systematically explored on a set of problematic complexes of histone peptide ligands with relatively large errors in their docked geometries. Six protocols were compared that differ in their MD simulation parameters. In all cases, pre-MD hydration of the complex interface regions was applied to avoid the unwanted presence of empty cavities. The best-performing protocol achieved a median of 32 % improvement over the docked structures in terms of the change of root mean squared deviations from the experimental references. The influence of structural factors and explicit hydration on the performance of post-docking MD refinements was also discussed to help their implementation in future methods and applications.
... Purine is a heterocyclic nucleus present in various antimetabolites and modified at positions 2,4 and 9 in the creation of protein kinase inhibitors. A quinazoline scaffold with a phenylamino pyrimidine moiety is one of the structural characteristics 6 . Pyrazolo-pyrimidine is a bioisostere of purine 7 was given much consideration while creating anticancer scaffolds, as evidenced by PKI-1668 and erlotinib 9 . ...
Article
Full-text available
The Pyrimidine system has received great attention and a vital component of genetic material emerged has fundamental source to fight against cancer. The pyrazolo(1,5-a) pyrimidines (5a-5j) were designed based on the structural features of antitumor antimetabolites, synthesized and chemical structures were confirmed using spectroscopic methods such as IR, 1H NMR, 13C NMR, Mass Spectral and elemental analysis. The cytotoxic activity was evaluated by DPPH free radical scavenging assay against standard ascorbic acid and MTT assay against MCF-7, HepG-2, and imatinib as standard. The DPPH assay indicated 5b, 5c, 5e, 5h and 5j were efficient antioxidants, while the MTT assay discloses potent cytotoxicity of 5b, 5d against MCF-7 with 16.61, 19.67µg/ml and 5c, 5h against HepG-2 with 14.32 and 19.24µg/ml compared to 5-FU. The ligands 5c and 5h demonstrated promising towards tyrosine kinase and cyclin dependent kinase 2, respectively and the bonding energy is similar as doxorubicin. Concluding that the compounds had reasonable cytotoxic potential and good association observed between in vitro and in silico studies.
... Docking involves two key steps: the sampling of the interaction space between the protein molecules to generate docked models and the scoring of the docked conformations to distinguish near-native conformations from the sampled conformations. There has been much recent progress on both sampling as well as scoring [18,19]. ...
Article
Full-text available
Protein–protein interactions play a ubiquitous role in biological function. Knowledge of the three-dimensional (3D) structures of the complexes they form is essential for understanding the structural basis of those interactions and how they orchestrate key cellular processes. Computational docking has become an indispensable alternative to the expensive and time-consuming experimental approaches for determining the 3D structures of protein complexes. Despite recent progress, identifying near-native models from a large set of conformations sampled by docking—the so-called scoring problem—still has considerable room for improvement. We present MetaScore, a new machine-learning-based approach to improve the scoring of docked conformations. MetaScore utilizes a random forest (RF) classifier trained to distinguish near-native from non-native conformations using their protein–protein interfacial features. The features include physicochemical properties, energy terms, interaction-propensity-based features, geometric properties, interface topology features, evolutionary conservation, and also scores produced by traditional scoring functions (SFs). MetaScore scores docked conformations by simply averaging the score produced by the RF classifier with that produced by any traditional SF. We demonstrate that (i) MetaScore consistently outperforms each of the nine traditional SFs included in this work in terms of success rate and hit rate evaluated over conformations ranked among the top 10; (ii) an ensemble method, MetaScore-Ensemble, that combines 10 variants of MetaScore obtained by combining the RF score with each of the traditional SFs outperforms each of the MetaScore variants. We conclude that the performance of traditional SFs can be improved upon by using machine learning to judiciously leverage protein–protein interfacial features and by using ensemble methods to combine multiple scoring functions.
... Docking involves two key steps: sampling of the interaction space between the protein molecules to generate docked models; and scoring of the docked conformations to distinguish near-native conformations from the sampled conformations. There has been much recent progress on both sampling as well as scoring [18,19]. ...
Preprint
Full-text available
Protein-protein interactions play a ubiquitous role in biological function. Knowledge of the three-dimensional (3D) structures of the complexes they form is essential for understanding the structural basis of those interactions and how they orchestrate key cellular processes. Computational docking has become an indispensable alternative to the expensive and timeconsuming experimental approaches for determining 3D structures of protein complexes. Despite recent progress, identifying near-native models from a large set of conformations sampled by docking - the so-called scoring problem - still has considerable room for improvement. We present here MetaScore, a new machine-learning based approach to improve the scoring of docked conformations. MetaScore utilizes a random forest (RF) classifier trained to distinguish near-native from non-native conformations using a rich set of features extracted from the respective protein-protein interfaces. These include physico-chemical properties, energy terms, interaction propensity-based features, geometric properties, interface topology features, evolutionary conservation and also scores produced by traditional scoring functions (SFs). MetaScore scores docked conformations by simply averaging of the score produced by the RF classifier with that produced by any traditional SF. We demonstrate that (i) MetaScore consistently outperforms each of nine traditional SFs included in this work in terms of success rate and hit rate evaluated over the top 10 predicted conformations; (ii) An ensemble method, MetaScore-Ensemble, that combines 10 variants of MetaScore obtained by combining the RF score with each of the traditional SFs outperforms each of the MetaScore variants. We conclude that the performance of traditional SFs can be improved upon by judiciously leveraging machine-learning.
... For docking of proteins, detailed information regarding the rotation and translation position of the binding molecules is required. This information can be collected by other CADD techniques [69]. ...
Chapter
Full-text available
Computational approaches efficiently design the drugs to prevent diseases for which no drug is available. These techniques are also used for the development of new drugs. It involves using a variety of computer software for drug modeling and simulation, hence usually known as computer-aided drug designing (CADD). The computational tools provide crucial drug designing in a short period. These techniques are time and cost-effective as compared to conventional drug development methods. Computational methods can effectively model the suitable drug candidate by optimizing ligand-target interactions and observing the deep insight of cellular processes by its powerful tools. Several studies have applied these modern computational techniques to find out the possible therapy against the pandemic disease of COVID-19. The critical proteins of COVID-19, including 3C-like protease, papain-like protease, and RNA polymerase, are targeted to model the effective drug. CADD approaches suggest anti-viral drugs, anti-coagulant, anti-HIV drugs, and anti-fungal drugs to have little effect against COVID-19. This chapter aims to overview the different CADD approaches to design the possible drug for the treatment of COVID-19.
... 1,2 In parallel, template-based modeling and diverse docking strategies have been developed and applied to produce reliable protein complex models. [3][4][5][6][7] CAPRI (Critical Assessment of PRedicted Interactions) has been gauging the accuracy of these computational approaches since 2002. 8,9 In the meantime, CASP has seen substantial developments in the field of protein structure prediction. ...
Article
In CASP14, 39 research groups submitted more than 2,500 3D models on 22 protein complexes. In general, the community performed well in predicting the fold of the assemblies (for 80% of the targets), though it faced significant challenges in reproducing the native contacts. This is especially the case for the complexes without whole‐assembly templates. The leading predictor, BAKER‐experimental, used a methodology combining classical techniques (template‐based modeling protein docking) with deep learning‐based contact predictions and a fold‐and‐dock approach. The Venclovas team achieved the runner‐up position with template‐based modeling and docking. By analyzing the target interfaces, we showed that the complexes with depleted charged contacts or dominating hydrophobic interactions were the most challenging ones to predict. We also demonstrated that if AlphaFold2 predictions were at hand, the interface prediction challenge could be alleviated for most of the targets. All in all, it is evident that new approaches are needed for the accurate prediction of assemblies, which undoubtedly will expand on the significant improvements in the tertiary structure prediction field. This article is protected by copyright. All rights reserved.
... Importantly, certain applications require specific levels of structure quality. For instance, rational drug design (Fernández-Ballester, et al., 2011; and molecular docking applied to predict protein-protein interactions (Movshovitz-Attias, et al., 2010;Park, et al., 2015) rely on the atomic resolution structures. ...
Article
Motivation X-ray crystallography was used to produce nearly 90% of protein structures. These efforts were supported by numerous sequence-based tools that accurately predict crystallizable proteins. However, protein structures vary widely in their quality, typically measured with resolution and R-free. This impacts the ability to use these structures for some applications including rational drug design and molecular docking and motivates development of methods that accurately predict structure quality. Results We introduce XRRpred, the first predictor of the resolution and R-free values from protein sequences. XRRpred relies on original sequence profiles, hand-crafted features, empirically selected and parametrized regressors, and modern resampling techniques. Using an independent test dataset, we show that XRRpred provides accurate predictions of resolution and R-free. We demonstrate that XRRpred’s predictions correctly model relationship between the resolution and R-free and reproduce structure quality relations between structural classes of proteins. We also show that XRRpred significantly outperforms indirect alternative ways to predict the structure quality that include predictors of crystallization propensity and an alignment-based approach. XRRpred is available as a convenient webserver that allows batch predictions and offers informative visualization of the results. Availability http://biomine.cs.vcu.edu/servers/XRRPred/.
... Innumerable protein-protein docking web-servers have been developed with diverse sampling algorithms and scoring functions in order to accurately predict the binding mode between two protein structures (Gromiha et al., 2017;Porter et al., 2019). Due to varying differences in their docking and scoring strategies, choosing an appropriate protocol for docking is a tricky problem in itself (Huang, 2014;Park et al., 2015;Gromiha et al., 2017;Porter et al., 2019). The CAPRI (Critical Assessment of PRedicted Interactions) community-wide effort attempts to dock the same proteins provided by the assessors in a scientific meeting held every six months for discussing protein-protein docking accuracy (Janin, 2010). ...
Article
Full-text available
Protein-protein interactions are indispensable physiological processes regulating several biological functions. Despite the availability of structural information on protein-protein complexes, deciphering their complex topology remains an outstanding challenge. Raf kinase inhibitory protein (RKIP) has gained substantial attention as a favorable molecular target for numerous pathologies including cancer and Alzheimer’s disease. RKIP interferes with the RAF/MEK/ERK signaling cascade by endogenously binding with C-Raf (Raf-1 kinase) and preventing its activation. In the current investigation, the binding of RKIP with C-Raf was explored by knowledge-based protein-protein docking web-servers including HADDOCK and ZDOCK and a consensus binding mode of C-Raf/RKIP structural complex was obtained. Molecular dynamics (MD) simulations were further performed in an explicit solvent to sample the conformations for when RKIP binds to C-Raf. Some of the conserved interface residues were mutated to alanine, phenylalanine and leucine and the impact of mutations was estimated by additional MD simulations and MM/PBSA analysis for the wild-type (WT) and constructed mutant complexes. Substantial decrease in binding free energy was observed for the mutant complexes as compared to the binding free energy of WT C-Raf/RKIP structural complex. Furthermore, a considerable increase in average backbone root mean square deviation and fluctuation was perceived for the mutant complexes. Moreover, per-residue energy contribution analysis of the equilibrated simulation trajectory by HawkDock and ANCHOR web-servers was conducted to characterize the key residues for the complex formation. One residue each from C-Raf (Arg398) and RKIP (Lys80) were identified as the druggable “hot spots” constituting the core of the binding interface and corroborated by additional long-time scale (300 ns) MD simulation of Arg398Ala mutant complex. A notable conformational change in Arg398Ala mutant occurred near the mutation site as compared to the equilibrated C-Raf/RKIP native state conformation and an essential hydrogen bonding interaction was lost. The thirteen binding sites assimilated from the overall analysis were mapped onto the complex as surface and divided into active and allosteric binding sites, depending on their location at the interface. The acquired information on the predicted 3D structural complex and the detected sites aid as promising targets in designing novel inhibitors to block the C-Raf/RKIP interaction.
... Similarly, structural comparisons between proteins that are known to interact with a common protein enable to infer probabilities for mutually exclusive interactions in a protein network [34]. Advances in computational protein-protein docking enable to infer protein interactions, and hence with sufficient structural resolution it can also indicate competing interactions throughout a network [35][36][37][38][39]. ...
Preprint
Full-text available
Protein interactions are fundamental building blocks of biochemical reaction systems underlying cellular functions. The complexity and functionality of these systems emerge not only from the protein interactions themselves but also from the dependencies between these interactions, e.g., allosteric effects, mutual exclusion or steric hindrance. Therefore, formal models for integrating and using information about such dependencies are of high interest. We present an approach for endowing protein networks with interaction dependencies using propositional logic, thereby obtaining constrained protein interaction networks ("constrained networks"). The construction of these networks is based on public interaction databases and known as well as text-mined interaction dependencies. We present an efficient data structure and algorithm to simulate protein complex formation in constrained networks. The efficiency of the model allows a fast simulation and enables the analysis of many proteins in large networks. Therefore, we are able to simulate perturbation effects (knockout and overexpression of single or multiple proteins, changes of protein concentrations). We illustrate how our model can be used to analyze a partially constrained human adhesome network. Comparing complex formation under known dependencies against without dependencies, we find that interaction dependencies limit the resulting complex sizes. Further we demonstrate that our model enables us to investigate how the interplay of network topology and interaction dependencies influences the propagation of perturbation effects. Our simulation software CPINSim (for Constrained Protein Interaction Network Simulator) is available under the MIT license at http://github.com/BiancaStoecker/cpinsim and via Bioconda (https://bioconda.github.io).
... Many more protein complex structures could in principle be predicted by computational approaches, in a scenario where protein-protein docking takes on a crucial role complementary to classical structural biology techniques. However, reliably predicting the three-dimensional structure of protein-protein complexes by molecular docking is still an open challenge, with one of the critical steps being the scoring, i.e. the ability to discriminate between correct and incorrect solutions within a wide pool of generated models [9]. ...
Article
Full-text available
Background: Properly scoring protein-protein docking models to single out the correct ones is an open challenge, also object of assessment in CAPRI (Critical Assessment of PRedicted Interactions), a community-wide blind docking experiment. We introduced in the field CONSRANK (CONSensus RANKing), the first pure consensus method. Also available as a web server, CONSRANK ranks docking models in an ensemble based on their ability to match the most frequent inter-residue contacts in it. We have been blindly testing CONSRANK in all the latest CAPRI rounds, where we showed it to perform competitively with the state-of-the-art energy and knowledge-based scoring functions. More recently, we developed Clust-CONSRANK, an algorithm introducing a contact-based clustering of the models as a preliminary step of the CONSRANK scoring process. In the latest CASP13-CAPRI joint experiment, we participated as scorers with a novel pipeline, combining both our scoring tools, CONSRANK and Clust-CONSRANK, with our interface analysis tool COCOMAPS. Selection of the 10 models for submission was guided by the strength of the emerging consensus, and their final ranking was assisted by results of the interface analysis. Results: As a result of the above approach, we were by far the first scorer in the CASP13-CAPRI top-1 ranking, having high/medium quality models ranked at the top-1 position for the majority of targets (11 out of the total 19). We were also the first scorer in the top-10 ranking, on a par with another group, and the second scorer in the top-5 ranking. Further, we topped the ranking relative to the prediction of binding interfaces, among all the scorers and predictors. Using the CASP13-CAPRI targets as case studies, we illustrate here in detail the approach we adopted. Conclusions: Introducing some flexibility in the final model selection and ranking, as well as differentiating the adopted scoring approach depending on the targets were the key assets for our highly successful performance, as compared to previous CAPRI rounds. The approach we propose is entirely based on methods made available to the community and could thus be reproduced by any user.
... The cartesian term 39 is also the basis for a cartesian_ddG method, which has been used to calculate ΔΔG values of mutations (where ΔG is the free energy of folding) to assess changes in protein stability. Only the backbones and side chains of residues near the mutation site are allowed to move 40 . Due to the local optimization, this protocol is much faster than the previous gold-standard ddg_ monomer 41 while retaining the same level of accuracy. ...
Article
The Rosetta software for macromolecular modeling, docking and design is extensively used in laboratories worldwide. During two decades of development by a community of laboratories at more than 60 institutions, Rosetta has been continuously refactored and extended. Its advantages are its performance and interoperability between broad modeling capabilities. Here we review tools developed in the last 5 years, including over 80 methods. We discuss improvements to the score function, user interfaces and usability. Rosetta is available at http://www.rosettacommons.org.
... 19 Structural refinement by various minimization protocols improves the accuracy of the predictions. 4,17,[20][21][22][23][24] In terms of the intermolecular energy, the scan in the free docking methodology typically means the intermolecular energy landscapewide search for the binding funnel. The free docking success depends heavily on a number of factors (the force field, characteristics of the intermolecular energy landscape, and such). ...
Article
Protein docking is essential for structural characterization of protein interactions. Besides providing the structure of protein complexes, modeling of proteins and their complexes is important for understanding the fundamental principles and specific aspects of protein interactions. The accuracy of protein modeling, in general, is still less than that of the experimental approaches. Thus, it is important to investigate the applicability of docking techniques to modeled proteins. We present new comprehensive benchmark sets of protein models for the development and validation of protein docking, as well as a systematic assessment of free and template‐based docking techniques on these sets. As opposed to previous studies, the benchmark sets reflect the real case modeling/docking scenario where the accuracy of the models is assessed by the modeling procedure, without reference to the native structure (which would be unknown in practical applications). We also expanded the analysis to include docking of protein pairs where proteins have different structural accuracy. The results show that, in general, the template‐based docking is less sensitive to the structural inaccuracies of the models than the free docking. The near‐native docking poses generated by the template‐based approach, typically, also have higher ranks than those produces by the free docking (although the free docking is indispensable in modeling the multiplicity of protein interactions in a crowded cellular environment). The results show that docking techniques are applicable to protein models in a broad range of modeling accuracy. The study provides clear guidelines for practical applications of docking to protein models. This article is protected by copyright. All rights reserved.
... Despite rapid and regular increases in computer speed, parallel processing, acceleration of computation through the use of GPUs, and the development of improved algorithms and scoring functions, accurate modeling of protein complexes remains challenging, even starting with high-quality structural models (e.g., X-ray crystal structures) of the binding partners. 21 This is due in part to the large number of potential complex structures that must be evaluated in the absence of additional information (sampling problem) 22 and to limitations in the ability to score accurately the models generated (scoring problem). 23,24 An integrated platform 25−28 that utilizes experiment-derived information (i.e., HDX-MS and XL-MS data) to define restraints that guide the sampling stage of protein−protein docking has the potential to provide models with high accuracy and resolution. ...
Article
We describe an integrated approach of using hydrogen deuterium exchange mass spectrometry (HDX-MS), chemical crosslinking mass spectrometry (XL-MS), and molecular docking to characterize the binding interface and to predict the three-dimensional quaternary structure of a protein-protein complex in solution. Interleukin 7 (IL-7) and its -receptor, IL-7R, serving as essential mediators in the immune system, are the model system. HDX kinetics report widespread protection on IL-7R but show no differential evidence of binding-induced protection or remote conformational change. Crosslinking with reagents that differ in spacer lengths and targeting residues increases the spatial resolution. Using five cross-links as distance restraints for protein-protein docking, we generated a high-confidence model of the IL-7/IL-7Rα complex. Both the predicted binding interface and regions with direct contacts agree well with those in the solid-state structure, as confirmed by previous X-ray crystallography. An additional binding region was revealed to be the C-terminus of helix B of IL-7, highlighting the value of solution-based characterization. To generalize the integrated approach, protein-protein docking was executed with a different number of cross-links. Combining cluster analysis and HDX kinetics adjudication, we found that two intermolecular crosslink-derived restraints are sufficient to generate a high-confidence model with root mean square distance (r.m.s.d.) value of all alpha carbons below 2.0 Å relative to the crystal structure. The remarkable results of binding-interface determination and quaternary structure prediction highlight the effectiveness and capability of the integrated approach, which will allow more efficient and comprehensive analysis of inter-protein interactions with broad applications in the multiple stages of design, implementation, and evaluation for protein therapeutics.
... Experimental structure determination of protein-protein interactions has greatly improved our understanding of cellular processes (Jones and Thornton, 1996;Nooren and Thornton, 2003). Current experimental methods, such as X-ray crystallography and electron microscopy (EM), only provide a slow accumulation of new experimental evidence and do not allow for high throughput (Wang et al., 2015;Marsh and Teichmann, 2015 (Park et al., 2015;Vajda et al., 2013); and secondly, methods have to reliably rank the docked poses from currently thousands of generated solutions to identify ensembles, also known as clusters, or single poses that resemble native like binding. The work discussed in this chapter focuses on the correct identification of near-native clusters. ...
Conference Paper
Proteins are involved in all processes of life and their shapes, interactions and functions are governed by physical forces. A model with atomic resolution is pivotal for the understanding of their mechanisms and how mutations perturb these. However, given the large variation of proteins and the limitations of experimental methods, in-silico approaches are the only viable solution. Presented here are a number of computational methods to predict their structure and binary interactions with atomic detail. Firstly, a machine-learning method was developed that models the recognition process of protein-protein binding to improve the identification of near-native binding sites. Secondly, a refinement method was developed to improve the structural accuracy of predicted monomers. An intra reside-residue contact map space was defined to perform more directed conformational exploration with metadynamics in order to find solutions that better resemble the native state. This method was extended to perform refinement of pre-docked heterodimers in order to predict the conformational transition from unbound to bound. Here, an inter residue-residue contact map space was defined between the interface of a receptor and a ligand. Following this extensive sampling of protein conformations by simulation, a recurrent neural network was defined and trained to predict the state changes during the sampling such that improved quality conformations can be identified. Finally, extensive in-silico biophysical experiments were performed to understand the mechanism of auto-phosphorylation for RET-kinase in wild-type and its deregulation by an oncogenic mutation.
... Further, a new energy term has been added that takes into consideration non-ideality of bond lengths and angles in cartesian space 170 . The cartesian term 170 is also the basis for a cartesian_ddG method that has been used to calculate ΔΔGs of mutation to probe protein stability, in which the backbones and sidechains of residues nearby the mutation site are allowed to move 171 . Due to the local optimization, this protocol is much faster than ddg_monomer 172 , while retaining the same level of accuracy. ...
Preprint
Full-text available
The Rosetta software suite for macromolecular modeling, docking, and design is widely used in pharmaceutical, industrial, academic, non-profit, and government laboratories. Despite its broad modeling capabilities, Rosetta remains consistently among leading software suites when compared to other methods created for highly specialized protein modeling and design tasks. Developed for over two decades by a global community of over 60 laboratories, Rosetta has undergone multiple refactorings, and now comprises over three million lines of code. Here we discuss methods developed in the last five years in Rosetta, involving the latest protocols for structure prediction; protein–protein and protein–small molecule docking; protein structure and interface design; loop modeling; the incorporation of various types of experimental data; modeling of peptides, antibodies and proteins in the immune system, nucleic acids, non-standard chemistries, carbohydrates, and membrane proteins. We briefly discuss improvements to the energy function, user interfaces, and usability of the ­­software. Rosetta is available at www.rosettacommons.org.
... 30,31 Moreover, the introduction of protein flexibility has been shown to improve sampling through a more complete search of the conformational space. 32 Despite all the developments in docking algorithms, present methods still struggle to identify near-native poses. 24,33,34 In particular, the success rate for finding an acceptable or better pose in the top-ten is a maximum of 58%, or 27% at the top position (considering 115 scoring functions). ...
Article
The development of docking algorithms to predict near-native structures of protein:protein complexes from the structure of the isolated monomers, is of paramount importance for molecular biology and drug discovery. In this study, we assessed the capacity of the interfacial area of protein:protein complexes and of Molecular Mechanics-Poisson Boltzmann Surface Area (MM-PBSA)-derived properties, to rank docking poses. We used a set of 48 protein:protein complexes, and a total of 67 docking experiments distributed among bound:bound, bound:unbound, and unbound:unbound test cases. The MM-PBSA binding free energy of protein monomers, has been shown to be very convenient to predict high-quality structures with a high success rate. In fact, considering solely the top-ranked pose of more than 200 docking solutions of each of 39 protein:protein complexes, the success rate was of 77% in the prediction of high-quality poses, or 90% if considering high- or medium-quality poses. If considering high- or medium-quality poses in the top-one prediction, a success rate of 87% was obtained for a scoring scheme based on computational alanine scanning mutagenesis data. Such ranking accuracy highlights the ability of these properties to predict near-native poses in protein:protein docking.
... Here, we highlight the use of evolutionary information and focus on methods for: a) Interface prediction, b) prediction of interface-related properties, and c) template-based modelling (TBM) of complexes. We do not review the docking field as it has been addressed in several other instances[104][105][106][107][108] . ...
Chapter
Structural characterization of proteins and their complexes is a fundamental part in understanding any biological phenomena. Yet, the experimental determination of the three‐dimensional (3D) structure of proteins and their complexes remains a challenging undertaking. In order to complement the experimental approaches, computational methods have been developed based on a variety of algorithms and models to fill the gap between the amount of sequences and structures. In this article, we review the most common methodological approaches currently used in the field, highlighting the ab initio structure prediction methods and methods for the prediction and structural modeling of protein–protein interfaces (PPIs). We particularly focus on the use of evolutionary information to guide the modeling process.
... At over 4 Å resolution the secondary structure of the protein cannot be determined and as a result the 3-D structure is inaccurate. Apart from the observable impact on the quality of the structure, at least good quality is desirable when modeling with the protein structure, for instance for rational drug design [41] and to computationally model protein-protein interactions [42]. ...
Article
Full-text available
Selection of proper targets for the X-ray crystallography will benefit biological research community immensely. Several computational models were proposed to predict propensity of successful protein production and diffraction quality crystallization from protein sequences. We reviewed a comprehensive collection of 22 such predictors that were developed in the last decade. We found that almost all of these models are easily accessible as webservers and/or standalone software and we demonstrated that some of them are widely used by the research community. We empirically evaluated and compared the predictive performance of seven representative methods. The analysis suggests that these methods produce quite accurate propensities for the diffraction-quality crystallization. We also summarized results of the first study of the relation between these predictive propensities and the resolution of the crystallizable proteins. We found that the propensities predicted by several methods are significantly higher for proteins that have high resolution structures compared to those with the low resolution structures. Moreover, we tested a new meta-predictor, MetaXXC, which averages the propensities generated by the three most accurate predictors of the diffraction-quality crystallization. MetaXXC generates putative values of resolution that have modest levels of correlation with the experimental resolutions and it offers the lowest mean absolute error when compared to the seven considered methods. We conclude that protein sequences can be used to fairly accurately predict whether their corresponding protein structures can be solved using X-ray crystallography. Moreover, we also ascertain that sequences can be used to reasonably well predict the resolution of the resulting protein crystals.
... The docking calculations were performed for the ADAM17-TMD interaction with the WT Rhbdf2-TMD and the I387F mutant (Fig. 5A,B). While the well known limitations of docking protocols preclude a detailed determination of the interface (Park et al., 2015;Ritchie, 2008), the pattern of inter-helix interactions resulting from the docking calculations has produced the expected set of predictions of specific local interactions for the planned testing in cell-based assays with mutated proteins (see below). We note further that when the Rhbdf2-TMD in the Rhbdf2-TMD-ADAM17-TMD complex predicted from the docking, is superimposed on TMD1 of the cognate GlpG crystal structure (PDB 2IC8), the segments of the interacting helices identified as the interface are positioned for interaction as predicted, while simultaneously allowing additional interactions with other parts of the Rhbdf2 transmembrane bundle without significant steric clashes (Fig. 5C). ...
Article
Full-text available
A disintegrin and metalloproteinase 17 (ADAM17) controls the release of the pro-inflammatory cytokine tumor necrosis factor α (TNFα) and is crucial for protecting the skin and intestinal barrier by proteolytic activation of Epidermal growth factor receptor (EGFR)-ligands. The seven-membrane spanning inactive Rhomboid2 (iRhom2; Rhbdf2) is required for ADAM17-dependent TNFα shedding and crosstalk with the EGFR, and a point mutation (sinecure, sin) in the first transmembrane domain (TMD) of Rhbdf2 (Rhbdf2(sin)) blocks TNFα shedding, yet little is known about the underlying mechanism. Here we used a structure/function analysis informed by structural modeling to evaluate the interaction between the TMD of ADAM17 with the 1(st) TMD of Rhbdf2 and its role in Rhbdf2/ADAM17-dependent shedding. Moreover, double mutant mice that are homozygous for Rhbdf2(sin/sin) and lack Rhbdf1 closely resemble Rhbdf1/2(-/-) double knockout mice, highlighting the severe functional impact of the Rhbdf2(sin/sin) mutation on ADAM17 during mouse development. Taken together, these findings provide new mechanistic and conceptual insights into the critical role of the TMDs of ADAM17 and Rhbdf2 in the regulation of the ADAM17/EGFR and ADAM17/TNFα pathway.
... The computational simulations provided herein served as guidelines for the additional steps in our experiment design, and the putative models obtained by molecular docking could improve our understanding of this interaction, acting as a baseline for site-directed mutagenesis studies until the exact complex may be obtained through X-ray crystallography. Despite its widespread use, docking has been considered an unreliable technique for the prediction of drug binding (Chen 2015), and macromolecular docking, such as the proteinprotein binding predicted here, has improved greatly in the last decade (Musiani and Ciurli 2015;Park et al. 2015). ...
Article
Extracellular heat shock protein 70 (HSP70) is recognized by receptors on the plasma membrane, such as Toll-like receptor 4 (TLR4), TLR2, CD14, and CD40. This leads to activation of nuclear factor-kappa B (NF-κB), release of pro-inflammatory cytokines, enhancement of the phagocytic activity of innate immune cells, and stimulation of antigen-specific responses. However, the specific characteristics of HSP70 binding are still unknown, and all HSP70 receptors have not yet been described. Putative models for HSP70 complexation to the receptor for advanced glycation endproducts (RAGEs), considering both ADP- and ATP-bound states of HSP70, were obtained through molecular docking and interaction energy calculations. This interaction was detected and visualized by a proximity fluorescence-based assay in A549 cells and further analyzed by normal mode analyses of the docking complexes. The interacting energy of the complexes showed that the most favored docking situation occurs between HSP70 ATP-bound and RAGE in its monomeric state. The fluorescence proximity assay presented a higher number of detected spots in the HSP70 ATP treatment, corroborating with the computational result. Normal-mode analyses showed no conformational deformability in the interacting interface of the complexes. Results were compared with previous findings in which oxidized HSP70 was shown to be responsible for the differential modulation of macrophage activation, which could result from a signaling pathway triggered by RAGE binding. Our data provide important insights into the characteristics of HSP70 binding and receptor interactions, as well as putative models with conserved residues on the interface area, which could be useful for future site-directed mutagenesis studies.
... However, experimental structures of protein-protein complexes are still underrepresented [5]. Many more protein complex structures could in principle be predicted by computational approaches, specifically by macromolecular docking, However, reliably predicting the three-dimensional structure of protein-protein complexes is still challenging, with one of the critical steps being the scoring, i.e. the ability to discriminate between correct and incorrect solutions within a pool of models [6][7][8]. ...
Article
Full-text available
Correctly scoring protein-protein docking models to single out native-like ones is an open challenge. It is also an object of assessment in CAPRI (Critical Assessment of PRedicted Interactions), the community-wide blind docking experiment. We introduced in the field the first pure consensus method, CONSRANK, which ranks models based on their ability to match the most conserved contacts in the ensemble they belong to. In CAPRI, scorers are asked to evaluate a set of available models and select the top ten ones, based on their own scoring approach. Scorers’ performance is ranked based on the number of targets/interfaces for which they could provide at least one correct solution. In such terms, blind testing in CAPRI Round 30 (a joint prediction round with CASP11) has shown that critical cases for CONSRANK are represented by targets showing multiple interfaces or for which only a very small number of correct solutions are available. To address these challenging cases, CONSRANK has now been modified to include a contact-based clustering of the models as a preliminary step of the scoring process. We used an agglomerative hierarchical clustering based on the number of common inter-residue contacts within the models. Two criteria, with different thresholds, were explored in the cluster generation, setting either the number of common contacts or of total clusters. For each clustering approach, after selecting the top (most populated) ten clusters, CONSRANK was run on these clusters and the top-ranked model for each cluster was selected, in the limit of 10 models per target. We have applied our modified scoring approach, Clust-CONSRANK, to SCORE_SET, a set of CAPRI scoring models made recently available by CAPRI assessors, and to the subset of homodimeric targets in CAPRI Round 30 for which CONSRANK failed to include a correct solution within the ten selected models. Results show that, for the challenging cases, the clustering step typically enriches the ten top ranked models in native-like solutions. The best performing clustering approaches we tested indeed lead to more than double the number of cases for which at least one correct solution can be included within the top ten ranked models.
... The explicit approach to protein flexibility is subdivided in partially flexible or fully flexibility methodologies [35]. Partially flexible methodologies are employed in several ways: some software's apply rigid body protein-protein docking followed by subsequent refinements in complex interface to provide partial flexibility. ...
Article
Full-text available
Proteins undergo changes in their form (conformational changes) upon interaction with compounds/substrates. Molecular docking is an important tool used in the study of correlations between structure and function, aiding the understanding of several biological processes, shedding light on drug development. Structural rearrangements can occur during molecular recognition in order to optimize interactions in the complex, leading to local and global conformational changes. Conformational selection and induced fit are models that attempt to explain structural variation effects in molecular recognition. In this review we discuss the different strategies employed for global and local conformational changes, in both protein-ligand and protein-protein docking.
... We should note that unrestrained protein-protein docking cannot provide ultimate information on exact structures of the complexes. This is especially true when at least one of the interacting partners (nAChR) is a multi-domain protein, where large-scale collective motions can occur (Park et al, 2015). In such cases, additional restraints based on experimental information are indispensable. ...
Article
‘Three-finger’ toxin WTX from Naja kaouthia interacts with nicotinic and muscarinic acetylcholine receptors (nAChRs and mAChRs). Mutagenesis and competition experiments with 125I-α-bungarotoxin revealed that Arg31 and Arg32 residues from the WTX loop II are important for binding to T. californica and human α7 nAChRs. Computer modeling suggested that loop II occupies the orthosteric binding site at α7 nAChR. The similar toxin interface was previously described as a major determinant of allosteric interactions with mAChRs.
... Consequently, accumulation of pro-inflammatory mediators in the liver contributes to NASH. or delivering small-molecule compound to modulate PIAS activity as a therapeutic solution. Recent advances in molecular docking-based, computer-aided drug design and material/surface science will likely offer help in this matter [38]. ...
Article
Full-text available
Excessive nutrition promotes the pathogenesis of non-alcoholic steatohepatitis (NASH), characterized by the accumulation of pro-inflammation mediators in the liver. In the present study we investigated the regulation of pro-inflammatory transcription in hepatocytes by protein inhibitor of activated STAT 4 (PIAS4) in this process and the underlying mechanisms. We report that expression of the class III deacetylase SIRT1 was down-regulated in the livers of NASH mice accompanied by a simultaneous increase in the expression and binding activity of PIAS4. Exposure to high glucose stimulated the expression PIAS4 in cultured hepatocytes paralleling SIRT1 repression. Estrogen, a known NASH-protective hormone, ameliorated SIRT1 trans-repression by targeting PIAS4. Over-expression of PIAS4 enhanced, while PIAS4 knockdown alleviated, repression of SIRT1 transcription by high glucose. Lentiviral delivery of short hairpin RNA (shRNA) targeting PIAS4 attenuated hepatic inflammation in NASH mice by restoring SIRT1 expression. Mechanistically, PIAS4 promoted NF-κB-mediated pro-inflammatory transcription in a SIRT1 dependent manner. In conclusion, our study indicates that PIAS4 mediated SIRT1 repression in response to nutrient surplus contributes to the pathogenesis of NASH. Therefore, targeting PIAS4 might provide novel therapeutic strategies in the intervention of NASH.
... We focus on developments since 2010 (foundational and earlier algorithms are discussed in [37,46,47]) in four areas: optimization algorithms for protein design, algorithms to search improved flexibility models, multi-state design, and ensemble-based design. Due to constraints on the length of this survey, we exclude related algorithms that are important for therapeutic and assembly protein design that have also been highly productive recently, such as docking algorithms (for a review see [48]), scaffold search algorithms (e.g., [49,50]), and algorithms to optimize libraries for in vitro evolution of designed proteins (e.g., [51,52]). ...
Article
Full-text available
Computational structure-based protein design programs are becoming an increasingly important tool in molecular biology. These programs compute protein sequences that are predicted to fold to a target structure and perform a desired function. The success of a program's predictions largely relies on two components: first, the input biophysical model, and second, the algorithm that computes the best sequence(s) and structure(s) according to the biophysical model. Improving both the model and the algorithm in tandem is essential to improving the success rate of current programs, and here we review recent developments in algorithms for protein design, emphasizing how novel algorithms enable the use of more accurate biophysical models. We conclude with a list of algorithmic challenges in computational protein design that we believe will be especially important for the design of therapeutic proteins and protein assemblies.
Article
Full-text available
Histones are keys to many epigenetic events and their complexes have therapeutic and diagnostic importance. The determination of the structures of histone complexes is fundamental in the design of new drugs. Computational molecular docking is widely used for the prediction of target–ligand complexes. Large, linear peptides like the tail regions of histones are challenging ligands for docking due to their large conformational flexibility, extensive hydration, and weak interactions with the shallow binding pockets of their reader proteins. Thus, fast docking methods often fail to produce complex structures of such peptide ligands at a level appropriate for drug design. To address this challenge, and improve the structural quality of the docked complexes, post-docking refinement has been applied using various molecular dynamics (MD) approaches. However, a final consensus has not been reached on the desired MD refinement protocol. In this present study, MD refinement strategies were systematically explored on a set of problematic complexes of histone peptide ligands with relatively large errors in their docked geometries. Six protocols were compared that differ in their MD simulation parameters. In all cases, pre-MD hydration of the complex interface regions was applied to avoid the unwanted presence of empty cavities. The best-performing protocol achieved a median of 32% improvement over the docked structures in terms of the change in root mean squared deviations from the experimental references. The influence of structural factors and explicit hydration on the performance of post-docking MD refinements are also discussed to help with their implementation in future methods and applications.
Article
Sulfiredoxin (Srx) is the enzyme that restores the peroxidase activity of peroxiredoxins (Prxs) through catalyzing the reduction of hyperoxidized Prxs back to their active forms. This process involves protein-protein interaction in an enzyme-substrate binding manner. The integrity of the Srx-Prx axis contributes to the pathogenesis of various oxidative stress related human disorders including cancer, inflammation, cardiovascular and neurological diseases. The purpose of this study is to understand the structural and molecular biology of the Srx-Prx interaction, which may be of significance for prediction of target site for the novel drug-discovery. Homology modeling and protein-protein docking approaches were applied to examine the Srx-Prx interaction using online platforms including ITASSER, Phyre2, Swissmodel, AlphaFold, MZDOCK and ZDOCK. By in-silico studies, A 26-amino acid motif at the C-terminus of Prx1 was predicted to cause a steric hindrance for the kinetics of the Srx-Prx1 interaction. These predictions were tested in-vitro using purified recombinant proteins including Srx, full-length Prxs, and C-terminus deleted Prxs. We confirmed that deletion of the C-terminus of Prxs significantly enhanced its rate of association with Srx (i.e. >1000 fold increase in the ka of the Srx-Prx1 interaction) with minimal effect on the rate of dissociation (kd). Differential interaction of Srx with individual members of the Prx family was further examined in cultured cells. Taken together, these data add novel molecular and structural insights critical for the understanding of the biology of the Srx-Prx interaction that may be of value for the development of targeted therapy for human disorders.
Chapter
The ongoing outbreak of coronavirus-19 (COVID-19) has quickly become a daunting challenge to global health. The ultraviolet radiations (UVR) A wavebands fall in the region of 320-400 nm of the solar spectrum and comprise about 95% of overall ultraviolet (UVs) approaching the biosphere. The UVA is extensively used in phototherapy systems, as a disinfectant, and in various skin-related conditions. This chapter aimed to address the reduction of coronavirus replication by direct application of phototherapy (UVA) in lungs and modulation of nitric oxide (NO) in skin cells following UVA exposures. This NO influx inside the bloodstream to deliver into the lungs endothelial cells where it goes to incur cytoprotection to alveolar lung cells. Moreover, it is also proposed for direct application of therapeutic doses of UVA light inside the lungs. It uses a fiberoptic adapter to modulate the production of NO in lung endothelial cells, which will diffuse into the bronchi and lungs to leave bronchodilatory and vasodilatory effects. How NO reduces inflammatory burst and reactive oxygen species (ROS) in the lungs' alveolar cells is also discussed. Moreover, it is also proposed that UVA radiation application should be limited to physiological doses and applied every 4-8 h, with at least 24 h of therapy before reassessment. The treating physician should determine discontinuation of this direct UVA treatment into the lungs following observation of the patient's condition and the safety and efficacy of the treatment. This study will highlight and emphasize the importance of utilizing UVA radiation to control this epidemic.
Chapter
Full-text available
The recent emergent coronaviruses in the 21st century, such as Severe Acute Respiratory Syndrome-Coronavirus (SARS-CoV), Middle East Respiratory SyndromeCoronavirus (MERS-CoV), and severe acute respiratory syndrome-coronavirus-2 (SARS-CoV-2), has caused significant morbidity and mortality around the world. The lung is the most affected organ in the infection of human pathogenic coronaviruses. There is always a scarcity of human signs, symptoms, and modes of transmission. So to study the viral pathogenesis and evaluated interventions of therapies and vaccines, animals need to be used as models, especially at early epidemics. Lesions scoring can be identified from histopathological studies, and it can be helpful to understand the viral pathogenesis and damages to the cells to design effective therapies or vaccines. Histopathology uses the cells to determine viral host receptors and viral host tropism to relate with disease severity and lesions. Moreover, histopathology also plays a role in the qualitative description of affected organs to determine the micro-anatomic location of cells, type of cells, and cellular consequences during and post-infection. Comparatively, this approach has various limitations, but still, it is significant in comparing treatment groups. In comparing various groups, semi-quantitative and quantitative tissue scores are used for statistical analysis to increase the reproducibility of the study. This chapter refers to different features, including the importance of histopathology, principles, technique, scoring methods, and pathological characteristics of COVID-19, which can be valuable to assess the lung infection caused by SARSCoV-2 and animal models and real situations.
Preprint
Full-text available
Being able to quantify the similarity between two protein complexes is essential for numerous applications. Prominent examples are database searches for known complexes with a given query complex, comparison of the output of different protein complex prediction algorithms, or summarizing and clustering protein complexes, e.g., for visualization. While the corresponding problems have received much attention on single proteins and protein families, the question about how to model and compute similarity between protein complexes has not yet been systematically studied. Because protein complexes can be naturally modeled as graphs, in principle general graph similarity measures may be used, but these are often computationally hard to obtain and do not take typical properties of protein complexes into account. Here we propose a parametric family of similarity measures based on Weisfeiler-Lehman labeling. We evaluate it on simulated complexes of the extended human integrin adhesome network. Because the connectivity (graph topology) of real complexes is often unknown and hard to obtain experimentally, we use both known protein-protein interaction networks and known interdependencies (constraints) between interactions to simulate more realistic complexes than from interaction networks alone. We empirically show that the defined family of similarity measures is in good agreement with edit similarity, a similarity measure derived from graph edit distance, but can be much more efficiently computed. It can therefore be used in large-scale studies and simulations and serve as a basis for further refinements of modeling protein complex similarity.
Preprint
Full-text available
Being able to quantify the similarity between two protein complexes is essential for numerous applications. Prominent examples are database searches for known complexes with a given query complex, comparison of the output of different protein complex prediction algorithms, or summarizing and clustering protein complexes, e.g., for visualization. While the corresponding problems have received much attention on single proteins and protein families, the question about how to model and compute similarity between protein complexes has not yet been systematically studied. Because protein complexes can be naturally modeled as graphs, in principle general graph similarity measures may be used, but these are often computationally hard to obtain and do not take typical properties of protein complexes into account. Here we propose a parametric family of similarity measures based on Weisfeiler-Lehman labeling. We evaluate it on simulated complexes of the extended human integrin adhesome network. Because the connectivity (graph topology) of real complexes is often unknown and hard to obtain experimentally, we use both known protein-protein interaction networks and known interdependencies (constraints) between interactions to simulate more realistic complexes than from interaction networks alone. We empirically show that the defined family of similarity measures is in good agreement with edit similarity, a similarity measure derived from graph edit distance, but can be much more efficiently computed. It can therefore be used in large-scale studies and simulations and serve as a basis for further refinements of modeling protein complex similarity.
Chapter
Mutation of a single amino acid in a protein often has consequences on the interaction with other proteins, which may affect other interaction networks and pathways and ultimately lead to pathological phenotypes. A detailed structural analysis of these altered protein–protein complexes is essential to interpret the impact of a given mutation at the molecular level, which may facilitate intervention with therapeutic purposes. Given current limitations in the structural coverage of the human interactome, computational docking is emerging as a complementary source of information. Structural analysis can help to locate a given mutation at a protein–protein interface, but further characterisation of its impact on binding affinity is needed for a full interpretation. The integration of computational docking methods and energy‐based descriptors is facilitating the characterisation of an increasing number of disease‐related mutations, thus improving our understanding of the consequences of such mutations at the phenotypic level. Key Concepts • Protein–protein interactions are key to understand disease at the molecular level. • Disease‐related mutations can have significant structural and energetic impact on protein–protein interactions. • The 3D structure of a complex is essential to interpret the functional impact of a mutation. • Computational docking can provide structural models for protein–protein interactions with no available structure. • In addition to structural data, further energetic description is needed to fully interpret the impact of a mutation. • Hot‐spot interface residues are causing the greater impact on a protein–protein interaction when mutated. • Altered protein–protein interactions are potentially suitable drug targets for therapeutic purposes.
Article
Integration of template‐based modeling, global sampling and precise scoring is crucial for the development of molecular docking programs with improved accuracy. We combined template‐based modeling and ab‐initio docking protocol as hybrid docking strategy called CoDock for the docking and scoring experiments of the seventh CAPRI edition. For CAPRI rounds 38‐45, we obtained acceptable or better models in the top 10 submissions for 8 out of the 16 evaluated targets as predictors, 9 out of the 16 targets as scorers. Especially, we submitted acceptable models for all of the evaluated protein‐oligosaccharide targets. For the CASP13‐CAPRI experiment (round 46), we obtained acceptable or better models in the top 5 submissions for 10 out of the 20 evaluated targets as predictors, 11 out of the 20 targets as scorers. The failed cases for our group were mainly the difficult targets and the protein‐peptide systems in CAPRI and CASP13‐CAPRI experiments. In summary, this CAPRI edition showed that our hybrid docking strategy can be efficiently adapted to the increasing variety of challenges in the field of molecular interactions. This article is protected by copyright. All rights reserved.
Chapter
Proteins in living cells rarely act alone, but instead perform their functions together with other proteins in so-called protein complexes. Being able to quantify the similarity between two protein complexes is essential for numerous applications, e.g. for database searches of complexes that are similar to a given input complex. While the similarity problem has been extensively studied on single proteins and protein families, there is very little existing work on modeling and computing the similarity between protein complexes. Because protein complexes can be naturally modeled as graphs, in principle general graph similarity measures may be used, but these are often computationally hard to obtain and do not take typical properties of protein complexes into account. Here we propose a parametric family of similarity measures based on Weisfeiler-Lehman labeling. We evaluate it on simulated complexes of the extended human integrin adhesome network. We show that the defined family of similarity measures is in good agreement with edit similarity, a similarity measure derived from graph edit distance, but can be computed more efficiently. It can therefore be used in large-scale studies and serve as a basis for further refinements of modeling protein complex similarity.
Article
We propose a flexible docking simulation based on parallel cascade selection molecular dynamics (PaCS-MD) as a post-processing treatment after a rigid docking simulation. PaCS-MD has been proposed as an enhanced sampling method for generating structural transition pathways from a given reactant to a product. The PaCS-MD cycle consists of the following two steps: (1) selections of important initial structures and (2) their conformational resampling from the selected initial structures. By repeating the conformational resampling from the important initial structures, structural transitions from the reactant to the product are gradually promoted. In the present flexible docking simulation, decoys (protein complexes) are generated by the rigid docking simulation a priori and employed as products of PaCS-MD. Then PaCS-MD is applied to reproduce association processes to the decoys from a reactant (completely separated proteins). To judge whether PaCS-MD found the association processes or not, the root-mean-square deviation measured from decoy (RMSDdecoy) was defined and monitored during the PaCS-MD cycles. By checking the RMSDdecoy values, a set of decoys is screened as a non-near native protein complex. In more detail, PaCS-MD detects near native protein complexes from the generated decoys by imposing a threshold (cutoff) for RMSDdecoy, i.e. RMSDdecoy < cutoff. As a demonstration, the present flexible docking addressed dimerization processes of K48-linked ubiquitin dimer without a covalent bond between its monomers. Finally, PaCS-MD screened out non-near native protein complexes from decoys generated by a rigid docking simulation.
Article
Protein-protein docking technology is an effective approach to study the molecular mechanism of essential biological process mediated by complex protein-protein interactions. The fast Fourier transform (FFT) correlation approach makes a good balance between the exhaustive global sampling and the computational efficiency for protein-protein docking. However, it is difficult to integrate the precise knowledge-based scoring function and site constraint information into the FFT-based approach. New docking strategies with capability of combining both global sampling and precise scoring are strongly needed. We propose a multistage protein-protein docking strategy called CoDockPP. This program takes full advantage of the sampling efficiency of FFT-based method to choose the valid ligand protein poses with good surface complementarity. The retained poses are transformed to the real Cartesian space for the implement of site constraints and atomic scoring. Site constraints and a rapid table lookup scoring are applied to gradually reduce the candidate poses to a tractable number. To enhance the accuracy of docking prediction, the best fast-scoring states are expanded the local sampling points and then these neighbor poses are further evaluated by the precise knowledge-based scoring function. By testing on protein-protein docking benchmark 5.0, CoDockPP remarkably improves the success rate and hit count in both ab initio docking and site-specific docking, especially in difficult cases. The server is free and open to all users with no login requirement at http://codockpp.schanglab.org.cn.
Article
Despite decades of development, protein-protein docking remains a largely unsolved problem. The main difficulties are the immense space spanned by the translational and rotational degrees of freedom and the prediction of the conformational changes of proteins upon binding. FFT is generally the preferred method to exhaustively explore the translation-rotation space at a fine grid resolution, albeit with the tradeoff of approximating force fields with correlation functions. This work presents a direct search alternative that samples the states in Cartesian space at the same resolution and computational cost as standard FFT methods. Operating in real space allows the use of standard force field functional forms used in typical non-FFT methods as well as the implementation of strategies for focused exploration of conformational flexibility. Currently, a few misplaced side chains can cause docking programs to fail. This work specifically addresses the problem of side chain rearrangements upon complex formation. Based on the observation that most side chains retain their unbound conformation upon binding, each rigidly docked pose is initially scored ignoring up to a limited number of side chain overlaps which are resolved in subsequent repacking and minimization steps. On test systems where side chains are altered and backbones held in their bound state, this implementation provides significantly better native pose recovery and higher quality (lower RMSD) predictions when compared with five of the most popular docking programs. The method is implemented in the software program: ProPOSE (Protein Pose Optimization by Systematic Enumeration).
Article
Aim: Scoring functions are important component of protein-protein docking methods. They need to be evaluated on high-quality benchmarks to reveal their strengths and weaknesses. Evaluation results obtained on such benchmarks can provide valuable guidance for developing more advanced scoring functions. Methodology & results: In our comparative assessment of scoring functions for protein-protein interactions benchmark, the performance of a scoring function was characterized by 'docking power' and 'scoring power'. A high-quality dataset of 273 protein-protein complexes was compiled and employed in both tests. Four scoring functions, including FASTCONTACT, ZRANK, dDFIRE and ATTRACT were tested as demonstration. ZRANK and ATTRACT exhibited encouraging performance in the docking power test. However, all four scoring functions failed badly in the scoring power test. Conclusion: Our comparative assessment of scoring functions for protein-protein interaction benchmark is created especially for assessing the scoring functions applicable to protein-protein interactions. It is different from other benchmarks for assessing protein-protein docking methods. Our benchmark is available to the public at www.pdbbind-cn.org/download/CASF-PPI/ .
Article
Protein interactions are fundamental building blocks of biochemical reaction systems underlying cellular functions. The complexity and functionality of these systems emerge not only from the protein interactions themselves but also...
Chapter
Protein–protein interactions (PPIs) are responsible for a number of key physiological processes in the living cells and underlie the pathomechanism of many diseases. Nowadays, along with the concept of so-called “hot spots” in protein–protein interactions, which are well-defined interface regions responsible for most of the binding energy, these interfaces can be targeted with modulators. In order to apply structure-based design techniques to design PPIs modulators, a three-dimensional structure of protein complex has to be available. In this context in silico approaches, in particular protein–protein docking, are a valuable complement to experimental methods for elucidating 3D structure of protein complexes. Protein–protein docking is easy to use and does not require significant computer resources and time (in contrast to molecular dynamics) and it results in 3D structure of a protein complex (in contrast to sequence-based methods of predicting binding interfaces). However, protein–protein docking cannot address all the aspects of protein dynamics, in particular the global conformational changes during protein complex formation. In spite of this fact, protein–protein docking is widely used to model complexes of water-soluble proteins and less commonly to predict structures of transmembrane protein assemblies, including dimers and oligomers of G protein-coupled receptors (GPCRs). In this chapter we review the principles of protein–protein docking, available algorithms and software and discuss the recent examples, benefits, and drawbacks of protein–protein docking application to water-soluble proteins, membrane anchoring and transmembrane proteins, including GPCRs.
Article
Protein-protein interactions (PPIs) are responsible for a number of key physiological processes in the living cells and underlie the pathomechanism of many diseases. Nowadays, along with the concept of so-called “hot spots” in protein-protein interactions, which are well-defined interface regions responsible for most of the binding energy, these interfaces can be targeted with modulators. In order to apply structure-based design techniques to design PPIs modulators, a three dimensional structure of protein complex has to be available. In this context in silico approaches, in particular protein-protein docking are a valuable compliment to experimental methods for elucidating 3D structure of protein complexes. Protein-protein docking is easy to use and does not require significant computer resources and time (in contrast to molecular dynamics) and it results in 3D structure of a protein complex (in contrast to sequence-based methods of predicting binding interfaces). However, protein-protein docking cannot address all the aspects of protein dynamics, in particular the global conformational changes during protein complex formation. In spite of this fact, protein-protein docking is widely used to model complexes of water-soluble proteins and less commonly to predict structures of transmembrane protein assemblies, including dimers and oligomers of G protein-coupled receptors (GPCRs). In this chapter we review the principles of protein-protein docking, available algorithms and software and discuss the recent examples, benefits and drawbacks of protein-protein docking application to water-soluble proteins, membrane anchoring and transmembrane proteins, including GPCRs.
Article
Full-text available
Reliable identification of near-native poses of docked protein-protein complexes is still an unsolved problem. The intrinsic heterogeneity of protein-protein interactions is challenging for traditional biophysical or knowledge based potentials and the identification of many false positive binding sites is not unusual. Often, ranking protocols are based on initial clustering of docked poses followed by the application of an energy function to rank each cluster according to its lowest energy member. Here we present an approach of cluster ranking based not only on one molecular descriptor (e.g. an energy function) but employing a large number of descriptors that are integrated in a machine learning model, whereby, an extremely randomized tree classifier based on 109 molecular descriptors is trained. The protocol is based on first locally enriching clusters with additional poses, the clusters are then characterized using features describing the distribution of molecular descriptors within the cluster, which are combined into a pairwise cluster comparison model to discriminate near-native from incorrect clusters. The results show that our approach is able to identify clusters containing near-native protein-protein complexes. In addition, we present an analysis of the descriptors with respect to their power to discriminate near native from incorrect clusters and how data transformations and recursive feature elimination can improve the ranking performance. This article is protected by copyright. All rights reserved.
Article
We report the performance of protein-protein docking predictions by our group for recent rounds of the Critical Assessment of Prediction of Interactions (CAPRI), a community-wide assessment of state-of-the-art docking methods. Our prediction procedure uses a protein-protein docking program named LZerD developed in our group. LZerD represents a protein surface with 3D Zernike descriptors (3DZD), which are based on a mathematical series expansion of a 3D function. The appropriate soft representation of protein surface with 3DZD makes the method more tolerant to conformational change of proteins upon docking, which adds an advantage for unbound docking. Docking was guided by interface residue prediction performed with BindML and cons-PPISP as well as literature information when available. The generated docking models were ranked by a combination of scoring functions, including PRESCO, which evaluates the native-likeness of residues' spatial environments in structure models. First, we discuss the overall performance of our group in the CAPRI prediction rounds and investigate the reasons for unsuccessful cases. Then, we examine the performance of several knowledge-based scoring functions and their combinations for ranking docking models. It was found that the quality of a pool of docking models generated by LZerD, i.e. whether or not the pool includes near-native models, can be predicted by the correlation of multiple scores. Although the current analysis used docking models generated by LZerD, findings on scoring functions are expected to be universally applicable to other docking methods. This article is protected by copyright. All rights reserved.
Article
Full-text available
Protein docking procedures carry out the task of predicting the structure of a protein-protein complex starting from the known structures of the individual protein components. More often than not, however, the structure of one or both components is not known, but can be derived by homology modeling on the basis of known structures of related proteins deposited in the Protein Data Bank (PDB). Thus, the problem is to develop methods that optimally integrate homology modeling and docking with the goal of predicting the structure of a complex directly from the amino acid sequences of its component proteins. One possibility is to use the best available homology modeling and docking methods. However, the models built for the individual subunits often differ to a significant degree from the bound conformation in the complex, often much more so than the differences observed between free and bound structures of the same protein, and therefore additional conformational adjustments, both at the backbone and side chain levels need to be modeled to achieve an accurate docking prediction. In particular, even homology models of overall good accuracy frequently include localized errors that unfavorably impact docking results. The predicted reliability of the different regions in the model can also serve as a useful input for the docking calculations. Here we present a benchmark dataset that should help to explore and solve combined modeling and docking problems. This dataset comprises a subset of the experimentally solved 'target' complexes from the widely used Docking Benchmark from the Weng Lab (excluding antibody-antigen complexes). This subset is extended to include the structures from the PDB related to those of the individual components of each complex, and hence represent potential templates for investigating and benchmarking integrated homology modeling and docking approaches. Template sets can be dynamically customized by specifying ranges in sequence similarity and in PDB release dates, or using other filtering options, such as excluding sets of specific structures from the template list. Multiple sequence alignments, as well as structural alignments of the templates to their corresponding subunits in the target are also provided. The resource is accessible online or can be downloaded at http://cluspro.org/benchmark, and is updated on a weekly basis in synchrony with new PDB releases. This article is protected by copyright. All rights reserved.
Article
Leptin administration results in leptin resistance presenting a significant barrier to therapeutic use of leptin. Consequently, we examined two hypotheses. The first examined the relationship between leptin dose and development of physiological and biochemical signs of leptin resistance. We hypothesized lower doses of leptin would produce proportional reductions in body weight without the adverse leptin-induced leptin resistance. The second compared pulsed central leptin infusion to continuous leptin infusion. We hypothesized that pulsed infusion at specific times of the day would evoke favorable body weight reductions while tempering the development of leptin-induced leptin resistance. The first experiment examined leptin responsiveness, including food intake, body weight and hypothalamic STAT3 phosphorylation to increasing doses of viral gene delivery of leptin. Varying the dose proved inconsequential with respect to long-term therapy and demonstrated proportional development of leptin resistance. The second experiment examined leptin responsiveness to pulsed central leptin infusion, comparing pulsed versus constant infusion of 3 ug/day leptin or a 2h morning versus a 2h evening pulsed leptin infusion. Pulsed delivery of the supramaximal dose of 3 ug/day was not different than constant delivery. Morning pulsed infusion of the submaximal dose of 0.25 ug reduces food intake only over subsequent immediate meal period and was associated with body weight reductions, but results in cellular leptin resistance. Evening pulsed infusion did not decrease food intake but reduces body weight and maintains full leptin signaling. The positive benefit for pulsed delivery remains speculative, yet potentially may provide an alternative mode of leptin therapy.
Article
Full-text available
Protein loop modeling is a tool for predicting protein local structures of particular interest, providing opportunities for applications involving protein structure prediction and de novo protein design. Until recently, the majority of loop modeling methods have been developed and tested by reconstructing loops in frameworks of experimentally resolved structures. In many practical applications, however, the protein loops to be modeled are located in inaccurate structural environments. These include loops in model structures, low-resolution experimental structures, or experimental structures of different functional forms. Accordingly, discrepancies in the accuracy of the structural environment assumed in development of the method and that in practical applications present additional challenges to modern loop modeling methods. This study demonstrates a new strategy for employing a hybrid energy function combining physics-based and knowledge-based components to help tackle this challenge. The hybrid energy function is designed to combine the strengths of each energy component, simultaneously maintaining accurate loop structure prediction in a high-resolution framework structure and tolerating minor environmental errors in low-resolution structures. A loop modeling method based on global optimization of this new energy function is tested on loop targets situated in different levels of environmental errors, ranging from experimental structures to structures perturbed in backbone as well as side chains and template-based model structures. The new method performs comparably to force field-based approaches in loop reconstruction in crystal structures and better in loop prediction in inaccurate framework structures. This result suggests that higher-accuracy predictions would be possible for a broader range of applications. The web server for this method is available at http://galaxy.seoklab.org/loop with the PS2 option for the scoring function.
Article
Full-text available
Since the 4th CAPRI evaluation, we have made improvements in three major areas in our refinement approach, namely the treatment of conformational flexibility, the binding free energy model, and the search algorithm. First, we incorporated backbone flexibility into our previous approach, which only optimized rigid backbone poses with limited side-chain flexibility. Here, we formulated and solved the conformational search as a hierarchical optimization problem (involving rigid-body poses, backbone flexibility, and side-chain flexibility). Second, we used continuum electrostatic calculations to include solvation effects in the binding free energy model. Last, we eliminated sloppy modes (directions in which the free energy is essentially constant) to improve the efficiency of the search. With these improvements, we produced correct predictions for 6 out of the 10 latest CAPRI targets, including 1 high, 3 medium, and 2 acceptable accuracy predictions. Compared to our previous performance in CAPRI, substantial improvements have been made for targets requiring homology modeling. © Proteins 2013;. © 2013 Wiley Periodicals, Inc.
Article
Full-text available
Here we report on the assessment results of the third experiment to evaluate the state-of-the-art in protein model refinement, where participants were invited to improve the accuracy of initial protein models for twenty-seven targets. Using an array of complementary evaluation measures, we find that five groups performed better than the naïve (null) method - a marked improvement over CASP9, although only three were significantly better. The leading groups also demonstrated the ability to consistently improve both backbone and side-chain positioning, while other groups reliably enhanced other aspects of protein physicality. The top-ranked group succeeded in improving the backbone in almost 90% of targets, suggesting a strategy that for the first time in CASP refinement is successful in a clear majority of cases. A number of issues remain unsolved: the majority of groups still fail to improve the quality of the starting models; even successful groups were only able to make modest improvements; and no prediction was more similar to the native structure than to the starting model. Successful refinement attempts also often go unrecognized, as suggested by the relatively larger improvements when predictions not submitted as model 1 are also considered. © Proteins 2013;. © 2013 Wiley Periodicals, Inc.
Article
Full-text available
Background: Prediction of protein tertiary and quaternary structures helps us to understand protein functionality. While tertiary structure prediction techniques have been much improved over the last two decades, quaternary structure (homo-oligomer) prediction has not been paid much attention to. Results: We show the results of the assessment of our simple auto server prediction and manual prediction of protein quaternary structure from its amino acid sequence based on templates. They were tested in the 9th Critical Assessment of Protein Structure Prediction (CASP9) experiment. CASP experiments are the only true blind test for protein tertiary and quatenary structure prediction from amino acid sequence alone and therefore they are the most severe tests in the field of protein stucture prediction. Our simple auto server prediction could generate successful models for 14 out of 58 targets. Human experts could generate successful models for 11 out of 16 targets and most of them were better than those by our auto server. Conclusions: The results show the efficiency of our template-based protein quaternary structure prediction approaches and provide useful information for improvement of the accuracy of template-based quaternary structure prediction.
Article
Full-text available
A large number of proteins function as homo-oligomers; therefore, predicting homo-oligomeric structure of proteins is of primary importance for understanding protein function at the molecular level. Here, we introduce a web server for prediction of protein homo-oligomer structure. The server takes a protein monomer structure as input and predicts its homo-oligomer structure from oligomer templates selected based on sequence and tertiary/quaternary structure similarity. Using protein model structures as input, the server shows clear improvement over the best methods of CASP9 in predicting oligomeric structures from amino acid sequences. Availability: http://galaxy.seoklab.org/gemini. Supplementary information: Supplementary data are available at Bioinformatics online.
Article
Full-text available
Protein–protein interactions are central to almost all biological functions, and the atomic details of such interactions can yield insights into the mechanisms that underlie these functions. We present a web server that wraps and extends the SwarmDock flexible protein–protein docking algorithm. After uploading PDB files of the binding partners, the server generates low energy conformations and returns a ranked list of clustered docking poses and their corresponding structures. The user can perform full global docking, or focus on particular residues that are implicated in binding. The server is validated in the CAPRI blind docking experiment, against the most current docking benchmark, and against the ClusPro docking server, the highest performing server currently available. Availability: The server is freely available and can be accessed at: http://bmm.cancerresearchuk.org/%7ESwarmDock/. Contact: Paul.Bates@cancer.org.uk Supplementary information: Supplementary data are available at Bioinformatics online.
Article
Full-text available
Interactions between proteins are orchestrated in a precise and time-dependent manner, underlying cellular function. The binding affinity, defined as the strength of these interactions, is translated into physico-chemical terms in the dissociation constant (K(d)), the latter being an experimental measure that determines whether an interaction will be formed in solution or not. Predicting binding affinity from structural models has been a matter of active research for more than 40 years because of its fundamental role in drug development. However, all available approaches are incapable of predicting the binding affinity of protein-protein complexes from coordinates alone. Here, we examine both theoretical and experimental limitations that complicate the derivation of structure-affinity relationships. Most work so far has concentrated on binary interactions. Systems of increased complexity are far from being understood. The main physico-chemical measure that relates to binding affinity is the buried surface area, but it does not hold for flexible complexes. For the latter, there must be a significant entropic contribution that will have to be approximated in the future. We foresee that any theoretical modelling of these interactions will have to follow an integrative approach considering the biology, chemistry and physics that underlie protein-protein recognition.
Article
Full-text available
Macromolecular protein complexes play important roles in a cell and their tertiary structure can help understand key biological processes of their functions. Multiple protein docking is a valuable computational tool for providing structure information of multimeric protein complexes. In a previous study we developed and implemented an algorithm for this purpose, named Multi-LZerD. This method represents a conformation of a multimeric protein complex as a graph, where nodes denote subunits and each edge connecting nodes denotes a pairwise docking conformation of the two subunits. Multi-LZerD employs a genetic algorithm to sample different topologies of the graph and pairwise transformations between subunits, seeking for the conformation of the optimal (lowest) energy. In this study we explore different configurations of the genetic algorithm, namely, the population size, whether to include a crossover operation, as well as the threshold for structural clustering, to find the optimal experimental setup. Multi-LZerD was executed to predict the structures of three multimeric protein complexes, using different population sizes, clustering thresholds, and configurations of mutation and crossover. We analyzed the impact of varying these parameters on the computational time and the prediction accuracy. Given that computational resources is a key for handling complexes with a large number of subunits and also for computing a large number of protein complexes in a genome-scale study, finding a proper setting for sampling the conformation space is of the utmost importance. Our results show that an excessive sampling of the conformational space by increasing the population size or by introducing the crossover operation is not necessary for improving accuracy for predicting structures of small complexes. The clustering is effective in reducing redundant pairwise predictions, which leads to successful identification of near-native conformations.
Article
Full-text available
The unique physicochemical properties of water make it the most important molecule for life. Water molecules have many roles, direct and indirect, related to both biological structure and function. This paper: 1) reviews tools for the prediction of water conservation in and around protein active sites, by empirical (knowledge-based) algorithms and by methods based on thermodynamics principles; 2) reviews principles and approaches to predict pKa for both protein residue ensembles and for ligands; and 3) discusses the HINT biomolecular interaction model and forcefield based on experimental measurements of LogPo/w, the 1-octanol/water partition coefficient, which implicitly incorporates all solution phenomena like these, and others like tautomerism and entropy. Lastly, it must be considered that the "real" biological environment is a continuum of nano-states and it may not be possible to represent it as a single discrete all-atom model.
Article
Full-text available
Traditional approaches to protein-protein docking sample the binding modes with no regard to similar experimentally determined structures (templates) of protein-protein complexes. Emerging template-based docking approaches utilize such similar complexes to determine the docking predictions. The docking problem assumes the knowledge of the participating proteins' structures. Thus, it provides the possibility of aligning the structures of the proteins and the template complexes. The progress in the development of template-based docking and the vast experience in template-based modeling of individual proteins show that, generally, such approaches are more reliable than the free modeling. The key aspect of this modeling paradigm is the availability of the templates. The current common perception is that due to the difficulties in experimental structure determination of protein-protein complexes, the pool of docking templates is insignificant, and thus a broad application of template-based docking is possible only at some future time. The results of our large scale, systematic study show that, surprisingly, in spite of the limited number of protein-protein complexes in the Protein Data Bank, docking templates can be found for complexes representing almost all the known protein-protein interactions, provided the components themselves have a known structure or can be homology-built. About one-third of the templates are of good quality when they are compared to experimental structures in test sets extracted from the Protein Data Bank and would be useful starting points in modeling the complexes. This finding dramatically expands our ability to model protein interactions, and has far-reaching implications for the protein docking field in general.
Article
Full-text available
There is a great interest in understanding and exploiting protein-protein associations as new routes for treating human disease. However, these associations are difficult to structurally characterize or model although the number of X-ray structures for protein-protein complexes is expanding. One feature of these complexes that has received little attention is the role of water molecules in the interfacial region. A data set of 4741 water molecules abstracted from 179 high-resolution (≤ 2.30 Å) X-ray crystal structures of protein-protein complexes was analyzed with a suite of modeling tools based on the HINT forcefield and hydrogen-bonding geometry. A metric termed Relevance was used to classify the general roles of the water molecules. The water molecules were found to be involved in: a) (bridging) interactions with both proteins (21%), b) favorable interactions with only one protein (53%), and c) no interactions with either protein (26%). This trend is shown to be independent of the crystallographic resolution. Interactions with residue backbones are consistent for all classes and account for 21.5% of all interactions. Interactions with polar residues are significantly more common for the first group and interactions with non-polar residues dominate the last group. Waters interacting with both proteins stabilize on average the proteins' interaction (-0.46 kcal mol(-1)), but the overall average contribution of a single water to the protein-protein interaction energy is unfavorable (+0.03 kcal mol(-1)). Analysis of the waters without favorable interactions with either protein suggests that this is a conserved phenomenon: 42% of these waters have SASA ≤ 10 Å(2) and are thus largely buried, and 69% of these are within predominantly hydrophobic environments or "hydrophobic bubbles". Such water molecules may have an important biological purpose in mediating protein-protein interactions.
Article
Full-text available
Rotamer libraries are used in protein structure determination, prediction, and design. The backbone-dependent rotamer library consists of rotamer frequencies, mean dihedral angles, and variances as a function of the backbone dihedral angles. Structure prediction and design methods that employ backbone flexibility would strongly benefit from smoothly varying probabilities and angles. A new version of the backbone-dependent rotamer library has been developed using adaptive kernel density estimates for the rotamer frequencies and adaptive kernel regression for the mean dihedral angles and variances. This formulation allows for evaluation of the rotamer probabilities, mean angles, and variances as a smooth and continuous function of phi and psi. Continuous probability density estimates for the nonrotameric degrees of freedom of amides, carboxylates, and aromatic side chains have been modeled as a function of the backbone dihedrals and rotamers of the remaining degrees of freedom. New backbone-dependent rotamer libraries at varying levels of smoothing are available from http://dunbrack.fccc.edu.
Article
Full-text available
The FALC-Loop web server provides an online interface for protein loop modeling by employing an ab initio loop modeling method called FALC (fragment assembly and analytical loop closure). The server may be used to construct loop regions in homology modeling, to refine unreliable loop regions in experimental structures or to model segments of designed sequences. The FALC method is computationally less expensive than typical ab initio methods because the conformational search space is effectively reduced by the use of fragments derived from a structure database. The analytical loop closure algorithm allows efficient search for loop conformations that fit into the protein framework starting from the fragment-assembled structures. The FALC method shows prediction accuracy comparable to other state-of-the-art loop modeling methods. Top-ranked model structures can be visualized on the web server, and an ensemble of loop structures can be downloaded for further analysis. The web server can be freely accessed at http://falc-loop.seoklab.org/.
Article
Full-text available
Here is presented an investigation of the use of normal modes in protein-protein docking, both in theory and in practice. Upper limits of the ability of normal modes to capture the unbound to bound conformational change are calculated on a large test set, with particular focus on the binding interface, the subset of residues from which the binding energy is calculated. Further, the SwarmDock algorithm is presented, to demonstrate that the modelling of conformational change as a linear combination of normal modes is an effective method of modelling flexibility in protein-protein docking.
Article
Full-text available
High resolution structures of antibody-antigen complexes are useful for analyzing the binding interface and to make rational choices for antibody engineering. When a crystallographic structure of a complex is unavailable, the structure must be predicted using computational tools. In this work, we illustrate a novel approach, named SnugDock, to predict high-resolution antibody-antigen complex structures by simultaneously structurally optimizing the antibody-antigen rigid-body positions, the relative orientation of the antibody light and heavy chains, and the conformations of the six complementarity determining region loops. This approach is especially useful when the crystal structure of the antibody is not available, requiring allowances for inaccuracies in an antibody homology model which would otherwise frustrate rigid-backbone docking predictions. Local docking using SnugDock with the lowest-energy RosettaAntibody homology model produced more accurate predictions than standard rigid-body docking. SnugDock can be combined with ensemble docking to mimic conformer selection and induced fit resulting in increased sampling of diverse antibody conformations. The combined algorithm produced four medium (Critical Assessment of PRediction of Interactions-CAPRI rating) and seven acceptable lowest-interface-energy predictions in a test set of fifteen complexes. Structural analysis shows that diverse paratope conformations are sampled, but docked paratope backbones are not necessarily closer to the crystal structure conformations than the starting homology models. The accuracy of SnugDock predictions suggests a new genre of general docking algorithms with flexible binding interfaces targeted towards making homology models useful for further high-resolution predictions.
Article
Full-text available
Noncovalent binding interactions between proteins are the central physicochemical phenomenon underlying biological signaling and functional control on the molecular level. Here, we perform an extensive structural analysis of a large set of bound and unbound ubiquitin conformers and study the level of residual induced fit after conformational selection in the binding process. We show that the region surrounding the binding site in ubiquitin undergoes conformational changes that are significantly more pronounced compared with the whole molecule on average. We demonstrate that these induced-fit structural adjustments are comparable in magnitude to conformational selection. Our final model of ubiquitin binding blends conformational selection with the subsequent induced fit and provides a quantitative measure of their respective contributions.
Article
Full-text available
Similarly to protein folding, the association of two proteins is driven by a free energy funnel, determined by favorable interactions in some neighborhood of the native state. We describe a docking method based on stochastic global minimization of funnel-shaped energy functions in the space of rigid body motions (SE(3)) while accounting for flexibility of the interface side chains. The method, called semi-definite programming-based underestimation (SDU), employs a general quadratic function to underestimate a set of local energy minima and uses the resulting underestimator to bias further sampling. While SDU effectively minimizes functions with funnel-shaped basins, its application to docking in the rotational and translational space SE(3) is not straightforward due to the geometry of that space. We introduce a strategy that uses separate independent variables for side-chain optimization, center-to-center distance of the two proteins, and five angular descriptors of the relative orientations of the molecules. The removal of the center-to-center distance turns out to vastly improve the efficiency of the search, because the five-dimensional space now exhibits a well-behaved energy surface suitable for underestimation. This algorithm explores the free energy surface spanned by encounter complexes that correspond to local free energy minima and shows similarity to the model of macromolecular association that proceeds through a series of collisions. Results for standard protein docking benchmarks establish that in this space the free energy landscape is a funnel in a reasonably broad neighborhood of the native state and that the SDU strategy can generate docking predictions with less than 5 A ligand interface C(alpha) root-mean-square deviation while achieving an approximately 20-fold efficiency gain compared to Monte Carlo methods.
Article
Full-text available
Here, we describe two freely available web servers for molecular docking. The PatchDock method performs structure prediction of protein–protein and protein–small molecule complexes. The SymmDock method predicts the structure of a homomultimer with cyclic symmetry given the structure of the monomeric unit. The inputs to the servers are either protein PDB codes or uploaded protein structures. The services are available at http://bioinfo3d.cs.tau.ac.il. The methods behind the servers are very efficient, allowing large-scale docking experiments.
Article
Maltose-binding protein is a periplasmic binding protein responsible for transport of maltooligosaccarides through the periplasmic space of gram negative bacteria, as a part of the ABC transport system. The molecular mechanisms of the initial ligand binding and induced large scale motion of the protein's domains still remain elusive. In this study we use a new docking protocol which combines a recently proposed explicit water placement algorithm based on the 3D-RISM-KH molecular theory of solvation and conventional docking software (AutoDock Vina) to explain the mechanisms of maltotriose binding to the apo-open state of maltose-binding protein. We confirm the predictions of previous NMR spectroscopic experiments on binding modes of the ligand. We provide the molecular details on the binding mode which was not previously observed in the X-ray experiments. We show that this mode which is defined by the fine balance between the protein-ligand direct interactions and solvation effects, can trigger the protein's domain motion resulting in the holo-closed structure of the maltotriose-maltose-binding protein in excellent agreement with the experimental data. We also discuss a role of water in blocking unfavorable binding sites and water-mediated interactions contributing to stability of observable binding modes of maltotriose.
Article
Protein-protein interactions lie at the heart of most cellular processes. Many experimental and computational studies aim to deepen our understanding of these interactions and improve our capacity to predict them. In this respect, the evolutionary perspective is most interesting, since the preservation of structure and function puts constraints on the evolution of proteins and their interactions. However, uncovering these constraints remains a challenge, and the description and detection of evolutionary signals in protein-protein interactions is currently a very active field of research. Here, we review recent works dissecting the mechanisms of protein-protein interaction evolution and exploring how to use evolutionary information to predict interactions, both at the global level of the interactome and at the detailed level of protein-protein interfaces. We first present to what extent protein-protein interactions are found to be conserved within interactomes and which properties can influence their conservation. We then discuss the evolutionary and co-evolutionary pressures applied on protein-protein interfaces. Finally, we describe how the computational prediction of interfaces can benefit from evolutionary inputs.
Conference Paper
Femtocell base stations are required to set proper downlink transmit power under various building environment. Conventional power setting techniques use a fixed power offset over received power level of the strongest macrocell base station to expand indoor femtocell coverage along with mitigating the interference leakage to the outdoors. However, the power offset has not been adequately optimized for various interference conditions, leading to degradation of macrocell or femtocell throughput. We propose an auto-tuning scheme of the power offset adaptive to the various interference conditions such as size of buildings where femtocell mobile stations exist, and distance to a street where macrocell mobile stations exist. The proposed scheme automatically tune the power offset so that the femtocell throughput can increase while maintaining the macrocell throughput based on macrocell mobile stations' interference detection reports and their totalization. According to the Long-Term Evolution system level simulations, the proposed scheme tuned the power offset to a proper level depending on various building conditions and can improve the throughput. If the power offset is commonly tuned among femtocells in macrocell, a newly deployed femtocell can employ a proper setting from the beginning of its operation.
Article
Protein interactions define the homeostatic state of the cell. Our ability to understand these interactions and their role in both health and disease is tied to our knowledge of the three-dimensional atomic structure of the interacting partners and their complexes. Despite advances in experimental structure determination methods, the majority of known protein interactions are still missing an atomic structure. High-resolution methods such as x-ray crystallography and nuclear magnetic resonance spectroscopy struggle with the high-throughput demand, while low-resolution techniques such as cryo-electron microscopy or small angle x-ray scattering provide too coarse data. Computational structure prediction of protein complexes, or docking, was first developed to complement experimental research and has since blossomed into an independent and lively research field. Its most successful products are hybrid approaches that combine powerful algorithms with experimental data from various sources to generate high-resolution models of protein complexes. This mini-review introduces the concept of docking and docking with the help of experimental data, compares and contrasts the available integrative docking methods, and provides a guide for the experimental researcher for what types of data and which particular software can be used to model a protein complex. This article is protected by copyright. All rights reserved.
Article
We report the first assessment of blind predictions of water positions at protein-protein interfaces, performed as part of the CAPRI (Critical Assessment of Predicted Interactions) community-wide experiment. Groups submitting docking predictions for the complex of the DNase domain of colicin E2 and Im2 immunity protein (CAPRI target 47), were invited to predict the positions of interfacial water molecules using the method of their choice. The predictions - 20 groups submitted a total of 195 models - were assessed by measuring the recall fraction of water-mediated protein contacts. Of the 176 high or medium quality docking models - a very good docking performance per se - only 44% had a recall fraction above 0.3, and a mere 6% above 0.5. The actual water positions were in general predicted to an accuracy level no better than 1.5 Å, and even in good models about half of the contacts represented false positives. This notwithstanding, three hotspot interface water positions were quite well predicted, and so was one of the water positions that is believed to stabilize the loop that confers specificity in these complexes. Overall the best interface water predictions was achieved by groups that also produced high quality docking models, indicating that accurate modelling of the protein portion is a determinant factor. The use of established molecular mechanics force fields, coupled to sampling and optimization procedures also seemed to confer an advantage. Insights gained from this analysis should help improve the prediction of protein-water interactions and their role in stabilizing protein complexes. © Proteins 2013;. © 2013 Wiley Periodicals, Inc.
Article
We present the 5th evaluation of docking and related scoring methods used in the community-wide experiment on the Critical Assessment of Predicted Interactions (CAPRI). The evaluation examined predictions submitted for a total of 15 targets in eight CAPRI rounds held during the years 2010-2012. The targets represented one the most diverse set tackled by the CAPRI community so far. They included only 10 'classical' docking and scoring problems. In one of the classical targets the new challenge was to predict the position of water molecules in the protein-protein interface. The remaining 5 targets represented other new challenges that involved estimating the relative binding affinity and the effect of point mutations on the stability of designed and natural protein-protein complexes. Although the 10 'classical' CAPRI targets included two difficult multi-component systems, and a protein-oligosaccharide complex with which CAPRI participants had little experience, this evaluation indicates that the performance of docking and scoring methods has remained quite robust. More remarkably, we find that automatic docking servers exhibit a significantly improved performance, with some servers now performing on par with predictions done by humans. The performance of CAPRI participants in the new challenges, briefly reviewed here, was mediocre overall, but some groups did relatively well and their approaches suggested ways of improving methods for designing binders and for estimating the free energies of protein assemblies, which should impact the field of protein modeling and design as a whole. © Proteins 2013;. © 2013 Wiley Periodicals, Inc.
Article
In this article, an enhanced version of GalaxyDock protein-ligand docking program is introduced. GalaxyDock performs conformational space annealing (CSA) global optimization to find the optimal binding pose of a ligand both in the rigid-receptor mode and the flexible-receptor mode. Binding pose prediction has been improved compared to the earlier version by the efficient generation of high-quality initial conformations for CSA using a predocking method based on a beta-complex derived from the Voronoi diagram of receptor atoms. Binding affinity prediction has also been enhanced by using the optimal combination of energy components, while taking into consideration the energy of the unbound ligand state. The new version has been tested in terms of binding mode prediction, binding affinity prediction, and virtual screening on several benchmark sets, showing improved performance over the previous version and AutoDock, on which the GalaxyDock energy function is based. GalaxyDock2 also performs better than or comparable to other state-of-the-art docking programs. GalaxyDock2 is freely available at http://galaxy.seoklab.org/softwares/galaxydock.html. © 2013 Wiley Periodicals, Inc.
Article
Oligomeric proteins are more abundant in nature than monomeric proteins, and involved in all biological processes. In the absence of an experimental structure, their subunits can be modeled from their sequence like monomeric proteins, but reliable procedures to build the oligomeric assembly are scarce. Template-based methods, which start from known protein structures, are commonly applied to model subunits. We present a method to model homodimers that relies on a structural alignment of the subunits, and test it on a set of 511 target structures recently released by the Protein Data Bank, taking as templates the earlier released structures of 3108 homodimeric proteins (H-set), and 2691 monomeric proteins that form dimer-like assemblies in crystals (M-set). The structural alignment identifies a H-set template for 97% of the targets, and in half of the cases, it yields a correct model of the dimer geometry and residue-residue contacts in the target. It also identifies a M-set template for most of the targets, and some of the crystal dimers are very similar to the target homodimers. The procedure efficiently detects homology at low levels of sequence identities, and points to erroneous quaternary structures in the Protein Data Bank. The high coverage of the target set suggests that the content of the Protein Data Bank already approaches the structural diversity of protein assemblies in nature, and that template-based methods should become the choice method for modeling oligomeric as well as monomeric proteins.
Article
We derive, implement, and apply equilibrium solvation-site analysis for biomolecules. Our method utilizes 3D-RISM calculations to quickly obtain equilibrium solvent distributions without either necessity of simulation or limits of solvent sampling. Our analysis of these distributions extracts highest likelihood poses of solvent as well as localized entropies, enthalpies and solvation free energies. We demonstrate our method on a structure of HIV-1 protease where excellent structural and thermodynamic data is available for comparison. Our results, obtained within minutes, show systematic agreement with available experimental data. Further, our results are in good agreement with established simulation-based solvent analysis methods. This method can be used not only for visual analysis of active site solvation but also for virtual screening methods and experimental refinement.
Article
A molecular dynamics (MD) simulation based protocol for structure refinement of template-based model predictions is described. The protocol involves the application of restraints, ensemble averaging of selected subsets, interpolation between initial and refined structures, and assessment of refinement success. It is found that sub-microsecond MD-based sampling when combined with ensemble averaging can produce moderate but consistent refinement for most systems in the CASP targets considered here.
Article
The nitrogen-related phosphoenolpyruvatephosphotransferasesystem (PTSNtr ) is involved in controlling ammonia assimilation and nitrogen fixation. The additional role of PTSNtr as a regulatorylink between nitrogen and carbon utilization in Escherichia coliis assumed to be closely related to molecular functions of IIANtr in potassium homeostasis. We have determined the crystal structure of IIANtr from Burkholderiapseudomallei (BpIIANtr ) which is a causative agent of melioidosis. The crystal structure of dimericBpIIANtr determined at 3.0 Årevealed that its active sites are mutually blocked. Thisdimeric state is stabilized by charge and weak hydrophobic interactions. Overall monomeric structure and the active site residues, Arg51 and His67, of BpIIANtr are well conserved with those of IIANtr enzymes from E. coliand Neisseria meningitides. Interestingly, His113 of BpIIANtr , which corresponds to a key residue in another phosphoryl group relay in the mannitol-specific enzyme EIIA family (EIIAMtl ), is located away from the active site due to the loop connecting β5 and α3. Combined with other differences in molecular surface properties, these structural signatures distinguish the IIANtr family from the EIIAMtl family. Since there is no gene for NPrin the chromosome of B. pseudomallei, modeling and docking studies of the BpIIANtr -BpHPr complex has been performed to support the proposal on the NPr-like activity of BpHPr. A potential dual role of BpHPr as a non-specificphosphocarrier protein interacting with both sugar EIIAs and IIANtr in B. pseudomalleihas been discussed. Proteins 2013. © 2013 Wiley Periodicals, Inc.
Article
Network-centered approaches are increasingly used to understand the fundamentals of biology. However, the molecular details contained in the interaction networks, often necessary to understand cellular processes, are very limited, and the experimental difficulties surrounding the determination of protein complex structures make computational modeling techniques paramount. Here we present Interactome3D, a resource for the structural annotation and modeling of protein-protein interactions. Through the integration of interaction data from the main pathway repositories, we provide structural details at atomic resolution for over 12,000 protein-protein interactions in eight model organisms. Unlike static databases, Interactome3D also allows biologists to upload newly discovered interactions and pathways in any species, select the best combination of structural templates and build three-dimensional models in a fully automated manner. Finally, we illustrate the value of Interactome3D through the structural annotation of the complement cascade pathway, rationalizing a potential common mechanism of action suggested for several disease-causing mutations.
Article
Structural characterization of protein-protein interactions across the broad spectrum of scales is key to our understanding of life at the molecular level. Low-resolution approach to protein interactions is needed for modeling large interaction networks, given the significant level of uncertainties in large biomolecular systems and the high-throughput nature of the task. Since only a fraction of protein structures in interactome are determined experimentally, protein docking approaches are increasingly focusing on modeled proteins. Current rapid advancement of template-based modeling of protein-protein complexes is following a long standing trend in structure prediction of individual proteins. Protein-protein templates are already available for almost all interactions of structurally characterized proteins, and about one third of such templates are likely correct.
Article
HADDOCK is one of the few docking programs that can explicitly account for water molecules in the docking process. Its solvated docking protocol starts from hydrated molecules and a fraction of the resulting interfacial waters is subsequently removed in a biased Monte Carlo procedure based on water-mediated contact probabilities. The latter were derived from an analysis of water contact frequencies from high-resolution crystal structures. Here, we introduce a simple water mediated amino acid - amino acid contact probability scale derived from the Kyte-Doolittle hydrophobicity scale and assess its performance on the largest high-resolution dataset developed to date for solvated docking. Both scales yield high-quality docking results. The novel and simple hydrophobicity scale, which should reflect better the physico-chemical principles underlying contact propensities, leads to a performance improvement of around 10% in ranking, cluster quality and water recovery at the interface compared to the statistics-based original solvated docking protocol.
Article
Modeling conformational changes in protein docking calculations is challenging. To make the calculations tractable, most current docking algorithms typically treat proteins as rigid bodies and use soft scoring functions that implicitly accommodate some degree of flexibility. Alternatively, ensembles of structures generated from molecular dynamics (MD) may be cross-docked. However, such combinatorial approaches can produce many thousands or even millions of docking poses, and require fast and sensitive scoring functions to distinguish them. Here, we present a novel approach called "EigenHex," which is based on normal mode analyses (NMAs) of a simple elastic network model of protein flexibility. We initially assume that the proteins to be docked are rigid, and we begin by performing conventional soft docking using the Hex polar Fourier correlation algorithm. We then apply a pose-dependent NMA to each of the top 1000 rigid body docking solutions, and we sample and re-score multiple perturbed docking conformations generated from linear combinations of up to 20 eigenvectors using a multi-threaded particle swarm optimization algorithm. When applied to the 63 "rigid body" targets of the Protein Docking Benchmark version 2.0, our results show that sampling and re-scoring from just one to three eigenvectors gives a modest but consistent improvement for these targets. Thus, pose-dependent NMA avoids the need to sample multiple eigenvectors and it offers a promising alternative to combinatorial cross-docking.
Article
Treating flexibility in molecular docking is a major challenge in cell biology research. Here we describe the background and the principles of existing flexible protein-protein docking methods, focusing on the algorithms and their rational. We describe how protein flexibility is treated in different stages of the docking process: in the preprocessing stage, rigid and flexible parts are identified and their possible conformations are modeled. This preprocessing provides information for the subsequent docking and refinement stages. In the docking stage, an ensemble of pre-generated conformations or the identified rigid domains may be docked separately. In the refinement stage, small-scale movements of the backbone and side-chains are modeled and the binding orientation is improved by rigid-body adjustments. For clarity of presentation, we divide the different methods into categories. This should allow the reader to focus on the most suitable method for a particular docking problem.
Article
Understanding protein interactions has broad implications for the mechanism of recognition, protein design, and assigning putative functions to uncharacterized proteins. Studying protein flexibility is a key component in the challenge of describing protein interactions. In this work, we characterize the observed conformational change for a set of 20 proteins that undergo large conformational change upon association (>2 Å Cα RMSD) and ask what features of the motion are successfully reproduced by the normal modes of the system. We demonstrate that normal modes can be used to identify mobile regions and, in some proteins, to reproduce the direction of conformational change. In 35% of the proteins studied, a single low-frequency normal mode was found that describes well the direction of the observed conformational change. Finally, we find that for a set of 134 proteins from a docking benchmark that the characteristic frequencies of normal modes can be used to predict reliably the extent of observed conformational change. We discuss the implications of the results for the mechanics of protein recognition. • conformational selection • elastic network model • induced fit • protein interactions • protein recognition
Article
Accommodating backbone flexibility continues to be the most difficult challenge in computational docking of protein-protein complexes. Towards that end, we simulate four distinct biophysical models of protein binding in RosettaDock, a multiscale Monte-Carlo-based algorithm that uses a quasi-kinetic search process to emulate the diffusional encounter of two proteins and to identify low-energy complexes. The four binding models are as follows: (1) key-lock (KL) model, using rigid-backbone docking; (2) conformer selection (CS) model, using a novel ensemble docking algorithm; (3) induced fit (IF) model, using energy-gradient-based backbone minimization; and (4) combined conformer selection/induced fit (CS/IF) model. Backbone flexibility was limited to the smaller partner of the complex, structural ensembles were generated using Rosetta refinement methods, and docking consisted of local perturbations around the complexed conformation using unbound component crystal structures for a set of 21 target complexes. The lowest-energy structure contained >30% of the native residue-residue contacts for 9, 13, 13, and 14 targets for KL, CS, IF, and CS/IF docking, respectively. When applied to 15 targets using nuclear magnetic resonance ensembles of the smaller protein, the lowest-energy structure recovered at least 30% native residue contacts in 3, 8, 4, and 8 targets for KL, CS, IF, and CS/IF docking, respectively. CS/IF docking of the nuclear magnetic resonance ensemble performed equally well or better than KL docking with the unbound crystal structure in 10 of 15 cases. The marked success of CS and CS/IF docking shows that ensemble docking can be a versatile and effective method for accommodating conformational plasticity in docking and serves as a demonstration for the CS theory--that binding-competent conformers exist in the unbound ensemble and can be selected based on their favorable binding energies.
Article
Molecular Dynamics (MD) simulations have been performed on a set of rigid-body docking poses, carried out over 25 protein-protein complexes. The results show that fully flexible relaxation increases the fraction of native contacts (NC) by up to 70% for certain docking poses. The largest increase in the fraction of NC is observed for docking poses where anchor residues are able to sample their bound conformation. For each MD simulation, structural snap-shots were clustered and the centre of each cluster used as the MD-relaxed docking pose. A comparison between two energy-based scoring schemes, the first calculated for the MD-relaxed poses, the second for energy minimized poses, shows that the former are better in ranking complexes with large hydrophobic interfaces. Furthermore, complexes with large interfaces are generally ranked well, regardless of the type of relaxation method chosen, whereas complexes with small hydrophobic interfaces remain difficult to rank. In general, the results indicate that current force-fields are able to correctly describe direct intermolecular interactions between receptor and ligand molecules. However, these force-fields still fail in cases where protein-protein complexes are stabilized by subtle energy contributions.
Article
Binding-induced backbone and large-scale conformational changes represent one of the major challenges in the modeling of biomolecular complexes by docking. To address this challenge, we have developed a flexible multidomain docking protocol that follows a "divide-and-conquer" approach to model both large-scale domain motions and small- to medium-scale interfacial rearrangements: the flexible binding partner is treated as an assembly of subparts/domains that are docked simultaneously making use of HADDOCK's multidomain docking ability. For this, the flexible molecules are cut at hinge regions predicted using an elastic network model. The performance of this approach is demonstrated on a benchmark covering an unprecedented range of conformational changes of 1.5 to 19.5 Å. We show from a statistical survey of known complexes that the cumulative sum of eigenvalues obtained from the elastic network has some predictive power to indicate the extent of the conformational change to be expected.
Article
We updated our protein-protein docking benchmark to include complexes that became available since our previous release. As before, we only considered high-resolution complex structures that are nonredundant at the family-family pair level, for which the X-ray or NMR unbound structures of the constituent proteins are also available. Benchmark 4.0 adds 52 new complexes to the 124 cases of Benchmark 3.0, representing an increase of 42%. Thus, benchmark 4.0 provides 176 unbound-unbound cases that can be used for protein-protein docking method development and assessment. Seventeen of the newly added cases are enzyme-inhibitor complexes, and we found no new antigen-antibody complexes. Classifying the new cases according to expected difficulty for protein-protein docking algorithms gives 33 rigid body cases, 11 cases of medium difficulty, and 8 cases that are difficult. Benchmark 4.0 listings and processed structure files are publicly accessible at http://zlab.umassmed.edu/benchmark/.
Article
Single molecule and NMR measurements of protein dynamics increasingly uncover the complexity of binding scenarios. Here, we describe an extended conformational selection model that embraces a repertoire of selection and adjustment processes. Induced fit can be viewed as a subset of this repertoire, whose contribution is affected by the bond types stabilizing the interaction and the differences between the interacting partners. We argue that protein segments whose dynamics are distinct from the rest of the protein ('discrete breathers') can govern conformational transitions and allosteric propagation that accompany binding processes and, as such, might be more sensitive to mutational events. Additionally, we highlight the dynamic complexity of binding scenarios as they relate to events such as aggregation and signalling, and the crowded cellular environment.
Article
Upon binding, proteins undergo conformational changes. These changes often prevent rigid-body docking methods from predicting the 3D structure of a complex from the unbound conformations of its proteins. Handling protein backbone flexibility is a major challenge for docking methodologies, as backbone flexibility adds a huge number of degrees of freedom to the search space, and therefore considerably increases the running time of docking algorithms. Normal mode analysis permits description of protein flexibility as a linear combination of discrete movements (modes). Low-frequency modes usually describe the large-scale conformational changes of the protein. Therefore, many docking methods model backbone flexibility by using only few modes, which have the lowest frequencies. However, studies show that due to molecular interactions, many proteins also undergo local and small-scale conformational changes, which are described by high-frequency normal modes. Here we present a new method, FiberDock, for docking refinement which models backbone flexibility by an unlimited number of normal modes. The method iteratively minimizes the structure of the flexible protein along the most relevant modes. The relevance of a mode is calculated according to the correlation between the chemical forces, applied on each atom, and the translation vector of each atom, according to the normal mode. The results show that the method successfully models backbone movements that occur during molecular interactions and considerably improves the accuracy and the ranking of rigid-docking models of protein-protein complexes. A web server for the FiberDock method is available at: http://bioinfo3d.cs.tau.ac.il/FiberDock.
Article
The association of two biological macromolecules is a fundamental biological phenomenon and an unsolved theoretical problem. Docking methods for ab initio prediction of association of two independently determined protein structures usually fail when they are applied to a large set of complexes, mostly because of inaccuracies in the scoring function and/or difficulties on simulating the rearrangement of the interface residues on binding. In this work we present an efficient pseudo-Brownian rigid-body docking procedure followed by Biased Probability Monte Carlo Minimization of the ligand interacting side-chains. The use of a soft interaction energy function precalculated on a grid, instead of the explicit energy, drastically increased the speed of the procedure. The method was tested on a benchmark of 24 protein-protein complexes in which the three-dimensional structures of their subunits (bound and free) were available. The rank of the near-native conformation in a list of candidate docking solutions was <20 in 85% of complexes with no major backbone motion on binding. Among them, as many as 7 out of 11 (64%) protease-inhibitor complexes can be successfully predicted as the highest rank conformations. The presented method can be further refined to include the binding site predictions and applied to the structures generated by the structural proteomics projects. All scripts are available on the Web.
Article
The structure determination of protein-protein complexes is a rather tedious and lengthy process, by both NMR and X-ray crystallography. Several methods based on docking to study protein complexes have also been well developed over the past few years. Most of these approaches are not driven by experimental data but are based on a combination of energetics and shape complementarity. Here, we present an approach called HADDOCK (High Ambiguity Driven protein-protein Docking) that makes use of biochemical and/or biophysical interaction data such as chemical shift perturbation data resulting from NMR titration experiments or mutagenesis data. This information is introduced as Ambiguous Interaction Restraints (AIRs) to drive the docking process. An AIR is defined as an ambiguous distance between all residues shown to be involved in the interaction. The accuracy of our approach is demonstrated with three molecular complexes. For two of these complexes, for which both the complex and the free protein structures have been solved, NMR titration data were available. Mutagenesis data were used in the last example. In all cases, the best structures generated by HADDOCK, that is, the structures with the lowest intermolecular energies, were the closest to the published structure of the respective complexes (within 2.0 A backbone RMSD).
Article
A protein-protein docking approach has been developed based on a reduced protein representation with up to three pseudo atoms per amino acid residue. Docking is performed by energy minimization in rotational and translational degrees of freedom. The reduced protein representation allows an efficient search for docking minima on the protein surfaces within. During docking, an effective energy function between pseudo atoms has been used based on amino acid size and physico-chemical character. Energy minimization of protein test complexes in the reduced representation results in geometries close to experiment with backbone root mean square deviations (RMSDs) of approximately 1 to 3 A for the mobile protein partner from the experimental geometry. For most test cases, the energy-minimized experimental structure scores among the top five energy minima in systematic docking studies when using both partners in their bound conformations. To account for side-chain conformational changes in case of using unbound protein conformations, a multicopy approach has been used to select the most favorable side-chain conformation during the docking process. The multicopy approach significantly improves the docking performance, using unbound (apo) binding partners without a significant increase in computer time. For most docking test systems using unbound partners, and without accounting for any information about the known binding geometry, a solution within approximately 2 to 3.5 A RMSD of the full mobile partner from the experimental geometry was found among the 40 top-scoring complexes. The approach could be extended to include protein loop flexibility, and might also be useful for docking of modeled protein structures.
Article
The development of scoring functions is of great importance to protein docking. Here we present a new scoring function for the initial stage of unbound docking. It combines our recently developed pairwise shape complementarity with desolvation and electrostatics. We compare this scoring function with three other functions on a large benchmark of 49 nonredundant test cases and show its superior performance, especially for the antibody-antigen category of test cases. For 44 test cases (90% of the benchmark), we can retain at least one near-native structure within the top 2000 predictions at the 6 degrees rotational sampling density, with an average of 52 near-native structures per test case. The remaining five difficult test cases can be explained by a combination of poor binding affinity, large backbone conformational changes, and our algorithm's strong tendency for identifying large concave binding pockets. All four scoring functions have been integrated into our Fast Fourier Transform based docking algorithm ZDOCK, which is freely available to academic users at http://zlab.bu.edu/~ rong/dock.
Article
Water-mediated hydrogen bonds play critical roles at protein-protein and protein-nucleic acid interfaces, and the interactions formed by discrete water molecules cannot be captured using continuum solvent models. We describe a simple model for the energetics of water-mediated hydrogen bonds, and show that, together with knowledge of the positions of buried water molecules observed in X-ray crystal structures, the model improves the prediction of free-energy changes upon mutation at protein-protein interfaces, and the recovery of native amino acid sequences in protein interface design calculations. We then describe a "solvated rotamer" approach to efficiently predict the positions of water molecules, at protein-protein interfaces and in monomeric proteins, that is compatible with widely used rotamer-based side-chain packing and protein design algorithms. Finally, we examine the extent to which the predicted water molecules can be used to improve prediction of amino acid identities and protein-protein interface stability, and discuss avenues for overcoming current limitations of the approach.
Article
The popular docking programs AutoDock, FlexX, and GOLD were used to predict binding modes of ligands in crystallographic complexes including X-ray water molecules or computationally predicted water molecules. Isoenzymes of two different enzyme systems were used, namely cytochromes P450 (n = 19) and thymidine kinases (n = 19) and three different "water" scenarios: i.e., docking (i) into water-free active sites, (ii) into active sites containing crystallographic water molecules, and (iii) into active sites containing water molecules predicted by a novel approach based on the program GRID. Docking accuracies were determined in terms of the root-mean-square deviation (RMSD) accuracy and, newly defined, in terms of the ligand catalytic site prediction (CSP) accuracy. Consideration of both X-ray and predicted water molecules and the subsequent pooling and rescoring of all solutions (generated by all three docking programs) with the SCORE scoring function significantly improved the quality of prediction of the binding modes both in terms of RMSD and CSP accuracy.
Article
We apply conformational space annealing (CSA), an efficient global optimization method, to the study of protein-protein interaction. The CSA is incorporated into the Tinker molecular modeling package along with a B-spline method for CAPRI Round 5 experiments. We have used an energy function for the protein-protein interaction that consists of electrostatic interaction, van der Waals interaction, and solvation energy terms represented by the occupancy desolvation method. The parameters of the AMBER94 all-atom empirical force field are used. Each energy term is calculated by precalculated grid potentials and B-spline method approximation. The ligand protein is placed inside a sphere of 50 A radius centered at an appropriate location, and the CSA rigid docking studies are carried out to find stable complexes. Up to 10 complexes are selected using the K-mean clustering method and biological information when available. These complexes are energy-minimized for further refinement by considering the flexibility of interacting proteins. The results show that the CSA method has a potential for the study of protein-protein interaction.
Article
Although reliable docking can now be achieved for systems that do not undergo important induced conformational change upon association, the presence of flexible surface loops, which must adapt to the steric and electrostatic properties of a partner, generally presents a major obstacle. We report here the first docking method that allows large loop movements during a systematic exploration of the possible arrangements of the two partners in terms of position and rotation. Our strategy consists in taking into account an ensemble of possible loop conformations by a multi-copy representation within a reduced protein model. The docking process starts from regularly distributed positions and orientations of the ligand around the whole receptor. Each starting configuration is submitted to energy minimization during which the best-fitting loop conformation is selected based on the mean-field theory. Trials were carried out on proteins with significant differences in the main-chain conformation of the binding loop between isolated form and complexed form, which were docked to their partner considered in their bound form. The method is able to predict complexes very close to the crystal complex both in terms of relative position of the two partners and of the geometry of the flexible loop. We also show that introducing loop flexibility on the isolated protein form during systematic docking largely improves the predictions of relative position of the partners in comparison with rigid-body docking.