Article

Development of a New Benchmark for Assessing the Scoring Functions Applicable to Protein–Protein Interactions

Taylor & Francis
Future Medicinal Chemistry
Authors:
  • Shanghai Institue of Organic Chemistry, Chinese Academy of Sciences
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Aim: Scoring functions are important component of protein-protein docking methods. They need to be evaluated on high-quality benchmarks to reveal their strengths and weaknesses. Evaluation results obtained on such benchmarks can provide valuable guidance for developing more advanced scoring functions. Methodology & results: In our comparative assessment of scoring functions for protein-protein interactions benchmark, the performance of a scoring function was characterized by 'docking power' and 'scoring power'. A high-quality dataset of 273 protein-protein complexes was compiled and employed in both tests. Four scoring functions, including FASTCONTACT, ZRANK, dDFIRE and ATTRACT were tested as demonstration. ZRANK and ATTRACT exhibited encouraging performance in the docking power test. However, all four scoring functions failed badly in the scoring power test. Conclusion: Our comparative assessment of scoring functions for protein-protein interaction benchmark is created especially for assessing the scoring functions applicable to protein-protein interactions. It is different from other benchmarks for assessing protein-protein docking methods. Our benchmark is available to the public at www.pdbbind-cn.org/download/CASF-PPI/ .

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... This is explored in a recent review. 79 Furthermore, Han et al. 80 created a new benchmark, named CASF-PPI, specifically for assessing the SFs applicable to protein-protein docking tasks. 80 A high-quality dataset of 273 protein-protein complexes was compiled and employed in both tests. ...
... 79 Furthermore, Han et al. 80 created a new benchmark, named CASF-PPI, specifically for assessing the SFs applicable to protein-protein docking tasks. 80 A high-quality dataset of 273 protein-protein complexes was compiled and employed in both tests. This is based on a larger, nonredundant set of protein-protein complexes with carefully examined 3D structures and experimental binding data. ...
Article
Full-text available
Molecular docking can be used to predict how strongly small‐molecule binders and their chemical derivatives bind to a macromolecular target using its available three‐dimensional structures. Scoring functions (SFs) are employed to rank these molecules by their predicted binding affinity (potency). A classical SF assumes a predetermined theory‐inspired functional form for the relationship between the features characterizing the structure of the protein–ligand complex and its predicted binding affinity (this relationship is almost always assumed to be linear). Recent years have seen the prosperity of machine‐learning SFs, which are fast regression models built instead with contemporary supervised learning algorithms. In this review, we analyzed machine‐learning SFs for drug lead optimization in the 2015–2019 period. The performance gap between classical and machine‐learning SFs was large and has now broadened owing to methodological improvements and the availability of more training data. Against the expectations of many experts, SFs employing deep learning techniques were not always more predictive than those based on more established machine learning techniques and, when they were, the performance gain was small. More codes and webservers are available and ready to be applied to prospective structure‐based drug lead optimization studies. These have exhibited excellent predictive accuracy in compelling retrospective tests, outperforming in some cases much more computationally demanding molecular simulation‐based methods. A discussion of future work completes this review. This article is categorized under: Computer and Information Science > Chemoinformatics
... In 2018 Han et al. published a benchmark data set of binary protein-protein complexes with known (experimentally determined) structures and binding affinities. They used this set to test four different scoring functions for estimating protein-protein interactions (PPI) [44]. In the best case, using the ATTRACT scoring function, the authors found a success rate of 0.78 in identifying the correct binding pose out of a set of decoys. ...
Article
Full-text available
Proteolysis targeting chimeras represent a class of drug molecules with a number of attractive properties, most notably a potential to work for targets that, so far, have been in-accessible for conventional small molecule inhibitors. Due to their different mechanism of action, and physico-chemical properties, many of the methods that have been designed and applied for computer aided design of traditional small molecule drugs are not applicable for proteolysis targeting chimeras. Here we review recent developments in this field focusing on three aspects: de-novo linker-design, estimation of absorption for beyond-rule-of-5 compounds, and the generation and ranking of ternary complex structures. In spite of this field still being young, we find that a good number of models and algorithms are available, with the potential to assist the design of such compounds in-silico, and accelerate applied pharmaceutical research.
... In living cells, only a few proteins perform their biological functions independently, and the vast majority (more than 80%) of proteins function through interacting with other molecules (Keskin et al., 2016;Wang et al., 2018b). It is estimated that there are approximately 130,000 to 650,000 protein-protein interactions (PPIs) in the human interactome (Venkatesan et al., 2009;Sheng et al., 2015;Tortorella et al., 2016), and targeting protein-protein interactions (PPIs) with small druglike molecules (Sheng et al., 2015;Shin et al., 2017;Han et al., 2018) become one of the most promising methods in modern drug discovery Tang et al., 2019a;Tang et al., 2019b). If drugs could strengthen the PPI interaction or damage it, the function of PPI will be inevitably influenced. ...
Article
Full-text available
Modulating protein–protein interactions (PPIs) with small drug-like molecules targeting it exhibits great promise in modern drug discovery. G protein-coupled receptors (GPCRs) are the largest family of targeted proteins and could form dimers in living biological cells through PPIs. However, compared to drug development of the orthosteric site, there has been lack of investigations on the druggability of the PPI interface for GPCRs and its functional implication on experiments. Thus, in order to address these issues, we constructed a novel computational strategy, which involved in molecular dynamics simulation, virtual screening and protein structure network (PSN), to study one representative GPCR homodimer (CXCR4). One druggable pocket was identified in the PPI interface and one small molecule targeting it was screened, which could strengthen PPI mainly through hydrophobic interaction between the benzene rings of the PPI molecule and TM4 of the receptor. The PSN results further reveals that the PPI molecule could increase the number of the allosteric regulation pathways between the druggable pocket of the dimer interface to the orthostatic site for the subunit A but only play minor role for the other subunit B, leading to the asymmetric change in the volume of the binding pockets for the two subunits (increase for the subunit A and minor change for the subunit B). Consequently, the screening performance of the subunit A to the antagonists is enhanced while the subunit B is unchanged nearly, implying that the PPI molecule may be beneficial to enhance the drug efficacies of the antagonists. In addition, one main regulation pathway with the highest frequency was identified for the subunit A, which consists of Trp1955.34–Tyr190ECL2–Val1965.35–Gln2005.39–Asp2626.58–Cys28N-term, revealing their importance in the allosteric regulation from the PPI molecule. The observations from the work could provide valuable information for the development of the PPI drug-like molecule for GPCRs.
... In the existing technologies, continuous improvements in proteomics technology have led to an increase in protein behavior information in cellular processes. The massive accumulation of proteomics data can be used to compare changes in prostate cells under normal and pathological conditions to guide the treatment and prognosis of PCa [3]. In general, proteomics research strategies aim to compare the proteomic characteristics of normal and abnormal states, and finally screen out a number of proteins with differential expression levels. ...
Article
Full-text available
In order to deeply explore the interaction between prostate cancer (PCa)-related proteins and to screen out effective targets for clinical practice, data mining of PCa proteomics literature is conducted, 41 differentially expressed seed proteins are identified, and a protein interaction network is constructed. The extended network consists of a mega network and three separate small parts, which are used to find key nodes and build a backbone network through connectivity screening. Topological analysis of these networks reveals that solute carrier family 2 (glucose transporter) member 4 (SLC2A4) and tubulin β-2C (TUBB2C) are centrally located in the protein interaction network. In addition, by using the module analysis, the dense connection area is found. Functional annotations indicate that the biological processes of Ras protein signaling, mitogen-activated protein kinase (MAPK), and neurotrophin and gonadotropin-releasing hormone (GnRH) signaling pathways play important roles in the pathogenesis of PCa. Therefore, further studies of SLC2A4 and TUBB2C proteins, and these biological processes and pathways may provide potential targets for the diagnosis and treatment of PCa.
... Curating high-quality compilations of data often requires tedious manual work, digging into original literature and being able to judge potential sources of errors which may have been overlooked by the authors themselves. R Wang and co-workers contributed a new benchmark set to evaluate scoring functions for docking, comprising 273 protein-protein complexes [9]. Since modulation of protein-protein interaction is an emerging field in drug discovery, this is a particularly valuable addition for further theoretical development. ...
Article
Understanding the thermodynamic signature of protein–peptide binding events is a major challenge in computational chemistry. The complexity generated by both components possessing many degrees of freedom poses a significant issue for methods that attempt to directly compute the enthalpic contribution to binding. Indeed, the prevailing assumption has been that the errors associated with such approaches would be too large for them to be meaningful. Nevertheless, we currently have no indication of how well the present methods would perform in terms of predicting the enthalpy of binding for protein–peptide complexes. To that end, we carefully assembled and curated a set of 11 protein–peptide complexes where there is structural and isothermal titration calorimetry data available and then computed the absolute enthalpy of binding. The initial “out of the box” calculations were, as expected, very modest in terms of agreement with the experiment. However, careful inspection of the outliers allows for the identification of key sampling problems such as distinct conformations of peptide termini not being sampled or suboptimal cofactor parameters. Additional simulations guided by these aspects can lead to a respectable correlation with isothermal titration calorimetry (ITC) experiments (R² of 0.88 and an RMSE of 1.48 kcal/mol overall). Although one cannot know prospectively whether computed ITC values will be correct or not, this work shows that if experimental ITC data are available, then this in conjunction with computed ITC, can be used as a tool to know if the ensemble being simulated is representative of the true ensemble or not. That is important for allowing the correct interpretation of the detailed dynamics of the system with respect to the measured enthalpy. The results also suggest that computational calorimetry is becoming increasingly feasible. We provide the data set as a resource for the community, which could be used as a benchmark to help further progress in this area.
Article
Protein‐protein interactions (PPIs) are ubiquitous and functionally of great importance in biological systems. Hence, the accurate prediction of PPIs by protein‐protein docking and scoring tools is highly desirable in order to characterize their structure and biological function. Ab initio docking protocols are divided into the sampling of docking poses to produce at least one near‐native structure, then to evaluate the vast candidate structures by scoring. Concurrent development in both sampling and scoring is crucial for the deployment of protein‐protein docking software. In the present work, we apply a machine learning model on pairwise potentials to refine the task of protein quaternary structure native structure detection among decoys. A decoy set was featurized using the Knowledge and Empirical Combined Scoring Algorithm 2 (KECSA2) pairwise potential. The highly unbalanced decoy set was then balanced using a comparison concept between native and decoy structures. The resultant comparison descriptors were used to train a logistic regression (LR) classifier. The LR model yielded the optimal performance for native detection among decoys compared to conventional scoring functions, while exhibiting lesser performance for the detection of low root mean square deviation (RMSD) decoy structures. Its deployment on an independent benchmark set confirms that the scoring function performs competitively relative to other scoring functions. Scripts used are available at: https://github.com/TanemuraKiyoto/PPI‐native‐detection‐via‐LR . This article is protected by copyright. All rights reserved.
Article
Molecular docking plays an indispensable role in predicting the receptor-ligand interactions in which the protein receptor is usually kept rigid while the ligand is treated as being flexible. Due to the inherent flexibility of proteins, the binding pocket of apo receptors might undergo significant conformational rearrangement upon ligand binding, which limits the prediction accuracy of docking. Here, we present an iterative Anisotropic Network Model (iterANM)-based ensemble docking approach which generates multiple holo-like receptor structures starting from the apo receptor and incorporates protein flexibility into docking. In a validation dataset consisting of 233 chemically diverse CDK2 inhibitors, the iterANM-based ensemble docking achieves higher capacity to reproduce native-like binding poses compared with those using single apo receptor conformation or conformational ensemble from molecular dynamics (MD) simulations. The prediction success rate within top5-ranked binding poses produced by iterANM can further be improved through re-ranking with the molecular mechanics–Poisson Boltzmann/surface area (MMPBSA) method. In a smaller dataset with 58 CDK2 inhibitors, the iterANM-based ensemble shows higher success rate compared with the flexible-receptor-based docking procedure AutoDockFR and other receptor conformation generation approaches. Further, an additional docking test consisting of ten diverse receptor/ligand combinations shows that the iterANM is robustly applicable for different receptor structures. These results suggest the iterANM-based ensemble docking as an accurate, efficient, and practical framework to predict the binding mode of a ligand for receptors with flexibility.
Article
Full-text available
Classical scoring functions have reached a plateau in their performance in virtual screening and binding affinity prediction. Recently, machine-learning scoring functions trained on protein-ligand complexes have shown great promise in small tailored studies. They have also raised controversy, specifically concerning model overfitting and applicability to novel targets. Here we provide a new ready-to-use scoring function (RF-Score-VS) trained on 15 426 active and 893 897 inactive molecules docked to a set of 102 targets. We use the full DUD-E data sets along with three docking tools, five classical and three machine-learning scoring functions for model building and performance assessment. Our results show RF-Score-VS can substantially improve virtual screening performance: RF-Score-VS top 1% provides 55.6% hit rate, whereas that of Vina only 16.2% (for smaller percent the difference is even more encouraging: RF-Score-VS top 0.1% achieves 88.6% hit rate for 27.5% using Vina). In addition, RF-Score-VS provides much better prediction of measured binding affinity than Vina (Pearson correlation of 0.56 and −0.18, respectively). Lastly, we test RF-Score-VS on an independent test set from the DEKOIS benchmark and observed comparable results. We provide full data sets to facilitate further research in this area (http://github.com/oddt/rfscorevs) as well as ready-to-use RF-Score-VS (http://github.com/oddt/rfscorevs_binary).
Article
Full-text available
Docking tools to predict whether and how a small molecule binds to a target can be applied if a structural model of such target is available. The reliability of docking depends, however, on the accuracy of the adopted scoring function (SF). Despite intense research over the years, improving the accuracy of SFs for structure‐based binding affinity prediction or virtual screening has proven to be a challenging task for any class of method. New SFs based on modern machine‐learning regression models, which do not impose a predetermined functional form and thus are able to exploit effectively much larger amounts of experimental data, have recently been introduced. These machine‐learning SFs have been shown to outperform a wide range of classical SFs at both binding affinity prediction and virtual screening. The emerging picture from these studies is that the classical approach of using linear regression with a small number of expert‐selected structural features can be strongly improved by a machine‐learning approach based on nonlinear regression allied with comprehensive data‐driven feature selection. Furthermore, the performance of classical SFs does not grow with larger training datasets and hence this performance gap is expected to widen as more training data becomes available in the future. Other topics covered in this review include predicting the reliability of a SF on a particular target class, generating synthetic data to improve predictive performance and modeling guidelines for SF development. WIREs Comput Mol Sci 2015, 5:405–424. doi: 10.1002/wcms.1225 For further resources related to this article, please visit the WIREs website.
Article
Full-text available
Protein-protein interactions are difficult therapeutic targets, and inhibiting pathologically relevant interactions without disrupting other essential ones presents an additional challenge. Herein we report how this might be achieved for the potential anticancer target, the TPX2-importin-α interaction. Importin-α is a nuclear transport protein that regulates the spindle assembly protein TPX2. It has two binding sites-major and minor-to which partners bind. Most nuclear transport cargoes use the major site, whereas TPX2 binds principally to the minor site. Fragment-based approaches were used to identify small molecules that bind importin-α, and crystallographic studies identified a lead series that was observed to bind specifically to the minor site, representing the first ligands specific for this site. Structure-guided synthesis informed the elaboration of these fragments to explore the source of ligand selectivity between the minor and major sites. These ligands are starting points for the development of inhibitors of this protein-protein interaction. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Article
Full-text available
Motivation: Molecular recognition between biological macromolecules and organic small molecules plays an important role in various life processes. Both structural information and binding data of biomolecular complexes are indispensable for depicting the underlying mechanism in such an event. The PDBbind database was created to collect experimentally measured binding data for the biomolecular complexes throughout the Protein Data Bank (PDB). It thus provides the linkage between structural information and energetic properties of biomolecular complexes, which is especially desirable for computational studies or statistical analyses. Results: Since its first public release in 2004, the PDBbind database has been updated on an annual basis. The latest release (version 2013) provides experimental binding affinity data for 10,776 biomolecular complexes in PDB, including 8302 protein-ligand complexes and 2474 other types of complexes. In this article, we will describe the current methods used for compiling PDBbind and the updated status of this database. We will also review some typical applications of PDBbind published in the scientific literature. Availability and implementation: All contents of this database are freely accessible at the PDBbind-CN Web server at http://www.pdbbind-cn.org/. Contact: wangrx@mail.sioc.ac.cn. Supplementary information: Supplementary data are available at Bioinformatics online.
Article
Full-text available
Protein-protein docking, which aims to predict the structure of a protein-protein complex from its unbound components, remains an unresolved challenge in structural bioinformatics. An important step is the ranking of docked poses using a scoring function, for which many methods have been developed. There is a need to explore the differences and commonalities of these methods with each other, as well as with functions developed in the fields of molecular dynamics and homology modelling. We present an evaluation of 115 scoring functions on an unbound docking decoy benchmark covering 118 complexes for which a near-native solution can be found, yielding top 10 success rates of up to 58%. Hierarchical clustering is performed, so as to group together functions which identify near-natives in similar subsets of complexes. Three set theoretic approaches are used to identify pairs of scoring functions capable of correctly scoring different complexes. This shows that functions in different clusters capture different aspects of binding and are likely to work together synergistically. All functions designed specifically for docking perform well, indicating that functions are transferable between sampling methods. We also identify promising methods from the field of homology modelling. Further, differential success rates by docking difficulty and solution quality suggest a need for flexibility-dependent scoring. Investigating pairs of scoring functions, the set theoretic measures identify known scoring strategies as well as a number of novel approaches, indicating promising augmentations of traditional scoring methods. Such augmentation and parameter combination strategies are discussed in the context of the learning-to-rank paradigm.
Article
Full-text available
Since the 4th CAPRI evaluation, we have made improvements in three major areas in our refinement approach, namely the treatment of conformational flexibility, the binding free energy model, and the search algorithm. First, we incorporated backbone flexibility into our previous approach, which only optimized rigid backbone poses with limited side-chain flexibility. Here, we formulated and solved the conformational search as a hierarchical optimization problem (involving rigid-body poses, backbone flexibility, and side-chain flexibility). Second, we used continuum electrostatic calculations to include solvation effects in the binding free energy model. Last, we eliminated sloppy modes (directions in which the free energy is essentially constant) to improve the efficiency of the search. With these improvements, we produced correct predictions for 6 out of the 10 latest CAPRI targets, including 1 high, 3 medium, and 2 acceptable accuracy predictions. Compared to our previous performance in CAPRI, substantial improvements have been made for targets requiring homology modeling. © Proteins 2013;. © 2013 Wiley Periodicals, Inc.
Article
Full-text available
The β-chemokine receptor CCR5 is considered to be an attractive target for inhibition of macrophage-tropic (CCR5-using or R5) HIV-1 replication because individuals having a nonfunctional receptor (a homozygous 32-bp deletion in the CCR5 coding region) are apparently normal but resistant to infection with R5 HIV-1. In this study, we found that TAK-779, a nonpeptide compound with a small molecular weight (Mr 531.13), antagonized the binding of RANTES (regulated on activation, normal T cell expressed and secreted) to CCR5-expressing Chinese hamster ovary cells and blocked CCR5-mediated Ca2+ signaling at nanomolar concentrations. The inhibition of β-chemokine receptors by TAK-779 appeared to be specific to CCR5 because the compound antagonized CCR2b to a lesser extent but did not affect CCR1, CCR3, or CCR4. Consequently, TAK-779 displayed highly potent and selective inhibition of R5 HIV-1 replication without showing any cytotoxicity to the host cells. The compound inhibited the replication of R5 HIV-1 clinical isolates as well as a laboratory strain at a concentration of 1.6–3.7 nM in peripheral blood mononuclear cells, though it was totally inactive against T-cell line-tropic (CXCR4-using or X4) HIV-1.
Article
Full-text available
Protein–protein interactions are central to almost all biological functions, and the atomic details of such interactions can yield insights into the mechanisms that underlie these functions. We present a web server that wraps and extends the SwarmDock flexible protein–protein docking algorithm. After uploading PDB files of the binding partners, the server generates low energy conformations and returns a ranked list of clustered docking poses and their corresponding structures. The user can perform full global docking, or focus on particular residues that are implicated in binding. The server is validated in the CAPRI blind docking experiment, against the most current docking benchmark, and against the ClusPro docking server, the highest performing server currently available. Availability: The server is freely available and can be accessed at: http://bmm.cancerresearchuk.org/%7ESwarmDock/. Contact: Paul.Bates@cancer.org.uk Supplementary information: Supplementary data are available at Bioinformatics online.
Article
Full-text available
Proteins in the B cell CLL/lymphoma 2 (BCL-2) family are key regulators of the apoptotic process. This family comprises proapoptotic and prosurvival proteins, and shifting the balance toward the latter is an established mechanism whereby cancer cells evade apoptosis. The therapeutic potential of directly inhibiting prosurvival proteins was unveiled with the development of navitoclax, a selective inhibitor of both BCL-2 and BCL-2-like 1 (BCL-X(L)), which has shown clinical efficacy in some BCL-2-dependent hematological cancers. However, concomitant on-target thrombocytopenia caused by BCL-X(L) inhibition limits the efficacy achievable with this agent. Here we report the re-engineering of navitoclax to create a highly potent, orally bioavailable and BCL-2-selective inhibitor, ABT-199. This compound inhibits the growth of BCL-2-dependent tumors in vivo and spares human platelets. A single dose of ABT-199 in three patients with refractory chronic lymphocytic leukemia resulted in tumor lysis within 24 h. These data indicate that selective pharmacological inhibition of BCL-2 shows promise for the treatment of BCL-2-dependent hematological cancers.
Article
Full-text available
Empirical models for the prediction of how changes in sequence alter protein-protein binding kinetics and thermodynamics can garner insights into many aspects of molecular biology. However, such models require empirical training data and proper validation before they can be widely applied. Previous databases contained few stabilizing mutations and no discussion of their inherent biases or how this impacts model construction or validation. We present SKEMPI, a database of 3047 binding free energy changes upon mutation assembled from the scientific literature, for protein-protein heterodimeric complexes with experimentally determined structures. This represents over four times more data than previously collected. Changes in 713 association and dissociation rates and 127 enthalpies and entropies were also recorded. The existence of biases towards specific mutations, residues, interfaces, proteins and protein families is discussed in the context of how the data can be used to construct predictive models. Finally, a cross-validation scheme is presented which is capable of estimating the efficacy of derived models on future data in which these biases are not present. Availability: The database is available online at http://life.bsc.es/pid/mutation_database/ juanf@bsc.es.
Article
Full-text available
Accurate binding free energy functions for protein-protein interactions are imperative for a wide range of purposes. Their construction is predicated upon ascertaining the factors that influence binding and their relative importance. A recent benchmark of binding affinities has allowed, for the first time, the evaluation and construction of binding free energy models using a diverse set of complexes, and a systematic assessment of our ability to model the energetics of conformational changes. We construct a large set of molecular descriptors using commonly available tools, introducing the use of energetic factors associated with conformational changes and disorder to order transitions, as well as features calculated on structural ensembles. The descriptors are used to train and test a binding free energy model using a consensus of four machine learning algorithms, whose performance constitutes a significant improvement over the other state of the art empirical free energy functions tested. The internal workings of the learners show how the descriptors are used, illuminating the determinants of protein-protein binding. The molecular descriptor set and descriptor values for all complexes are available in the Supplementary Material. A web server for the learners and coordinates for the bound and unbound structures can be accessed from the website: http://bmm.cancerresearchuk.org/~Affinity. paul.bates@cancer.org.uk. Supplementary data are available at Bioinformatics online.
Article
Full-text available
The CCP4 (Collaborative Computational Project, Number 4) software suite is a collection of programs and associated data and software libraries which can be used for macromolecular structure determination by X-ray crystallography. The suite is designed to be flexible, allowing users a number of methods of achieving their aims. The programs are from a wide variety of sources but are connected by a common infrastructure provided by standard file formats, data objects and graphical interfaces. Structure solution by macromolecular crystallography is becoming increasingly automated and the CCP4 suite includes several automation pipelines. After giving a brief description of the evolution of CCP4 over the last 30 years, an overview of the current suite is given. While detailed descriptions are given in the accompanying articles, here it is shown how the individual programs contribute to a complete software package.
Article
Full-text available
Over the last two decades, an increasing research effort in academia and industry has focused on the modulation (both inhibition and stabilization) of protein-protein interactions (PPIs) in order to develop novel therapeutic approaches and target-selective agents in drug discovery. The diversity and complexity of highly dynamic systems such as PPIs present many challenges for the identification of drug-like molecules with the ability to modulate the PPI with the necessary selectivity and potency. In this review, a number of these strategies will be presented along with a critical overview of the challenges and potential solutions relating to the exploitation of PPIs as molecular targets. Both traditional drug discovery approaches and some more recently developed innovative strategies have already provided valuable tools for the discovery of PPI modulators, and a number of successful examples have highlighted the potential of targeting PPIs for therapeutic intervention, especially in the oncology area.
Article
Full-text available
Unlabelled: A protein-protein docking decoy set is built for the Dockground unbound benchmark set. The GRAMM-X docking scan was used to generate 100 non-native and at least one near-native match per complex for 61 complexes. The set is a publicly available resource for the development of scoring functions and knowledge-based potentials for protein docking methodologies. Availability: The decoys are freely available for download at http://dockground.bioinformatics.ku.edu/UNBOUND/decoy/decoy.php
Article
Full-text available
Hemostasis and thrombosis (blood clotting) involve fibrinogen binding to integrin alpha(IIb)beta(3) on platelets, resulting in platelet aggregation. alpha(v)beta(3) binds fibrinogen via an Arg-Asp-Gly (RGD) motif in fibrinogen's alpha subunit. alpha(IIb)beta(3) also binds to fibrinogen; however, it does so via an unstructured RGD-lacking C-terminal region of the gamma subunit (gammaC peptide). These distinct modes of fibrinogen binding enable alpha(IIb)beta(3) and alpha(v)beta(3) to function cooperatively in hemostasis. In this study, crystal structures reveal the integrin alpha(IIb)beta(3)-gammaC peptide interface, and, for comparison, integrin alpha(IIb)beta(3) bound to a lamprey gammaC primordial RGD motif. Compared with RGD, the GAKQAGDV motif in gammaC adopts a different backbone configuration and binds over a more extended region. The integrin metal ion-dependent adhesion site (MIDAS) Mg(2+) ion binds the gammaC Asp side chain. The adjacent to MIDAS (ADMIDAS) Ca(2+) ion binds the gammaC C terminus, revealing a contribution for ADMIDAS in ligand binding. Structural data from this natively disordered gammaC peptide enhances our understanding of the involvement of gammaC peptide and integrin alpha(IIb)beta(3) in hemostasis and thrombosis.
Article
Full-text available
A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent mathematical results on the stochastic properties of MSP scores allow an analysis of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the analysis of multiple regions of similarity in long DNA sequences. In addition to its flexibility and tractability to mathematical analysis, BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.
Article
Full-text available
The Database of Interacting Proteins (DIP; http://dip.doe-mbi.ucla.edu ) is a database that documents experimentally determined protein–protein interactions. This database is intended to provide the scientific community with a comprehensive and integrated tool for browsing and efficiently extracting information about protein interactions and interaction networks in biological processes. Beyond cataloging details of protein–protein interactions, the DIP is useful for understanding protein function and protein–protein relationships, studying the properties of networks of interacting proteins, benchmarking predictions of protein–protein interactions, and studying the evolution of protein–protein interactions.
Article
Full-text available
mentation will be kept publicly available and the distribution sites will mirror the PDB archive using identical contents and subdirec- tory structure. However, each member of the wwPDB will be able to develop its own web site, with a unique view of the primary data, providing a variety of tools and resources for the global community. An Advisory Board consisting of appointees from the wwPDB, the International Union of Crystallography and the International Council on Magnetic Resonance in Biological Systems will provide guidance through annual meetings with the wwPDB consortium. This board is responsible for reviewing and deter- mining policy as well as providing a forum for resolving issues related to the wwPDB. Specific details about the Advisory Board can be found in the wwPDB charter, available on the wwPDB web site. The RCSB is the 'archive keeper' of wwPDB. It has sole write access to the PDB archive and control over directory structure and contents, as well as responsibility for dis- tributing new PDB identifiers to all deposi- tion sites. The PDB archive is a collection of flat files in the legacy PDB file format 3 and in the mmCIF 4 format that follows the PDB exchange dictionary (http://deposit.pdb.org/ mmcif/). This dictionary describes the syntax and semantics of PDB data that are processed and exchanged during the process of data annotation. It was designed to provide consis- tency in data produced in structure laborato- ries, processed by the wwPDB members and used in bioinformatics applications. The PDB archive does not include the websites, browsers, software and database query engines developed by researchers worldwide. The members of the wwPDB will jointly agree to any modifications or extensions to the PDB exchange dictionary. As data tech- nology progresses, other data formats (such as XML) and delivery methods may be included in the official PDB archive if all the wwPDB members concur on the alteration. Any new formats will follow the naming and description conventions of the PDB exchange dictionary. In addition, the legacy PDB for- mat would not be modified unless there is a compelling reason for a change. Should such a situation occur, all three wwPDB members would have to agree on the changes and give the structural biology community 90 days advance notice. The creation of the wwPDB formalizes the international character of the PDB and ensures that the archive remains single and uniform. It provides a mechanism to ensure consistent data for software developers and users world- wide. We hope that this will encourage individ- ual creativity in developing tools for presenting structural data, which could benefit the scien- tific research community in general.
Article
Full-text available
IntAct provides an open source database and toolkit for the storage, presentation and analysis of protein interactions. The web interface provides both textual and graphical representations of protein interactions, and allows exploring interaction networks in the context of the GO annotations of the interacting proteins. A web service allows direct computational access to retrieve interaction networks in XML format. IntAct currently contains ∼2200 binary and complex interactions imported from the literature and curated in collaboration with the Swiss‐Prot team, making intensive use of controlled vocabularies to ensure data consistency. All IntAct software, data and controlled vocabularies are available at http://www.ebi.ac.uk/intact.
Article
We present the sixth report evaluating the performance of methods for predicting the atomic resolution structures of protein complexes offered as targets to the community-wide initiative on the Critical Assessment of Predicted Interactions (CAPRI). The evaluation is based on a total of 20670 predicted models for 8 protein-peptide complexes, a novel category of targets in CAPRI, and 12 protein-protein targets in CAPRI prediction Rounds held during the years 2013-2016. For two of the protein-protein targets, the focus was on the prediction of side-chain conformation and positions of interfacial water molecules. Seven of the protein-protein targets were particularly challenging owing to their multi-component nature, to conformational changes at the binding site, or to a combination of both. Encouragingly, the very large multi-protein complex with the nucleosome was correctly predicted, and correct models were submitted for the protein-peptide targets, but not for some of the challenging protein-protein targets. Models of acceptable quality or better were obtained for 14 of the 20 targets, including medium quality models for 13 targets and high quality models for 8 targets, indicating tangible progress of present-day computational methods in modeling protein complexes with increased accuracy. Our evaluation suggests that the progress stems from better integration of different modeling tools with docking procedures, as well as the use of more sophisticated evolutionary information to score models. Nonetheless, adequate modeling of conformational flexibility in interacting proteins remains an important area with a crucial need for improvement. This article is protected by copyright. All rights reserved.
Article
A computational protein-protein docking method that predicts atomic details of protein-protein interactions from protein monomer structures is an invaluable tool for understanding the molecular mechanisms of protein interactions and for designing molecules that control such interactions. Compared to low-resolution docking, high-resolution docking explores the conformational space in atomic resolution to provide predictions with atomic details. This allows for applications to more challenging docking problems that involve conformational changes induced by binding. Recently, high-resolution methods have become more promising as additional information such as global shapes or residue contacts are now available from experiments or sequence/structure data. In this review article, we highlight developments in high-resolution docking made during the last decade, specifically regarding global optimization methods employed by the docking methods. We also discuss two major challenges in high-resolution docking: prediction of backbone flexibility and water-mediated interactions. Copyright © 2015 Elsevier Ltd. All rights reserved.
Article
Predicting protein binding affinities from structural data has remained elusive, a difficulty owing to the variety of protein binding modes. Using the structure-affinity-benchmark (SAB, 144 cases with bound/unbound crystal structures and experimental affinity measurements), prediction has been undertaken either by fitting a model using a handfull of pre-defined variables, or by training a complex model from a large pool of parameters (typically hundreds). The former route unnecessarily restricts the model space, while the latter is prone to overfitting. We design models in a third tier, using twelve variables describing enthalpic and entropic variations upon binding, and a model selection procedure identifying the best sparse model built from a subset of these variables. Using these models, we report three main results. First, we present models yielding a marked improvement of affinity predictions. For the whole dataset, we present a model predicting Kd within one and two orders of magnitude for 48% and 79% of cases, respectively. These statistics jump to 62% and 89% respectively, for the subset of the SAB consisting of high resolution structures. Second, we show that these performances owe to a new parameter encoding interface morphology and packing properties of interface atoms. Third, we argue that interface flexibility and prediction hardness do not correlate, and that for flexible cases, a performance matching that of the whole SAB can be achieved. Overall, our work suggests that the affinity prediction problem could be partly solved using databases of high resolution complexes whose affinity is known.
Article
We present an updated and integrated version of our widely used protein-protein docking and binding affinity benchmarks. The benchmarks consist of non-redundant, high quality structures of protein-protein complexes along with the unbound structures of their components. Fifty-five new complexes were added to the docking benchmark, 35 of which have experimentally-measured binding affinities. These updated docking and affinity benchmarks now contain 230 and 179 entries, respectively. In particular, the number of antibody-antigen complexes has increased significantly, by 67% and 74% in the docking and affinity benchmarks, respectively. We tested previously developed docking and affinity prediction algorithms on the new cases. Considering only the top ten docking predictions per benchmark case, a prediction accuracy of 38% is achieved on all 55 cases, and up to 50% for the 32 rigid-body cases only. Predicted affinity scores are found to correlate with experimental binding energies up to r=0.52 overall, and r=0.72 for the rigid complexes. Copyright © 2015. Published by Elsevier Ltd.
Article
CAPRI (Critical Assessment of PRedicted Interactions) has proven to be a catalyst for the development of docking algorithms. An essential step in docking is the scoring of predicted binding modes in order to identify stable complexes. In 2005, CAPRI introduced the scoring experiment, where upon completion of a prediction round, a larger set of models predicted by different groups and comprising both correct and incorrect binding modes, is made available to all participants for testing new scoring functions independently from docking calculations. Here we present an expanded benchmark data set for testing scoring functions, which comprises the consolidated ensemble of predicted complexes made available in the CAPRI scoring experiment since its inception. This consolidated scoring benchmark contains predicted complexes for 15 published CAPRI targets. These targets were subjected to 23 CAPRI assessments, due to existence of multiple binding modes for some targets. The benchmark contains more than 19000 protein complexes. About 10% of the complexes represent docking predictions of acceptable quality or better, the remainder represent incorrect solutions (decoys). The benchmark set contains models predicted by 47 different predictor groups including web servers, which use different docking and scoring procedures, and is arguably as diverse as one may expect, representing the state of the art in protein docking. The data set is publicly available at the following URL: http://cb.iri.univ-lille1.fr/Users/lensink/Score_set. © Proteins 2014;. © 2014 Wiley Periodicals, Inc.
Article
Scoring functions are often applied in combination with molecular docking methods to predict ligand binding poses, ligand binding affinities, or identify active compounds through virtual screening. An objective benchmark for assessing the performance of current scoring functions is expected to provide practical guidance for the users to make smart choices among available methods. It can also elucidate the common weakness in current methods for future improvements. The primary goal of our Comparative Assessment of Scoring Functions (CASF) project is to provide a high-standard, publicly accessible benchmark of this type. Our latest study, i.e. CASF-2013, evaluated 20 popular scoring functions on an updated set of protein-ligand complexes. This data set was selected out of 8302 protein-ligand complexes recorded in the PDBbind database (version 2013) through a fairly complicated process. Sample selection was made by considering the quality of complex structures as well as binding data. Finally, qualified complexes were clustered by 90% similarity in protein sequences. Three representative complexes were chosen from each cluster to control sample redundancy. The final outcome, namely the PDBbind core set (version 2013), consists of 195 protein-ligand com-plexes in 65 clusters with binding constants spanning nearly 10 orders of magnitude. In this data set, 82% ligand molecules are "drug-like" and 78% protein molecules are validated or potential drug targets. Correlation between binding constants and several key properties of ligands are discussed. Methods and results of scoring function evaluation will be described in a following article.
Article
Our comparative assessment of scoring functions (CASF) benchmark is created to provide an objective evaluation of current scoring functions. The key idea of CASF is to compare the general performance of scoring functions on a diverse set of protein-ligand complexes. In order to avoid testing scoring functions in context of molecular docking, the scoring process is separated from the docking (or sampling) process by using ensembles of ligand binding poses that are generated in prior. Here, we describe the technical methods and evaluation results of the latest CASF-2013 study. The PDBbind core set (version 2013) was employed as the primary test set in this study, which consists of 195 protein-ligand complexes with high-quality 3D structures and reliable binding constants. A panel of 20 scoring functions, most of which are implemented in main-stream commercial software, were evaluated in terms of "scoring power" (binding affinity prediction), "ranking power" (relative ranking prediction), "docking power" (binding pose prediction), and "screening power" (discrimination of true binders from random molecules). Our results reveal that the performance of these scoring functions is generally more promising in the docking/screening power tests than in the scoring/ranking power tests. Top-ranked scoring functions in the scoring power test, such as X-ScoreHM, ChemScore@SYBYL, ChemPLP@GOLD, and PLP@DS, are also top-ranked in the ranking power test. Top-ranked scoring functions in the docking power test, such as ChemPLP@GOLD, Chemscore@GOLD, GlidScore-SP, LigScore@DS, and PLP@DS, are also top-ranked in the screening power test. Our results obtained on the entire test set and its subsets suggest that the real challenge in protein-ligand binding affinity prediction lies in polar interactions and the associated desolvation effect. Non-additive features observed among high-affinity protein-ligand complexes also need attention.
Article
We present the 5th evaluation of docking and related scoring methods used in the community-wide experiment on the Critical Assessment of Predicted Interactions (CAPRI). The evaluation examined predictions submitted for a total of 15 targets in eight CAPRI rounds held during the years 2010-2012. The targets represented one the most diverse set tackled by the CAPRI community so far. They included only 10 'classical' docking and scoring problems. In one of the classical targets the new challenge was to predict the position of water molecules in the protein-protein interface. The remaining 5 targets represented other new challenges that involved estimating the relative binding affinity and the effect of point mutations on the stability of designed and natural protein-protein complexes. Although the 10 'classical' CAPRI targets included two difficult multi-component systems, and a protein-oligosaccharide complex with which CAPRI participants had little experience, this evaluation indicates that the performance of docking and scoring methods has remained quite robust. More remarkably, we find that automatic docking servers exhibit a significantly improved performance, with some servers now performing on par with predictions done by humans. The performance of CAPRI participants in the new challenges, briefly reviewed here, was mediocre overall, but some groups did relatively well and their approaches suggested ways of improving methods for designing binders and for estimating the free energies of protein assemblies, which should impact the field of protein modeling and design as a whole. © Proteins 2013;. © 2013 Wiley Periodicals, Inc.
Article
The computational evaluation of protein-protein interactions will play an important role in organising the wealth of data being generated by high-throughput initiatives. Here we discuss future applications, report recent developments and identify areas requiring further investigation. Many functions have been developed to quantify the structural and energetic properties of interacting proteins, finding use in interrelated challenges revolving around the relationship between sequence, structure and binding free energy. These include loop modelling, side-chain refinement, docking, multimer assembly, affinity prediction, affinity change upon mutation, hotspots location and interface design. Information derived from models optimised for one of these challenges can be used to benefit the others, and can be unified within the theoretical frameworks of multi-task learning and Pareto-optimal multi-objective learning.
Article
We developed a method called Residue Contact Frequency (RCF), which uses the complex structures generated by the protein-protein docking algorithm ZDOCK to predict interface residues. Unlike interface prediction algorithms that are based on monomers alone, RCF is binding partner specific. We evaluated the performance of RCF using the Area Under the Precision-Recall (PR) Curve (AUC) on a large protein docking Benchmark. RCF (AUC=0.44) performed as well as meta-PPISP (AUC=0.43), which is one of the best monomer-based interface prediction methods. In addition, we test a Support Vector Machine (SVM) to combine RCF with meta-PPISP and another monomer-based interface prediction algorithm Evolutionary Trace to further improve the performance. We found that the SVM that combined RCF and meta-PPISP achieved the best performance (AUC=0.47). We used RCF to predict the binding interfaces of proteins that can bind to multiple partners and RCF was able to correctly predict interface residues that are unique for the respective binding partners. Furthermore, we found that residues that contributed greatly to binding affinity (hotspot residues) had significantly higher RCF than other residues. © Proteins 2013;. © 2013 Wiley Periodicals, Inc.
Article
Advances in biophysics and biochemistry have pushed back the limits for the structural characterization of biomolecular assemblies. Large efforts have been devoted to increase both resolution and accuracy of the methods, probe into the smallest biomolecules as well as the largest macromolecular machineries, unveil transient complexes along with dynamic interaction processes, and, lately, dissect whole organism interactomes using high‐throughput strategies. However, the atomic description of such interactions, rarely reached by large‐scale projects in structural biology, remains indispensable to fully understand the subtleties of the recognition process, measure the impact of a mutation or predict the effect of a drug binding to a complex. Mixing even a limited amount of experimental and/or bioinformatic data with modeling methods, such as macromolecular docking, presents a valuable strategy to predict the three‐dimensional structures of complexes. Recent developments indicate that the docking community is seething to tackle the greatest challenge of adding the structural dimension to interactomes. © 2011 John Wiley & Sons, Ltd. This article is categorized under: Molecular and Statistical Mechanics > Molecular Dynamics and Monte-Carlo Methods
Article
The maximum achievable accuracy of in silico models depends on the quality of the experimental data. Consequently, experimental uncertainty defines a natural upper limit to the predictive performance possible. Models that yield errors smaller than the experimental uncertainty are necessarily overtrained. A reliable estimate of the experimental uncertainty is therefore of high importance to all originators and users of in silico models. The data deposited in ChEMBL was analyzed for reproducibility, i.e., the experimental uncertainty of independent measurements. Careful filtering of the data was required because ChEMBL contains unit-transcription errors, undifferentiated stereoisomers, and repeated citations of single measurements (90% of all pairs). The experimental uncertainty is estimated to yield a mean error of 0.44 pK(i) units, a standard deviation of 0.54 pK(i) units, and a median error of 0.34 pK(i) units. The maximum possible squared Pearson correlation coefficient (R(2)) on large data sets is estimated to be 0.81.
Article
Novel discoveries in molecular disease pathways within the cell, combined with increasing information regarding protein binding partners has lead to a new approach in drug discovery. There is interest in designing drugs to modulate protein-protein interactions as opposed to solely targeting the catalytic active site within a single enzyme or protein. There are many challenges in this new approach to drug discovery, particularly since the protein-protein interface has a larger surface area, can comprise a discontinuous epitope, and is more amorphous and less well defined than the typical drug design target, a small contained enzyme-binding pocket. Computational methods to predict modes of protein-protein interaction, as well as protein interface hot spots, have garnered significant interest, in order to facilitate the development of drugs to successfully disrupt and inhibit protein-protein interactions. This review summarizes some current methods available for computational protein-protein docking, as well as tabulating some examples of the successful design of antagonists and small molecule inhibitors for protein-protein interactions. Several of these drugs are now beginning to appear in the clinic.
Article
We have assembled a nonredundant set of 144 protein-protein complexes that have high-resolution structures available for both the complexes and their unbound components, and for which dissociation constants have been measured by biophysical methods. The set is diverse in terms of the biological functions it represents, with complexes that involve G-proteins and receptor extracellular domains, as well as antigen/antibody, enzyme/inhibitor, and enzyme/substrate complexes. It is also diverse in terms of the partners' affinity for each other, with K(d) ranging between 10(-5) and 10(-14) M. Nine pairs of entries represent closely related complexes that have a similar structure, but a very different affinity, each pair comprising a cognate and a noncognate assembly. The unbound structures of the component proteins being available, conformation changes can be assessed. They are significant in most of the complexes, and large movements or disorder-to-order transitions are frequently observed. The set may be used to benchmark biophysical models aiming to relate affinity to structure in protein-protein interactions, taking into account the reactants and the conformation changes that accompany the association reaction, instead of just the final product.
Article
Protein docking algorithms are assessed by evaluating blind predictions performed during 2007-2009 in Rounds 13-19 of the community-wide experiment on critical assessment of predicted interactions (CAPRI). We evaluated the ability of these algorithms to sample docking poses and to single out specific association modes in 14 targets, representing 11 distinct protein complexes. These complexes play important biological roles in RNA maturation, G-protein signal processing, and enzyme inhibition and function. One target involved protein-RNA interactions not previously considered in CAPRI, several others were hetero-oligomers, or featured multiple interfaces between the same protein pair. For most targets, predictions started from the experimentally determined structures of the free (unbound) components, or from models built from known structures of related or similar proteins. To succeed they therefore needed to account for conformational changes and model inaccuracies. In total, 64 groups and 12 web-servers submitted docking predictions of which 4420 were evaluated. Overall our assessment reveals that 67% of the groups, more than ever before, produced acceptable models or better for at least one target, with many groups submitting multiple high- and medium-accuracy models for two to six targets. Forty-one groups including four web-servers participated in the scoring experiment with 1296 evaluated models. Scoring predictions also show signs of progress evidenced from the large proportion of correct models submitted. But singling out the best models remains a challenge, which also adversely affects the ability to correctly rank docking models. With the increased interest in translating abstract protein interaction networks into realistic models of protein assemblies, the growing CAPRI community is actively developing more efficient and reliable docking and scoring methods for everyone to use.
Article
We updated our protein-protein docking benchmark to include complexes that became available since our previous release. As before, we only considered high-resolution complex structures that are nonredundant at the family-family pair level, for which the X-ray or NMR unbound structures of the constituent proteins are also available. Benchmark 4.0 adds 52 new complexes to the 124 cases of Benchmark 3.0, representing an increase of 42%. Thus, benchmark 4.0 provides 176 unbound-unbound cases that can be used for protein-protein docking method development and assessment. Seventeen of the newly added cases are enzyme-inhibitor complexes, and we found no new antigen-antibody complexes. Classifying the new cases according to expected difficulty for protein-protein docking algorithms gives 33 rigid body cases, 11 cases of medium difficulty, and 8 cases that are difficult. Benchmark 4.0 listings and processed structure files are publicly accessible at http://zlab.umassmed.edu/benchmark/.
Article
Docking algorithms build multimolecular assemblies based on the subunit structures. "Unbound" docking, which starts with the free molecules and allows for conformation changes, may be used to predict the structure of a protein-protein complex. This requires at least two steps, a rigid-body search that determines the relative position and orientation of the subunits, and a refinement step. The methods developed in the past twenty years yield native-like models in most cases, but always with many false positives that must be filtered out, and they fail when the conformation changes are large. CAPRI (Critical Assessment of PRedicted Interactions) is a community-wide experiment set up to monitor progress in the field. It offers participants the opportunity to test their methods in blind predictions that are assessed against an unpublished experimental structure. The models submitted by predictor groups are judged depending on how well they reproduce the geometry and the residue-residue contacts seen in the target structure. In nine years of CAPRI, 42 target complexes have been subjected to prediction based on the components' unbound structures. Good models have been submitted for 28 targets, and prediction has failed on 6. Both these successes and these failures have been fruitful, as they stimulated participant groups to develop new score functions to identify native-like solutions, and new algorithms that allow the molecules to be flexible during docking.
Article
The design of an ideal scoring function for protein-protein docking that would also predict the binding affinity of a complex is one of the challenges in structural proteomics. Such a scoring function would open the route to in silico, large-scale annotation and prediction of complete interactomes. Here we present a protein-protein binding affinity benchmark consisting of binding constants (K(d)'s) for 81 complexes. This benchmark was used to assess the performance of nine commonly used scoring algorithms along with a free-energy prediction algorithm in their ability to predicting binding affinities. Our results reveal a poor correlation between binding affinity and scores for all algorithms tested. However, the diversity and validity of the benchmark is highlighted when binding affinity data are categorized according to the methodology by which they were determined. By further classifying the complexes into low, medium and high affinity groups, significant correlations emerge, some of which are retained after dividing the data into more classes, showing the robustness of these correlations. Despite this, accurate prediction of binding affinity remains outside our reach due to the large associated standard deviations of the average score within each group. All the above-mentioned observations indicate that improvements of existing scoring functions or design of new consensus tools will be required for accurate prediction of the binding affinity of a given protein-protein complex. The benchmark developed in this work will serve as an indispensable source to reach this goal.
Article
Scoring functions are widely applied to the evaluation of protein-ligand binding in structure-based drug design. We have conducted a comparative assessment of 16 popular scoring functions implemented in main-stream commercial software or released by academic research groups. A set of 195 diverse protein-ligand complexes with high-resolution crystal structures and reliable binding constants were selected through a systematic nonredundant sampling of the PDBbind database and used as the primary test set in our study. All scoring functions were evaluated in three aspects, that is, "docking power", "ranking power", and "scoring power", and all evaluations were independent from the context of molecular docking or virtual screening. As for "docking power", six scoring functions, including GOLD::ASP, DS::PLP1, DrugScore(PDB), GlideScore-SP, DS::LigScore, and GOLD::ChemScore, achieved success rates over 70% when the acceptance cutoff was root-mean-square deviation < 2.0 A. Combining these scoring functions into consensus scoring schemes improved the success rates to 80% or even higher. As for "ranking power" and "scoring power", the top four scoring functions on the primary test set were X-Score, DrugScore(CSD), DS::PLP, and SYBYL::ChemScore. They were able to correctly rank the protein-ligand complexes containing the same type of protein with success rates around 50%. Correlation coefficients between the experimental binding constants and the binding scores computed by these scoring functions ranged from 0.545 to 0.644. Besides the primary test set, each scoring function was also tested on four additional test sets, each consisting of a certain number of protein-ligand complexes containing one particular type of protein. Our study serves as an updated benchmark for evaluating the general performance of today's scoring functions. Our results indicate that no single scoring function consistently outperforms others in all three aspects. Thus, it is important in practice to choose the appropriate scoring functions for different purposes.
Article
During the past year, many new antibody structures have been determined, increasing our understanding of these immunologically important molecules. Of special interest are new catalytic antibodies, antibody-peptide and antibody-virus complexes, NMR structures, and structures illustrating conformational changes and antibody cross-reactivity.
Article
A protein docking study was performed for two classes of biomolecular complexes: six enzyme/inhibitor and four antibody/antigen. Biomolecular complexes for which crystal structures of both the complexed and uncomplexed proteins are available were used for eight of the ten test systems. Our docking experiments consist of a global search of translational and rotational space followed by refinement of the best predictions. Potential complexes are scored on the basis of shape complementarity and favourable electrostatic interactions using Fourier correlation theory. Since proteins undergo conformational changes upon binding, the scoring function must be sufficiently soft to dock unbound structures successfully. Some degree of surface overlap is tolerated to account for side-chain flexibility. Similarly for electrostatics, the interaction of the dispersed point charges of one protein with the Coulombic field of the other is measured rather than precise atomic interactions. We tested our docking protocol using the native rather than the complexed forms of the proteins to address the more scientifically interesting problem of predictive docking. In all but one of our test cases, correctly docked geometries (interface Calpha RMS deviation </=2 A from the experimental structure) are found during a global search of translational and rotational space in a list that was always less than 250 complexes and often less than 30. Varying degrees of biochemical information are still necessary to remove most of the incorrectly docked complexes.
Article
Recently, developments have been made in predicting the structure of docked complexes when the coordinates of the components are known. The process generally consists of a stage during which the components are combined rigidly and then a refinement stage. Several rapid new algorithms have been introduced in the rigid docking problem and promising refinement techniques have been developed, based on modified molecular mechanics force fields and empirical measures of desolvation, combined with minimisations that switch on the short-range interactions gradually. There has also been progress in developing a benchmark set of targets for docking and a blind trial, similar to the trials of protein structure prediction, has taken place.
Article
Protein interaction databases represent unique tools to store, in a computer readable form, the protein interaction information disseminated in the scientific literature. Well organized and easily accessible databases permit the easy retrieval and analysis of large interaction data sets. Here we present MINT, a database (http://cbm.bio.uniroma2.it/mint/index.html) designed to store data on functional interactions between proteins. Beyond cataloguing binary complexes, MINT was conceived to store other types of functional interactions, including enzymatic modifications of one of the partners. Release 1.0 of MINT focuses on experimentally verified protein-protein interactions. Both direct and indirect relationships are considered. Furthermore, MINT aims at being exhaustive in the description of the interaction and, whenever available, information about kinetic and binding constants and about the domains participating in the interaction is included in the entry. MINT consists of entries extracted from the scientific literature by expert curators assisted by 'MINT Assistant', a software that targets abstracts containing interaction information and presents them to the curator in a user-friendly format. The interaction data can be easily extracted and viewed graphically through 'MINT Viewer'. Presently MINT contains 4568 interactions, 782 of which are indirect or genetic interactions.
Article
The distance-dependent structure-derived potentials developed so far all employed a reference state that can be characterized as a residue (atom)-averaged state. Here, we establish a new reference state called the distance-scaled, finite ideal-gas reference (DFIRE) state. The reference state is used to construct a residue-specific all-atom potential of mean force from a database of 1011 nonhomologous (less than 30% homology) protein structures with resolution less than 2 A. The new all-atom potential recognizes more native proteins from 32 multiple decoy sets, and raises an average Z-score by 1.4 units more than two previously developed, residue-specific, all-atom knowledge-based potentials. When only backbone and C(beta) atoms are used in scoring, the performance of the DFIRE-based potential, although is worse than that of the all-atom version, is comparable to those of the previously developed potentials on the all-atom level. In addition, the DFIRE-based all-atom potential provides the most accurate prediction of the stabilities of 895 mutants among three knowledge-based all-atom potentials. Comparison with several physical-based potentials is made.
Article
The Biomolecular Interaction Network Database (BIND: http://bind.ca) archives biomolecular interaction, complex and pathway information. A web-based system is available to query, view and submit records. BIND continues to grow with the addition of individual submissions as well as interaction data from the PDB and a number of large-scale interaction and complex mapping experiments using yeast two hybrid, mass spectrometry, genetic interactions and phage display. We have developed a new graphical analysis tool that provides users with a view of the domain composition of proteins in interaction and complex records to help relate functional domains to protein interactions. An interaction network clustering tool has also been developed to help focus on regions of interest. Continued input from users has helped further mature the BIND data specification, which now includes the ability to store detailed information about genetic interactions. The BIND data specification is available as ASN.1 and XML DTD.
Article
We have developed a nonredundant benchmark for testing protein-protein docking algorithms. Currently it contains 59 test cases: 22 enzyme-inhibitor complexes, 19 antibody-antigen complexes, 11 other complexes, and 7 difficult test cases. Thirty-one of the test cases, for which the unbound structures of both the receptor and ligand are available, are classified as follows: 16 enzyme-inhibitor, 5 antibody-antigen, 5 others, and 5 difficult. Such a centralized resource should benefit the docking community not only as a large curated test set but also as a common ground for comparing different algorithms. The benchmark is available at (http://zlab.bu.edu/~rong/dock/benchmark.shtml).
Article
CAPRI is a communitywide experiment to assess the capacity of protein-docking methods to predict protein-protein interactions. Nineteen groups participated in rounds 1 and 2 of CAPRI and submitted blind structure predictions for seven protein-protein complexes based on the known structure of the component proteins. The predictions were compared to the unpublished X-ray structures of the complexes. We describe here the motivations for launching CAPRI, the rules that we applied to select targets and run the experiment, and some conclusions that can already be drawn. The results stress the need for new scoring functions and for methods handling the conformation changes that were observed in some of the target systems. CAPRI has already been a powerful drive for the community of computational biologists who development docking algorithms. We hope that this issue of Proteins will also be of interest to the community of structural biologists, which we call upon to provide new targets for future rounds of CAPRI, and to all molecular biologists who view protein-protein recognition as an essential process.