Article

Using Multiple Structure Alignments, Fast Model Building, and Energetic Analysis in Fold Recognition and Homology Modeling

Authors:
  • Entangible, Inc
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We participated in the fold recognition and homology sections of CASP5 using primarily in-house software. The central feature of our structure prediction strategy involved the ability to generate good sequence-to-structure alignments and to quickly transform them into models that could be evaluated both with energy-based methods and manually. The in-house tools we used include: a) HMAP (Hybrid Multidimensional Alignment Profile)-a profile-to-profile alignment method that is derived from sequence-enhanced multiple structure alignments in core regions, and sequence motifs in non-structurally conserved regions. b) NEST-a fast model building program that applies an "artificial evolution" algorithm to construct a model from a given template and alignment. c) GRASP2-a new structure and alignment visualization program incorporating multiple structure superposition and domain database scanning modules. These methods were combined with model evaluation based on all atom and simplified physical-chemical energy functions. All of these methods were under development during CASP5 and consequently a great deal of manual analysis was carried out at each stage of the prediction process. This interactive model building procedure has several advantages and suggests important ways in which our and other methods can be improved, examples of which are provided.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... BLAST was used to identify proteins in the PDB with sequences similar to the query sequence. For BLAST evalue ≤10 À12 , a homology model for the query sequence with the PDB structure as template was created with Nest (Petrey et al., 2003). If no template was identified, remote sequence homologs within the PDB were identified by HHblits (Remmert et al., 2012) with 5 iterations, and, for e-value ≤10 À12 , a homology model was created with Nest (Petrey et al., 2003). ...
... For BLAST evalue ≤10 À12 , a homology model for the query sequence with the PDB structure as template was created with Nest (Petrey et al., 2003). If no template was identified, remote sequence homologs within the PDB were identified by HHblits (Remmert et al., 2012) with 5 iterations, and, for e-value ≤10 À12 , a homology model was created with Nest (Petrey et al., 2003). Otherwise, a homology model for the query was not created. ...
Article
Full-text available
We describe the Predicting Protein–Compound Interactions (PrePCI) database which comprises over 5 billion predicted interactions between 6.8 million chemical compounds and 19,797 human proteins. PrePCI relies on a proteome‐wide database of structural models based on both traditional modeling techniques and the AlphaFold Protein Structure Database. Sequence‐ and structural similarity‐based metrics are established between template proteins, T, in the Protein Data Bank that bind compounds, C, and query proteins in the model database, Q. When the metrics exceed threshold values, it is assumed that C also binds to Q with a likelihood ratio (LR) derived from machine learning. If the relationship is based on structural similarity, the LR is based on a scoring function that measures the extent to which C is compatible with the binding site of Q as described in the LT‐scanner algorithm. For every predicted complex derived in this way, chemical similarity based on the Tanimoto coefficient identifies other small molecules that may bind to Q. An overall LR for the binding of C to Q is obtained from Naive Bayesian statistics. The PrePCI database can be queried by entering a UniProt ID or gene name for a protein to obtain a list of compounds predicted to bind to it along with associated LRs. Alternatively, entering an identifier for the compound outputs a list of proteins it is predicted to bind. Specific applications of the database to lead discovery, elucidation of drug mechanism of action, and biological function annotation are described.
... The profix module from the Jackal package was used to rebuild those missing heavy atoms and short loops in the MSH2 and MSH6 protein ( Figure 5) [47]. Overall, four VUS missense mutations, p.Tyr43Cys, p.Ala272Val, p.Asn547Ser and p.Met592Val were selected for this study. ...
... Chain A is the MSH2 protein and Chain B is the MSH6 protein. The profix module from the Jackal package was used to rebuild those missing heavy atoms and short loops in the MSH2 and MSH6 protein ( Figure 5) [47]. ...
Article
Full-text available
This study suggests that two newly discovered variants in the MSH2 gene, which codes for a DNA mismatch repair (MMR) protein, can be associated with a high risk of breast cancer. While variants in the MSH2 gene are known to be linked with an elevated cancer risk, the MSH2 gene is not a part of the standard kit for testing patients for elevated breast cancer risk. Here we used the results of genetic testing of women diagnosed with breast cancer, but who did not have variants in BRCA1 and BRCA2 genes. Instead, the test identified four variants with unknown significance (VUS) in the MSH2 gene. Here, we carried in silico analysis to develop a classifier that can distinguish pathogenic from benign mutations in MSH2 genes taken from ClinVar. The classifier was then used to classify VUS in MSH2 genes, and two of them, p.Ala272Val and p.Met592Val, were predicted to be pathogenic mutations. These two mutations were found in women with breast cancer who did not have mutations in BRCA1 and BRCA2 genes, and thus they are suggested to be considered as new bio-markers for the early detection of elevated breast cancer risk. However, before this is done, an in vitro validation of mutation pathogenicity is needed and, moreover, the presence of these mutations should be demonstrated in a higher number of patients or in families with breast cancer history.
... They experimented with two different approaches -sequence and structural similarity to retrieve homologous templates from their ProtCom database in order to identify true interacting pairs of proteins. They used the sequence search tool PSI-BLAST and the structural alignment program SKA (Yang and Honig, 2000) program to search for templates and they used the NEST (Petrey et al., 2003) program to model a 3D structure of a protein/domain. They considered a pair of query sequences to interact if there is at least one database template involving both query sequences. ...
... In general, sequence-based methods tend to give larger numbers of interface types than structure-based methods. GWIDD (Kundrotas et al., 2010) is a database of experimentally-determined 3D structures of protein-protein complexes as well as 3D models obtained by homology modelling using the NEST program (Petrey et al., 2003). For every PPI in the BIND and DIP databases, if there is no corresponding 3D structure in the PDB, GWIDD searches for a homologous PDB complex from which it builds by homology a 3D model of the complex. ...
Thesis
Understanding how the protein interactome works at a structural level could provide useful insights into the mechanisms of diseases. Comparative homology modelling and ab initio protein docking are two computational methods for modelling the three-dimensional (3D) structures of protein-protein interactions (PPIs). Previous studies have shown that both methods give significantly better predictions when they incorporate experimental PPI information. However, in general, PPI information is often not available in an easily accessible way, and cannot be re-used by 3D PPI modelling algorithms. Hence, there is currently a need to develop a reliable framework to facilitate the reuse of PPI data. This thesis presents a systematic knowledge-based approach for representing, describing and manipulating 3D interactions to study PPIs on a large scale and to facilitate knowledge-based modelling of protein-protein complexes. The main contributions of this thesis are: (1) it describes an integrated database of non-redundant 3D hetero domain interactions; (2) it presents a novel method of describing and clustering DDIs according to the spatial orientations of the binding partners, thus introducing the notion of "domain family-level binding sites" (DFBS); (3) it proposes a structural classification of DFBSs similar to the CATH classification of protein folds, and it presents a study of secondary structure propensities of DFBSs and interaction preferences; (4) it introduces a systematic case-base reasoning approach to model on a large scale the 3D structures of protein complexes from existing structural DDIs. All these contributions have been made publicly available through a web server (http://kbdock.loria.fr)
... The three-dimensional structures of pLM7p04, pUA140_p1, pUA140_p3 and pUA140_p4 proteins were also predicted by homology modeling using the program NEST, a fast model building program that applies an "artificial evolution" algorithm to construct a model from a given template and alignment [14]. The NEST option -tune 2 -was used to refine the alignment avoiding the unlikely occurrence of insertions and deletions within template secondary structure elements and the presence of "zig-zag" gaps in the alignment. ...
... The results obtained by submitting the structural model of pUA140_p1 to the Dali server are reported in Table 2, and these show that the most significant structural similarity (Z-score = 13.5, rmsd = 3.0 Å, lali = 208, nres = 280 and %id = 13) is detected with the dimeric enzyme phosphopantothenoylcysteine synthetase (PDB code: 1P9O). In order to find out what the likely function of pUA140_p1 could be, the structure of the enzyme phosphopantothenoylcysteine synthetase was used as a template to build the structural model of pUA140_p1 using NEST [14]. Although the resulting model appears more structured than that predicted by Rosetta, it is not possible to identify a protein region that shows features of a site responsible for binding of metal ions, an essential characteristic for the catalytic activity of rep proteins. ...
Article
The Gram-positive bacterium Streptococcus mutans is the principal causative agent of human tooth decay, an oral disease that affects the majority of the world’s population. Although the complete S. mutans genome is known, approximately 700 proteins are still annotated as hypothetical proteins, as no three-dimensional structure or homology with known proteins exists for them. Thus, the significant portion of genomic sequences coding for unknown-function proteins makes the knowledge of pathogenicity and survival mechanisms of S. mutans still incomplete. Plasmids are found in virtually every species of Streptococcus, and some of these mediate resistance to antibiotics and pathogenesis. However, there are strains of S. mutans that contain plasmids, such as LM7 and UA140, to which no function has been assigned yet. In this work, we describe an in silico study of the structure and function of all the S. mutans proteins encoded by pLM7 and pUA140 plasmids to gain insight into their biological function. A combination of different structural bioinformatics methodologies led to the identification of plasmidic proteins potentially required for the bacterial survival and pathogenicity. The structural information obtained on these proteins can be used to select novel targets for the design of innovative therapeutic agents towards S. mutans.
... Interactive surfaces were obtained from PISA (www.ebi.ac.uk/pdbe/pisa/). Homology models for identified antibody sequences were generated using NEST (Petrey et al., 2003) or SWISS-MODEL (Arnold et al., 2006). Structure figures were prepared using PyMOL (The PyMOL Molecular Graphics System (DeLano Scientific). ...
... Homology models for identified antibody sequences were generated using NEST (Petrey et al., 2003) or SWISS-MODEL (Arnold et al., 2006) using the corresponding VRC 310 antibody crystal structure. ...
Article
Antibodies capable of neutralizing divergent influenza A viruses could form the basis of a universal vaccine. Here, from subjects enrolled in an H5N1 DNA/MIV-prime-boost influenza vaccine trial, we sorted hemagglutinin cross-reactive memory B cells and identified three antibody classes, each capable of neutralizing diverse subtypes of group 1 and group 2 influenza A viruses. Co-crystal structures with hemagglutinin revealed that each class utilized characteristic germline genes and convergent sequence motifs to recognize overlapping epitopes in the hemagglutinin stem. All six analyzed subjects had sequences from at least one multidonor class, and—in half the subjects—multidonor-class sequences were recovered from >40% of cross-reactive B cells. By contrast, these multidonor-class sequences were rare in published antibody datasets. Vaccination with a divergent hemagglutinin can thus increase the frequency of B cells encoding broad influenza A-neutralizing antibodies. We propose the sequence signature-quantified prevalence of these B cells as a metric to guide universal influenza A immunization strategies.
... Smith-Waterman dynamic programming is used to align the target sequence to the main-chain paths. Finally, all-atom structure models are based on the top-10 C-alpha models using the ctrip program (Xiang and Honig, 2001;Petrey et al., 2003) and refined by energy minimization using Amber (Case et al., 2005). Like CR-I-TASSER, DeepMM can only handle building single protein chains and segmentation tools, such as Segger (Pintilie et al., 2010) are needed to preprocess multichain density maps into individual chain map segments. ...
Article
Full-text available
Ion channels are expressed in almost all living cells, controlling the in-and-out communications, making them ideal drug targets, especially for central nervous system diseases. However, owing to their dynamic nature and the presence of a membrane environment, ion channels remain difficult targets for the past decades. Recent advancement in cryo-electron microscopy and computational methods has shed light on this issue. An explosion in high-resolution ion channel structures paved way for structure-based rational drug design and the state-of-the-art simulation and machine learning techniques dramatically improved the efficiency and effectiveness of computer-aided drug design. Here we present an overview of how simulation and machine learning-based methods fundamentally changed the ion channel-related drug design at different levels, as well as the emerging trends in the field.
... The resulted multiple Ca models are ranked by their alignment scores. Finally, the all-atom structures are constructed from the top Ca models using the ctrip program in the Jackal modeling package (Petrey et al., 2003;Xiang and Honig, 2001) and refined by an energy minimization using Amber (Case et al., 2005). ...
Article
Full-text available
Motivation Advances in microscopy instruments and image processing algorithms have led to an increasing number of cryo-EM maps. However, building accurate models for the EM maps at 3-5 Å resolution remains a challenging and time-consuming process. With the rapid growth of deposited EM maps, there is an increasing gap between the maps and reconstructed/modeled 3-dimensional (3D) structures. Therefore, automatic reconstruction of atomic-accuracy full-atomstructures fromEMmaps is pressingly needed. Results We present a semi-automatic de novo structure determination method using a deep learningbased framework, named as DeepMM, which builds atomic-accuracy all-atom models from cryo-EM maps at near-atomic resolution. In our method, the main-chain and Cα positions as well as their amino acid and secondary structure types are predicted in the EM map using Densely Connected Convolutional Networks. DeepMM was extensively validated on 40 simulated maps at 5 Å resolution and 30 experimental maps at 2.6-4.8 Å resolution as well as an EMDB-wide data set of 2931 experimental maps at 2.6-4.9 Å resolution, and compared with state-of-the-art algorithms including RosettaES, MAINMAST, and Phenix. Overall, our DeepMM algorithm obtained a significant improvement over existing methods in terms of both accuracy and coverage in building full-length protein structures on all test sets, demonstrating the efficacy and general applicability of DeepMM. Availability http://huanglab.phys.hust.edu.cn/DeepMM Supplementary information Supplementary data are available at Bioinformatics online.
... The homology model was built with Nest. 30 Finally, the water molecules and the bound ligand were removed from each CYP1 3D structure, and protons were added using Reduce 3.24. 31 Compounds 13−26 were docked to the prepared CYP1 3D structures using Plants 1.2 32 and the ChemPLP 33 scoring function. ...
Article
Full-text available
Of the three enzymes in the human cytochrome P450 family 1, CYP1A2 is an important enzyme mediating metabolism of xenobiotics including drugs in the liver, while CYP1A1 and CYP1B1 are expressed in extrahepatic tissues. Currently used CYP substrates, such as 7-ethoxycoumarin and 7-ethoxyresorufin, are oxidized by all individual CYP1 forms. The main aim of this study was to find profluorescent coumarin substrates that are more selective for the individual CYP1 forms. Eleven 3-phenylcoumarin derivatives were synthetized, their enzyme kinetic parameters were determined, and their interactions in the active sites of CYP1 enzymes were analyzed by docking and molecular dynamic simulations. All coumarin derivatives and 7-ethoxyresorufin and 7-pentoxyresorufin were oxidized by at least one CYP1 enzyme. 3-(3-Methoxyphenyl)-6-methoxycoumarin (19) was 7-O-demethylated by similar high efficiency [21–30 ML/(min·mol CYP)] by all CYP1 forms and displayed similar binding in the enzyme active sites. 3-(3-Fluoro-4-acetoxyphenyl)coumarin (14) was selectively 7-O-demethylated by CYP1A1, but with low efficiency [0.16 ML/(min mol)]. This was explained by better orientation and stronger H-bond interactions in the active site of CYP1A1 than that of CYP1A2 and CYP1B1. 3-(4-Acetoxyphenyl)-6-chlorocoumarin (20) was 7-O-demethylated most efficiently by CYP1B1 [53 ML/(min·mol CYP)], followed by CYP1A1 [16 ML/(min·mol CYP)] and CYP1A2 [0.6 ML/(min·mol CYP)]. Variations in stabilities of complexes between 20 and the individual CYP enzymes explained these differences. Compounds 14, 19, and 20 are candidates to replace traditional substrates in measuring activity of human CYP1 enzymes.
... Thus, homology modeling of A. niger lipases is a good alternative to its crystal structure. However, a reliable structural model using homology modeling can be obtained only when the amino acid sequence to be modeled are more than 30% similar to the template sequence (Petrey et al., 2003). ...
Article
Full-text available
In this study, a sn-1, 3 extracellular lipases from Aspergillus niger GZUF36 (PEXANL1) was expressed in Pichia pastoris, characterized, and the predicted structural model was analyzed. The optimized culture conditions of P. pastoris showed that the highest lipase activity of 66.5 ± 1.4 U/mL (P < 0.05) could be attained with 1% methanol and 96 h induction time. The purified PEXANL1 exhibited the highest activity at pH 4.0 and 40°C temperature, and its original activity remained unaltered in the majority of the organic solvents (20% v/v concentration). Triton X-100, Tween 20, Tween 80, and SDS at a concentration of 0.01% (w/v) enhanced, and all the metal ions tested inhibited activity of purified PEXANL. The results of ultrasound-assisted PEXANL1 catalyzed synthesis of 1,3-diaglycerides showed that the content of 1,3-diglycerides was rapidly increased to 36.90% with 25 min of ultrasound duration (P < 0.05) and later decreased to 19.93% with 35 min of ultrasound duration. The modeled structure of PEXANL1 by comparative modeling showed α/β hydrolase fold. Structural superposition and molecular docking results validated that Ser162, His274, and Asp217 residues of PEXANL1 were involved in the catalysis. Small-angle X-ray scattering analysis indicated the monomer properties of PEXANL1 in solution. The ab initio model of PEXANL1 overlapped with its modeling structure. This work presents a reliable structural model of A. niger lipase based on homology modeling and small-angle X-ray scattering. Besides, the data from this study will benefit the rational design of suitable crystalline lipase variants in the future.
... Homology modeling can give spatial structures with the highest accuracy (Werner et al., 2012) and thus has been widely applied for rational analysis of interactions between small organic molecule (ligand) and target protein during the docking and virtual screening for drug discovery (Cheng et al., 2012). Homology modeling can be built by four methods, including rigid body assembly [by tools like SWISS-MODEL (Arnold et al., 2006)], segment matching [by tools like SEGMOD/ENCAD (Levitt, 1992)], spatial restraint [by tools like MODELER (Sali and Blundell, 1993)], and artificial evolution [by tools like NEST (Petrey et al., 2003)]. ...
Article
Full-text available
Owing to the high mortality and the spread rate, the infectious disease caused by SARS-CoV-2 has become a major threat to public health and social economy, leading to over 70 million infections and 1. 6 million deaths to date. Since there are currently no effective therapeutic or widely available vaccines, it is of urgent need to look for new strategies for the treatment of SARS-CoV-2 infection diseases. Binding of a viral protein onto cell surface heparan sulfate (HS) is generally the first step in a cascade of interaction that is required for viral entry and the initiation of infection. Meanwhile, interactions of selectins and cytokines (e.g., IL-6 and TNF-α) with HS expressed on endothelial cells are crucial in controlling the recruitment of immune cells during inflammation. Thus, structurally defined heparin/HS and their mimetics might serve as potential drugs by competing with cell surface HS for the prevention of viral adhesion and modulation of inflammatory reaction. In this review, we will elaborate coronavirus invasion mechanisms and summarize the latest advances in HS–protein interactions, especially proteins relevant to the process of coronavirus infection and subsequent inflammation. Experimental and computational techniques involved will be emphasized.
... Most of the templates for invertebrate-infecting viruses came from vertebrate and bacterial proteins, likely reflecting the limited structural coverage of non-vertebrate viruses and invertebrate hosts in the PDB. In order to measure structure similarity between protein structures we utilized Ska, an extensively utilized and validated tool for inference of structure-based functional relationships even in the absence of detectable sequence similarity Hwang et al., 2017;Lasso et al., 2019;Zhang et al., 2012;Petrey et al., 2003Petrey et al., , 2009Yang and Honig, 2000). In addition to Ska, we employed a conservative global structural similarity criteria (structural alignment score, SAS < 2.5 Å ) (Budowski-Tal et al., 2010;Kolodny et al., 2005;Subbiah et al., 1993) to infer structural mimics and minimize biases imposed by local structural similarities (see STAR Methods). ...
Article
Viruses deploy genetically encoded strategies to coopt host machinery and support viral replicative cycles. Here, we use protein structure similarity to scan for molecular mimicry, manifested by structural similarity between viral and endogenous host proteins, across thousands of cataloged viruses and hosts spanning broad ecological niches and taxonomic range, including bacteria, plants and fungi, invertebrates, and vertebrates. This survey identified over 6,000,000 instances of structural mimicry; more than 70% of viral mimics cannot be discerned through protein sequence alone. We demonstrate that the manner and degree to which viruses exploit molecular mimicry varies by genome size and nucleic acid type and identify 158 human proteins that are mimicked by coronaviruses, providing clues about cellular processes driving pathogenesis. Our observations point to molecular mimicry as a pervasive strategy employed by viruses and indicate that the protein structure space used by a given virus is dictated by the host proteome. A record of this paper’s transparent peer review process is included in the Supplemental Information.
... Homology modeling is a method for prediction of 3D structures of target AA sequences based on the known structures of templates with homologous sequences [33,55]. Examples of software programs for homology modeling include Modeller [64], Nest [37], and SegMod/ ENCAD [27]. First, after searching for template structures in the PDB, homology modeling programs can designate parts of known, but related 3D structures as templates for sequence alignments with the target sequences. ...
Article
Full-text available
The use of in silico strategies to develop the structural basis for a rational optimization of glycan-protein interactions remains a great challenge. This problem derives, in part, from the lack of technologies to quantitatively and qualitatively assess the complex assembling between a glycan and the targeted protein molecule. Since there is an unmet need for developing new sugar-targeted therapeutics, many investigators are searching for technology platforms to elucidate various types of molecular interactions within glycan-protein complexes and aid in the development of glycan-targeted therapies. Here we discuss three important technology platforms commonly used in the assessment of the complex assembly of glycosylated biomolecules, such as glycoproteins or glycosphingolipids: Biacore analysis, molecular docking, and molecular dynamics simulations. We will also discuss the structural investigation of glycosylated biomolecules, including conformational changes of glycans and their impact on molecular interactions within the glycan-protein complex. For glycoproteins, secreted protein acidic and rich in cysteine (SPARC), which is associated with various lung disorders, such as chronic obstructive pulmonary disease (COPD) and lung cancer, will be taken as an example showing that the core fucosylation of N-glycan in SPARC regulates protein-binding affinity with extracellular matrix collagen. For glycosphingolipids (GSLs), Globo H ceramide, an important tumor-associated GSL which is being actively investigated as a target for new cancer immunotherapies, will be used to demonstrate how glycan structure plays a significant role in enhancing angiogenesis in tumor microenvironments.
... The final step of the DeepMM workflow is to construct and refine all-atom structures from predicted DeepMM Cα models [36][37][38] . To investigate the impact of this modeling process, Figure 4d gives the RMSDs of DeepMM Cα models and final all-atom structures. ...
Preprint
Full-text available
Motivation and Results Advances in microscopy instruments and image processing algorithms have led to an increasing number of cryo-EM maps. However, building accurate models for the EM maps at 3-5 Å resolution remains challenging and time-consuming. Here, we present a fully automatic de novo structure determination method using a deep learning-based framework, named as DeepMM, which automatically builds atomic-accuracy all-atom models from cryo-EM maps at near-atomic resolution. In our method, the main-chain and C α positions as well as their amino acid and secondary structure types are predicted in the EM map using Densely Connected Convolutional Networks. DeepMM was extensively validated on 40 simulated maps at 5 Å resolution and 30 experimental maps at 2.6-4.8 Å resolution as well as an EMDB-wide data set of 2931 experimental maps at 2.6-4.9 Å resolution. DeepMM built correct models for >60% of the cases, and it outperformed existing state-of-the-art algorithms including RosettaES, MAINMAST, and Phenix. Availability http://huanglab.phys.hust.edu.cn/DeepMM/
... Its structure as part of the 60S ribosomal subunit has been well established (Anger et al. 2013). As the amino acid position of all six6 missense variants which we identified in RPL3L were conserved in both paralogs, we constructed a homology-based structural model of RPL3L using the structure of RPL3 in the human ribosome to gain further insights into the pathogenic effects of the identified RPL3L missense variants (Altschul et al. 1997;Petrey et al. 2003). Three mutated residues were located in regions directly contributing to RNA binding. ...
Article
Full-text available
Dilated cardiomyopathy (DCM) belongs to the most frequent forms of cardiomyopathy mainly characterized by cardiac dilatation and reduced systolic function. Although most cases of DCM are classified as sporadic, 20–30% of cases show a heritable pattern. Familial forms of DCM are genetically heterogeneous, and mutations in several genes have been identified that most commonly play a role in cytoskeleton and sarcomere-associated processes. Still, a large number of familial cases remain unsolved. Here, we report five individuals from three independent families who presented with severe dilated cardiomyopathy during the neonatal period. Using whole-exome sequencing (WES), we identified causative, compound heterozygous missense variants in RPL3L (ribosomal protein L3-like) in all the affected individuals. The identified variants co-segregated with the disease in each of the three families and were absent or very rare in the human population, in line with an autosomal recessive inheritance pattern. They are located within the conserved RPL3 domain of the protein and were classified as deleterious by several in silico prediction software applications. RPL3L is one of the four non-canonical riboprotein genes and it encodes the 60S ribosomal protein L3-like protein that is highly expressed only in cardiac and skeletal muscle. Three-dimensional homology modeling and in silico analysis of the affected residues in RPL3L indicate that the identified changes specifically alter the interaction of RPL3L with the RNA components of the 60S ribosomal subunit and thus destabilize its binding to the 60S subunit. In conclusion, we report that bi-allelic pathogenic variants in RPL3L are causative of an early-onset, severe neonatal form of dilated cardiomyopathy, and we show for the first time that cytoplasmic ribosomal proteins are involved in the pathogenesis of non-syndromic cardiomyopathies.
... It is worth mentioning that the accuracy of template-based modelling increases when more than one template is utilized to construct a protein 3D structure, as reported by Venclovas et al. [24] and Sanchez et al. [25], and then each template is evaluated according to a scoring function such as the energy function [26]. The resulting model predictions outperform models that were based on the single best template [27]. When several templates are utilized to model the protein, they generally are superposed with each other and, later on, the multiple template-based alignment is utilized [28,29]. ...
Article
Full-text available
We discuss the use of the regularized linear discriminant analysis (LDA) as a model reduction technique combined with particle swarm optimization (PSO) in protein tertiary structure prediction, followed by structure refinement based on singular value decomposition (SVD) and PSO. The algorithm presented in this paper corresponds to the category of template-based modeling. The algorithm performs a preselection of protein templates before constructing a lower dimensional subspace via a regularized LDA. The protein coordinates in the reduced spaced are sampled using a highly explorative optimization algorithm, regressive-regressive PSO (RR-PSO). The obtained structure is then projected onto a reduced space via singular value decomposition and further optimized via RR-PSO to carry out a structure refinement. The final structures are similar to those predicted by best structure prediction tools, such as Rossetta and Zhang servers. The main advantage of our methodology is that alleviates the ill-posed character of protein structure prediction problems related to high dimensional optimization. It is also capable of sampling a wide range of conformational space due to the application of a regularized linear discriminant analysis, which allows us to expand the differences over a reduced basis set.
... Computational structure prediction has been widely used as a tool to propose hypothetical models from algorithms generated for different softwares that are based on considering forces applied to the system from the primary sequences, spatial distribution and comparison with experimentally isolated structures of which their conservation with their evolutionary phase is known [28], [29]. ...
Article
Full-text available
American trypanosomiasis, commonly known as Chagas disease, is a desease with the highest prevalence in the tropics and is caused by the parasite Trypanosoma cruzi, whose vector is an insect from the Rhodnius prolixus family. The pathology of this disease is characterized by the presence of cardiopathies and gastrointestinal problems in patients during chronic phases. It should be noted that an approach of a structure of the orthosteric site that allows to explain the functionality and the plausible mechanism of reaction is important in order to understand the design of molecular targets or possible resistance generated in chronic phases of the disease. This is why the structural biology has tools such as the homology modelling and the structural assembly by sequence-based fold recognition to construct a model. Besides, the proposed models obtained by comparison with the reported structures are validated through energetic and stereochemical softwares that produce quantitative data, which characterize the structural models. The previous validation would allow to compare two predictive and structural refinement methods to generate the best methodology of elucidation.
... The quality of the resulting model depends on the degree of homology with the structure used as template [122]. Different methods can be used for model building [123], like rigidbody assembly (implemented in SwissModel web server [124]), segment matching [125], spatial restraint (implemented in Modeller software [126]) or artificial evolution (implemented in Nest program [127]). In addition, databases of homology models were developed, such as ModBase [128] that currently comprises over 6 million unique sequences modeled and over 37 million models. ...
Article
Full-text available
Background: Alzheimer's disease (AD) is considered a severe, irreversible and progressive neurodegenerative disorder. Currently, the pharmacological management of AD is based on a few clinically approved acethylcholinesterase (AChE) and N-methyl-D-aspartate (NMDA) receptor ligands, with unclear molecular mechanisms and severe side effects. Methods: Here, we reviewed the most recent bioinformatics, cheminformatics (SAR, drug design, molecular docking, friendly databases, ADME-Tox) and experimental data on relevant structure-biological activity relationships and molecular mechanisms of some natural and synthetic compounds with possible anti-AD effects (inhibitors of AChE, NMDA receptors, beta-secretase, amyloid beta (Aβ), redox metals) or acting on multiple AD targets at once. We considered: (i) in silico supported by experimental studies regarding the pharmacological potential of natural compounds as resveratrol, natural alkaloids, flavonoids isolated from various plants and donepezil, galantamine, rivastagmine and memantine derivatives, (ii) the most important pharmacokinetic descriptors of natural compounds in comparison with donepezil, memantine and galantamine. Results: In silico and experimental methods applied to synthetic compounds led to the identification of new AChE inhibitors, NMDA antagonists, multipotent hybrids targeting different AD processes and metal-organic compounds acting as Aβ inhibitors. Natural compounds appear as multipotent agents, acting on several AD pathways: cholinesterases, NMDA receptors, secretases or Aβ, but their efficiency in vivo and their correct dosage is to be determined. Conclusion: Bioinformatics, cheminformatics and ADME-Tox methods can be very helpful in the quest for an effective anti-AD treatment, allowing the identification of novel drugs, enhancing the druggability of molecular targets and providing a deeper understanding of AD pathological mechanisms.
... The homology model of FLNA6 was built in two steps: 1) the sequence alignment of FLNA6 and template structure of FLNA5 (Protein Data Bank (PDB): 4M9P) (14) were constructed using the MALIGN tool in Bodil software (25) by employing a structure-based matrix (26) with a gap penalty of 40, and 2) the sequence-alignment-based model of FLNA6 was built using the nest tool in Jackal software (Honig Lab, New York, NY) (27). ...
Article
Mitral valve diseases affect ∼3% of the population and are the most common reasons for valvular surgery because no drug-based treatments exist. Inheritable genetic mutations have now been established as the cause of mitral valve insufficiency, and four different missense mutations in the filamin A gene (FLNA) have been found in patients suffering from nonsyndromic mitral valve dysplasia (MVD). The filamin A (FLNA) protein is expressed, in particular, in endocardial endothelia during fetal valve morphogenesis and is key in cardiac development. The FLNA-MVD-causing mutations are clustered in the N-terminal region of FLNA. How the mutations in FLNA modify its structure and function has mostly remained elusive. In this study, using NMR spectroscopy and interaction assays, we investigated FLNA-MVD-causing V711D and H743P mutations. Our results clearly indicated that both mutations almost completely destroyed the folding of the FLNA5 domain, where the mutation is located, and also affect the folding of the neighboring FLNA4 domain. The structure of the neighboring FLNA6 domain was not affected by the mutations. These mutations also completely abolish FLNA's interactions with protein tyrosine phosphatase nonreceptor type 12, which has been suggested to contribute to the pathogenesis of FLNA-MVD. Taken together, our results provide an essential structural and molecular framework for understanding the molecular bases of FLNA-MVD, which is crucial for the development of new therapies to replace surgery.
... Besides MODELLER, there are other comparative modelling programs available such as nest (Petrey et al., 2003), 3D-JIGSAW (Bates et al., 2001), Builder (Koehl and Delarue, 1995), SWISS-MODEL (Kopp and Schwede, 2004), and SegMod/ENCAD (Levitt, 1983(Levitt, , 1992. A comparison was done between all these methods by Wallner and Elofsson (2005). ...
Thesis
Functional families (FunFams) are a sub-classification of CATH protein domain superfamilies that cluster relatives likely to have very similar structures and functions. The functional purity of FunFams has been demonstrated by comparing against experimentally determined Enzyme Commission annotations and by checking whether known functional sites coincide with highly conserved residues in the multiple sequence alignments of FunFams. We hypothesised that clustering relatives into FunFams may help in protein structure modelling. In the first work chapter, we demonstrate the structural coherence of domains in FunFams. We then explore the usage of FunFams in protein monomer modelling. The FunFam based protocol produced higher percentages of good models compared to an HHsearch (the state-of-the-art HMM based sequence search tool) based protocol for both close and remote homologs. We developed a modelling pipeline that, utilises the FunFam protocol, and is able to model up to 70% of domain sequences from human and fly genomes. In the second work chapter, we explore the usage of FunFams in protein complex modelling. Our analysis demonstrated that domain-domain interfaces in FunFams tend to be conserved. The FunFam based complex modelling protocol produced significantly more good quality models when compared to a BLAST based protocol and slightly better than a HHsearch based protocol. In the final work chapter, we employ the FunFam based structural modelling tool to understand the implications of alternative splicing. We focused on isoforms derived from mutually exclusively exons (MXEs) for which there is more enriched in proteomics data. MXEs which could be mapped to structure show a significant tendency to be exposed to the solvent, are likely to exhibit a significant change in their physiochemical property and to lie close to a known/predicted functional sites. Our results suggest that MXE events may have a number of important roles in cells generally.
... BLAST was used for template search and alignment, while NEST (http://honig.c2b2.columbia.edu/nest/) was used to model the structure of the BOMV GP1,2 40 . A non-redundant set of sequences was assembled, corresponding to the proteins in the NCBI PDB, using a sequence identity cut-off of 1.0 with CD-HIT 43 (http://weizhongli-lab.org/cd-hit/). ...
... Energy minimization follows NEST tools that build a model by using an artificial evolution algorithm where changes from the template structure such as substitutions, insertions, and deletions are made one at a time and each mutation. This process is repeated until the whole query is modeled (Petrey et al., 2003). Modeller tools that most popular tools. ...
Article
Full-text available
SP3 Transcription factor contains 81,925 Dalton mass, which member of the Kruppel like zinc finger protein family that is clinically relevant for many neuronal transmission diseases. Considering the functional importance and lack of X-ray crystal structure of SP3 TFs protein, present work was undertaken to build the3D structure of aprotein using homology modeling with a multi-template approach. This present study, we chose three different SP3 templates (PDB ID: 3EBT, 4M9E, and 2WBS) were used for homology modeling. Five models were developedwith the help of multiple sequence alignment respect to templates using Modeller 8.0.0 software. All models were refined and ranked as per their overall DOPE-score. The top-ranked predicted model of SP3 TFs had 93.8% of residues in favored regions as revealed by Ramachandran plot and the ERRAT score was 100% which indicated an accurate model. The results of the homology modeling study and the proposed model can be further used for understanding the structural and functional characteristics of SP3 and to gain more insights to the molecular basis of SP3 inhibition through docking and molecular dynamics simulation studies. IndexTerms-Specificity Protein3 (SP3), Multi-template Homology Modeling, Modeller.
... The gap between known protein sequences and identified protein structures is significantly growing. Given an enormous amount of data through a vast array of DNA sequencing techniques available, experimental structure identification techniques require attention (180). Computational techniques are actively exploited in the pharmaceutical industry for the prediction of 3D This article is protected by copyright. ...
Article
Ebola virus disease (EVD), caused by Ebola viruses, resulted in more than 11500 deaths according to a recent 2018 WHO report. With mortality rates up to 90 %, it is nowadays one of the most deadly infectious diseases. However, no FDA approved Ebola drugs or vaccines are available yet with the mainstay of therapy being supportive care. The high fatality rate and absence of effective treatment or vaccination makes Ebola virus a category A biothreat pathogen. Fortunately, a series of investigational countermeasures have been developed to control and prevent this global threat. This review summarizes the recent therapeutic advances and ongoing research progress from R&D to clinical trials in the development of small‐molecule antiviral drugs, small interference RNA molecules, phosphorodiamidate morpholino oligomers, full‐length monoclonal antibodies and vaccines. Moreover, difficulties are highlighted in the search for effective countermeasures against EVD with additional focus on the interplay between available in silico prediction methods and their evidenced potential in antiviral drug discovery.
... At a lower level, homology can be seen as either a four-phase (Martí-Renom et al., 2000) or a five-phase procedure (Floudas, 2007). Schwede, Kopp, Guex, & Peitsch, 2003), MODELLER (Sali & Blundell, 1993;Benjamin Webb & Sali, 2016), NEST (Petrey et al., 2003), OPLS (Jacobson et al., 2004), SABERTOOTH (Teichert, Minning, Bastolla, & Porto, 2010), and FUGUE (J. Shi, Blundell, & Mizuguchi, 2001). ...
Thesis
Full-text available
Proteins play critical biochemical roles in all living organisms; in human beings, they are the targets of 50% of all drugs. Although the first protein structure was determined 60 years ago, experimental techniques are still time and cost consuming. Consequently, in silico protein structure prediction, which is considered a main challenge in computational biology, is fundamental to decipher conformations of protein targets. This thesis contributes to the state of the art of fragment-assembly protein structure prediction. This category has been widely and thoroughly studied due to its application to any type of targets. While the majority of research focuses on enhancing the functions that are used to score fragments by incorporating new terms and optimising their weights, another important issue is how to pick appropriate fragments from a large pool of candidate structures. Since prediction of the main structural classes, i.e. mainly-alpha, mainly-beta and alpha-beta, has recently reached quite a high level of accuracy, we have introduced a novel approach by decreasing the size of the pool of candidate structures to comprise only proteins that share the same structural class a target is likely to adopt. Picking fragments from this customised set of known structures not only has contributed in generating decoys with higher level of accuracy but also has eliminated irrelevant parts of the search space which makes the selection of first models a less complicated process, addressing the inaccuracies of energy functions. In addition to the challenge of adopting a unique template structure for all targets, another one arises whenever relying on the same amount of corrections and fine tunings; such a phase may be damaging to “easy’ targets, i.e. those that comprise a relatively significant percentage of alpha helices. Owing to the sequence-structure correlation based on which fragment-based protein structure prediction was born, we have also proposed a customised phase of correction based on the structural class prediction of the target in question. After using secondary structure prediction as a “global feature” of a target, i.e. structural classes, we have also investigated its usage as a “local feature” to customise the number of candidate fragments, which is currently the same at all positions. Relying on the known facts regarding diversity of short fragments of helices, sheets and loops, the fragment insertion process has been adjusted to make “changes” relative to the expected complexity of each region. We have proved in this thesis the extent to which secondary structure features can be used implicitly or explicitly to enhance fragment assembly protein structure prediction.
... In some cases, even though there is a publicly available structural alignment server, it is not fast enough for navigating structure space; for these, one may prefer to pre-calculate allagainst-all comparisons (e.g., using the parallel power of a computer cluster). We list just a few examples of comparison methods that were used in a similar context: HHSearch [30], Matt [74], CE [75], Mammoth [76], 3D-BLAST [77], FragBag [78], TM-align [71], SSM [68], GRASP [79], and STRUCTAL [80]. Third, the structural alignment servers do not offer a global perspective of structure space, only a local one, and one may be interested in this global perspective. ...
Chapter
Full-text available
Present-day protein space is the result of 3.7 billion years of evolution, constrained by the underlying physicochemical qualities of the proteins. It is difficult to differentiate between evolutionary traces and effects of physicochemical constraints. Nonetheless, as a rule of thumb, instances of structural reuse, or focusing on structural similarity, are likely attributable to physicochemical constraints, whereas sequence reuse, or focusing on sequence similarity, may be more indicative of evolutionary relationships. Both types of relationships have been studied and can provide meaningful insights to protein biophysics and evolution, which in turn can lead to better algorithms for protein search, annotation, and maybe even design.
... BLAST was used for template search and alignment, while NEST (http://honig.c2b2.columbia.edu/nest/) was used to model the structure of the BOMV GP1,2 40 . A non-redundant set of sequences was assembled, corresponding to the proteins in the NCBI PDB, using a sequence identity cut-off of 1.0 with CD-HIT 43 (http://weizhongli-lab.org/cd-hit/). ...
Article
Full-text available
Here we describe the complete genome of a new ebolavirus, Bombali virus (BOMV) detected in free-tailed bats in Sierra Leone (little free-tailed (Chaerephon pumilus) and Angolan free-tailed (Mops condylurus)). The bats were found roosting inside houses, indicating the potential for human transmission. We show that the viral glycoprotein can mediate entry into human cells. However, further studies are required to investigate whether exposure has actually occurred or if BOMV is pathogenic in humans.
... We used the following 3D structures in our analysis and visualization of Gα-RGS complexes (with PDB codes for each structure): Gα i1 -RGS4 (1AGR) [11], Gα i1 -RGS16 (2IK8) [14], Gα i1 -RGS1 (2GTP) [14], Gα q -RGS2 (4EKD) [16], and Gα q -RGS8 (5DO9) [17]. Missing short segments in PDB entry 2IK8 (Gα i1 residues 112-118) were modeled on the basis of the Gα i1 -RGS4 structure (PDB 1AGR) using the program Nest [56], with partial or missing side chains being modeled using Scap [57]. 3D structural visualization and superimposition were carried out with PyMol (http://pymol.org). ...
Article
Regulators of G protein Signaling (RGS) proteins inactivate Gα subunits, thereby controling G protein-coupled signaling networks. Among all RGS proteins, RGS2 is unique in interacting only with the Gαq and not with the Gαi sub-family. Previous studies suggested that this specificity is determined by the RGS domain, and in particular by three RGS2-specific residues that lead to a unique mode of interaction with Gαq This interaction was further proposed to act through contacts with the Gα GTPase domain. Here, we combined energy calculations and GTPase activity measurements to determine which Gα residues dictate specificity toward RGS2. We identified putative specificity-determining residues in the Gα helical domain, which among G proteins is found only in Gα subunits. Replacing these helical domain residues in Gαi with their Gαq counterparts resulted in a dramatic specificity-switch towards RGS2. We further show that Gα-RGS2 specificity is set by Gαi residues that perturb interactions with RGS2, and by Gαq residues that enhance these interactions. These results show, for the first time, that the Gα helical domain is central to dictating specificity towards RGS2, suggesting this domain plays a general role in governing Gα-RGS specificity. Our insights provide new options for manipulating RGS-G protein interactions in vivo , for better understanding of their "wiring" into signaling networks, and for devising novel drugs targeting such interactions.
... We used individual and joined templates based on both crystal structures at the same time to build H3R models. The programs MODELER 9.14 [81] and Jackal-nest [82] as well as modeling services Swiss-Model [83] and I-TASSER [84] were applied for homology modeling. Modeling parameters are summarized in S1 Table in the supplementary data. ...
Article
Full-text available
The crucial role of G-protein coupled receptors and the significant achievements associated with a better understanding of the spatial structure of known receptors in this family encouraged us to undertake a study on the histamine H3 receptor, whose crystal structure is still unresolved. The latest literature data and availability of different software enabled us to build homology models of higher accuracy than previously published ones. The new models are expected to be closer to crystal structures; and therefore, they are much more helpful in the design of potential ligands. In this article, we describe the generation of homology models with the use of diverse tools and a hybrid assessment. Our study incorporates a hybrid assessment connecting knowledge-based scoring algorithms with a two-step ligand-based docking procedure. Knowledge-based scoring employs probability theory for global energy minimum determination based on information about native amino acid conformation from a dataset of experimentally determined protein structures. For a two-step docking procedure two programs were applied: GOLD was used in the first step and Glide in the second. Hybrid approaches offer advantages by combining various theoretical methods in one modeling algorithm. The biggest advantage of hybrid methods is their intrinsic ability to self-update and self-refine when additional structural data are acquired. Moreover, the diversity of computational methods and structural data used in hybrid approaches for structure prediction limit inaccuracies resulting from theoretical approximations or fuzziness of experimental data. The results of docking to the new H3 receptor model allowed us to analyze ligand— receptor interactions for reference compounds.
... Models were generated using the PDB file 1ZAA [33] for a template with the crystallographic water molecules and counter-ions removed. The Jackal [34] program was used to model the protein, while the DNA component was mutated using Chimera [35] to produce the requisite sequence. The DNA was elongated by 4 bp at each end using X3DNA [36], such that any end effects (termini melting) would not affect the protein-bound nucleotides. ...
Article
Full-text available
Background The C2H2 zinc finger (C2H2-ZF) is the most numerous protein domain in many metazoans, but is not as frequent or diverse in other eukaryotes. The biochemical and evolutionary mechanisms that underlie the diversity of this DNA-binding domain exclusively in metazoans are, however, mostly unknown. Results Here, we show that the C2H2-ZF expansion in metazoans is facilitated by contribution of non-base-contacting residues to DNA binding energy, allowing base-contacting specificity residues to mutate without catastrophic loss of DNA binding. In contrast, C2H2-ZF DNA binding in fungi, plants, and other lineages is constrained by reliance on base-contacting residues for DNA-binding functionality. Reconstructions indicate that virtually every DNA triplet was recognized by at least one C2H2-ZF domain in the common progenitor of placental mammals, but that extant C2H2-ZF domains typically bind different sequences from these ancestral domains, with changes facilitated by non-base-contacting residues. Conclusions Our results suggest that the evolution of C2H2-ZFs in metazoans was expedited by the interaction of non-base-contacting residues with the DNA backbone. We term this phenomenon “kaleidoscopic evolution,” to reflect the diversity of both binding motifs and binding motif transitions and the facilitation of their diversification. Electronic supplementary material The online version of this article (doi:10.1186/s13059-017-1287-y) contains supplementary material, which is available to authorized users.
... A more recent method is called the artificial evolution model building. It was first implemented in the NEST program [100]. In this approach, the target model building in a homology modeling process should be similar to the natural process of evolving a protein that happens in multiple steps. ...
Article
Full-text available
Resolving the three dimensional structure of a protein is a critical step in modern drug discovery today. Homology modeling is a powerful tool that can efficiently predict protein structures from their amino acid sequence. Although it might sound simple enough, homology modeling, in fact, has to pass through several sophisticated steps before it can predict an accurate structure of a protein. These steps include template identification, alignment with the template, model construction and many post-modeling processes. Here, we describe in details these different steps, discuss the strengths and limitations of the methods and list a number of successful homology modelling applications in the literature. The objective of this review is to shed light on this extremely useful tool and highlight many case studies in this area of active research.
... Afterwards, for each family, Multinest [Petrey(2003)], a homology structure prediction tool, has been employed to predict the structure of individual enzymes with respect to the template. ...
Thesis
Full-text available
The number of sequenced genomes continues to increase; however, experimentally characterized enzymes remain patchy. Experimentally characterization of enzymes, e.g. measuring thermostability, is time-consuming and there are required procedures for this type of characterization such as to clone, express and purification which makes it an even expensive task. Predicting the temperature at which enzymes function more effectively, from amino acid sequences, is highly valuable for any enzymes discovery project. In this thesis, a framework to predict thermostability of seven different Glycoside Hydrolase enzymes from their amino acid sequences is presented. Different structural and sequence-based features are used for training the Gaussian process regressor. A novel covariance function for Gaussian processes, specifically for protein sequences, has been introduced, which suits for different protein property predictions. Finally, Lasso with stability selection approach ranked all features and most predictive ones are reported. The final model is evaluated through a cross-validation procedure and also by supplying an external evaluation set. Experiments show that the presented model has a potential to be used for enzyme discovery projects.
... SWISS-MODEL [20] generates a core model by averaging template backbone atom positions. NEST [21] implements an artificial evolution algorithm where changes from the template structure such as substitutions, insertions and deletions are made at one time, and each mutation is followed by energy minimization. This process is repeated until the whole target protein is modelled. ...
Article
Full-text available
Protein structure prediction is an important problem in computational biology, and is widely applied to various biomedical problems such as protein function study, protein design, and drug design. In this work, we developed a novel deep learning approach based on a deeply stacked denoising autoencoder for protein structure reconstruction. We applied our approach to a template-based protein structure prediction using only the 3D structural coordinates of homologous template proteins as input. The templates were identified for a target protein by a PSI-BLAST search. 3DRobot (a program that automatically generates diverse and well-packed protein structure decoys) was used to generate initial decoy models for the target from the templates. A stacked denoising autoencoder was trained on the decoys to obtain a deep learning model for the target protein. The trained deep model was then used to reconstruct the final structural model for the target sequence. With target proteins that have highly similar template proteins as benchmarks, the GDT-TS score of the predicted structures is greater than 0.7, suggesting that the deep autoencoder is a promising method for protein structure reconstruction.
... Thus, a sequence alignment between FucP and Fpn was obtained using the program ALIGNME [29]. Once a sequence alignment between FucP and Fpn was obtained, a model for the outward-open conformation of the latter protein was built using the homology modelling program NEST [30]. The two models are qualitatively very similar with an rmsd of only 1.95 A over 708 superimposed backbone atoms; thus only the model obtained using I-TASSER is presented in the Results section. ...
Article
Motivation Ferroportin (Fpn) is a membrane protein belonging to the Major Facilitator Superfamily of transporters. It is the only vertebrate iron exporter known so far. Several Fpn mutations lead to the so-called ‘ferroportin disease’ or type 4 haemochromatosis, characterized by two distinct iron accumulation phenotypes depending on whether the mutations affects the protein’s activity or its degradation pathway (1). Despite a general agreement of the scientific community on a 12 transmembrane helices topology, no experimental data are available on human Fpn (HsFpn) three-dimensional structure. Thus, important features of HsFpn remain to be clarified. Recently, the crystal structures of a HsFpn homologue from the predatory Gram-negative bacterium Bdellovibrio bacteriovorus (BbFPN), in both the outward- (Figure 1 A) and inward-open states (Figure 1 B), has been reported (2). The residues essential for iron binding and transport in HsFpn are conserved in BbFPN (3). The conservation of these functionally relevant residues prompted us to exploit the two BbFPN structures to construct reliable models of HsFPN. Methods The structural models of HsFpn in were built in both the outward- and inward-open states through the ab initio/threading strategy implemented in the I-TASSER server (4). The overall quality of the models generated has been evaluated using PROCHECK (5) and the model quality parameters provided in the I-TASSER output, such as the C-score. Putative iron binding sites have been detected using LIBRA (6). Results The models display the typical fold of MFS proteins with 12 TMs spanning the membrane and the N- and C-termini located on the intracellular side (Figure 1). LIBRA analysis of the models has led to the identification of potential iron binding sites in the inward-open state allowing to propose an iron traslocation mechanism. Further, the outward-open model uncovers details of the interaction site of the peptide hormone hepcidin, a regulator of HsFpn function. Fin ally, the HsFPN models provide a mechanistic interpretation for the disease-related mutations that cause hereditary hemochromatosis.
... References (50)(51)(52)(53)(54)(55)(56)(57)(58)(59) appear in the Supporting Material. ...
Article
Full-text available
Cryo-electron-microscopy (cryo-EM) structures of flaviviruses reveal significant variation in epitope occupancy across different monoclonal antibodies that have largely been attributed to epitope-level differences in conformation or accessibility that affect antibody binding. The consequences of these variations for macroscopic properties such as antibody binding and neutralization are the results of the law of mass action—a stochastic process of innumerable binding and unbinding events between antibodies and the multiple binding sites on the flavivirus in equilibrium—that cannot be directly imputed from structure alone. We carried out coarse-grained spatial stochastic binding simulations for nine flavivirus antibodies with epitopes defined by cryo-EM or x-ray crystallography to assess the role of epitope spatial arrangement on antibody-binding stoichiometry, occupancy, and neutralization. In our simulations, all epitopes were equally competent for binding, representing the upper limit of binding stoichiometry that results from epitope spatial arrangement alone. Surprisingly, our simulations closely reproduced the relative occupancy and binding stoichiometry observed in cryo-EM, without having to account for differences in epitope accessibility or conformation, suggesting that epitope spatial arrangement alone may be sufficient to explain differences in binding occupancy and stoichiometry between antibodies. Furthermore, we found that there was significant heterogeneity in binding configurations even at saturating antibody concentrations, and that bivalent antibody binding may be more common than previously thought. Finally, we propose a structure-based explanation for the stoichiometric threshold model of neutralization.
Article
Full-text available
Many peptide-derived natural products are produced by non-ribosomal peptide synthetases (NRPSs) in an assembly-line fashion. Each amino acid is coupled to a designated peptidyl carrier protein (PCP) through two distinct reactions catalysed sequentially by the single active site of the adenylation domain (A-domain). Accumulating evidence suggests that large-amplitude structural changes occur in different NRPS states; yet how these molecular machines orchestrate such biochemical sequences has remained elusive. Here, using single-molecule Förster resonance energy transfer, we show that the A-domain of gramicidin S synthetase I adopts structurally extended and functionally obligatory conformations for alternating between adenylation and thioester-formation structures during enzymatic cycles. Complementary biochemical, computational and small-angle X-ray scattering studies reveal interconversion among these three conformations as intrinsic and hierarchical where intra-A-domain organizations propagate to remodel inter-A–PCP didomain configurations during catalysis. The tight kinetic coupling between structural transitions and enzymatic transformations is quantified, and how the gramicidin S synthetase I A-domain utilizes its inherent conformational dynamics to drive directional biosynthesis with a flexibly linked PCP domain is revealed.
Chapter
Proteins are the basic biological units of life responsible for almost every function within the body. The three-dimensional structure of the protein that represents its native state is critical for the biochemical activity of a protein. The information for proper folding of a protein is hidden in its primary sequence. Hence, several strategies are commonly used for predicting the tertiary structure of a protein from its sequence. A typical protein structure prediction strategy homology modeling is employed for targets which have homologous proteins with high sequence similarity and known structure. It involves the identification of a suitable template structure from which the three-dimensional information for a query sequence can be extrapolated. Some protein targets may share only structure-level homology with proteins with similar folds. Fold recognition method comprises identification of such remote homologs that needs more sensitive search for relevant structural folds. If a structural homolog for the target sequence is unavailable, template-free methods including ab initio modeling can be used. However, template-based methods are preferred as template-free modeling methods are much less reliable and are usually applicable for smaller proteins. More recent automated hybrid strategies include amalgamation of both template based and template-free prediction strategies to obtain protein structure models with high accuracy. Advancement in computational techniques and application of deep learning in protein structure prediction has enabled crystal structure resolution predictions. In this book chapter, we discuss strategies and highlight various tools for protein tertiary structure prediction.
Article
Full-text available
Alzheimer's disease pathology is characterized by β-amyloid plaques and neurofibrillary tangles. Amyloid precursor protein is processed by β and γ secretase, resulting in the production of β-amyloid peptides with a length ranging from 38 to 43 amino acids. Presenilin 1 (PS1) is the catalytic unit of γ-secretase, and more than 200 PS1 pathogenic mutations have been identified as causative for Alzheimer's disease. A complete monocrystal structure of PS1 has not been determined so far due to the presence of two flexible domains. We have developed a complete structural model of PS1 using a computational approach with structure prediction software. Missing fragments Met1-Glut72 and Ser290-Glu375 were modeled and validated by their energetic and stereochemical characteristics. Then, with the complete structure of PS1, we defined that these fragments do not have a direct effect in the structure of the pore. Next, we used our hypothetical model for the analysis of the functional effects of PS1 mutations Ala246GLu, Leu248Pro, Leu248Arg, Leu250Val, Tyr256Ser, Ala260Val, and Val261Phe, localized in the catalytic pore. For this, we used a quantum mechanics/ molecular mechanics (QM/MM) hybrid method, evaluating modifications in the topology, potential surface density, and electrostatic potential map of mutated PS1 proteins. We found that each mutation exerts changes resulting in structural modifications of the active site and in the shape of the pore. We suggest this as a valid approach for functional studies of PS1 in view of the possible impact in substrate processing and for the design of targeted therapeutic strategies.
Chapter
Rational drug discovery relies heavily on molecular docking-based virtual screening, which samples flexibly the ligand binding poses against the target protein’s structure. The upside of flexible docking is that the geometries of the generated docking poses are adjusted to match the residue alignment inside the target protein’s ligand-binding pocket. The downside is that the flexible docking requires plenty of computing resources and, regardless, acquiring a decent level of enrichment typically demands further rescoring or post-processing. Negative image-based screening is a rigid docking technique that is ultrafast and computationally light but also effective as proven by vast benchmarking and screening experiments. In the NIB screening, the target protein cavity’s shape/electrostatics is aligned and compared against ab initio-generated ligand 3D conformers. In this chapter, the NIB methodology is explained at the practical level and both its weaknesses and strengths are discussed candidly.
Article
Full-text available
Alzheimer's disease is a major neurodegenerative illness whose prevalence is increasing worldwide but the molecular mechanism remains unclear. There is some scientific evidence that the molecular complexity of Alzheimer's pathophys-iology is associated with the formation of extracellular amyloid-beta plaques in the brain. A novel cross-phenotype association analysis of imaging genetics reported a brain atrophy susceptibility gene, namely FAM222A and the protein Aggregatin encoded by FAM222A interacts with amyloid-beta (Aβ)-peptide (1-42) through its N-terminal Aβ binding domain and facilitates Aβ aggregation. The function of Aggregatin protein is unknown, and its three-dimensional structure has not been analyzed experimentally yet. Our goal was to investigate the interaction of Aggregatin with Aβ in detail by in silico analysis, including the 3D structure prediction analysis of Aggregatin protein by homology modeling. Our analysis verified the interaction of the C-terminal domain of model protein with the N-terminal domain of Aβ. This is the first attempt to demonstrate the interaction of Aggregatin with the Aβ. These results confirmed in vitro and in vivo study reports claiming FAM222A helping to ease the aggregating of the Aβ-peptide.
Preprint
Full-text available
Viruses deploy an array of genetically encoded strategies to coopt host machinery and support viral replicative cycles. Molecular mimicry, manifested by structural similarity between viral and endogenous host proteins, allow viruses to harness or disrupt cellular functions including nucleic acid metabolism and modulation of immune responses. Here, we use protein structure similarity to scan for virally encoded structure mimics across thousands of catalogued viruses and hosts spanning broad ecological niches and taxonomic range, including bacteria, plants and fungi, invertebrates and vertebrates. Our survey identified over 6,000,000 instances of structural mimicry, the vast majority of which (>70%) cannot be discerned through protein sequence. The results point to molecular mimicry as a pervasive strategy employed by viruses and indicate that the protein structure space used by a given virus is dictated by the host proteome. Interrogation of proteins mimicked by human-infecting viruses points to broad diversification of cellular pathways targeted via structural mimicry, identifies biological processes that may underly autoimmune disorders, and reveals virally encoded mimics that may be leveraged to engineer synthetic metabolic circuits or may serve as targets for therapeutics. Moreover, the manner and degree to which viruses exploit molecular mimicry varies by genome size and nucleic acid type, with ssRNA viruses circumventing limitations of their small genomes by mimicking human proteins to a greater extent than their large dsDNA counterparts. Finally, we identified over 140 cellular proteins that are mimicked by CoV, providing clues about cellular processes driving the pathogenesis of the ongoing COVID-19 pandemic.
Article
While knowledge of protein-protein interactions (PPIs) is critical for understanding virus-host relationships, limitations on the scalability of high-throughput methods have hampered their identification beyond a number of well-studied viruses. Here, we implement an in silico computational framework (pathogen host interactome prediction using structure similarity [P-HIPSTer]) that employs structural information to predict ∼282,000 pan viral-human PPIs with an experimental validation rate of ∼76%. In addition to rediscovering known biology, P-HIPSTer has yielded a series of new findings: the discovery of shared and unique machinery employed across human-infecting viruses, a likely role for ZIKV-ESR1 interactions in modulating viral replication, the identification of PPIs that discriminate between human papilloma viruses (HPVs) with high and low oncogenic potential, and a structure-enabled history of evolutionary selective pressure imposed on the human proteome. Further, P-HIPSTer enables discovery of previously unappreciated cellular circuits that act on human-infecting viruses and provides insight into experimentally intractable viruses. * *****For full text, use the following link: https://urldefense.proofpoint.com/v2/url?u=https-3A__authors.elsevier.com_a_1ZgzdL7PXYWE3&d=DwMFaQ&c=G2MiLlal7SXE3PeSnG8W6_JBU6FcdVjSsBSbw6gcR0U&r=xWpa-_nAdyRW5ove16DFW0YLYQ8bWn78dW2PexZbQn0&m=3rM8bRXKTWv8jItQlM3u5QdOHUHpd_HMxfK3Fkh4xC0&s=urlHfmz_0dbIt6r38994XP_Cg6C1qefNburPd_nRQ7s&e=
Article
Full-text available
Many proteins are synthesized as precursors, with propeptides playing a variety of roles such as assisting in folding or preventing them from being active within the cell. While the precise role of the propeptide in fungal lipases is not completely understood, it was previously reported that mutations in the propeptide region of the Rhizomucor miehei lipase have an influence on the activity of the mature enzyme, stressing the importance of the amino acid composition of this region. We here report two structures of this enzyme in complex with its propeptide, which suggests that the latter plays a role in the correct maturation of the enzyme. Most importantly, we demonstrate that the propeptide shows inhibition of lipase activity in standard lipase assays and propose that an important role of the propeptide is to ensure that the enzyme is not active during its expression pathway in the original host.
Article
A long-standing goal in biology is the complete annotation of function and structure on all protein-protein interactions, a large fraction of which is mediated by intrinsically disordered protein regions (IDRs). However, knowledge derived from experimental structures of such protein complexes is disproportionately small due, in part, to challenges in studying interactions of IDRs. Here, we introduce IDRBind, a computational method that by combining gradient boosted trees and conditional random field models predicts binding sites of IDRs with performance approaching state-of-the-art globular interface predictions, making it suitable for proteome-wide applications. Although designed and trained with a focus on molecular recognition features, which are long interaction-mediating-elements in IDRs, IDRBind also predicts the binding sites of short peptides more accurately than existing specialized predictors. Consistent with IDRBind's specificity, a comparison of protein interface categories uncovered uniform trends in multiple physicochemical properties, positioning molecular recognition feature interfaces between peptide and globular interfaces.
Article
Full-text available
Eighty four throat swabs were obtained from Basrah General Hospital inpatients (N = 34): 17 were suffering from renal failure and the other 17 were diabetics; and from outpatients (N = 50). Throat swabs were cultured first in the selective media Ashdown’s broth then subcultured on Ashdown’s agar to isolate Burkholderia pseudomallei which was recovered from seven cases (8.33%). Four isolates were from renal failure patients (23.53%), two from diabetic patients (11.76%) and the seventh isolate was from an outpatient with tonsillitis. All isolates were able to produce capsules, form filament chains, exhibit swarming motility and were arabinose non assimilators (Ara-) indicative of their virulence. Additionally, isolated B. pseudomallei were found to produce protease, lipase, hemolysin, and lecithinase and were able to produce biofilm, the root of many troublesome persistent infections that resist antibiotic treatment. Susceptibility of the seven isolates of B. pseudomallei toward 11 antibiotics was assessed, isolates were found multiply resistant to all antibiotics apart from ciproflaxin. This study confirms for the first time isolation of B. pseudomallei from immunocompromised patients in Basrah city of Iraq and describes their virulence potentials. Key words: B. pseudomallei, virulence potentials, biofilm, antibiotic susceptibility, immunocompromised patients
Article
Full-text available
Homology modeling is one of the computational structure prediction methods that are used to determine protein 3D structure from its amino acid sequence. It is considered to be the most accurate of the computational structure prediction methods. It consists of multiple steps that are straightforward and easy to apply. There are many tools and servers that are used for homology modeling. There is no single modeling program or server which is superior in every aspect to others. Since the functionality of the model depends on the quality of the generated protein 3D structure, maximizing the quality of homology modeling is crucial. Homology modeling has many applications in the drug discovery process. Since drugs interact with receptors, which consists mainly of proteins in their structure, protein 3D structure determination, and thus homology modeling is important in drug discovery. Accordingly, there has been the clarification of protein interactions using 3D structures of proteins that are built with homology modeling. This contributes to the identification of novel drug candidates. Homology modeling plays an important role in making drug discovery faster, easier, cheaper and more practical. As new modeling methods and combinations are introduced, the scope of its applications widens. This article is protected by copyright. All rights reserved.
Article
Protein tyrosine phosphatase B (PtpB) from Mycobacterium tuberculosis (Mtb) extends the bacteria's survival in hosts and hence is a potential target for Mtb-specific drugs. To study how Mtb-specific sequence insertions in PtpB may regulate access to its active site through large-amplitude conformational changes, we performed free-energy calculations using an all-atom explicit solvent model. Corroborated by biochemical assays, the results show that PtpB's active site is controlled via an "either/or" compound conformational gating mechanism---an unexpected discovery that Mtb has evolved to bestow a single enzyme with such intricate logical operations. In addition to providing unprecedented insights for its active-site surroundings, the findings also suggest new ways of inactivating PtpB.
Chapter
The comparative study of homologous proteins can provide abundant information about the functional and structural constraints on protein evolution. For example, an amino acid substitution that is deleterious may become permissive in the presence of another substitution at a second site of the protein. A popular approach for detecting coevolving residues is by looking for correlated substitution events on branches of the molecular phylogeny relating the protein-coding sequences. Here we describe a machine learning method (Bayesian graphical models) implemented in the open-source phylogenetic software package HyPhy, http://hyphy.org, for extracting a network of coevolving residues from a sequence alignment.
Article
The crystallizations of the prokaryotic LeuT and of the eukaryotic DAT and SERT transporters represent important steps forward in the comprehension of the molecular physiology of Neurotransmitter:Sodium Symporters, although the molecular determinants of the coupling mechanism and of ion selectivity still remain to be fully elucidated. The insect NSS homologue KAAT1 exhibits unusual physiological features, such as the ability to use K+ as the driver ion, weak chloride dependence, and the ability of the driver ion to influence the substrate selectivity; these characteristics can help to define the molecular determinants of NSS function. Two non-conserved residues are present in the putative sodium binding sites of KAAT1: Ala 66, corresponding to Gly 20 in the Na2 site of LeuT, and Ser 68, corresponding to Ala 22 in the Na1 site. Thr 67 appears also to be significant since it is not conserved among NSS members, is present as threonine only in KAAT1 and in the paralogue CAATCH1 and, according to LeuT structure, is close to the amino acid binding site. Mutants of these residues were functionally characterized in Xenopus oocytes. The T67Y mutant exhibited uptake activity comparable to that of the wild type, but fully chloride-independent and with enhanced stereoselectivity. Interestingly, although dependent on the presence of sodium, the mutant showed reduced transport-associated currents, indicating uncoupling of the driver ion and amino acid fluxes. Thr 67 therefore appears to be a key component in the coupling mechanism, participating in a network that influences the cotransport of Na+ and the amino acid.
Chapter
A prerequisite to understand cell functioning on the system level is the knowledge of three-dimensional protein structures that mediate biochemical interactions. The explosion in the number of available gene sequences set the stage for the next step in genome scale projects, to obtain three dimensional structures for each protein. To achieve this ambitious goal, the costly and slow structure determination experiments are boosted with theoretical approaches. The current state and recent advances in structure modelling approaches are reviewed here, with special emphasis on comparative structure modelling techniques.
Article
In enzymatic C-H activation by hydrogen tunneling, reduced barrier width is important for efficient hydrogen wave function overlap during catalysis. For native enzymes displaying nonadiabatic tunneling, the dominant reactive hydrogen donor-acceptor distance (DAD) is typically ca. 2.7 Å, considerably shorter than normal van der Waals distances. Without a ground state substrate-bound structure for the prototypical nonadiabatic tunneling system, soybean lipoxygenase (SLO), it has remained unclear whether the requisite close tunneling distance occurs through an unusual ground state active site arrangement or by thermally sampling conformational substates. Herein, we introduce Mn(2+) as a spin-probe surrogate for the SLO Fe ion; X-ray diffraction shows Mn-SLO is structurally faithful to the native enzyme. (13)C ENDOR then reveals the locations of (13)C10 and reactive (13)C11 of linoleic acid relative to the metal; (1)H ENDOR and molecular dynamics simulations of the fully solvated SLO model using ENDOR-derived restraints give additional metrical information. The resulting three-dimensional representation of the SLO active site ground state contains a reactive (a) conformer with hydrogen DAD of ∼3.1 Å, approximately van der Waals contact, plus an inactive (b) conformer with even longer DAD, establishing that stochastic conformational sampling is required to achieve reactive tunneling geometries. Tunneling-impaired SLO variants show increased DADs and variations in substrate positioning and rigidity, confirming previous kinetic and theoretical predictions of such behavior. Overall, this investigation highlights the (i) predictive power of nonadiabatic quantum treatments of proton-coupled electron transfer in SLO and (ii) sensitivity of ENDOR probes to test, detect, and corroborate kinetically predicted trends in active site reactivity and to reveal unexpected features of active site architecture.
Article
Full-text available
The PSIPRED protein structure prediction server allows users to submit a protein sequence, perform a prediction of their choice and receive the results of the prediction both textually via e-mail and graphically via the web. The user may select one of three prediction methods to apply to their sequence: PSIPRED, a highly accurate secondary structure prediction method; MEMSAT 2, a new version of a widely used transmembrane topology prediction method; or GenTHREADER, a sequence profile based fold recognition method. Availability: Freely available to non-commercial users at http://globin.bio.warwick.ac.uk/psipred/
Article
Full-text available
By the middle of 1993, > 30,000 protein sequences has been listed. For 1000 of these, the three-dimensional (tertiary) structure has been experimentally solved. Another 7000 can be modelled by homology. For the remaining 21,000 sequences, secondary structure prediction provides a rough estimate of structural features. Predictions in three states range between 35% (random) and 88% (homology modelling) overall accuracy. Using information about evolutionary conservation as contained in multiple sequence alignments, the secondary structure of 4700 protein sequences was predicted by the automatic e-mail server PHD. For proteins with at least one known homologue, the method has an expected overall three-state accuracy of 71.4% for proteins with at least one known homologue (evaluated on 126 unique protein chains).
Article
Full-text available
The three-dimensional (3D) profile of a protein structure is a table computed from the atomic coordinates of the structure that can be used to score the compatibility of the 3D structure model with any amino acid sequence. Three-dimensional profiles computed from correct protein structures match their own sequences with high scores. An incorrectly modeled segment in an otherwise correct structure can be identified by examining the profile score in a moving-window scan. Thus, the correctness of a protein model can be verified by its 3D profile, regardless of whether the model has been derived by X-ray, nuclear magnetic resonance (NMR), or computational procedures. For this reason, 3D profiles are useful in the evaluation of undetermined protein models, based on low-resolution electron-density maps, on NMR spectra with inadequate distance constraints, or on computational procedures. An advantage of using 3D profiles for testing models is that profiles have not themselves been used in the determination of the structure. Traditional R-factor tests in X-ray analysis depend on the comparison of observed properties—that is, the X-ray structure factor magnitudes with the same property calculated from the final protein model.
Article
Full-text available
Unlabelled: An interactive protein secondary structure prediction Internet server is presented. The server allows a single sequence or multiple alignment to be submitted, and returns predictions from six secondary structure prediction algorithms that exploit evolutionary information from multiple sequences. A consensus prediction is also returned which improves the average Q3 accuracy of prediction by 1% to 72.9%. The server simplifies the use of current prediction algorithms and allows conservation patterns important to structure and function to be identified. Availability: http://barton.ebi.ac.uk/servers/jpred.h tml Contact: geoff@ebi.ac.uk
Article
To facilitate understanding of, and access to, the information available for protein structures, we have constructed the Structural Classification of Proteins (scop) database. This database provides a detailed and comprehensive description of the structural and evolutionary relationships of the proteins of known structure. It also provides for each entry Links to co-ordinates, images of the structure, interactive viewers, sequence data and literature references. Two search facilities are available. The homology search permits users to enter a sequence and obtain a list of any structures to which it has significant levels of sequence similarity The key word search finds, for a word entered by the user, matches from both the text of the scop database and the headers of Brookhaven Protein Databank structure files. The database is freely accessible on World Wide Web (WWW) with an entry point to URL http://scop.mrc-lmb.cam.ac.uk/scop/ scop: an old English poet or minstrel (Oxford English Dictionary); ckon: pile, accumulation (Russian Dictionary).
Article
As methods for determining protein three-dimensional (3D) structure develop, a continuing problem is how to verify that the final protein model is correct. The revision of several protein models to correct errors has prompted the development of new criteria for judging the validity of X-ray and NMR structures, as well as the formation of energetic and empirical methods to evaluate the correctness of protein models. The challenge is to distinguish between a mistraced or wrongly folded model, and one that is basically correct, but not adequately refined. We show that an effective test of the accuracy of a 3D protein model is a comparison of the model to its own amino-acid sequence, using a 3D profile, computed from the atomic coordinates of the structure 3D profiles of correct protein structures match their own sequences with high scores. In contrast, 3D profiles for protein models known to be wrong score poorly. An incorrectly modelled segment in an otherwise correct structure can be identified by examining the profile score in a moving-window scan. The accuracy of a protein model can be assessed by its 3D profile, regardless of whether the model has been derived by X-ray, NMR or computational procedures.
Article
We demonstrate in this work that the surface tension, water-organic solvent, transfer-free energies and the thermodynamics of melting of linear alkanes provide fundamental insights into the nonpolar driving forces for protein folding and protein binding reactions. We first develop a model for the curvature dependence of the hydrophobic effect and find that the macroscopic concept of interfacial free energy is applicable at the molecular level. Application of a well-known relationship involving surface tension and adhesion energies reveals that dispersion forces play little or no net role in hydrophobic interactions; rather, the standard model of disruption of water structure (entropically driven at 25 degrees C) is correct. The hydrophobic interaction is found, in agreement with the classical picture, to provide a major driving force for protein folding. Analysis of the melting behavior of hydrocarbons reveals that close packing of the protein interior makes only a small free energy contribution to folding because the enthalpic gain resulting from increased dispersion interactions (relative to the liquid) is countered by the freezing of side chain motion. The identical effect should occur in association reactions, which may provide an enormous simplification in the evaluation of binding energies. Protein binding reactions, even between nearly planar or concave/convex interfaces, are found to have effective hydrophobicities considerably smaller than the prediction based on macroscopic surface tension. This is due to the formation of a concave collar region that usually accompanies complex formation. This effect may preclude the formation of complexes between convex surfaces.
Article
To facilitate understanding of, and access to, the information available for protein structures, we have constructed the Structural Classification of Proteins (scop) database. This database provides a detailed and comprehensive description of the structural and evolutionary relationships of the proteins of known structure. It also provides for each entry links to co-ordinates, images of the structure, interactive viewers, sequence data and literature references. Two search facilities are available. The homology search permits users to enter a sequence and obtain a list of any structures to which it has significant levels of sequence similarity. The key word search finds, for a word entered by the user, matches from both the text of the scop database and the headers of Brookhaven Protein Databank structure files. The database is freely accessible on World Wide Web (WWW) with an entry point to URL http: parallel scop.mrc-lmb.cam.ac.uk magnitude of scop.
Article
Several fold recognition algorithms are compared to each other in terms of prediction accuracy and significance. It is shown that on standard benchmarks, hybrid methods, which combine scoring based on sequence‐sequence and sequence‐structure matching, surpass both sequence and threading methods in the number of accurate predictions. However, the sequence similarity contributes most to the prediction accuracy. This strongly argues that most examples of apparently nonhomologous proteins with similar folds are actually related by evolution. While disappointing from the perspective of the fundamental understanding of protein folding, this adds a new significance to fold recognition methods as a possible first step in function prediction. Despite hybrid methods being more accurate at fold prediction than either the sequence or threading methods, each of the methods is correct in some cases where others have failed. This partly reflects a different perspective on sequence/structure relationship embedded in various methods. To combine predictions from different methods, estimates of significance of predictions are made for all methods. With the help of such estimates, it is possible to develop a “jury” method, which has accuracy higher than any of the single methods. Finally, building full three‐dimensional models for all top predictions helps to eliminate possible false positives where alignments, which are optimal in the one‐dimensional sequences, lead to unsolvable sterical conflicts for the full three‐dimensional models.
Article
We have devised and implemented in PrISM (protein informatics system for modeling) a new measure of protein structural relationships, the protein structural distance (PSD). The PSD is designed to describe relationships between protein structures in quantitative rather than descriptive terms and is applicable both when two structures are very similar, and when they are very different. It is calculated with a structural alignment procedure that uses double dynamic programming to align secondary structure elements and an iterative rigid body superposition that minimizes the root-mean-square deviation of C(alpha) atoms. The alignment algorithm, as implemented on a modest workstation, is computationally efficient, allowing for large-scale structural comparisons. PSD scores for more than one and a half million pairs of proteins were calculated and compared to the discrete classification of proteins in the SCOP database. The PSD scores, which were obtained automatically, are in large part consistent with the manually derived classifications in SCOP. Discrepancies do arise, however, due, in part, to the fact that SCOP uses criteria other than structural similarity to derive classifications while the PrISM procedure is exclusively structure based. Analysis of PSD scores suggests that there is a continuous aspect of protein conformation space, even though various classification schemes are extremely useful. The use of a continuous measure for structural distance between all pairs of proteins allows us, as described in the two accompanying papers to derive sequence/structure relationships in a more quantitative way than has previously been possible. An important strength of the approach implemented in PrISM is its ability to address many different kinds of queries interactively, making its structural comparison procedure a convenient computational tool that complements structural classification databases such as SCOP and CATH.
Article
We develop a protocol for estimating the free energy difference between different conformations of the same polypeptide chain. The conformational free energy evaluation combines the CHARMM force field with a continuum treatment of the solvent. In almost all cases studied, experimentally determined structures are predicted to be more stable than misfolded "decoys." This is due in part to the fact that the Coulomb energy of the native protein is consistently lower than that of the decoys. The solvation free energy generally favors the decoys, although the total electrostatic free energy (sum of Coulomb and solvation terms) favors the native structure. The behavior of the solvation free energy is somewhat counterintuitive and, surprisingly, is not correlated with differences in the burial of polar area between native structures and decoys. Rather. the effect is due to a more favorable charge distribution in the native protein, which, as is discussed, will tend to decrease its interaction with the solvent. Our results thus suggest, in keeping with a number of recent studies, that electrostatic interactions may play an important role in determining the native topology of a folded protein. On this basis, a simplified scoring function is derived that combines a Coulomb term with a hydrophobic contact term. This function performs as well as the more complete free energy evaluation in distinguishing the native structure from misfolded decoys. Its computational efficiency suggests that it can be used in protein structure prediction applications, and that it provides a physically well-defined alternative to statistically derived scoring functions.
Article
Current techniques for the prediction of side-chain conformations on a fixed backbone have an accuracy limit of about 1.0-1.5 A rmsd for core residues. We have carried out a detailed and systematic analysis of the factors that influence the prediction of side-chain conformation and, on this basis, have succeeded in extending the limits of side-chain prediction for core residues to about 0.7 A rmsd from native, and 94 % and 89 % of chi(1) and chi(1+2 ) dihedral angles correctly predicted to within 20 degrees of native, respectively. These results are obtained using a force-field that accounts for only van der Waals interactions and torsional potentials. Prediction accuracy is strongly dependent on the rotamer library used. That is, a complete and detailed rotamer library is essential. The greatest accuracy was obtained with an extensive rotamer library, containing over 7560 members, in which bond lengths and bond angles were taken from the database rather than simply assuming idealized values. Perhaps the most surprising finding is that the combinatorial problem normally associated with the prediction of the side-chain conformation does not appear to be important. This conclusion is based on the fact that the prediction of the conformation of a single side-chain with all others fixed in their native conformations is only slightly more accurate than the simultaneous prediction of all side-chain dihedral angles.
Article
In this paper, an approach is described that combines multiple structure alignments and multiple sequence alignments to generate sequence profiles for protein families. First, multiple sequence alignments are generated from sequences that are closely related to each sequence of known three-dimensional structure. These alignments then are merged through a multiple structure alignment of family members of known structure. The merged alignment is used to generate a Hidden Markov Model for the family in question. The Hidden Markov Model can be used to search for new family members or to improve alignments for distantly related family members that already have been identified. Application of a profile generated for SH2 domains indicates that the Janus family of nonreceptor protein tyrosine kinases contains SH2 domains. This conclusion is strongly supported by the results of secondary structure-prediction programs, threading calculations, and the analysis of comparative models generated for these domains. One of the Janus kinases, human TYK2, has an SH2 domain that contains a histidine instead of the conserved arginine at the key phosphotyrosine-binding position, betaB5. Calculations of the pK(a) values of the betaB5 arginines in a number of SH2 domains and of the betaB5 histidine in a homology model of TYK2 suggest that this histidine is likely to be neutral around pH 7, thus indicating that it may have lost the ability to bind phosphotyrosine. If this indeed is the case, TYK2 may contain a domain with an SH2 fold that has a modified binding specificity.
Article
In this paper, we introduce a method to account for the shape of the potential energy curve in the evaluation of conformational free energies. The method is based on a procedure that generates a set of conformations, each with its own force-field energy, but adds a term to this energy that favors conformations that are close in structure (have a low rmsd) to other conformations. The sum of the force-field energy and rmsd-dependent term is defined here as the "colony energy" of a given conformation, because each conformation that is generated is viewed as representing a colony of points. The use of the colony energy tends to select conformations that are located in broad energy basins. The approach is applied to the ab initio prediction of the conformations of all of the loops in a dataset of 135 nonredundant proteins. By using an rmsd from a native criterion based on the superposition of loop stems, the average rmsd of 5-, 6-, 7-, and 8-residue long loops is 0.85, 0.92, 1.23, and 1.45 A, respectively. For 8-residue loops, 60 of 61 predictions have an rmsd of less than 3.0 A. The use of the colony energy is found to improve significantly the results obtained from the potential function alone. (The loop prediction program, "Loopy," can be downloaded at http://trantor.bioc.columbia.edu.)
Article
Structural alignments often reveal relationships between proteins that cannot be detected using sequence alignment alone. However, profile search methods based entirely on structural alignments alone have not been found to be effective in finding remote homologs. Here, we explore the role of structural information in remote homolog detection and sequence alignment. To this end, we develop a series of hybrid multidimensional alignment profiles that combine sequence, secondary and tertiary structure information into hybrid profiles. Sequence-based profiles are profiles whose position-specific scoring matrix is derived from sequence alignment alone; structure-based profiles are those derived from multiple structure alignments. We compare pure sequence-based profiles to pure structure-based profiles, as well as to hybrid profiles that use combined sequence-and-structure-based profiles, where sequence-based profiles are used in loop/motif regions and structural information is used in core structural regions. All of the hybrid methods offer significant improvement over simple profile-to-profile alignment. We demonstrate that both sequence-based and structure-based profiles contribute to remote homology detection and alignment accuracy, and that each contains some unique information. We discuss the implications of these results for further improvements in amino acid sequence and structural analysis.
Article
The widespread use of the original version of GRASP revealed the importance of the visualization of physicochemical and structural properties on the molecular surface. This chapter describes a new version of GRASP that contains many new capabilities. In terms of analysis tools, the most notable new features are sequence and structure analysis and alignment tools and the graphical integration of sequence and structural information. Not all the new GRASP2 could be described here and more capabilities are continually being added. An on-line manual, details on obtaining the software, and technical notes about the program and the Troll software library can be found at the Honig laboratory Web site (http://trantor.bioc.columbia.edu).