Article

Using Multiple Structure Alignments, Fast Model Building, and Energetic Analysis in Fold Recognition and Homology Modeling

January 2003
Proteins Structure Function and Bioinformatics 53 Suppl 6(S6):430-5

January 2003
53 Suppl 6(S6):430-5

DOI:10.1002/prot.10550

Source
PubMed

Authors:

Chris Tang

Entangible, Inc

Lei Xie

City University of New York - Hunter College

Show all 13 authorsHide

We participated in the fold recognition and homology sections of CASP5 using primarily in-house software. The central feature of our structure prediction strategy involved the ability to generate good sequence-to-structure alignments and to quickly transform them into models that could be evaluated both with energy-based methods and manually. The in-house tools we used include: a) HMAP (Hybrid Multidimensional Alignment Profile)-a profile-to-profile alignment method that is derived from sequence-enhanced multiple structure alignments in core regions, and sequence motifs in non-structurally conserved regions. b) NEST-a fast model building program that applies an "artificial evolution" algorithm to construct a model from a given template and alignment. c) GRASP2-a new structure and alignment visualization program incorporating multiple structure superposition and domain database scanning modules. These methods were combined with model evaluation based on all atom and simplified physical-chemical energy functions. All of these methods were under development during CASP5 and consequently a great deal of manual analysis was carried out at each stage of the prediction process. This interactive model building procedure has several advantages and suggests important ways in which our and other methods can be improved, examples of which are provided.

PrePCI: A structure‐ and chemical similarity‐informed database of predicted protein compound interactions

Article

Full-text available

Mar 2023
PROTEIN SCI

We describe the Predicting Protein–Compound Interactions (PrePCI) database which comprises over 5 billion predicted interactions between 6.8 million chemical compounds and 19,797 human proteins. PrePCI relies on a proteome‐wide database of structural models based on both traditional modeling techniques and the AlphaFold Protein Structure Database. Sequence‐ and structural similarity‐based metrics are established between template proteins, T, in the Protein Data Bank that bind compounds, C, and query proteins in the model database, Q. When the metrics exceed threshold values, it is assumed that C also binds to Q with a likelihood ratio (LR) derived from machine learning. If the relationship is based on structural similarity, the LR is based on a scoring function that measures the extent to which C is compatible with the binding site of Q as described in the LT‐scanner algorithm. For every predicted complex derived in this way, chemical similarity based on the Tanimoto coefficient identifies other small molecules that may bind to Q. An overall LR for the binding of C to Q is obtained from Naive Bayesian statistics. The PrePCI database can be queried by entering a UniProt ID or gene name for a protein to obtain a list of compounds predicted to bind to it along with associated LRs. Alternatively, entering an identifier for the compound outputs a list of proteins it is predicted to bind. Specific applications of the database to lead discovery, elucidation of drug mechanism of action, and biological function annotation are described.

Novel Genetic Markers for Early Detection of Elevated Breast Cancer Risk in Women

Article

Full-text available

Sep 2019
INT J MOL SCI

This study suggests that two newly discovered variants in the MSH2 gene, which codes for a DNA mismatch repair (MMR) protein, can be associated with a high risk of breast cancer. While variants in the MSH2 gene are known to be linked with an elevated cancer risk, the MSH2 gene is not a part of the standard kit for testing patients for elevated breast cancer risk. Here we used the results of genetic testing of women diagnosed with breast cancer, but who did not have variants in BRCA1 and BRCA2 genes. Instead, the test identified four variants with unknown significance (VUS) in the MSH2 gene. Here, we carried in silico analysis to develop a classifier that can distinguish pathogenic from benign mutations in MSH2 genes taken from ClinVar. The classifier was then used to classify VUS in MSH2 genes, and two of them, p.Ala272Val and p.Met592Val, were predicted to be pathogenic mutations. These two mutations were found in women with breast cancer who did not have mutations in BRCA1 and BRCA2 genes, and thus they are suggested to be considered as new bio-markers for the early detection of elevated breast cancer risk. However, before this is done, an in vitro validation of mutation pathogenicity is needed and, moreover, the presence of these mutations should be demonstrated in a higher number of patients or in families with breast cancer history.

Knowledge-based Approaches for Modelling the 3D Structural Interactome

Thesis

Nov 2012

Anisah W. Ghoorah

Understanding how the protein interactome works at a structural level could provide useful insights into the mechanisms of diseases. Comparative homology modelling and ab initio protein docking are two computational methods for modelling the three-dimensional (3D) structures of protein-protein interactions (PPIs). Previous studies have shown that both methods give significantly better predictions when they incorporate experimental PPI information. However, in general, PPI information is often not available in an easily accessible way, and cannot be re-used by 3D PPI modelling algorithms. Hence, there is currently a need to develop a reliable framework to facilitate the reuse of PPI data. This thesis presents a systematic knowledge-based approach for representing, describing and manipulating 3D interactions to study PPIs on a large scale and to facilitate knowledge-based modelling of protein-protein complexes. The main contributions of this thesis are: (1) it describes an integrated database of non-redundant 3D hetero domain interactions; (2) it presents a novel method of describing and clustering DDIs according to the spatial orientations of the binding partners, thus introducing the notion of "domain family-level binding sites" (DFBS); (3) it proposes a structural classification of DFBSs similar to the CATH classification of protein folds, and it presents a study of secondary structure propensities of DFBSs and interaction preferences; (4) it introduces a systematic case-base reasoning approach to model on a large scale the 3D structures of protein complexes from existing structural DDIs. All these contributions have been made publicly available through a web server (http://kbdock.loria.fr)

In silico study of the structure and function of Streptococcus mutans plasmidic proteins

Article

Jan 2017

The Gram-positive bacterium Streptococcus mutans is the principal causative agent of human tooth decay, an oral disease that affects the majority of the world’s population. Although the complete S. mutans genome is known, approximately 700 proteins are still annotated as hypothetical proteins, as no three-dimensional structure or homology with known proteins exists for them. Thus, the significant portion of genomic sequences coding for unknown-function proteins makes the knowledge of pathogenicity and survival mechanisms of S. mutans still incomplete. Plasmids are found in virtually every species of Streptococcus, and some of these mediate resistance to antibiotics and pathogenesis. However, there are strains of S. mutans that contain plasmids, such as LM7 and UA140, to which no function has been assigned yet. In this work, we describe an in silico study of the structure and function of all the S. mutans proteins encoded by pLM7 and pUA140 plasmids to gain insight into their biological function. A combination of different structural bioinformatics methodologies led to the identification of plasmidic proteins potentially required for the bacterial survival and pathogenicity. The structural information obtained on these proteins can be used to select novel targets for the design of innovative therapeutic agents towards S. mutans.

Vaccine-Induced Antibodies that Neutralize Group 1 and Group 2 Influenza A Viruses

Article

Jul 2016

Antibodies capable of neutralizing divergent influenza A viruses could form the basis of a universal vaccine. Here, from subjects enrolled in an H5N1 DNA/MIV-prime-boost influenza vaccine trial, we sorted hemagglutinin cross-reactive memory B cells and identified three antibody classes, each capable of neutralizing diverse subtypes of group 1 and group 2 influenza A viruses. Co-crystal structures with hemagglutinin revealed that each class utilized characteristic germline genes and convergent sequence motifs to recognize overlapping epitopes in the hemagglutinin stem. All six analyzed subjects had sequences from at least one multidonor class, and—in half the subjects—multidonor-class sequences were recovered from >40% of cross-reactive B cells. By contrast, these multidonor-class sequences were rare in published antibody datasets. Vaccination with a divergent hemagglutinin can thus increase the frequency of B cells encoding broad influenza A-neutralizing antibodies. We propose the sequence signature-quantified prevalence of these B cells as a metric to guide universal influenza A immunization strategies.

Simulation and Machine Learning Methods for Ion-Channel Structure Determination, Mechanistic Studies and Drug Design

Article

Full-text available

Jun 2022

Ion channels are expressed in almost all living cells, controlling the in-and-out communications, making them ideal drug targets, especially for central nervous system diseases. However, owing to their dynamic nature and the presence of a membrane environment, ion channels remain difficult targets for the past decades. Recent advancement in cryo-electron microscopy and computational methods has shed light on this issue. An explosion in high-resolution ion channel structures paved way for structure-based rational drug design and the state-of-the-art simulation and machine learning techniques dramatically improved the efficiency and effectiveness of computer-aided drug design. Here we present an overview of how simulation and machine learning-based methods fundamentally changed the ion channel-related drug design at different levels, as well as the emerging trends in the field.

Full-length de novo protein structure determination from cryo-EM maps using deep learning

Article

Full-text available

May 2021

Motivation Advances in microscopy instruments and image processing algorithms have led to an increasing number of cryo-EM maps. However, building accurate models for the EM maps at 3-5 Å resolution remains a challenging and time-consuming process. With the rapid growth of deposited EM maps, there is an increasing gap between the maps and reconstructed/modeled 3-dimensional (3D) structures. Therefore, automatic reconstruction of atomic-accuracy full-atomstructures fromEMmaps is pressingly needed. Results We present a semi-automatic de novo structure determination method using a deep learningbased framework, named as DeepMM, which builds atomic-accuracy all-atom models from cryo-EM maps at near-atomic resolution. In our method, the main-chain and Cα positions as well as their amino acid and secondary structure types are predicted in the EM map using Densely Connected Convolutional Networks. DeepMM was extensively validated on 40 simulated maps at 5 Å resolution and 30 experimental maps at 2.6-4.8 Å resolution as well as an EMDB-wide data set of 2931 experimental maps at 2.6-4.9 Å resolution, and compared with state-of-the-art algorithms including RosettaES, MAINMAST, and Phenix. Overall, our DeepMM algorithm obtained a significant improvement over existing methods in terms of both accuracy and coverage in building full-length protein structures on all test sets, demonstrating the efficacy and general applicability of DeepMM. Availability http://huanglab.phys.hust.edu.cn/DeepMM Supplementary information Supplementary data are available at Bioinformatics online.

Substrate Selectivity of Coumarin Derivatives by Human CYP1 Enzymes: In Vitro Enzyme Kinetics and In Silico Modeling

Article

Full-text available

Apr 2021

Of the three enzymes in the human cytochrome P450 family 1, CYP1A2 is an important enzyme mediating metabolism of xenobiotics including drugs in the liver, while CYP1A1 and CYP1B1 are expressed in extrahepatic tissues. Currently used CYP substrates, such as 7-ethoxycoumarin and 7-ethoxyresorufin, are oxidized by all individual CYP1 forms. The main aim of this study was to find profluorescent coumarin substrates that are more selective for the individual CYP1 forms. Eleven 3-phenylcoumarin derivatives were synthetized, their enzyme kinetic parameters were determined, and their interactions in the active sites of CYP1 enzymes were analyzed by docking and molecular dynamic simulations. All coumarin derivatives and 7-ethoxyresorufin and 7-pentoxyresorufin were oxidized by at least one CYP1 enzyme. 3-(3-Methoxyphenyl)-6-methoxycoumarin (19) was 7-O-demethylated by similar high efficiency [21–30 ML/(min·mol CYP)] by all CYP1 forms and displayed similar binding in the enzyme active sites. 3-(3-Fluoro-4-acetoxyphenyl)coumarin (14) was selectively 7-O-demethylated by CYP1A1, but with low efficiency [0.16 ML/(min mol)]. This was explained by better orientation and stronger H-bond interactions in the active site of CYP1A1 than that of CYP1A2 and CYP1B1. 3-(4-Acetoxyphenyl)-6-chlorocoumarin (20) was 7-O-demethylated most efficiently by CYP1B1 [53 ML/(min·mol CYP)], followed by CYP1A1 [16 ML/(min·mol CYP)] and CYP1A2 [0.6 ML/(min·mol CYP)]. Variations in stabilities of complexes between 20 and the individual CYP enzymes explained these differences. Compounds 14, 19, and 20 are candidates to replace traditional substrates in measuring activity of human CYP1 enzymes.

Gene Expression, Biochemical Characterization of a sn-1, 3 Extracellular Lipase From Aspergillus niger GZUF36 and Its Model-Structure Analysis

Article

Full-text available

Mar 2021

In this study, a sn-1, 3 extracellular lipases from Aspergillus niger GZUF36 (PEXANL1) was expressed in Pichia pastoris, characterized, and the predicted structural model was analyzed. The optimized culture conditions of P. pastoris showed that the highest lipase activity of 66.5 ± 1.4 U/mL (P < 0.05) could be attained with 1% methanol and 96 h induction time. The purified PEXANL1 exhibited the highest activity at pH 4.0 and 40°C temperature, and its original activity remained unaltered in the majority of the organic solvents (20% v/v concentration). Triton X-100, Tween 20, Tween 80, and SDS at a concentration of 0.01% (w/v) enhanced, and all the metal ions tested inhibited activity of purified PEXANL. The results of ultrasound-assisted PEXANL1 catalyzed synthesis of 1,3-diaglycerides showed that the content of 1,3-diglycerides was rapidly increased to 36.90% with 25 min of ultrasound duration (P < 0.05) and later decreased to 19.93% with 35 min of ultrasound duration. The modeled structure of PEXANL1 by comparative modeling showed α/β hydrolase fold. Structural superposition and molecular docking results validated that Ser162, His274, and Asp217 residues of PEXANL1 were involved in the catalysis. Small-angle X-ray scattering analysis indicated the monomer properties of PEXANL1 in solution. The ab initio model of PEXANL1 overlapped with its modeling structure. This work presents a reliable structural model of A. niger lipase based on homology modeling and small-angle X-ray scattering. Besides, the data from this study will benefit the rational design of suitable crystalline lipase variants in the future.

Elucidating the Interactions Between Heparin/Heparan Sulfate and SARS-CoV-2-Related Proteins—An Important Strategy for Developing Novel Therapeutics for the COVID-19 Pandemic

Article

Full-text available

Jan 2021

Owing to the high mortality and the spread rate, the infectious disease caused by SARS-CoV-2 has become a major threat to public health and social economy, leading to over 70 million infections and 1. 6 million deaths to date. Since there are currently no effective therapeutic or widely available vaccines, it is of urgent need to look for new strategies for the treatment of SARS-CoV-2 infection diseases. Binding of a viral protein onto cell surface heparan sulfate (HS) is generally the first step in a cascade of interaction that is required for viral entry and the initiation of infection. Meanwhile, interactions of selectins and cytokines (e.g., IL-6 and TNF-α) with HS expressed on endothelial cells are crucial in controlling the recruitment of immune cells during inflammation. Thus, structurally defined heparin/HS and their mimetics might serve as potential drugs by competing with cell surface HS for the prevention of viral adhesion and modulation of inflammatory reaction. In this review, we will elaborate coronavirus invasion mechanisms and summarize the latest advances in HS–protein interactions, especially proteins relevant to the process of coronavirus infection and subsequent inflammation. Experimental and computational techniques involved will be emphasized.

A Sweep of Earth’s Virome Reveals Host-Guided Viral Protein Structural Mimicry and Points to Determinants of Human Disease

Article

Oct 2020

Viruses deploy genetically encoded strategies to coopt host machinery and support viral replicative cycles. Here, we use protein structure similarity to scan for molecular mimicry, manifested by structural similarity between viral and endogenous host proteins, across thousands of cataloged viruses and hosts spanning broad ecological niches and taxonomic range, including bacteria, plants and fungi, invertebrates, and vertebrates. This survey identified over 6,000,000 instances of structural mimicry; more than 70% of viral mimics cannot be discerned through protein sequence alone. We demonstrate that the manner and degree to which viruses exploit molecular mimicry varies by genome size and nucleic acid type and identify 158 human proteins that are mimicked by coronaviruses, providing clues about cellular processes driving pathogenesis. Our observations point to molecular mimicry as a pervasive strategy employed by viruses and indicate that the protein structure space used by a given virus is dictated by the host proteome. A record of this paper’s transparent peer review process is included in the Supplemental Information.

Dissecting the conformation of glycans and their interactions with proteins

Article

Full-text available

Sep 2020
J BIOMED SCI

The use of in silico strategies to develop the structural basis for a rational optimization of glycan-protein interactions remains a great challenge. This problem derives, in part, from the lack of technologies to quantitatively and qualitatively assess the complex assembling between a glycan and the targeted protein molecule. Since there is an unmet need for developing new sugar-targeted therapeutics, many investigators are searching for technology platforms to elucidate various types of molecular interactions within glycan-protein complexes and aid in the development of glycan-targeted therapies. Here we discuss three important technology platforms commonly used in the assessment of the complex assembly of glycosylated biomolecules, such as glycoproteins or glycosphingolipids: Biacore analysis, molecular docking, and molecular dynamics simulations. We will also discuss the structural investigation of glycosylated biomolecules, including conformational changes of glycans and their impact on molecular interactions within the glycan-protein complex. For glycoproteins, secreted protein acidic and rich in cysteine (SPARC), which is associated with various lung disorders, such as chronic obstructive pulmonary disease (COPD) and lung cancer, will be taken as an example showing that the core fucosylation of N-glycan in SPARC regulates protein-binding affinity with extracellular matrix collagen. For glycosphingolipids (GSLs), Globo H ceramide, an important tumor-associated GSL which is being actively investigated as a target for new cancer immunotherapies, will be used to demonstrate how glycan structure plays a significant role in enhancing angiogenesis in tumor microenvironments.

Automatic de novo atomic-accuracy structure determination for cryo-EM maps using deep learning

Preprint

Full-text available

Aug 2020

Motivation and Results Advances in microscopy instruments and image processing algorithms have led to an increasing number of cryo-EM maps. However, building accurate models for the EM maps at 3-5 Å resolution remains challenging and time-consuming. Here, we present a fully automatic de novo structure determination method using a deep learning-based framework, named as DeepMM, which automatically builds atomic-accuracy all-atom models from cryo-EM maps at near-atomic resolution. In our method, the main-chain and C α positions as well as their amino acid and secondary structure types are predicted in the EM map using Densely Connected Convolutional Networks. DeepMM was extensively validated on 40 simulated maps at 5 Å resolution and 30 experimental maps at 2.6-4.8 Å resolution as well as an EMDB-wide data set of 2931 experimental maps at 2.6-4.9 Å resolution. DeepMM built correct models for >60% of the cases, and it outperformed existing state-of-the-art algorithms including RosettaES, MAINMAST, and Phenix. Availability http://huanglab.phys.hust.edu.cn/DeepMM/

Bi-allelic missense disease-causing variants in RPL3L associate neonatal dilated cardiomyopathy with muscle-specific ribosome biogenesis

Article

Full-text available

Nov 2020
HUM GENET

Dilated cardiomyopathy (DCM) belongs to the most frequent forms of cardiomyopathy mainly characterized by cardiac dilatation and reduced systolic function. Although most cases of DCM are classified as sporadic, 20–30% of cases show a heritable pattern. Familial forms of DCM are genetically heterogeneous, and mutations in several genes have been identified that most commonly play a role in cytoskeleton and sarcomere-associated processes. Still, a large number of familial cases remain unsolved. Here, we report five individuals from three independent families who presented with severe dilated cardiomyopathy during the neonatal period. Using whole-exome sequencing (WES), we identified causative, compound heterozygous missense variants in RPL3L (ribosomal protein L3-like) in all the affected individuals. The identified variants co-segregated with the disease in each of the three families and were absent or very rare in the human population, in line with an autosomal recessive inheritance pattern. They are located within the conserved RPL3 domain of the protein and were classified as deleterious by several in silico prediction software applications. RPL3L is one of the four non-canonical riboprotein genes and it encodes the 60S ribosomal protein L3-like protein that is highly expressed only in cardiac and skeletal muscle. Three-dimensional homology modeling and in silico analysis of the affected residues in RPL3L indicate that the identified changes specifically alter the interaction of RPL3L with the RNA components of the 60S ribosomal subunit and thus destabilize its binding to the 60S subunit. In conclusion, we report that bi-allelic pathogenic variants in RPL3L are causative of an early-onset, severe neonatal form of dilated cardiomyopathy, and we show for the first time that cytoplasmic ribosomal proteins are involved in the pathogenesis of non-syndromic cardiomyopathies.

Prediction of Protein Tertiary Structure via Regularized Template Classification Techniques

Article

Full-text available

May 2020
MOLECULES

We discuss the use of the regularized linear discriminant analysis (LDA) as a model reduction technique combined with particle swarm optimization (PSO) in protein tertiary structure prediction, followed by structure refinement based on singular value decomposition (SVD) and PSO. The algorithm presented in this paper corresponds to the category of template-based modeling. The algorithm performs a preselection of protein templates before constructing a lower dimensional subspace via a regularized LDA. The protein coordinates in the reduced spaced are sampled using a highly explorative optimization algorithm, regressive-regressive PSO (RR-PSO). The obtained structure is then projected onto a reduced space via singular value decomposition and further optimized via RR-PSO to carry out a structure refinement. The final structures are similar to those predicted by best structure prediction tools, such as Rossetta and Zhang servers. The main advantage of our methodology is that alleviates the ill-posed character of protein structure prediction problems related to high dimensional optimization. It is also capable of sampling a wide range of conformational space due to the application of a regularized linear discriminant analysis, which allows us to expand the differences over a reduced basis set.

In Silico Prediction of the Structural Model of the Parasite Trypanosoma cruzi Nitroreductase Enzyme and Its Structural Validation Citation

Article

Full-text available

Feb 2017

American trypanosomiasis, commonly known as Chagas disease, is a desease with the highest prevalence in the tropics and is caused by the parasite Trypanosoma cruzi, whose vector is an insect from the Rhodnius prolixus family. The pathology of this disease is characterized by the presence of cardiopathies and gastrointestinal problems in patients during chronic phases. It should be noted that an approach of a structure of the orthosteric site that allows to explain the functionality and the plausible mechanism of reaction is important in order to understand the design of molecular targets or possible resistance generated in chronic phases of the disease. This is why the structural biology has tools such as the homology modelling and the structural assembly by sequence-based fold recognition to construct a model. Besides, the proposed models obtained by comparison with the reported structures are validated through energetic and stereochemical softwares that produce quantitative data, which characterize the structural models. The previous validation would allow to compare two predictive and structural refinement methods to generate the best methodology of elucidation.

Potential Therapeutic Approaches to Alzheimer's Disease By Bioinformatics, Cheminformatics And Predicted Adme-Tox Tools

Article

Full-text available

Dec 2019
CURR NEUROPHARMACOL

Background: Alzheimer's disease (AD) is considered a severe, irreversible and progressive neurodegenerative disorder. Currently, the pharmacological management of AD is based on a few clinically approved acethylcholinesterase (AChE) and N-methyl-D-aspartate (NMDA) receptor ligands, with unclear molecular mechanisms and severe side effects. Methods: Here, we reviewed the most recent bioinformatics, cheminformatics (SAR, drug design, molecular docking, friendly databases, ADME-Tox) and experimental data on relevant structure-biological activity relationships and molecular mechanisms of some natural and synthetic compounds with possible anti-AD effects (inhibitors of AChE, NMDA receptors, beta-secretase, amyloid beta (Aβ), redox metals) or acting on multiple AD targets at once. We considered: (i) in silico supported by experimental studies regarding the pharmacological potential of natural compounds as resveratrol, natural alkaloids, flavonoids isolated from various plants and donepezil, galantamine, rivastagmine and memantine derivatives, (ii) the most important pharmacokinetic descriptors of natural compounds in comparison with donepezil, memantine and galantamine. Results: In silico and experimental methods applied to synthetic compounds led to the identification of new AChE inhibitors, NMDA antagonists, multipotent hybrids targeting different AD processes and metal-organic compounds acting as Aβ inhibitors. Natural compounds appear as multipotent agents, acting on several AD pathways: cholinesterases, NMDA receptors, secretases or Aβ, but their efficiency in vivo and their correct dosage is to be determined. Conclusion: Bioinformatics, cheminformatics and ADME-Tox methods can be very helpful in the quest for an effective anti-AD treatment, allowing the identification of novel drugs, enhancing the druggability of molecular targets and providing a deeper understanding of AD pathological mechanisms.

Critical Structural Defects Explain Filamin A Mutations Causing Mitral Valve Dysplasia

Article

Aug 2019

Mitral valve diseases affect ∼3% of the population and are the most common reasons for valvular surgery because no drug-based treatments exist. Inheritable genetic mutations have now been established as the cause of mitral valve insufficiency, and four different missense mutations in the filamin A gene (FLNA) have been found in patients suffering from nonsyndromic mitral valve dysplasia (MVD). The filamin A (FLNA) protein is expressed, in particular, in endocardial endothelia during fetal valve morphogenesis and is key in cardiac development. The FLNA-MVD-causing mutations are clustered in the N-terminal region of FLNA. How the mutations in FLNA modify its structure and function has mostly remained elusive. In this study, using NMR spectroscopy and interaction assays, we investigated FLNA-MVD-causing V711D and H743P mutations. Our results clearly indicated that both mutations almost completely destroyed the folding of the FLNA5 domain, where the mutation is located, and also affect the folding of the neighboring FLNA4 domain. The structure of the neighboring FLNA6 domain was not affected by the mutations. These mutations also completely abolish FLNA's interactions with protein tyrosine phosphatase nonreceptor type 12, which has been suggested to contribute to the pathogenesis of FLNA-MVD. Taken together, our results provide an essential structural and molecular framework for understanding the molecular bases of FLNA-MVD, which is crucial for the development of new therapies to replace surgery.

A domain based protein structural modelling platform applied in the analysis of alternative splicing

Thesis

Jan 2018

Su Datt Lam

Functional families (FunFams) are a sub-classiﬁcation of CATH protein domain superfamilies that cluster relatives likely to have very similar structures and functions. The functional purity of FunFams has been demonstrated by comparing against experimentally determined Enzyme Commission annotations and by checking whether known functional sites coincide with highly conserved residues in the multiple sequence alignments of FunFams. We hypothesised that clustering relatives into FunFams may help in protein structure modelling. In the ﬁrst work chapter, we demonstrate the structural coherence of domains in FunFams. We then explore the usage of FunFams in protein monomer modelling. The FunFam based protocol produced higher percentages of good models compared to an HHsearch (the state-of-the-art HMM based sequence search tool) based protocol for both close and remote homologs. We developed a modelling pipeline that, utilises the FunFam protocol, and is able to model up to 70% of domain sequences from human and ﬂy genomes. In the second work chapter, we explore the usage of FunFams in protein complex modelling. Our analysis demonstrated that domain-domain interfaces in FunFams tend to be conserved. The FunFam based complex modelling protocol produced signiﬁcantly more good quality models when compared to a BLAST based protocol and slightly better than a HHsearch based protocol. In the ﬁnal work chapter, we employ the FunFam based structural modelling tool to understand the implications of alternative splicing. We focused on isoforms derived from mutually exclusively exons (MXEs) for which there is more enriched in proteomics data. MXEs which could be mapped to structure show a signiﬁcant tendency to be exposed to the solvent, are likely to exhibit a signiﬁcant change in their physiochemical property and to lie close to a known/predicted functional sites. Our results suggest that MXE events may have a number of important roles in cells generally.

The discovery of a new Ebolavirus, Bombali virus, adds further support for bats as hosts of Ebolaviruses

Article

Full-text available

Feb 2019
INT J INFECT DIS

Multi-Template based Homology Modeling of specificity protein 3

Article

Full-text available

Feb 2018

SP3 Transcription factor contains 81,925 Dalton mass, which member of the Kruppel like zinc finger protein family that is clinically relevant for many neuronal transmission diseases. Considering the functional importance and lack of X-ray crystal structure of SP3 TFs protein, present work was undertaken to build the3D structure of aprotein using homology modeling with a multi-template approach. This present study, we chose three different SP3 templates (PDB ID: 3EBT, 4M9E, and 2WBS) were used for homology modeling. Five models were developedwith the help of multiple sequence alignment respect to templates using Modeller 8.0.0 software. All models were refined and ranked as per their overall DOPE-score. The top-ranked predicted model of SP3 TFs had 93.8% of residues in favored regions as revealed by Ramachandran plot and the ERRAT score was 100% which indicated an accurate model. The results of the homology modeling study and the proposed model can be further used for understanding the structural and functional characteristics of SP3 and to gain more insights to the molecular basis of SP3 inhibition through docking and molecular dynamics simulation studies. IndexTerms-Specificity Protein3 (SP3), Multi-template Homology Modeling, Modeller.

Perspectives towards antiviral drug discovery against Ebola virus

Article

Nov 2018
J MED VIROL

Ebola virus disease (EVD), caused by Ebola viruses, resulted in more than 11500 deaths according to a recent 2018 WHO report. With mortality rates up to 90 %, it is nowadays one of the most deadly infectious diseases. However, no FDA approved Ebola drugs or vaccines are available yet with the mainstay of therapy being supportive care. The high fatality rate and absence of effective treatment or vaccination makes Ebola virus a category A biothreat pathogen. Fortunately, a series of investigational countermeasures have been developed to control and prevent this global threat. This review summarizes the recent therapeutic advances and ongoing research progress from R&D to clinical trials in the development of small‐molecule antiviral drugs, small interference RNA molecules, phosphorodiamidate morpholino oligomers, full‐length monoclonal antibodies and vaccines. Moreover, difficulties are highlighted in the search for effective countermeasures against EVD with additional focus on the interplay between available in silico prediction methods and their evidenced potential in antiviral drug discovery.

Secondary Structure-Based Template Selection for Fragment-Assembly Protein Structure Prediction

Thesis

Full-text available

Jul 2018

Jad Abbass

Proteins play critical biochemical roles in all living organisms; in human beings, they are the targets of 50% of all drugs. Although the first protein structure was determined 60 years ago, experimental techniques are still time and cost consuming. Consequently, in silico protein structure prediction, which is considered a main challenge in computational biology, is fundamental to decipher conformations of protein targets. This thesis contributes to the state of the art of fragment-assembly protein structure prediction. This category has been widely and thoroughly studied due to its application to any type of targets. While the majority of research focuses on enhancing the functions that are used to score fragments by incorporating new terms and optimising their weights, another important issue is how to pick appropriate fragments from a large pool of candidate structures. Since prediction of the main structural classes, i.e. mainly-alpha, mainly-beta and alpha-beta, has recently reached quite a high level of accuracy, we have introduced a novel approach by decreasing the size of the pool of candidate structures to comprise only proteins that share the same structural class a target is likely to adopt. Picking fragments from this customised set of known structures not only has contributed in generating decoys with higher level of accuracy but also has eliminated irrelevant parts of the search space which makes the selection of first models a less complicated process, addressing the inaccuracies of energy functions. In addition to the challenge of adopting a unique template structure for all targets, another one arises whenever relying on the same amount of corrections and fine tunings; such a phase may be damaging to “easy’ targets, i.e. those that comprise a relatively significant percentage of alpha helices. Owing to the sequence-structure correlation based on which fragment-based protein structure prediction was born, we have also proposed a customised phase of correction based on the structural class prediction of the target in question. After using secondary structure prediction as a “global feature” of a target, i.e. structural classes, we have also investigated its usage as a “local feature” to customise the number of candidate fragments, which is currently the same at all positions. Relying on the known facts regarding diversity of short fragments of helices, sheets and loops, the fragment insertion process has been adjusted to make “changes” relative to the expected complexity of each region. We have proved in this thesis the extent to which secondary structure features can be used implicitly or explicitly to enhance fragment assembly protein structure prediction.

Navigating Among Known Structures in Protein Space

Chapter

Full-text available

Jan 2019
Meth Mol Biol

Present-day protein space is the result of 3.7 billion years of evolution, constrained by the underlying physicochemical qualities of the proteins. It is difficult to differentiate between evolutionary traces and effects of physicochemical constraints. Nonetheless, as a rule of thumb, instances of structural reuse, or focusing on structural similarity, are likely attributable to physicochemical constraints, whereas sequence reuse, or focusing on sequence similarity, may be more indicative of evolutionary relationships. Both types of relationships have been studied and can provide meaningful insights to protein biophysics and evolution, which in turn can lead to better algorithms for protein search, annotation, and maybe even design.

The discovery of Bombali virus adds further support for bats as hosts of ebolaviruses

Article

Full-text available

Oct 2018

Here we describe the complete genome of a new ebolavirus, Bombali virus (BOMV) detected in free-tailed bats in Sierra Leone (little free-tailed (Chaerephon pumilus) and Angolan free-tailed (Mops condylurus)). The bats were found roosting inside houses, indicating the potential for human transmission. We show that the viral glycoprotein can mediate entry into human cells. However, further studies are required to investigate whether exposure has actually occurred or if BOMV is pathogenic in humans.

Interplay between negative and positive design elements in Gα helical domains of G proteins determines interaction specificity toward RGS2

Article

Jun 2018

Regulators of G protein Signaling (RGS) proteins inactivate Gα subunits, thereby controling G protein-coupled signaling networks. Among all RGS proteins, RGS2 is unique in interacting only with the Gαq and not with the Gαi sub-family. Previous studies suggested that this specificity is determined by the RGS domain, and in particular by three RGS2-specific residues that lead to a unique mode of interaction with Gαq This interaction was further proposed to act through contacts with the Gα GTPase domain. Here, we combined energy calculations and GTPase activity measurements to determine which Gα residues dictate specificity toward RGS2. We identified putative specificity-determining residues in the Gα helical domain, which among G proteins is found only in Gα subunits. Replacing these helical domain residues in Gαi with their Gαq counterparts resulted in a dramatic specificity-switch towards RGS2. We further show that Gα-RGS2 specificity is set by Gαi residues that perturb interactions with RGS2, and by Gαq residues that enhance these interactions. These results show, for the first time, that the Gα helical domain is central to dictating specificity towards RGS2, suggesting this domain plays a general role in governing Gα-RGS specificity. Our insights provide new options for manipulating RGS-G protein interactions in vivo , for better understanding of their "wiring" into signaling networks, and for devising novel drugs targeting such interactions.

Hybrid approach to structure modeling of the histamine H3 receptor: Multi-level assessment as a tool for model verification

Article

Full-text available

Oct 2017
PLOS ONE

The crucial role of G-protein coupled receptors and the significant achievements associated with a better understanding of the spatial structure of known receptors in this family encouraged us to undertake a study on the histamine H3 receptor, whose crystal structure is still unresolved. The latest literature data and availability of different software enabled us to build homology models of higher accuracy than previously published ones. The new models are expected to be closer to crystal structures; and therefore, they are much more helpful in the design of potential ligands. In this article, we describe the generation of homology models with the use of diverse tools and a hybrid assessment. Our study incorporates a hybrid assessment connecting knowledge-based scoring algorithms with a two-step ligand-based docking procedure. Knowledge-based scoring employs probability theory for global energy minimum determination based on information about native amino acid conformation from a dataset of experimentally determined protein structures. For a two-step docking procedure two programs were applied: GOLD was used in the first step and Glide in the second. Hybrid approaches offer advantages by combining various theoretical methods in one modeling algorithm. The biggest advantage of hybrid methods is their intrinsic ability to self-update and self-refine when additional structural data are acquired. Moreover, the diversity of computational methods and structural data used in hybrid approaches for structure prediction limit inaccuracies resulting from theoretical approximations or fuzziness of experimental data. The results of docking to the new H3 receptor model allowed us to analyze ligand— receptor interactions for reference compounds.

Non-base-contacting residues enable kaleidoscopic evolution of metazoan C2H2 zinc finger DNA binding

Article

Full-text available

Sep 2017
GENOME BIOL

Background The C2H2 zinc finger (C2H2-ZF) is the most numerous protein domain in many metazoans, but is not as frequent or diverse in other eukaryotes. The biochemical and evolutionary mechanisms that underlie the diversity of this DNA-binding domain exclusively in metazoans are, however, mostly unknown. Results Here, we show that the C2H2-ZF expansion in metazoans is facilitated by contribution of non-base-contacting residues to DNA binding energy, allowing base-contacting specificity residues to mutate without catastrophic loss of DNA binding. In contrast, C2H2-ZF DNA binding in fungi, plants, and other lineages is constrained by reliance on base-contacting residues for DNA-binding functionality. Reconstructions indicate that virtually every DNA triplet was recognized by at least one C2H2-ZF domain in the common progenitor of placental mammals, but that extant C2H2-ZF domains typically bind different sequences from these ancestral domains, with changes facilitated by non-base-contacting residues. Conclusions Our results suggest that the evolution of C2H2-ZFs in metazoans was expedited by the interaction of non-base-contacting residues with the DNA backbone. We term this phenomenon “kaleidoscopic evolution,” to reflect the diversity of both binding motifs and binding motif transitions and the facilitation of their diversification. Electronic supplementary material The online version of this article (doi:10.1186/s13059-017-1287-y) contains supplementary material, which is available to authorized users.

Homology Modeling: an Overview of Fundamentals and Tools

Article

Full-text available

Apr 2017

Resolving the three dimensional structure of a protein is a critical step in modern drug discovery today. Homology modeling is a powerful tool that can efficiently predict protein structures from their amino acid sequence. Although it might sound simple enough, homology modeling, in fact, has to pass through several sophisticated steps before it can predict an accurate structure of a protein. These steps include template identification, alignment with the template, model construction and many post-modeling processes. Here, we describe in details these different steps, discuss the strengths and limitations of the methods and list a number of successful homology modelling applications in the literature. The objective of this review is to shed light on this extremely useful tool and highlight many case studies in this area of active research.

Prediction of Wild Type Enzyme Thermostability from Amino Acid Sequences

Thesis

Full-text available

Apr 2012

Alireza Kashani

The number of sequenced genomes continues to increase; however, experimentally characterized enzymes remain patchy. Experimentally characterization of enzymes, e.g. measuring thermostability, is time-consuming and there are required procedures for this type of characterization such as to clone, express and purification which makes it an even expensive task. Predicting the temperature at which enzymes function more effectively, from amino acid sequences, is highly valuable for any enzymes discovery project. In this thesis, a framework to predict thermostability of seven different Glycoside Hydrolase enzymes from their amino acid sequences is presented. Different structural and sequence-based features are used for training the Gaussian process regressor. A novel covariance function for Gaussian processes, specifically for protein sequences, has been introduced, which suits for different protein property predictions. Finally, Lasso with stability selection approach ranked all features and most predictive ones are reported. The final model is evaluated through a cross-validation procedure and also by supplying an external evaluation set. Experiments show that the presented model has a potential to be used for enzyme discovery projects.

A Template-Based Protein Structure Reconstruction Method Using Deep Autoencoder Learning

Article

Full-text available

Dec 2016

Haiou Li

Protein structure prediction is an important problem in computational biology, and is widely applied to various biomedical problems such as protein function study, protein design, and drug design. In this work, we developed a novel deep learning approach based on a deeply stacked denoising autoencoder for protein structure reconstruction. We applied our approach to a template-based protein structure prediction using only the 3D structural coordinates of homologous template proteins as input. The templates were identified for a target protein by a PSI-BLAST search. 3DRobot (a program that automatically generates diverse and well-packed protein structure decoys) was used to generate initial decoy models for the target from the templates. A stacked denoising autoencoder was trained on the decoys to obtain a deep learning model for the target protein. The trained deep model was then used to reconstruct the final structural model for the target sequence. With target proteins that have highly similar template proteins as benchmarks, the GDT-TS score of the predicted structures is greater than 0.7, suggesting that the deep autoencoder is a promising method for protein structure reconstruction.

Structural models of the human iron exporter ferroportin in the inward- and outward- open states

Article

Jun 2016

Motivation Ferroportin (Fpn) is a membrane protein belonging to the Major Facilitator Superfamily of transporters. It is the only vertebrate iron exporter known so far. Several Fpn mutations lead to the so-called ‘ferroportin disease’ or type 4 haemochromatosis, characterized by two distinct iron accumulation phenotypes depending on whether the mutations affects the protein’s activity or its degradation pathway (1). Despite a general agreement of the scientific community on a 12 transmembrane helices topology, no experimental data are available on human Fpn (HsFpn) three-dimensional structure. Thus, important features of HsFpn remain to be clarified. Recently, the crystal structures of a HsFpn homologue from the predatory Gram-negative bacterium Bdellovibrio bacteriovorus (BbFPN), in both the outward- (Figure 1 A) and inward-open states (Figure 1 B), has been reported (2). The residues essential for iron binding and transport in HsFpn are conserved in BbFPN (3). The conservation of these functionally relevant residues prompted us to exploit the two BbFPN structures to construct reliable models of HsFPN. Methods The structural models of HsFpn in were built in both the outward- and inward-open states through the ab initio/threading strategy implemented in the I-TASSER server (4). The overall quality of the models generated has been evaluated using PROCHECK (5) and the model quality parameters provided in the I-TASSER output, such as the C-score. Putative iron binding sites have been detected using LIBRA (6). Results The models display the typical fold of MFS proteins with 12 TMs spanning the membrane and the N- and C-termini located on the intracellular side (Figure 1). LIBRA analysis of the models has led to the identification of potential iron binding sites in the inward-open state allowing to propose an iron traslocation mechanism. Further, the outward-open model uncovers details of the interaction site of the peptide hormone hepcidin, a regulator of HsFpn function. Fin ally, the HsFPN models provide a mechanistic interpretation for the disease-related mutations that cause hereditary hemochromatosis.

Modeling the Role of Epitope Arrangement on Antibody Binding Stoichiometry in Flaviviruses

Article

Full-text available

Oct 2016

Cryo-electron-microscopy (cryo-EM) structures of flaviviruses reveal significant variation in epitope occupancy across different monoclonal antibodies that have largely been attributed to epitope-level differences in conformation or accessibility that affect antibody binding. The consequences of these variations for macroscopic properties such as antibody binding and neutralization are the results of the law of mass action—a stochastic process of innumerable binding and unbinding events between antibodies and the multiple binding sites on the flavivirus in equilibrium—that cannot be directly imputed from structure alone. We carried out coarse-grained spatial stochastic binding simulations for nine flavivirus antibodies with epitopes defined by cryo-EM or x-ray crystallography to assess the role of epitope spatial arrangement on antibody-binding stoichiometry, occupancy, and neutralization. In our simulations, all epitopes were equally competent for binding, representing the upper limit of binding stoichiometry that results from epitope spatial arrangement alone. Surprisingly, our simulations closely reproduced the relative occupancy and binding stoichiometry observed in cryo-EM, without having to account for differences in epitope accessibility or conformation, suggesting that epitope spatial arrangement alone may be sufficient to explain differences in binding occupancy and stoichiometry between antibodies. Furthermore, we found that there was significant heterogeneity in binding configurations even at saturating antibody concentrations, and that bivalent antibody binding may be more common than previously thought. Finally, we propose a structure-based explanation for the stoichiometric threshold model of neutralization.

Subdomain dynamics enable chemical chain reactions in non-ribosomal peptide synthetases

Article

Full-text available

Dec 2023
NAT CHEM

Many peptide-derived natural products are produced by non-ribosomal peptide synthetases (NRPSs) in an assembly-line fashion. Each amino acid is coupled to a designated peptidyl carrier protein (PCP) through two distinct reactions catalysed sequentially by the single active site of the adenylation domain (A-domain). Accumulating evidence suggests that large-amplitude structural changes occur in different NRPS states; yet how these molecular machines orchestrate such biochemical sequences has remained elusive. Here, using single-molecule Förster resonance energy transfer, we show that the A-domain of gramicidin S synthetase I adopts structurally extended and functionally obligatory conformations for alternating between adenylation and thioester-formation structures during enzymatic cycles. Complementary biochemical, computational and small-angle X-ray scattering studies reveal interconversion among these three conformations as intrinsic and hierarchical where intra-A-domain organizations propagate to remodel inter-A–PCP didomain configurations during catalysis. The tight kinetic coupling between structural transitions and enzymatic transformations is quantified, and how the gramicidin S synthetase I A-domain utilizes its inherent conformational dynamics to drive directional biosynthesis with a flexibly linked PCP domain is revealed.

Identification, morphological, biochemical, and genetic characterization of microorganisms

Chapter

Jan 2023

Computational strategies and tools for protein tertiary structure prediction

Chapter

Jan 2023

Proteins are the basic biological units of life responsible for almost every function within the body. The three-dimensional structure of the protein that represents its native state is critical for the biochemical activity of a protein. The information for proper folding of a protein is hidden in its primary sequence. Hence, several strategies are commonly used for predicting the tertiary structure of a protein from its sequence. A typical protein structure prediction strategy homology modeling is employed for targets which have homologous proteins with high sequence similarity and known structure. It involves the identification of a suitable template structure from which the three-dimensional information for a query sequence can be extrapolated. Some protein targets may share only structure-level homology with proteins with similar folds. Fold recognition method comprises identification of such remote homologs that needs more sensitive search for relevant structural folds. If a structural homolog for the target sequence is unavailable, template-free methods including ab initio modeling can be used. However, template-based methods are preferred as template-free modeling methods are much less reliable and are usually applicable for smaller proteins. More recent automated hybrid strategies include amalgamation of both template based and template-free prediction strategies to obtain protein structure models with high accuracy. Advancement in computational techniques and application of deep learning in protein structure prediction has enabled crystal structure resolution predictions. In this book chapter, we discuss strategies and highlight various tools for protein tertiary structure prediction.

Protein Predictive Modeling and Simulation of Mutations of Presenilin-1 Familial Alzheimer's Disease on the Orthosteric Site

Article

Full-text available

Jun 2021

Alzheimer's disease pathology is characterized by β-amyloid plaques and neurofibrillary tangles. Amyloid precursor protein is processed by β and γ secretase, resulting in the production of β-amyloid peptides with a length ranging from 38 to 43 amino acids. Presenilin 1 (PS1) is the catalytic unit of γ-secretase, and more than 200 PS1 pathogenic mutations have been identified as causative for Alzheimer's disease. A complete monocrystal structure of PS1 has not been determined so far due to the presence of two flexible domains. We have developed a complete structural model of PS1 using a computational approach with structure prediction software. Missing fragments Met1-Glut72 and Ser290-Glu375 were modeled and validated by their energetic and stereochemical characteristics. Then, with the complete structure of PS1, we defined that these fragments do not have a direct effect in the structure of the pore. Next, we used our hypothetical model for the analysis of the functional effects of PS1 mutations Ala246GLu, Leu248Pro, Leu248Arg, Leu250Val, Tyr256Ser, Ala260Val, and Val261Phe, localized in the catalytic pore. For this, we used a quantum mechanics/ molecular mechanics (QM/MM) hybrid method, evaluating modifications in the topology, potential surface density, and electrostatic potential map of mutated PS1 proteins. We found that each mutation exerts changes resulting in structural modifications of the active site and in the shape of the pore. We suggest this as a valid approach for functional studies of PS1 in view of the possible impact in substrate processing and for the design of targeted therapeutic strategies.

Negative Image-Based Screening: Rigid Docking Using Cavity Information

Chapter

Mar 2021

Rational drug discovery relies heavily on molecular docking-based virtual screening, which samples flexibly the ligand binding poses against the target protein’s structure. The upside of flexible docking is that the geometries of the generated docking poses are adjusted to match the residue alignment inside the target protein’s ligand-binding pocket. The downside is that the flexible docking requires plenty of computing resources and, regardless, acquiring a decent level of enrichment typically demands further rescoring or post-processing. Negative image-based screening is a rigid docking technique that is ultrafast and computationally light but also effective as proven by vast benchmarking and screening experiments. In the NIB screening, the target protein cavity’s shape/electrostatics is aligned and compared against ab initio-generated ligand 3D conformers. In this chapter, the NIB methodology is explained at the practical level and both its weaknesses and strengths are discussed candidly.

Assessment of the Interaction of Aggregatin Protein with Amyloid-Beta (Aβ) at the Molecular Level via In Silico Analysis

Article

Full-text available

Oct 2020
ACTA CHIM SLOV

Alzheimer's disease is a major neurodegenerative illness whose prevalence is increasing worldwide but the molecular mechanism remains unclear. There is some scientific evidence that the molecular complexity of Alzheimer's pathophys-iology is associated with the formation of extracellular amyloid-beta plaques in the brain. A novel cross-phenotype association analysis of imaging genetics reported a brain atrophy susceptibility gene, namely FAM222A and the protein Aggregatin encoded by FAM222A interacts with amyloid-beta (Aβ)-peptide (1-42) through its N-terminal Aβ binding domain and facilitates Aβ aggregation. The function of Aggregatin protein is unknown, and its three-dimensional structure has not been analyzed experimentally yet. Our goal was to investigate the interaction of Aggregatin with Aβ in detail by in silico analysis, including the 3D structure prediction analysis of Aggregatin protein by homology modeling. Our analysis verified the interaction of the C-terminal domain of model protein with the N-terminal domain of Aβ. This is the first attempt to demonstrate the interaction of Aggregatin with the Aβ. These results confirmed in vitro and in vivo study reports claiming FAM222A helping to ease the aggregating of the Aβ-peptide.

A sweep of earth's virome reveals host-guided viral protein structural mimicry; with implications for human disease

Preprint

Full-text available

Jun 2020

Viruses deploy an array of genetically encoded strategies to coopt host machinery and support viral replicative cycles. Molecular mimicry, manifested by structural similarity between viral and endogenous host proteins, allow viruses to harness or disrupt cellular functions including nucleic acid metabolism and modulation of immune responses. Here, we use protein structure similarity to scan for virally encoded structure mimics across thousands of catalogued viruses and hosts spanning broad ecological niches and taxonomic range, including bacteria, plants and fungi, invertebrates and vertebrates. Our survey identified over 6,000,000 instances of structural mimicry, the vast majority of which (>70%) cannot be discerned through protein sequence. The results point to molecular mimicry as a pervasive strategy employed by viruses and indicate that the protein structure space used by a given virus is dictated by the host proteome. Interrogation of proteins mimicked by human-infecting viruses points to broad diversification of cellular pathways targeted via structural mimicry, identifies biological processes that may underly autoimmune disorders, and reveals virally encoded mimics that may be leveraged to engineer synthetic metabolic circuits or may serve as targets for therapeutics. Moreover, the manner and degree to which viruses exploit molecular mimicry varies by genome size and nucleic acid type, with ssRNA viruses circumventing limitations of their small genomes by mimicking human proteins to a greater extent than their large dsDNA counterparts. Finally, we identified over 140 cellular proteins that are mimicked by CoV, providing clues about cellular processes driving the pathogenesis of the ongoing COVID-19 pandemic.

A Structure-Informed Atlas of Human-Virus Interactions

Article

Aug 2019
CELL

While knowledge of protein-protein interactions (PPIs) is critical for understanding virus-host relationships, limitations on the scalability of high-throughput methods have hampered their identification beyond a number of well-studied viruses. Here, we implement an in silico computational framework (pathogen host interactome prediction using structure similarity [P-HIPSTer]) that employs structural information to predict ∼282,000 pan viral-human PPIs with an experimental validation rate of ∼76%. In addition to rediscovering known biology, P-HIPSTer has yielded a series of new findings: the discovery of shared and unique machinery employed across human-infecting viruses, a likely role for ZIKV-ESR1 interactions in modulating viral replication, the identification of PPIs that discriminate between human papilloma viruses (HPVs) with high and low oncogenic potential, and a structure-enabled history of evolutionary selective pressure imposed on the human proteome. Further, P-HIPSTer enables discovery of previously unappreciated cellular circuits that act on human-infecting viruses and provides insight into experimentally intractable viruses. * *****For full text, use the following link: https://urldefense.proofpoint.com/v2/url?u=https-3A__authors.elsevier.com_a_1ZgzdL7PXYWE3&d=DwMFaQ&c=G2MiLlal7SXE3PeSnG8W6_JBU6FcdVjSsBSbw6gcR0U&r=xWpa-_nAdyRW5ove16DFW0YLYQ8bWn78dW2PexZbQn0&m=3rM8bRXKTWv8jItQlM3u5QdOHUHpd_HMxfK3Fkh4xC0&s=urlHfmz_0dbIt6r38994XP_Cg6C1qefNburPd_nRQ7s&e=

Novel Inhibitory Function of the Rhizomucor miehei Lipase Propeptide and Three-Dimensional Structures of Its Complexes with the Enzyme

Article

Full-text available

Jun 2019

Many proteins are synthesized as precursors, with propeptides playing a variety of roles such as assisting in folding or preventing them from being active within the cell. While the precise role of the propeptide in fungal lipases is not completely understood, it was previously reported that mutations in the propeptide region of the Rhizomucor miehei lipase have an influence on the activity of the mature enzyme, stressing the importance of the amino acid composition of this region. We here report two structures of this enzyme in complex with its propeptide, which suggests that the latter plays a role in the correct maturation of the enzyme. Most importantly, we demonstrate that the propeptide shows inhibition of lipase activity in standard lipase assays and propose that an important role of the propeptide is to ensure that the enzyme is not active during its expression pathway in the original host.

Predicting Protein–Protein Interfaces that Bind Intrinsically Disordered Protein Regions

Article

Jun 2019

A long-standing goal in biology is the complete annotation of function and structure on all protein-protein interactions, a large fraction of which is mediated by intrinsically disordered protein regions (IDRs). However, knowledge derived from experimental structures of such protein complexes is disproportionately small due, in part, to challenges in studying interactions of IDRs. Here, we introduce IDRBind, a computational method that by combining gradient boosted trees and conditional random field models predicts binding sites of IDRs with performance approaching state-of-the-art globular interface predictions, making it suitable for proteome-wide applications. Although designed and trained with a focus on molecular recognition features, which are long interaction-mediating-elements in IDRs, IDRBind also predicts the binding sites of short peptides more accurately than existing specialized predictors. Consistent with IDRBind's specificity, a comparison of protein interface categories uncovered uniform trends in multiple physicochemical properties, positioning molecular recognition feature interfaces between peptide and globular interfaces.

In Vitro Study on Virulence Potentials of Burkholderia pseudomallei Isolated from Immunocompromised Patients

Article

Full-text available

May 2012

Rana Abdulnabi

Eighty four throat swabs were obtained from Basrah General Hospital inpatients (N = 34): 17 were suffering from renal failure and the other 17 were diabetics; and from outpatients (N = 50). Throat swabs were cultured first in the selective media Ashdown’s broth then subcultured on Ashdown’s agar to isolate Burkholderia pseudomallei which was recovered from seven cases (8.33%). Four isolates were from renal failure patients (23.53%), two from diabetic patients (11.76%) and the seventh isolate was from an outpatient with tonsillitis. All isolates were able to produce capsules, form filament chains, exhibit swarming motility and were arabinose non assimilators (Ara-) indicative of their virulence. Additionally, isolated B. pseudomallei were found to produce protease, lipase, hemolysin, and lecithinase and were able to produce biofilm, the root of many troublesome persistent infections that resist antibiotic treatment. Susceptibility of the seven isolates of B. pseudomallei toward 11 antibiotics was assessed, isolates were found multiply resistant to all antibiotics apart from ciproflaxin. This study confirms for the first time isolation of B. pseudomallei from immunocompromised patients in Basrah city of Iraq and describes their virulence potentials. Key words: B. pseudomallei, virulence potentials, biofilm, antibiotic susceptibility, immunocompromised patients

Homology Modeling in Drug Discovery: Overview, Current Applications and Future Perspectives

Article

Full-text available

Jan 2019

Homology modeling is one of the computational structure prediction methods that are used to determine protein 3D structure from its amino acid sequence. It is considered to be the most accurate of the computational structure prediction methods. It consists of multiple steps that are straightforward and easy to apply. There are many tools and servers that are used for homology modeling. There is no single modeling program or server which is superior in every aspect to others. Since the functionality of the model depends on the quality of the generated protein 3D structure, maximizing the quality of homology modeling is crucial. Homology modeling has many applications in the drug discovery process. Since drugs interact with receptors, which consists mainly of proteins in their structure, protein 3D structure determination, and thus homology modeling is important in drug discovery. Accordingly, there has been the clarification of protein interactions using 3D structures of proteins that are built with homology modeling. This contributes to the identification of novel drug candidates. Homology modeling plays an important role in making drug discovery faster, easier, cheaper and more practical. As new modeling methods and combinations are introduced, the scope of its applications widens. This article is protected by copyright. All rights reserved.

Compound Molecular Logic in Accessing the Active Site of M. Tuberculosis Protein Tyrosine Phosphatase B

Article

Oct 2018

Protein tyrosine phosphatase B (PtpB) from Mycobacterium tuberculosis (Mtb) extends the bacteria's survival in hosts and hence is a potential target for Mtb-specific drugs. To study how Mtb-specific sequence insertions in PtpB may regulate access to its active site through large-amplitude conformational changes, we performed free-energy calculations using an all-atom explicit solvent model. Corroborated by biochemical assays, the results show that PtpB's active site is controlled via an "either/or" compound conformational gating mechanism---an unexpected discovery that Mtb has evolved to bestow a single enzyme with such intricate logical operations. In addition to providing unprecedented insights for its active-site surroundings, the findings also suggest new ways of inactivating PtpB.

Detecting Amino Acid Coevolution with Bayesian Graphical Models

Chapter

Jan 2019
Meth Mol Biol

The comparative study of homologous proteins can provide abundant information about the functional and structural constraints on protein evolution. For example, an amino acid substitution that is deleterious may become permissive in the presence of another substitution at a second site of the protein. A popular approach for detecting coevolving residues is by looking for correlated substitution events on branches of the molecular phylogeny relating the protein-coding sequences. Here we describe a machine learning method (Bayesian graphical models) implemented in the open-source phylogenetic software package HyPhy, http://hyphy.org, for extracting a network of coevolving residues from a sequence alignment.

Threonine 67 is a key component in the coupling of the NSS amino acid transporter KAAT1

Article

Jan 2018
BBA-BIOMEMBRANES

The crystallizations of the prokaryotic LeuT and of the eukaryotic DAT and SERT transporters represent important steps forward in the comprehension of the molecular physiology of Neurotransmitter:Sodium Symporters, although the molecular determinants of the coupling mechanism and of ion selectivity still remain to be fully elucidated. The insect NSS homologue KAAT1 exhibits unusual physiological features, such as the ability to use K+ as the driver ion, weak chloride dependence, and the ability of the driver ion to influence the substrate selectivity; these characteristics can help to define the molecular determinants of NSS function. Two non-conserved residues are present in the putative sodium binding sites of KAAT1: Ala 66, corresponding to Gly 20 in the Na2 site of LeuT, and Ser 68, corresponding to Ala 22 in the Na1 site. Thr 67 appears also to be significant since it is not conserved among NSS members, is present as threonine only in KAAT1 and in the paralogue CAATCH1 and, according to LeuT structure, is close to the amino acid binding site. Mutants of these residues were functionally characterized in Xenopus oocytes. The T67Y mutant exhibited uptake activity comparable to that of the wild type, but fully chloride-independent and with enhanced stereoselectivity. Interestingly, although dependent on the presence of sodium, the mutant showed reduced transport-associated currents, indicating uncoupling of the driver ion and amino acid fluxes. Thr 67 therefore appears to be a key component in the coupling mechanism, participating in a network that influences the cotransport of Na+ and the amino acid.

Comparative Protein Structure Modelling

Chapter

Apr 2017

Andras Fiser

A prerequisite to understand cell functioning on the system level is the knowledge of three-dimensional protein structures that mediate biochemical interactions. The explosion in the number of available gene sequences set the stage for the next step in genome scale projects, to obtain three dimensional structures for each protein. To achieve this ambitious goal, the costly and slow structure determination experiments are boosted with theoretical approaches. The current state and recent advances in structure modelling approaches are reviewed here, with special emphasis on comparative structure modelling techniques.

13 C ENDOR Spectroscopy of Lipoxygenase–Substrate Complexes Reveals the Structural Basis for C–H Activation by Tunneling

Article

Jan 2017

In enzymatic C-H activation by hydrogen tunneling, reduced barrier width is important for efficient hydrogen wave function overlap during catalysis. For native enzymes displaying nonadiabatic tunneling, the dominant reactive hydrogen donor-acceptor distance (DAD) is typically ca. 2.7 Å, considerably shorter than normal van der Waals distances. Without a ground state substrate-bound structure for the prototypical nonadiabatic tunneling system, soybean lipoxygenase (SLO), it has remained unclear whether the requisite close tunneling distance occurs through an unusual ground state active site arrangement or by thermally sampling conformational substates. Herein, we introduce Mn(2+) as a spin-probe surrogate for the SLO Fe ion; X-ray diffraction shows Mn-SLO is structurally faithful to the native enzyme. (13)C ENDOR then reveals the locations of (13)C10 and reactive (13)C11 of linoleic acid relative to the metal; (1)H ENDOR and molecular dynamics simulations of the fully solvated SLO model using ENDOR-derived restraints give additional metrical information. The resulting three-dimensional representation of the SLO active site ground state contains a reactive (a) conformer with hydrogen DAD of ∼3.1 Å, approximately van der Waals contact, plus an inactive (b) conformer with even longer DAD, establishing that stochastic conformational sampling is required to achieve reactive tunneling geometries. Tunneling-impaired SLO variants show increased DADs and variations in substrate positioning and rigidity, confirming previous kinetic and theoretical predictions of such behavior. Overall, this investigation highlights the (i) predictive power of nonadiabatic quantum treatments of proton-coupled electron transfer in SLO and (ii) sensitivity of ENDOR probes to test, detect, and corroborate kinetically predicted trends in active site reactivity and to reveal unexpected features of active site architecture.

The PSIPRED protein structure prediction server

Article

Full-text available

Apr 2000

The PSIPRED protein structure prediction server allows users to submit a protein sequence, perform a prediction of their choice and receive the results of the prediction both textually via e-mail and graphically via the web. The user may select one of three prediction methods to apply to their sequence: PSIPRED, a highly accurate secondary structure prediction method; MEMSAT 2, a new version of a widely used transmembrane topology prediction method; or GenTHREADER, a sequence profile based fold recognition method. Availability: Freely available to non-commercial users at http://globin.bio.warwick.ac.uk/psipred/

PHD – an Automatic Mail Server for Protein Secondary Structure Prediction

Article

Full-text available

Feb 1994

By the middle of 1993, > 30,000 protein sequences has been listed. For 1000 of these, the three-dimensional (tertiary) structure has been experimentally solved. Another 7000 can be modelled by homology. For the remaining 21,000 sequences, secondary structure prediction provides a rough estimate of structural features. Predictions in three states range between 35% (random) and 88% (homology modelling) overall accuracy. Using information about evolutionary conservation as contained in multiple sequence alignments, the secondary structure of 4700 protein sequences was predicted by the automatic e-mail server PHD. For proteins with at least one known homologue, the method has an expected overall three-state accuracy of 71.4% for proteins with at least one known homologue (evaluated on 126 unique protein chains).

VERIFY3D: Assessment of Protein Models with Three-Dimensional Profiles

Article

Full-text available

Feb 1997
METHOD ENZYMOL

The three-dimensional (3D) profile of a protein structure is a table computed from the atomic coordinates of the structure that can be used to score the compatibility of the 3D structure model with any amino acid sequence. Three-dimensional profiles computed from correct protein structures match their own sequences with high scores. An incorrectly modeled segment in an otherwise correct structure can be identified by examining the profile score in a moving-window scan. Thus, the correctness of a protein model can be verified by its 3D profile, regardless of whether the model has been derived by X-ray, nuclear magnetic resonance (NMR), or computational procedures. For this reason, 3D profiles are useful in the evaluation of undetermined protein models, based on low-resolution electron-density maps, on NMR spectra with inadequate distance constraints, or on computational procedures. An advantage of using 3D profiles for testing models is that profiles have not themselves been used in the determination of the structure. Traditional R-factor tests in X-ray analysis depend on the comparison of observed properties—that is, the X-ray structure factor magnitudes with the same property calculated from the final protein model.

Jpred: A Consensus Secondary Structure Prediction Server

Article

Full-text available

Feb 1998

Unlabelled: An interactive protein secondary structure prediction Internet server is presented. The server allows a single sequence or multiple alignment to be submitted, and returns predictions from six secondary structure prediction algorithms that exploit evolutionary information from multiple sequences. A consensus prediction is also returned which improves the average Q3 accuracy of prediction by 1% to 72.9%. The server simplifies the use of current prediction algorithms and allows conservation patterns important to structure and function to be identified. Availability: http://barton.ebi.ac.uk/servers/jpred.h tml Contact: geoff@ebi.ac.uk

SCOP: A structural classification of proteins database for the investigation of sequences and structures

Article

Apr 1995

To facilitate understanding of, and access to, the information available for protein structures, we have constructed the Structural Classification of Proteins (scop) database. This database provides a detailed and comprehensive description of the structural and evolutionary relationships of the proteins of known structure. It also provides for each entry Links to co-ordinates, images of the structure, interactive viewers, sequence data and literature references. Two search facilities are available. The homology search permits users to enter a sequence and obtain a list of any structures to which it has significant levels of sequence similarity The key word search finds, for a word entered by the user, matches from both the text of the scop database and the headers of Brookhaven Protein Databank structure files. The database is freely accessible on World Wide Web (WWW) with an entry point to URL http://scop.mrc-lmb.cam.ac.uk/scop/ scop: an old English poet or minstrel (Oxford English Dictionary); ckon: pile, accumulation (Russian Dictionary).

Assessment of protein models with 3D profiles

Article

Apr 1992

As methods for determining protein three-dimensional (3D) structure develop, a continuing problem is how to verify that the final protein model is correct. The revision of several protein models to correct errors has prompted the development of new criteria for judging the validity of X-ray and NMR structures, as well as the formation of energetic and empirical methods to evaluate the correctness of protein models. The challenge is to distinguish between a mistraced or wrongly folded model, and one that is basically correct, but not adequately refined. We show that an effective test of the accuracy of a 3D protein model is a comparison of the model to its own amino-acid sequence, using a 3D profile, computed from the atomic coordinates of the structure 3D profiles of correct protein structures match their own sequences with high scores. In contrast, 3D profiles for protein models known to be wrong score poorly. An incorrectly modelled segment in an otherwise correct structure can be identified by examining the profile score in a moving-window scan. The accuracy of a protein model can be assessed by its 3D profile, regardless of whether the model has been derived by X-ray, NMR or computational procedures.

Protein folding and association: Insights from the interfacial and thermodynamic properties of hydrocarbons

Article

Dec 1991

We demonstrate in this work that the surface tension, water-organic solvent, transfer-free energies and the thermodynamics of melting of linear alkanes provide fundamental insights into the nonpolar driving forces for protein folding and protein binding reactions. We first develop a model for the curvature dependence of the hydrophobic effect and find that the macroscopic concept of interfacial free energy is applicable at the molecular level. Application of a well-known relationship involving surface tension and adhesion energies reveals that dispersion forces play little or no net role in hydrophobic interactions; rather, the standard model of disruption of water structure (entropically driven at 25 degrees C) is correct. The hydrophobic interaction is found, in agreement with the classical picture, to provide a major driving force for protein folding. Analysis of the melting behavior of hydrocarbons reveals that close packing of the protein interior makes only a small free energy contribution to folding because the enthalpic gain resulting from increased dispersion interactions (relative to the liquid) is countered by the freezing of side chain motion. The identical effect should occur in association reactions, which may provide an enormous simplification in the evaluation of binding energies. Protein binding reactions, even between nearly planar or concave/convex interfaces, are found to have effective hydrophobicities considerably smaller than the prediction based on macroscopic surface tension. This is due to the formation of a concave collar region that usually accompanies complex formation. This effect may preclude the formation of complexes between convex surfaces.

SCOP: A Structural Classification Of Proteins Database For The Investigation Of Sequences And Structures

Article

May 1995

To facilitate understanding of, and access to, the information available for protein structures, we have constructed the Structural Classification of Proteins (scop) database. This database provides a detailed and comprehensive description of the structural and evolutionary relationships of the proteins of known structure. It also provides for each entry links to co-ordinates, images of the structure, interactive viewers, sequence data and literature references. Two search facilities are available. The homology search permits users to enter a sequence and obtain a list of any structures to which it has significant levels of sequence similarity. The key word search finds, for a word entered by the user, matches from both the text of the scop database and the headers of Brookhaven Protein Databank structure files. The database is freely accessible on World Wide Web (WWW) with an entry point to URL http: parallel scop.mrc-lmb.cam.ac.uk magnitude of scop.

SSAP: Sequential Structure Alignment Program for Protein Structure Comparison

Article

Feb 1996
METHOD ENZYMOL

Fold prediction by a hierarchy of sequence and threading methods

Article

Jun 1998

Several fold recognition algorithms are compared to each other in terms of prediction accuracy and significance. It is shown that on standard benchmarks, hybrid methods, which combine scoring based on sequence‐sequence and sequence‐structure matching, surpass both sequence and threading methods in the number of accurate predictions. However, the sequence similarity contributes most to the prediction accuracy. This strongly argues that most examples of apparently nonhomologous proteins with similar folds are actually related by evolution. While disappointing from the perspective of the fundamental understanding of protein folding, this adds a new significance to fold recognition methods as a possible first step in function prediction. Despite hybrid methods being more accurate at fold prediction than either the sequence or threading methods, each of the methods is correct in some cases where others have failed. This partly reflects a different perspective on sequence/structure relationship embedded in various methods. To combine predictions from different methods, estimates of significance of predictions are made for all methods. With the help of such estimates, it is possible to develop a “jury” method, which has accuracy higher than any of the single methods. Finally, building full three‐dimensional models for all top predictions helps to eliminate possible false positives where alignments, which are optimal in the one‐dimensional sequences, lead to unsolvable sterical conflicts for the full three‐dimensional models.

An integrated approach to the analysis and modeling of protein sequences and structures. I. Protein structural alignment and a quantitative measure for protein structural distance

Article

Sep 2000

We have devised and implemented in PrISM (protein informatics system for modeling) a new measure of protein structural relationships, the protein structural distance (PSD). The PSD is designed to describe relationships between protein structures in quantitative rather than descriptive terms and is applicable both when two structures are very similar, and when they are very different. It is calculated with a structural alignment procedure that uses double dynamic programming to align secondary structure elements and an iterative rigid body superposition that minimizes the root-mean-square deviation of C(alpha) atoms. The alignment algorithm, as implemented on a modest workstation, is computationally efficient, allowing for large-scale structural comparisons. PSD scores for more than one and a half million pairs of proteins were calculated and compared to the discrete classification of proteins in the SCOP database. The PSD scores, which were obtained automatically, are in large part consistent with the manually derived classifications in SCOP. Discrepancies do arise, however, due, in part, to the fact that SCOP uses criteria other than structural similarity to derive classifications while the PrISM procedure is exclusively structure based. Analysis of PSD scores suggests that there is a continuous aspect of protein conformation space, even though various classification schemes are extremely useful. The use of a continuous measure for structural distance between all pairs of proteins allows us, as described in the two accompanying papers to derive sequence/structure relationships in a more quantitative way than has previously been possible. An important strength of the approach implemented in PrISM is its ability to address many different kinds of queries interactively, making its structural comparison procedure a convenient computational tool that complements structural classification databases such as SCOP and CATH.

Free energy determinants of tertiary structure and evaluation of protein models

Article

Dec 2000

We develop a protocol for estimating the free energy difference between different conformations of the same polypeptide chain. The conformational free energy evaluation combines the CHARMM force field with a continuum treatment of the solvent. In almost all cases studied, experimentally determined structures are predicted to be more stable than misfolded "decoys." This is due in part to the fact that the Coulomb energy of the native protein is consistently lower than that of the decoys. The solvation free energy generally favors the decoys, although the total electrostatic free energy (sum of Coulomb and solvation terms) favors the native structure. The behavior of the solvation free energy is somewhat counterintuitive and, surprisingly, is not correlated with differences in the burial of polar area between native structures and decoys. Rather. the effect is due to a more favorable charge distribution in the native protein, which, as is discussed, will tend to decrease its interaction with the solvent. Our results thus suggest, in keeping with a number of recent studies, that electrostatic interactions may play an important role in determining the native topology of a folded protein. On this basis, a simplified scoring function is derived that combines a Coulomb term with a hydrophobic contact term. This function performs as well as the more complete free energy evaluation in distinguishing the native structure from misfolded decoys. Its computational efficiency suggests that it can be used in protein structure prediction applications, and that it provides a physically well-defined alternative to statistically derived scoring functions.

Extending the accuracy limits of prediction for side-chain conformations1

Article

Sep 2001

Current techniques for the prediction of side-chain conformations on a fixed backbone have an accuracy limit of about 1.0-1.5 A rmsd for core residues. We have carried out a detailed and systematic analysis of the factors that influence the prediction of side-chain conformation and, on this basis, have succeeded in extending the limits of side-chain prediction for core residues to about 0.7 A rmsd from native, and 94 % and 89 % of chi(1) and chi(1+2 ) dihedral angles correctly predicted to within 20 degrees of native, respectively. These results are obtained using a force-field that accounts for only van der Waals interactions and torsional potentials. Prediction accuracy is strongly dependent on the rotamer library used. That is, a complete and detailed rotamer library is essential. The greatest accuracy was obtained with an extensive rotamer library, containing over 7560 members, in which bond lengths and bond angles were taken from the database rather than simply assuming idealized values. Perhaps the most surprising finding is that the combinatorial problem normally associated with the prediction of the side-chain conformation does not appear to be important. This conclusion is based on the fact that the prediction of the conformation of a single side-chain with all others fixed in their native conformations is only slightly more accurate than the simultaneous prediction of all side-chain dihedral angles.

Combining multiple structure and sequence alignments to improve sequence detection and alignment: Application to the SH2 domains of Janus kinases

Article

Jan 2002

In this paper, an approach is described that combines multiple structure alignments and multiple sequence alignments to generate sequence profiles for protein families. First, multiple sequence alignments are generated from sequences that are closely related to each sequence of known three-dimensional structure. These alignments then are merged through a multiple structure alignment of family members of known structure. The merged alignment is used to generate a Hidden Markov Model for the family in question. The Hidden Markov Model can be used to search for new family members or to improve alignments for distantly related family members that already have been identified. Application of a profile generated for SH2 domains indicates that the Janus family of nonreceptor protein tyrosine kinases contains SH2 domains. This conclusion is strongly supported by the results of secondary structure-prediction programs, threading calculations, and the analysis of comparative models generated for these domains. One of the Janus kinases, human TYK2, has an SH2 domain that contains a histidine instead of the conserved arginine at the key phosphotyrosine-binding position, betaB5. Calculations of the pK(a) values of the betaB5 arginines in a number of SH2 domains and of the betaB5 histidine in a homology model of TYK2 suggest that this histidine is likely to be neutral around pH 7, thus indicating that it may have lost the ability to bind phosphotyrosine. If this indeed is the case, TYK2 may contain a domain with an SH2 fold that has a modified binding specificity.

Evaluating conformational free energies: The colony energy and its application to the problem of loop prediction

Article

Jun 2002

In this paper, we introduce a method to account for the shape of the potential energy curve in the evaluation of conformational free energies. The method is based on a procedure that generates a set of conformations, each with its own force-field energy, but adds a term to this energy that favors conformations that are close in structure (have a low rmsd) to other conformations. The sum of the force-field energy and rmsd-dependent term is defined here as the "colony energy" of a given conformation, because each conformation that is generated is viewed as representing a colony of points. The use of the colony energy tends to select conformations that are located in broad energy basins. The approach is applied to the ab initio prediction of the conformations of all of the loops in a dataset of 135 nonredundant proteins. By using an rmsd from a native criterion based on the superposition of loop stems, the average rmsd of 5-, 6-, 7-, and 8-residue long loops is 0.85, 0.92, 1.23, and 1.45 A, respectively. For 8-residue loops, 60 of 61 predictions have an rmsd of less than 3.0 A. The use of the colony energy is found to improve significantly the results obtained from the potential function alone. (The loop prediction program, "Loopy," can be downloaded at http://trantor.bioc.columbia.edu.)

On the Role of Structural Information in Remote Homology Detection and Sequence Alignment: New Methods Using Hybrid Sequence Profiles

Article

Jan 2004

Structural alignments often reveal relationships between proteins that cannot be detected using sequence alignment alone. However, profile search methods based entirely on structural alignments alone have not been found to be effective in finding remote homologs. Here, we explore the role of structural information in remote homolog detection and sequence alignment. To this end, we develop a series of hybrid multidimensional alignment profiles that combine sequence, secondary and tertiary structure information into hybrid profiles. Sequence-based profiles are profiles whose position-specific scoring matrix is derived from sequence alignment alone; structure-based profiles are those derived from multiple structure alignments. We compare pure sequence-based profiles to pure structure-based profiles, as well as to hybrid profiles that use combined sequence-and-structure-based profiles, where sequence-based profiles are used in loop/motif regions and structural information is used in core structural regions. All of the hybrid methods offer significant improvement over simple profile-to-profile alignment. We demonstrate that both sequence-based and structure-based profiles contribute to remote homology detection and alignment accuracy, and that each contains some unique information. We discuss the implications of these results for further improvements in amino acid sequence and structural analysis.

GRASP2: Visualization, Surface Properties, and Electrostatics of Macromolecular Structures and Sequences

Article

Feb 2003
METHOD ENZYMOL

The widespread use of the original version of GRASP revealed the importance of the visualization of physicochemical and structural properties on the molecular surface. This chapter describes a new version of GRASP that contains many new capabilities. In terms of analysis tools, the most notable new features are sequence and structure analysis and alignment tools and the graphical integration of sequence and structural information. Not all the new GRASP2 could be described here and more capabilities are continually being added. An on-line manual, details on obtaining the software, and technical notes about the program and the Troll software library can be found at the Honig laboratory Web site (http://trantor.bioc.columbia.edu).

Using Multiple Structure Alignments, Fast Model Building, and Energetic Analysis in Fold Recognition and Homology Modeling

Abstract

No full-text available

Recommended publications

Protein structure prediction