Article

DALI: A network tool for protein structure comparison

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Guidelines for submitting commentsPolicy: Comments that contribute to the discussion of the article will be posted within approximately three business days. We do not accept anonymous comments. Please include your email address; the address will not be displayed in the posted comment. Cell Press Editors will screen the comments to ensure that they are relevant and appropriate but comments will not be edited. The ultimate decision on publication of an online comment is at the Editors' discretion. Formatting: Please include a title for the comment and your affiliation. Note that symbols (e.g. Greek letters) may not transmit properly in this form due to potential software compatibility issues. Please spell out the words in place of the symbols (e.g. replace “α” with “alpha”). Comments should be no more than 8,000 characters (including spaces ) in length. References may be included when necessary but should be kept to a minimum. Be careful if copying and pasting from a Word document. Smart quotes can cause problems in the form. If you experience difficulties, please convert to a plain text file and then copy and paste into the form.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... A DALI search [43] with the catalytic domain of BoGH13A Sus reveals structural similarity with α-amylases from Anoxybacillus sp. SK3-4 (Z-score of 35.2) and Geobacillus thermoleovorans (Z-score of 35.0) [44,45]. ...
... BoGH13A Sus does not hydrolyze α1,6 bonds, however (Fig. 2). Nonetheless, the most structurally related CBM48 with bound ligand to BoCBM48 according to the DALI server is that from the Escherichia coli branching enzyme (Z score: 12.8; PDB ID: 4LQ1 [94]) at the N terminus of the protein [43]. Another closely related CBM48 from the branching enzyme in Cyanothece sp. ...
Article
Full-text available
Members of the Bacteroidetes phylum in the human colon deploy an extensive number of proteins to capture and degrade polysaccharides. Operons devoted to glycan breakdown and uptake are termed polysaccharide utilization loci or PUL. The starch utilization system (Sus) is one such PUL and was initially described in Bacteroides thetaiotaomicron (Bt). BtSus is highly conserved across many species, except for its extracellular α-amylase, SusG. In this work, we show that the Bacteroides ovatus (Bo) extracellular α-amylase, BoGH13ASus, is distinguished from SusG in its evolutionary origin and its domain architecture and by being the most prevalent form in Bacteroidetes Sus. BoGH13ASus is the founding member of both a novel subfamily in the glycoside hydrolase family 13, GH13_47, and a novel carbohydrate-binding module, CBM98. The BoGH13ASus CBM98–CBM48–GH13_47 architecture differs from the CBM58 embedded within the GH13_36 of SusG. These domains adopt a distinct spatial orientation and invoke a different association with the outer membrane. The BoCBM98 binding site is required for Bo growth on polysaccharides and optimal enzymatic degradation thereof. Finally, the BoGH13ASus structure features bound Ca2+ and Mn2+ ions, the latter of which is novel for an α-amylase. Little is known about the impact of Mn2+ on gut bacterial function, much less on polysaccharide consumption, but Mn2+ addition to Bt expressing BoGH13ASus specifically enhances growth on starch. Further understanding of bacterial starch degradation signatures will enable more tailored prebiotic and pharmaceutical approaches that increase starch flux to the gut.
... The presence of disulfide bonds was detected with ChimeraX [52], using the "bond" function for every cysteine residue (select::name = "CYS"; bond sel). Dali [53] was used to identify the peptides displaying the most similar 3D structures within the PDB25 database, considering the results with z-score >2 as indicative of non-spurious structural overlap. ...
... The peptide was synthe respecting the predicted knottin disulfide connectivity, thereby avoiding the format anomalous disulfide bonds that occurred in our previous solid synthesis approach Moreover, the peptide was C-terminally ami (APCWPRGCFRDRDCCYGYQCSYRKCMRKR-NH2), reflecting one of the frequently observed post-translational modifications occurring in short moll bioactive peptides [54]. As of note, this modification could be only predicted in-silic In line with such observations, the structural comparisons performed with Dali [53] identified a close resemblance to several previously characterized knottins. These included several toxins from spiders (e.g., purotoxin-1 and -6 [59], psalmotoxin-1 [60], J-atracotoxin-HV1C [61] and ceratoxin-1 [62]), cone snails (e.g., the conotoxins GXIA [63], GS [64]) and scorpions (e.g., the U1-liotoxin-Lw1a [65]). ...
Article
Full-text available
Mussels (Mytilus spp.) tolerate infections much better than other species living in the same marine coastal environment thanks to a highly efficient innate immune system, which exploits a remarkable diversification of effector molecules involved in mucosal and humoral responses. Among these, antimicrobial peptides (AMPs) are subjected to massive gene presence/absence variation (PAV), endowing each individual with a potentially unique repertoire of defense molecules. The unavailability of a chromosome-scale assembly has so far prevented a comprehensive evaluation of the genomic arrangement of AMP-encoding loci, preventing an accurate ascertainment of the orthology/paralogy relationships among sequence variants. Here, we characterized the CRP-I gene cluster in the blue mussel Mytilus edulis, which includes about 50 paralogous genes and pseudogenes, mostly packed in a small genomic region within chromosome 5. We further reported the occurrence of widespread PAV within this family in the Mytilus species complex and provided evidence that CRP-I peptides likely adopt a knottin fold. We functionally characterized the synthetic peptide sCRP-I H1, assessing the presence of biological activities consistent with other knottins, revealing that mussel CRP-I peptides are unlikely to act as antimicrobial agents or protease inhibitors, even though they may be used as defense molecules against infections from eukaryotic parasites.
... Proteins that are structurally similar to SARS-CoV2 3CL pro were identified using the DALI (Distance matrix ALIgnment) (Holm and Sander, 1995;Holm, 2020a) server (http://ekhidna2. biocenter.helsinki.fi). ...
... Structural similarity is reported as Z-score, relative to the distribution of all-vs-all pairwise structural similarity scores in the queried structural database. A higher Z-score means the structures have higher structural similarity in their ordered regions (Holm and Sander, 1995). ...
Article
Full-text available
Considering the significant impact of the recent COVID-19 outbreak, development of broad-spectrum antivirals is a high priority goal to prevent future global pandemics. Antiviral development processes generally emphasize targeting a specific protein from a particular virus. However, some antiviral agents developed for specific viral protein targets may exhibit broad spectrum antiviral activity, or at least provide useful lead molecules for broad spectrum drug development. There is significant potential for repurposing a wide range of existing viral protease inhibitors to inhibit the SARS-CoV2 3C-like protease (3CLpro). If effective even as relatively weak inhibitors of 3CLpro, these molecules can provide a diverse and novel set of scaffolds for new drug discovery campaigns. In this study, we compared the sequence- and structure-based similarity of SARS-CoV2 3CLpro with proteases from other viruses, and identified 22 proteases with similar active-site structures. This structural similarity, characterized by secondary-structure topology diagrams, is evolutionarily divergent within taxonomically related viruses, but appears to result from evolutionary convergence of protease enzymes between virus families. Inhibitors of these proteases that are structurally similar to the SARS-CoV2 3CLpro protease were identified and assessed as potential inhibitors of SARS-CoV2 3CLpro protease by virtual docking. Several of these molecules have docking scores that are significantly better than known SARS-CoV2 3CLpro inhibitors, suggesting that these molecules are also potential inhibitors of the SARS-CoV2 3CLpro protease. Some have been previously reported to inhibit SARS-CoV2 3CLpro. The results also suggest that established inhibitors of SARS-CoV2 3CLpro may be considered as potential inhibitors of other viral 3C-like proteases.
... is the observed intensity of reflection h, and < I(h)> is the average intensity obtained from multiple measurements. To investigate the structural novelty of AcrIF24, structural homologues were searched using the DALI server (51). The closest related structure picked by this server was Aca1 (52), having a Z-score of 6.9 and 2.5Å root mean square deviation (RMSD) when superimposing 68 amino acids among 73 total amino acids of Aca1 with 72 amino acids among 228 total amino acids of AcrIF24 ( Table 2). ...
... Structural similarity search using DALI(51) ...
Article
Full-text available
CRISPR-Cas systems are adaptive immune systems in bacteria and archaea that provide resistance against phages and other mobile genetic elements. To fight against CRISPR-Cas systems, phages and archaeal viruses encode anti-CRISPR (Acr) proteins that inhibit CRISPR-Cas systems. The expression of acr genes is controlled by anti-CRISPR-associated (Aca) proteins encoded within acr-aca operons. AcrIF24 is a recently identified Acr that inhibits the type I-F CRISPR-Cas system. Interestingly, AcrIF24 was predicted to be a dual-function Acr and Aca. Here, we elucidated the crystal structure of AcrIF24 from Pseudomonas aeruginosa and identified its operator sequence within the regulated acr-aca operon promoter. The structure of AcrIF24 has a novel domain composition, with wing, head and body domains. The body domain is responsible for recognition of promoter DNA for Aca regulatory activity. We also revealed that AcrIF24 directly bound to type I-F Cascade, specifically to Cas7 via its head domain as part of its Acr mechanism. Our results provide new molecular insights into the mechanism of a dual functional Acr-Aca protein.
... Structural comparison of kpChbG with its structural homologs. The proposed molecular mechanism of diacetylchitobiose deacetylation was investigated by comparison of dimeric kpChbG to its structural homologs using the Dali server 27 . The two most similar proteins (highest Z-score and RMSD) were hypothetical protein EF3048 (PDB 2I5I), having a 35% sequence identity with kpChbG, and hypothetical protein TTHB029 (PDB 2E67) 16 ( Table 2). ...
... Structural similarity search using the DALI server27 . ...
Article
Full-text available
The chitobiose (chb) operon is involved in the synthesis of chitooligosaccharide and is comprised of a BCARFG gene cluster. ChbG encodes a chitooligosaccharide deacetylase (CDA) which catalyzes the removal of one acetyl group from N,N’-diacetylchitobiose. It is considered a novel type of CDA due to its lack of sequence homology. Although there are various structural studies of CDAs linked to the kinetic properties of the enzyme, the structural information of ChbG is unavailable. In this study, the crystal structure of ChbG from Klebsiella pneumoniae is provided. The molecular basis of deacetylation of diacetylchitobiose by ChbG is determined based on structural analysis, mutagenesis, biophysical analysis, and in silico docking of the substrate, diacetylchitobiose. This study contributes towards a deeper understanding of chitin and chitosan biology, as well as provides a platform to engineer CDA biocatalysts.
... Methodologically, most studies are based on pairwise comparisons of structures, either by direct comparison of structures via structural alignment, or by comparison of vectors encoding their structural features (e.g. secondary structures, local features, atom density) (22)(23)(24)(25)(26)(27)(28)(29)(30)(31)(32)(33)(34)(35)(36)(37). In most cases only the Cα atoms of the protein backbone are considered; the other atoms being ignored. ...
Article
Full-text available
Changes that occur in proteins over time provide a phylogenetic signal that can be used to decipher their evolutionary history and the relationships between organisms. Sequence comparison is the most common way to access this phylogenetic signal, while those based on three-dimensional structure comparisons are still in their infancy. In this study, we propose a new effective approach based on Persistent Homology Theory (PH) to extract the phylogenetic information contained in protein three-dimensional protein structures. PH provides efficient and robust algorithms for extracting and comparing geometric features from noisy datasets at different spatial resolutions. PH has a growing number of applications in the life sciences, including the study of proteins (e.g., classification, folding). However, it has never been used to study the phylogenetic signal they may contain. Here, using 518 protein families, representing 22,940 protein sequences and structures, from ten major taxonomic groups, we show that distances calculated with PH from protein structures correlate strongly with phylogenetic distances calculated from protein sequences, at both small and large evolutionary scales. We test several methods for calculating PH distances and propose some refinements to improve their relevance for addressing evolutionary questions. This work opens up new perspectives in evolutionary biology by proposing an efficient way to access the phylogenetic signal contained in protein structures, as well as future developments of topological analysis in the life sciences.
... The structural scores are as follows: an RMSD of 2.667 A for 95 Ca atoms with a Z-score of 5.35. Interestingly, a Z-score between 3 and 8 is defined as the twilight zone for structural homology [44]. However, if we consider sequence (P-value 4.3E-05) and structure-based (Z-score 5.35) scorings and the significantly long alignment, we could imply a common evolutionary origin (at least locally) of both proteins. ...
Article
Full-text available
Modular assembly is a compelling pathway to create new proteins, a concept supported by protein engineering and millennia of evolution. Natural evolution provided a repository of building blocks, known as domains, which trace back to even shorter segments that underwent numerous ‘copy‐paste’ processes culminating in the scaffolds we see today. Utilizing the subdomain‐database Fuzzle, we constructed a fold‐chimera by integrating a flavodoxin‐like fragment into a periplasmic binding protein. This chimera is well‐folded and a crystal structure reveals stable interfaces between the fragments. These findings demonstrate the adaptability of α/β‐proteins and offer a stepping stone for optimization. By emphasizing the practicality of fragment databases, our work pioneers new pathways in protein engineering. Ultimately, the results substantiate the conjecture that periplasmic binding proteins originated from a flavodoxin‐like ancestor.
... Structural comparison with other proteins using the DALI server (25) revealed that the closest structural homologs of HspB were P. putida DLL-E4 PnpA (PDB ID 6AIO, Z score = 49.5, RMSD = 1.29 Å for 376 Cα atoms, and sequence identity = 38%), P. putida KT2440 p-Hydroxybenzoate hydroxylase (PDB ID 6DLL, Z score = 41.7, RMSD = 2.13 Å for 370 Cα atoms, and sequence identity = 19%), and Rhodococcus jostii RHA1 3-hydroxybenzoate 6-hydroxylase (PDB ID 4BJY, Z score = 37.3, RMSD = 1.99 Å for 364 Cα atoms, and sequence identity = 19%). ...
Article
Full-text available
Flavoprotein monooxygenases catalyze reactions, including hydroxylation and epoxidation, involved in the catabolism, detoxification, and biosynthesis of natural substrates and industrial contaminants. Among them, the 6-hydroxy-3-succinoyl-pyridine (HSP) monooxygenase (HspB) from Pseudomonas putida S16 facilitates the hydroxylation and C-C bond cleavage of the pyridine ring in nicotine. However, the mechanism for biodegradation remains elusive. Here, we refined the crystal structure of HspB and elucidated the detailed mechanism behind the oxidative hydroxylation and C-C cleavage processes. Leveraging structural information about domains for binding the cofactor flavin adenine dinucleotide (FAD) and HSP substrate, we used molecular dynamics simulations and quantum/molecular mechanics calculations to demonstrate that the transfer of an oxygen atom from the reactive FAD peroxide species (C4a-hydroperoxyflavin) to the C3 atom in the HSP substrate constitutes a rate-limiting step, with a calculated reaction barrier of about 20 kcal/mol. Subsequently, the hydrogen atom was rebounded to the FAD cofactor, forming C4a-hydroxyflavin. The residue Cys218 then catalyzed the subsequent hydrolytic process of C-C cleavage. Our findings contribute to a deeper understanding of the versatile functions of flavoproteins in the natural transformation of pyridine and HspB in nicotine degradation. IMPORTANCE Pseudomonas putida S16 plays a pivotal role in degrading nicotine, a toxic pyridine derivative that poses significant environmental challenges. This study highlights a key enzyme, HspB (6-hydroxy-3-succinoyl-pyridine monooxygenase), in breaking down nicotine through the pyrrolidine pathway. Utilizing dioxygen and a flavin adenine dinucleotide cofactor, HspB hydroxylates and cleaves the substrate’s side chain. Structural analysis of the refined HspB crystal structure, combined with state-of-the-art computations, reveals its distinctive mechanism. The crucial function of Cys218 was never discovered in its homologous enzymes. Our findings not only deepen our understanding of bacterial nicotine degradation but also open avenues for applications in both environmental cleanup and pharmaceutical development.
... Thus, a search for proteins of similar structures with these proteins predicted by RoseTTAFold was carried out by DALI, and the top hits had relatively high Z scores correlating with known MCPs and minor capsid proteins (mCPs) (Fig. 5). With a conventional vZ score >2 considered significant [62], the two proteins received Z scores of 20 (ORF5) and 11.5 (ORF8) for the Faustovirus MCP and Cafeteriavirus-dependent mavirus mCP, respectively (Table S2). The protein of ORF5 matched the description of an idealized DJR fold, which forms a hexon in assembly of the viral capsid [63], and the protein of ORF8 had a high structural similarity with mCPs, which forms the penton. ...
Article
Twenty complete genomes (29 to 63 kb) and twenty-nine genomes with an estimated completeness of over 90% (30 to 90 kb) were identified for novel dsDNA viruses in the Yangshan Harbor metavirome. These newly discovered viruses contribute to the expansion of viral taxonomy by introducing forty-six potential new families. Except for one virus, all others belong to the class Caudoviricetes. The exceptional virus is a novel member of the recently characterized viral group known as Gossevirus. Fifteen viruses were predicted to be temperate. The predicted hosts for the viruses appear to be involved in various aspects of the nitrogen cycle, including nitrogen fixation, oxidation, and denitrification. Two viruses were identified to have a host of Flavobacterium and Tepidimonas fonticaldi, respectively, by matching CRISPR spacers with viral protospacers. Our findings provide an overview for characterizing and identifying specific viruses from Yangshan Harbor. The Gossevirus-like virus uncovered emphasizes the need for further comprehensive isolation and investigation of polinton-like viruses.
... Protein structures were predicted with local Colabfold (Mirdita et al., 2022), compared with the Dali online server (Holm & Sander, 1995), and visualized with Chi-meraX (Pettersen et al., 2021). A Dali alignment Z score of over 2 was considered a significant match (Holm et al., 2006). ...
Article
Full-text available
The genome of a putative Nitrosopumilaceae virus with a hypothetical spindle‐shaped particle morphology was identified in the Yangshan Harbour metavirome from the East China Sea through protein similarity comparison and structure analysis. This discovery was accompanied by a set of 10 geographically dispersed close relatives found in the environmental virus datasets from typical locations of ammonia‐oxidizing archaeon distribution. Its host prediction was supported by iPHoP prediction and protein sequence similarity. The structure of the predicted major capsid protein, together with the overall N‐glycosylation site, the transmembrane helices prediction, the hydrophilicity profile, and the docking simulation of the major capsid proteins, indicate that these viruses resemble spindle‐shaped viruses. It suggests a similarly assembled structure and, consequently, a possibly spindle‐shaped morphology of these newly discovered archaeal viruses.
... For structural comparisons, the DALI program was used (Holm and Sander 1995). For each structural superposition, we obtained both the RMSD values and the number of overlapping residues. ...
Article
Full-text available
The voltage-sensing domain (VSD) is a module capable of responding to changes in the membrane potential through conformational changes and facilitating electromechanical coupling to open a pore gate, activate proton permeation pathways, or promote enzymatic activity in some membrane-anchored phosphatases. To carry out these functions, this module acts cooperatively through conformational changes. The VSD is formed by four transmembrane segments (S1–S4) but the S4 segment is critical since it carries positively charged residues, mainly Arg or Lys, which require an aqueous environment for its proper function. The discovery of this module in voltage-gated ion channels (VGICs), proton channels (Hv1), and voltage sensor-containing phosphatases (VSPs) has expanded our understanding of the principle of modularity in the voltage-sensing mechanism of these proteins. Here, by sequence comparison and the evaluation of the relationship between sequence composition, intrinsic flexibility, and structural analysis in 14 selected representatives of these three major protein groups, we report five interesting differences in the folding patterns of the VSD both in prokaryotes and eukaryotes. Our main findings indicate that this module is highly conserved throughout the evolutionary scale, however: (1) segments S1 to S3 in eukaryotes are significantly more hydrophobic than those present in prokaryotes; (2) the S4 segment has retained its hydrophilic character; (3) in eukaryotes the extramembranous linkers are significantly larger and more flexible in comparison with those present in prokaryotes; (4) the sensors present in the kHv1 proton channel and the ciVSP phosphatase, both of eukaryotic origin, exhibit relationships of flexibility and folding patterns very close to the typical ones found in prokaryotic voltage sensors; and (5) archaeal channels KvAP and MVP have flexibility profiles which are clearly contrasting in the S3–S4 region, which could explain their divergent activation mechanisms. Finally, to elucidate the obscure origins of this module, we show further evidence for a possible connection between voltage sensors and TolQ proteins.
... Superposition of the crystal structure and the AlphaFold model resulted in an rmsd value of 0.63 Å, including 100 aligned pairs of Cα atoms. Superposition with the PPIase domains of PrsA and SurA yielded higher rmsd values than the AlphaFold model, as determined by the DALI program (38). PrsA from Bacillus anthracis (PDB entry 6XD8) and the BB0108 PPIase domain yielded an rmsd value of 1.4 Å over 94 aligned Cα atoms, and SurA from E. coli (PDB entry 1M5Y) yielded an rmsd value of 1.9 Å between 103 aligned Cα atoms. ...
Article
Full-text available
Borrelia burgdorferi, the pathogen of Lyme disease, encodes many conserved proteins of unknown structure or function, including ones that serve essential roles in microbial infectivity. One such protein is BB0238, which folds into a two-domain protein, as we have determined by X-ray crystallography and AlphaFold analysis. The N-terminal domain begins with a helix-turn-helix motif (HTH), previously referred to as a tetratricopeptide repeat (TPR) motif, known to mediate protein-protein interactions. The fold of the C-terminal domain has been seen in proteins with a range of unrelated activities and thus does not infer function. In addition to its previously known binding partner BB0323, another essential borrelial virulence determinant, we show that BB0238 also binds a second protein, BB0108, a borrelial ortholog of the chaperone protein SurA and the peptidyl-prolyl cis / trans isomerase protein PrsA. An in vitro enzymatic assay confirmed the catalytic activity. We also determined the crystal structure of the catalytic domain of BB0108, which revealed the parvulin-type organization of the key catalytic residues. We show that BB0238 influences the proteolytic processing of BB0323, although the TPR/HTH motif is not involved in the process. Instead, we show that the motif stabilizes BB0238 in the host environment and facilitates tick-to-mouse pathogen transmission by aiding spirochete evasion of early host cellular immunity. Taken together, these studies highlight the biological significance of BB0238 and its interactions with multiple B. burgdorferi proteins essential for microbial infection. IMPORTANCE Lyme disease is a major tick-borne infection caused by a bacterial pathogen called Borrelia burgdorferi , which is transmitted by ticks and affects hundreds of thousands of people every year. These bacterial pathogens are distinct from other genera of microbes because of their distinct features and ability to transmit a multi-system infection to a range of vertebrates, including humans. Progress in understanding the infection biology of Lyme disease, and thus advancements towards its prevention, are hindered by an incomplete understanding of the microbiology of B. burgdorferi , partly due to the occurrence of many unique borrelial proteins that are structurally unrelated to proteins of known functions yet are indispensable for pathogen survival. We herein report the use of diverse technologies to examine the structure and function of a unique B. burgdorferi protein, annotated as BB0238—an essential virulence determinant. We show that the protein is structurally organized into two distinct domains, is involved in multiplex protein-protein interactions, and facilitates tick-to-mouse pathogen transmission by aiding microbial evasion of early host cellular immunity. We believe that our findings will further enrich our understanding of the microbiology of B. burgdorferi, potentially impacting the future development of novel prevention strategies against a widespread tick-transmitted infection.
... Secondary structure predictions were performed using PsiPred (Jones, 1999). The final UBL alignment was obtained using a combination of profile-to-profile comparisons (Soding et al, 2005) and sequence alignments derived from structural superpositions of a selection of UBL domains whose tertiary structure is known (PDB IDs: 1WFY, 5J2R, 2D07, and 3EEC) (Holm & Sander, 1995). Figures were generated using Inkscape (http:// inkscape.org/). ...
Article
Full-text available
FAM111A is a replisome-associated protein and dominant mutations within its trypsin-like peptidase domain are linked to severe human developmental syndrome, the Kenny–Caffey syndrome. However, FAM111A functions remain unclear. Here, we show that FAM111A facilitates efficient activation of DNA replication origins. Upon hydroxyurea treatment, FAM111A-depleted cells exhibit reduced single-stranded DNA formation and a better survival rate. Unrestrained expression of FAM111A WT and patient mutants causes accumulation of DNA damage and cell death, only when the peptidase domain remains intact. Unrestrained expression of FAM111A WT also causes increased single-stranded DNA formation that relies on S phase entry, FAM111A peptidase activity but not its binding to proliferating cell nuclear antigen. Altogether, these data unveil how FAM111A promotes DNA replication under normal conditions and becomes harmful in a disease context.
... Domains IV and V (residues 543-715 and 716-789 in Vip3Aa, respectively) of Vip3 proteins have a twisted beta-sheet 'jelly roll' topology resembling that of Domain III of Cry proteins, which has been implicated in target specificity and glycan binding (Byrne et al., 2021). In silico DALI analysis (Holm and Sander, 1995) of these domains for the Vip3Aa, Vip3Ba and Vip3Bc proteins identified a number of structural homologs with glycan binding capabilities (Byrne et al., 2021;Núñez-Ramírez et al., 2020;Zheng et al., 2019). It has been then reasonably suggested that these domains may play a role in receptor engagement. ...
... We used a dataset of 1 million domain pairs randomly sampled from family and superfamily levels of SCOP (v2.08) classification to generate three sets of structure alignment datasets (D 3D ) from three alignment programs (1) MMLigner (Collier et al., 2017) (2) TM-Align (Zhang and Skolnick, 2005)) and (3) DALI (Holm and Sander, 1995). For each structure alignment in all three datasets, first, we inferred the divergence time of sequences t 1D using the SeqMMLigner (Sumanaweera et al., 2019) program. ...
Preprint
Full-text available
A complete time-parameterized statistical model quantifying the divergent evolution of protein structures in terms of the patterns of conservation of their secondary structures is inferred from a large collection of protein 3D structure alignments. This provides a better alternative to time-parameterized sequence-based models of protein relatedness, that have clear limitations dealing with twilight and midnight zones of sequence relationships. Since protein structures are far more conserved due to the selection pressure directly placed on their function, divergence time estimates can be more accurate when inferred from structures. We use the Bayesian and information-theoretic framework of Minimum Message Length to infer a time-parameterized stochastic matrix (accounting for perturbed structural states of related residues) and associated Dirichlet models (accounting for insertions and deletions during the evolution of protein domains). These are used in concert to estimate the Markov time of divergence of tertiary structures, a task previously only possible using proxies (like RMSD). By analyzing one million pairs of homologous structures, we yield a relationship between the Markov divergence time of structures and of sequences. Using these inferred models and the relationship between the divergence of sequences and structures, we demonstrate a competitive performance in secondary structure prediction against neural network architectures commonly employed for this task. The source code and supplementary information are downloadable from \url{http://lcb.infotech.monash.edu.au/sstsum}.
... The AF2 analysis predicted a high-confidence interaction between the N-terminal HEAT repeats of RIF1 and the C terminus of SHLD3. The C-terminal half of SHLD3 is predicted to form a globular domain with structural homology to the translation initiation factor eIF4E (DALI search; Holm & Sander, 1995). All five models scored highly, with pDockQ scores ≥ 0.5, mean interface PAE ≤ 11.6 A, and iPTM scores ≥ 0.7, a range of values that discriminates between accurate and inaccurate predictions by previous benchmarking studies (Fig 1C;Bryant et al, 2022;Yin et al, 2022). ...
Article
Full-text available
53BP1 is a chromatin-binding protein that promotes DNA double-strand break repair through the recruitment of downstream effectors including RIF1, shieldin, and CST. The structural basis of the protein-protein interactions within the 53BP1-RIF1-shieldin-CST pathway that are essential for its DNA repair activity is largely unknown. Here, we used AlphaFold2-Multimer (AF2) to predict all possible pairwise combinations of proteins within this pathway and provide structural models of seven previously characterized interactions. This analysis also predicted an entirely novel binding interface between the HEAT-repeat domain of RIF1 and the eIF4E-like domain of SHLD3. Extensive interrogation of this interface through both in vitro pulldown analysis and cellular assays supports the AF2-predicted model and demonstrates that RIF1-SHLD3 binding is essential for shieldin recruitment to sites of DNA damage, and for its role in antibody class switch recombination and PARP inhibitor sensitivity. Direct physical interaction between RIF1 and SHLD3 is therefore essential for 53BP1-RIF1-shieldin-CST pathway activity.
... Protein structure-based searches were performed using the Dali server (http://ekhidna.biocenter.helsinki.fi/dali/) (76). ...
Article
Full-text available
Psychrobacter is an important bacterial genus that is widespread in Antarctic and marine environments. However, to date, only two complete Psychrobacter phage sequences have been deposited in the NCBI database. Here, the novel Psychrobacter phage vB_PmaS_Y8A, infecting Psychrobacter HM08A, was isolated from sewage in the Qingdao area, China. The morphology of vB_PmaS_Y8A was characterized by transmission electron microscopy, revealing an icosahedral head and long tail. The genomic sequence of vB_PmaS_Y8A is linear, double-stranded DNA with a length of 40,226 bp and 44.1% G1C content, and encodes 69 putative open reading frames. Two auxiliary metabolic genes (AMGs) were identified, encoding phosphoadenosine phosphosulfate reductase and MarR protein. The first AMG uses thioredoxin as an electron donor for the reduction of phos-phoadenosine phosphosulfate to phosphoadenosine phosphate. MarR regulates multiple antibiotic resistance mechanisms in Escherichia coli and is rarely found in viruses. No tRNA genes were identified and no lysogeny-related feature genes were detected. However, many similar open reading frames (ORFs) were found in the host genome, which may indicate that Y8A also has a lysogenic stage. Phylogenetic analysis based on the amino acid sequences of whole genomes and comparative genomic analysis indicate that vB_PmaS_Y8A contains a novel genomic architecture similar only to that of Psychrobacter phage pOW20-A, although at a low similarity. vB_PmaS_Y8A represents a new family-level virus cluster with 22 metagenomic assembled viral genomes, here named Minviridae. IMPORTANCE Although Psychrobacter is a well-known and important bacterial genus that is widespread in Antarctic and marine environments, genetic characterization of its phages is still rare. This study describes a novel Psychrobacter phage containing an uncharacterized antibiotic resistance gene and representing a new virus family, Minviridae. The characterization provided here will bolster current understanding of genomes, diversity , evolution, and phage-host interactions in Psychrobacter populations.
... Using pairwise Dali [27] structural alignment of the crystal structures of NMA1982 and human DUSPs, we found that cyclin-dependent kinase inhibitor 3 (CDKN3), also known as kinaseassociated phosphatase (KAP), is the most similar human phosphatase to NMA1982, with an amino acid identity of 15% and a Dali Z-score of 14.9 (Fig 2a). Importantly, several conserved loops and amino acid residues critical for phosphatase activity [13] could be identified in NMA1982 (Fig 2b): 1) The phosphate-binding loop (P-loop), which forms the center of the active site, is present in NMA1982, albeit shorter by one amino acid. ...
Preprint
Full-text available
Protein phosphorylation is an integral part of many cellular processes, not only in eukaryotes but also in bacteria. The discovery of both prokaryotic protein kinases and phosphatases has created interest in generating antibacterial therapeutics that target these enzymes. NMA1982 is a putative phosphatase from Neisseria meningitidis , the causative agent of meningitis and meningococcal septicemia. The overall fold of NMA1982 closely resembles that of protein tyrosine phosphatases (PTPs). However, the hallmark C(X)5R PTP signature motif, containing the catalytic cysteine and invariant arginine, is shorter by one amino acid in NMA1982. This has cast doubt about the catalytic mechanism of NMA1982 and its assignment to the PTP superfamily. Here, we demonstrate that NMA1982 indeed employs a catalytic mechanism that is specific to PTPs. Mutagenesis experiments, transition state inhibition, pH-dependence activity, and oxidative inactivation experiments all support that NMA1982 is a genuine phosphatase. Importantly, we show that NMA1982 is secreted by N. meningitidis , suggesting that this protein is a potential virulence factor. Future studies will need to address whether NMA1982 is indeed essential for N. meningitidis survival and virulence. Based on its unique active site conformation, NMA1982 may become a suitable target for developing selective antibacterial drugs.
... It contains a complex four-domain architecture, a βsandwich domain extends into a smaller two α-helical linker domain connecting to an (α/α) 6 -barrel domain before terminating in a smaller β-sandwich domain (Fig. 5A). The top hit from a DALI analysis indicates a few close structural homologous in the PDB, giving Z scores in the range of 44.6 to 30.7 (58). For instance, chitobiose phosphorylase, ChBP, (PDB id-1V7V/1V7W, Z scores 44.6), cellobiose phosphorylase, CBP (PDB ID-2CQS, Z scores 43.4), laminaribiose phosphorylase, LBP (PDB ID-6GGY, Z scores 32.6), and cellodextrin phosphorylase, RtCDP (PDB ID-5NZ7/5NZ8, Z scores 30.7) shared 36%, 33%, 18%, and 23% sequence identify to BpGH94 MLG, respectively. ...
Article
The β-glucans are structurally varied, naturally occurring components of the cell walls and storage materials of a variety of plant and microbial species. In the human diet, mixed-linkage glucans [MLG - β-(1,3/4)-glucans] influence the gut microbiome and the host immune system. Although consumed daily, the molecular mechanism by which human gut Gram-positive bacteria utilize MLG largely remains unknown. In this study, we used Blautia producta ATCC 27340 as a model organism to develop understanding of MLG utilization. B. producta encodes a gene locus comprising a multi-modular cell-anchored endo-glucanase (BpGH16MLG), an ABC transporter, and a glycoside phosphorylase (BpGH94MLG) for utilizing MLG, as evidenced by the up-regulation of expression of the enzyme- and solute binding protein (SBP)-encoding genes in this cluster when the organism is grown on MLG. We determined that recombinant BpGH16MLG cleaved various types of β-glucan, generating oligosaccharides suitable for cellular uptake by B. producta. Cytoplasmic digestion of these oligosaccharides is then performed by recombinant BpGH94MLG and β-glucosidases (BpGH3-AR8MLG and BpGH3-X62MLG). Using targeted deletion, we demonstrated BpSBPMLG is essential for B. producta growth on barley β-glucan. Furthermore, we revealed that beneficial bacteria, such as Roseburia faecis JCM 17581T, Bifidobacterium pseudocatenulatum JCM 1200T, Bifidobacterium adolescentis JCM 1275T, and Bifidobacterium bifidum JCM 1254, can also utilize oligosaccharides resulting from the action of BpGH16MLG. Disentangling the β-glucan utilizing capability of B. producta provides a rational basis on which to consider the probiotic potential of this class of organism.
... To identify structural homologues of CshA/B_NR2, searches were conducted using the DALI web server 35 employing the monomeric forms of CshA_NR2 and CshB_NR2 as search models. Known adhesive domains with a DALI Z score of >6.0 were selected for comparative structural analysis. ...
Article
Full-text available
Bacterial fibrillar adhesins are specialised extracellular polypeptides that promote the attachment of bacteria to the surfaces of other cells or materials. Adhesin-mediated interactions are critical for the establishment and persistence of stable bacterial populations within diverse environmental niches and are important determinants of virulence. The fibronectin (Fn) binding fibrillar adhesin CshA, and its paralogue CshB, play important roles in host colonisation by the oral commensal and opportunistic pathogen Streptococcus gordonii. As paralogues are often catalysts for functional diversification, we have probed the early stages of structural and functional divergence in Csh proteins by determining the X-ray crystal structure of the CshB adhesive domain NR2 and characterising its Fn binding properties in vitro. Despite sharing a common fold, CshB_NR2 displays an ~1.7-fold reduction in Fn binding affinity relative to CshA_NR2. This correlates with reduced electrostatic charge in the Fn binding cleft. Complementary bioinformatic studies reveal that homologues of CshA/B_NR2 domains are widely distributed in both Gram-positive and Gram-negative bacteria, where they are found housed within functionally cryptic multi-domain polypeptides. Our findings are consistent with the classification of Csh adhesins and their relatives as members of the recently defined Polymer Adhesin Domain (PAD) family of bacterial proteins. This article is protected by copyright. All rights reserved.
... Protein structure determines its function, and proteins with similar structures usually share similar functions even when their sequence similarities are very low (Brenner et al. 1996;Holm and Sander 1996;Rost 1999). Therefore, structure-based methods detect the structure similarity between proteins to determine the functions of target proteins (Holm and Sander 1995;Gibrat et al. 1996;Laskowski et al. 2005). However, it is expensive to determine protein structures, and the amount of protein structure data is small. ...
Article
Full-text available
Motivation Protein function annotation is fundamental to understanding biological mechanisms. The abundant genome-scale protein–protein interaction (PPI) networks, together with other protein biological attributes, provide rich information for annotating protein functions. As PPI networks and biological attributes describe protein functions from different perspectives, it is highly challenging to cross-fuse them for protein function prediction. Recently, several methods combine the PPI networks and protein attributes via the graph neural networks (GNNs). However, GNNs may inherit or even magnify the bias caused by noisy edges in PPI networks. Besides, GNNs with stacking of many layers may cause the over-smoothing problem of node representations. Results We develop a novel protein function prediction method, CFAGO, to integrate single-species PPI networks and protein biological attributes via a multi-head attention mechanism. CFAGO is first pre-trained with an encoder–decoder architecture to capture the universal protein representation of the two sources. It is then fine-tuned to learn more effective protein representations for protein function prediction. Benchmark experiments on human and mouse datasets show CFAGO outperforms state-of-the-art single-species network-based methods by at least 7.59%, 6.90%, 11.68% in terms of m-AUPR, M-AUPR, and Fmax, respectively, demonstrating cross-fusion by multi-head attention mechanism can greatly improve the protein function prediction. We further evaluate the quality of captured protein representations in terms of Davies Bouldin Score, whose results show that cross-fused protein representations by multi-head attention mechanism are at least 2.7% better than that of original and concatenated representations. We believe CFAGO is an effective tool for protein function prediction. Availability and implementation The source code of CFAGO and experiments data are available at: http://bliulab.net/CFAGO/.
... The function of the β-arm and its location in the conserved catalytic core begins to highlight its importance within other PLP-dependent enzymes. Among these enzymes, only the FT I CoA-acyltransferase and aminotransferase II subfamilies feature the β-arm based on known structural homology (Table S1; Holm & Sander, 1995). Understanding how the structural dynamics of this region are controlled may yield insight into the regulation of a broader group of enzymes. ...
Article
Full-text available
5‐Aminolevulinic acid synthase (ALAS) is a pyridoxal 5′‐phosphate (PLP)‐dependent enzyme that catalyzes the first and rate‐limiting step of heme biosynthesis in α‐proteobacteria and several non‐plant eukaryotes. All ALAS homologs contain a highly conserved catalytic core, but eukaryotes also have a unique C‐terminal extension that plays a role in enzyme regulation. Several mutations in this region are implicated in multiple blood disorders in humans. In Saccharomyces cerevisiae ALAS (Hem1), the C‐terminal extension wraps around the homodimer core to contact conserved ALAS motifs proximal to the opposite active site. To determine the importance of these Hem1 C‐terminal interactions, we determined the crystal structure of S. cerevisiae Hem1 lacking the terminal 14 amino acids (Hem1 ΔCT). With truncation of the C‐terminal extension, we show structurally and biochemically that multiple catalytic motifs become flexible, including an antiparallel β‐sheet important to Fold‐Type I PLP‐dependent enzymes. The changes in protein conformation result in an altered cofactor microenvironment, decreased enzyme activity and catalytic efficiency, and ablation of subunit cooperativity. These findings suggest that the eukaryotic ALAS C‐terminus has a homolog‐specific role in mediating heme biosynthesis, indicating a mechanism for autoregulation that can be exploited to allosterically modulate heme biosynthesis in different organisms.
... Cependant, la méthode de comparaison reste assez globale. Il a été utilisé pour classer les protéines (la classification est nommée FSSP (Holm et Sander, 1996b)) et il est mis à disposition sur un serveur (Holm et Sander, 1995b). Il est à noter que dans sa version serveur, DALI utilise en plus des pré-filtres basés sur les structures secondaires (Holm et Sander, 1995a), ce qui détériore la finesse de la recherche, puisque seules les structures de la base de données répondant aux critères de SSE de la structure requête sont examinées. ...
Thesis
L'identification des similarités structurales dans les protéines apporte de multiples informations à propos des relations entre les séquences, les structures et les fonctions. Une méthode de recherche de similarités structurales locales dans une banque de structures a été développée. Elle a permis d'établir une procédure de classification des structures en familles. Les blocs structuraux conservés parmi les structures de même famille peuvent ensuite être définis. Trois méthodes de comparaison multiple et locales de structures ont été mises en place, chacune répondant à des critères différents. Elles ont été utilisées pour déterminer des blocs structuraux locaux communs aux protéines d'une même famille. Ces blocs structuraux communs constitueront une base pertinente de coeurs destinée à être intégrée dans un logiciel de reconnaissance de repliement (threading). Cette méthode de prédiction des structures protéiques uniquement à partir de leurs séquences peut être utilisée à des fins d'annotation de séquences inconnues, les similarités identifiées pouvant être indécelables par les méthodes utilisant seulement l'information de séquence. L'analyse de ces blocs structuraux apportera aussi de nombreuses informations quant à la conservation de la structure des protéines.
... Alternatively, HHpred [29] was used to annotate the putative ORFs. In addition, AlphaFold2 [30,31], in combination with the Dali sever [32,33], was also applied to predict the structural proteins. The circular genome maps were drawn with Proksee (https:// proks ee. ...
Article
Full-text available
Background Along with the fast development and urbanization in developing countries, the waterbodies aside the growing cities become heavily polluted and highly eutrophic, thus leading to the seasonal outbreak of cyanobacterial bloom. Systematic isolation and characterization of freshwater cyanophages might provide a biological solution to control the awful blooms. However, genomic sequences and related investigations on the freshwater cyanophages remain very limited to date. Results Following our recently reported five cyanophages Pam1~Pam5 from Lake Chaohu in China, here we isolated another five cyanophages, termed Pan1~Pan5, which infect the cyanobacterium Pseudanabaena sp. Chao 1811. Whole-genome sequencing showed that they all contain a double-stranded DNA genome of 37.2 to 72.0 kb in length, with less than half of the putative open reading frames annotated with known functions. Remarkably, the siphophage Pan1 encodes an auxiliary metabolic gene phoH and constitutes, together with the host, a complete queuosine modification pathway. Proteomic analyses revealed that although Pan1~Pan5 are distinct from each other in evolution, Pan1 and Pan3 are somewhat similar to our previously identified cyanophages Pam3 and Pam1 at the genomic level, respectively. Moreover, phylogenetic analyses suggested that Pan1 resembles the α -proteobacterial phage vB_DshS-R5C, revealing direct evidence for phage-mediated horizontal gene transfer between cyanobacteria and α -proteobacteria. Conclusion In addition to the previous reports of Pam1~Pam5, the present findings on Pan1~Pan5 largely enrich the library of reference freshwater cyanophages. The abundant genomic information provides a pool to identify novel genes and proteins of unknown function. Moreover, we found for the first time the evolutionary traces in the cyanophage that horizontal gene transfer might occur at the level of not only inter-species, but even inter-phylum. It indicates that the bacteriophage or cyanophage could be developed as a powerful tool for gene manipulation among various species or phyla.
... Another possibility would be to elaborate or extract new variables that could help distinguish these superfamilies. We can also explore new variables using other structural alignment programs, such as DALI (Holm and Sander, 1995), Fast (Zhu and Weng, 2005), and Mammoth (Ortiz et al., 2002). . CC-BY-NC-ND 4.0 International license perpetuity. ...
Preprint
Full-text available
Remote homolog detection is a classic problem in Bioinformatics. It attempts to identify distantly related proteins sharing a similar structure. Methods that can accurately detect remote homologs benefit protein functional annotation. Recent computational advances in methods predicting the three-dimensional structure of a protein from amino acid sequences allow the massive use of structural data to develop new tools for identifying remote homologs. In this work, we created a discriminative SVM-based method based on structural alignment algorithms (FATCAT, TM-Align, and LovoAlign) to detect whether a protein is a remote homolog with any proteins in the SCOPe database. The final model showed a ROC AUC of 0.9191.
... Despite the advantages of mmCIF, for legacy reasons, the PDB format is still the only supported format for many bioinformatics applications ranging from side-chain packing (Huang, et al., 2020;) and tertiary structure prediction (Zheng, et al., 2019) to structure alignment (Holm and Sander, 1995;Shindyalov and Bourne, 1998) and function prediction (Laskowski, et al., 2005;Zhang, et al., 2017). Even for some programs that support both mmCIF and PDB formats, PDB is still the preferred format due to smaller input size and faster file reading speed thanks to its fixed-width nature. ...
Preprint
Full-text available
Although mmCIF is the current official format for deposition of protein and nucleic acid structures to the Protein Data Bank (PDB) database, the legacy PDB format is still the primary supported format for many structural bioinformatics tools. Therefore, reliable software to convert mmCIF structure files to PDB files is needed. Unfortunately, existing conversion programs fail to correctly convert many mmCIF files, especially those with many atoms and/or long chain identifies. This study proposed BeEM, which converts any mmCIF format structure files to PDB format. BeEM conversion faithfully retains all atomic and chain information, including chain IDs with more than 2 characters, which are not supported by any existing mmCIF to PDB converters. The conversion speed of BeEM is at least ten times faster than existing converters such as MAXIT and Phenix. BeEM is available under the BSD licence at https://github.com/kad-ecoli/BeEM/ .
... Solvent-accessible area and interface residue analysis, accessible and buried surface areas and G for dimer dissociation were calculated using the PISA server [63][64][65] . Structures similar to Li-rLAO were identified using DALI [66][67][68][69] , and the top hits were quantitatively compared to the structure of Li-rLAO using the SSM superpose option of Coot 54 . Ramachandran map for the structures was generated using Procheck and Rampage. ...
Article
Full-text available
Amino acid oxidases (AOs) are flavin adenine dinucle-otide (FAD)-dependent dimeric enzymes that stereo specifically catalyse the deamination of an -amino acid leading to an -keto acid. Putative Leptospira interro-gans recombinant L-amino acid oxidase (Li-rLAO; lacking 20 residues corresponding to the N-terminal signal sequence) was cloned, expressed, purified, and its three-dimensional structure was determined by X-ray crystallography at a resolution of 1.8 Å. The active site could be easily identified by the presence of electron density corresponding to a non-covalently bound FAD in both protomers of the dimeric enzyme. Structural analysis of Li-rLAO revealed that its polypeptide fold is similar to those of the previously determined homol-ogous structures as available in the Protein Data Bank. However, a substrate-binding residue found at the active site of other previously determined homologous structures was not conserved in Li-rLAO, suggesting that its specificity may differ from those of earlier reported structures. Not surprisingly, Li-rLAO showed no activity for most amino acids and amines; it exhibited a low activity only with L-arginine as the substrate. The catalytic properties of Li-rLAO could be rationalized in terms of its three-dimensional structure.
... We further investigated MagR homologues in A. thaliana by adopting an approach based on the available three-dimensional structure of MagR from D. melanogaster. The structure of MagR was compared with all available Alphafold prediction for the Arabidopsis proteome using the Dali tool (Holm and Sander, 1995). Four proteins were found to be highly similar to MagR: IscA-like 3 (At2g36260, Q8L8C0) with an 18.1 Z-score and 51% identity, IscA-like 1 (At2g16710, Q8LBM4) with a 14.7 Z-score and 51% identity, IscA-like 2 (At5g03905, Q8LCY2) with a 13 Z-score and 26% identity and cpIscA (At1g10500, Q9XIK3) with a 12.3 Z-score and 28% identity. ...
Article
Iron-sulfur (Fe-S) clusters are involved in fundamental biological reactions and represent a highly regulated process involving a complex sequence of mitochondrial, cytosolic and nuclear-catalyzed protein-protein interactions. Iron-sulfur complex assembly (ISCA) scaffold proteins are involved in Fe-S cluster biosynthesis, nitrogen and sulfur metabolism. ISCA proteins are involved in abiotic stress responses and in the pigeon they act as a magnetic sensor by forming a magnetosensor (MagS) complex with cryptochrome (Cry). MagR gene exists in the genomes of humans, plants, and microorganisms and the interaction between Cry and MagR is highly conserved. Owing to the extensive presence of ISCA proteins in plants and the occurrence of homology between animal and human MagR with at least four Arabidopsis ISCAs and several ISCAs from different plant species, we believe that a mechanism similar to pigeon magnetoperception might be present in plants. We suggest that plant ISCA proteins, homologous of the animal MagR, are good candidates and could contribute to a better understanding of plant magnetic induction. We thus urge more studies in this regard to fully uncover the plant molecular mechanisms underlying MagR/Cry mediated magnetic induction and the possible coupling between light and magnetic induction.
... These diffracted to better than 2Å and allowed the structure of the unliganded protein to be solved using molecular replacement with the LaM domain of LARP3 (21). The DALI server (46) identified LARP7 (PDB 4WKR; 1.1Å RMSD) and LARP3 (PDB 1S29; 1.2Å RMSD) as the closest structural homologs (29,47). ...
Article
Full-text available
La-related proteins (LARPs) comprise a family of RNA-binding proteins involved in a wide range of posttranscriptional regulatory activities. LARPs share a unique tandem of two RNA-binding domains, La motif (LaM) and RNA recognition motif (RRM), together referred to as a La-module, but vary in member-specific regions. Prior structural studies of La-modules reveal they are pliable platforms for RNA recognition in diverse contexts. Here, we characterize the La-module of LARP1, which plays an important role in regulating synthesis of ribosomal proteins in response to mTOR signaling and mRNA stabilization. LARP1 has been well characterized functionally but no structural information exists for its La-module. We show that unlike other LARPs, the La-module in LARP1 does not contain an RRM domain. The LaM alone is sufficient for binding poly(A) RNA with submicromolar affinity and specificity. Multiple high-resolution crystal structures of the LARP1 LaM domain in complex with poly(A) show that it is highly specific for the RNA 3′-end, and identify LaM residues Q333, Y336 and F348 as the most critical for binding. Use of a quantitative mRNA stabilization assay and poly(A) tail-sequencing demonstrate functional relevance of LARP1 RNA binding in cells and provide novel insight into its poly(A) 3′ protection activity.
... Compared to TaGGR, SaGGR contains 60 extra amino acids at the C-terminus forming three α-helices that are considered part of the ligand-binding domain. A DALI search revealed similarities with members of the p-hydroxybenzoate hydroxylase (PHBH) flavin monooxygenase family which adopt a similar two-domain organization and structure, but lacking the 60 C-terminal amino acids found in SaGGR (Holm and Sander 1995;Xu et al. 2010;Sasaki et al. 2011). Notably, the PHBH family is known to have members with three-domain structures that are approximately 50-70 amino acids longer than the two-domain enzyme sequences (See (Sasaki et al. 2011) and references therein). ...
Article
Full-text available
Archaeal glycerophospholipids are the main constituents of the cytoplasmic membrane in the archaeal domain of life and fundamentally differ in chemical composition compared to bacterial phospholipids. They consist of isoprenyl chains ether-bonded to glycerol-1-phosphate. In contrast, bacterial glycerophospholipids are composed of fatty acyl chains ester-bonded to glycerol-3-phosphate. This largely domain-distinguishing feature has been termed the “lipid-divide”. The chemical composition of archaeal membranes contributes to the ability of archaea to survive and thrive in extreme environments. However, ether-bonded glycerophospholipids are not only limited to extremophiles and found also in mesophilic archaea. Resolving the structural basis of glycerophospholipid biosynthesis is a key objective to provide insights in the early evolution of membrane formation and to deepen our understanding of the molecular basis of extremophilicity. Many of the glycerophospholipid enzymes are either integral membrane proteins or membrane-associated, and hence are intrinsically difficult to study structurally. However, in recent years, the crystal structures of several key enzymes have been solved, while unresolved enzymatic steps in the archaeal glycerophospholipid biosynthetic pathway have been clarified providing further insights in the lipid-divide and the evolution of early life.
Article
CRISPR–Cas systems serve as adaptive immune systems in bacteria and archaea, protecting against phages and other mobile genetic elements. However, phages and archaeal viruses have developed countermeasures, employing anti-CRISPR (Acr) proteins to counteract CRISPR–Cas systems. Despite the revolutionary impact of CRISPR–Cas systems on genome editing, concerns persist regarding potential off-target effects. Therefore, understanding the structural and molecular intricacies of diverse Acrs is crucial for elucidating the fundamental mechanisms governing CRISPR–Cas regulation. In this study, we present the structure of AcrIIA28 from Streptococcus phage Javan 128 and analyze its structural and functional features to comprehend the mechanisms involved in its inhibition of Cas9. Our current study reveals that AcrIIA28 is a metalloprotein that contains Zn2+ and abolishes the cleavage activity of Cas9 only from Streptococcus pyrogen (SpyCas9) by directly interacting with the REC3 domain of SpyCas9. Furthermore, we demonstrate that the AcrIIA28 interaction prevents the target DNA from being loaded onto Cas9. These findings indicate the molecular mechanisms underlying AcrIIA28-mediated Cas9 inhibition and provide valuable insights into the ongoing evolutionary battle between bacteria and phages.
Article
Histidine kinases are key bacterial sensors that recognize diverse environmental stimuli. While mechanisms of phosphorylation and phosphotransfer by cytoplasmic kinase domains are relatively well-characterized, the ways in which extracytoplasmic sensor domains regulate activation remain mysterious. The Cpx envelope stress response is a conserved Gram-negative two-component system which is controlled by the sensor kinase CpxA. We report the structure of the Escherichia coli CpxA sensor domain (CpxA-SD) as a globular Per-ARNT-Sim (PAS)-like fold highly similar to that of Vibrio parahaemolyticus CpxA as determined by X-ray crystallography. Because sensor kinase dimerization is important for signaling, we used AlphaFold2 to model CpxA-SD in the context of its connected transmembrane domains, which yielded a novel dimer of PAS domains possessing a distinct dimer organization compared to previously characterized sensor domains. Gain of function cpxA∗ alleles map to the dimer interface, and mutation of other residues in this region also leads to constitutive activation. CpxA activation can be suppressed by mutations that restore inter-monomer interactions, suggesting that inhibitory interactions between CpxA-SD monomers are the major point of control for CpxA activation and signaling. Searching through hundreds of structural homologs revealed the sensor domain of Pseudomonas aeruginosa sensor kinase PfeS as the only PAS structure in the same novel dimer orientation as CpxA, suggesting that our dimer orientation may be utilized by other extracytoplasmic PAS domains. Overall, our findings provide insight into the diversity of the organization of PAS sensory domains and how they regulate sensor kinase activation.
Article
In vertebrates, DNA methyltransferase 1 (DNMT1) contributes to preserving DNA methylation patterns, ensuring the stability and heritability of epigenetic marks important for gene expression regulation and the maintenance of cellular identity. Previous structural studies have elucidated the catalytic mechanism of DNMT1 and its specific recognition of hemimethylated DNA. Here, using solution nuclear magnetic resonance spectroscopy and small-angle X-ray scattering, we demonstrate that the N-terminal region of human DNMT1, while flexible, encompasses a conserved globular domain with a novel α-helical bundle-like fold. This work expands our understanding of the structure and dynamics of DNMT1 and provides a structural framework for future functional studies in relation with this new domain.
Article
Full-text available
Protein phosphorylation is an integral part of many cellular processes, not only in eukaryotes but also in bacteria. The discovery of both prokaryotic protein kinases and phosphatases has created interest in generating antibacterial therapeutics that target these enzymes. NMA1982 is a putative phosphatase from Neisseria meningitidis, the causative agent of meningitis and meningococcal septicemia. The overall fold of NMA1982 closely resembles that of protein tyrosine phosphatases (PTPs). However, the hallmark C(X)5R PTP signature motif, containing the catalytic cysteine and invariant arginine, is shorter by one amino acid in NMA1982. This has cast doubt about the catalytic mechanism of NMA1982 and its assignment to the PTP superfamily. Here, we demonstrate that NMA1982 indeed employs a catalytic mechanism that is specific to PTPs. Mutagenesis experiments, transition state inhibition, pH-dependence activity, and oxidative inactivation experiments all support that NMA1982 is a genuine PTP. Importantly, we show that NMA1982 is secreted by N. meningitidis, suggesting that this protein is a potential virulence factor. Future studies will need to address whether NMA1982 is indeed essential for N. meningitidis survival and virulence. Based on its unique active site conformation, NMA1982 may become a suitable target for developing selective antibacterial drugs.
Preprint
Full-text available
Histidine kinases are key bacterial sensors that recognize diverse environmental stimuli. While mechanisms of phosphorylation and phosphotransfer by cytoplasmic kinase domains are relatively well-characterized, the ways in which extracytoplasmic sensor domains regulate activation remain mysterious. The Cpx envelope stress response is a conserved Gram-negative two-component system which is controlled by the sensor kinase CpxA. We report the structure of the Escherichia coli CpxA sensor domain (CpxA-SD) as a globular Per-ARNT-Sim (PAS)-like fold highly similar to that of Vibrio parahaemolyticus CpxA as determined by X-ray crystallography. Because sensor kinase dimerization is important for signaling, we used AlphaFold2 to model CpxA-SD in the context of its connected transmembrane domains, which yielded a novel dimer of PAS domains possessing a distinct dimer organization compared to previously characterized sensor domains. Gain of function cpxA * alleles map to the dimer interface, and mutation of other residues in this region also leads to constitutive activation. CpxA activation can be suppressed by mutations that restore inter-monomer interactions, suggesting that inhibitory interactions between CpxA-SD monomers are the major point of control for CpxA activation and signaling. Searching through hundreds of structural homologues revealed the sensor domain of Pseudomonas aeruginosa sensor kinase PfeS as the only PAS structure in the same novel dimer orientation as CpxA, suggesting that our dimer orientation may be utilized by other extracytoplasmic PAS domains. Overall, our findings provide insight into the diversity of the organization of PAS sensory domains and how they regulate sensor kinase activation. Significance Bacterial two-component systems play an essential role in sensing environmental cues, mitigating stress, and regulating virulence. We approach the study of a key Gram-negative sensor kinase CpxA with both classical methods in structural biology and genetic analysis and emerging protein-folding prediction software. This approach provides a wholistic perspective on the structure and function of histidine kinases as proteins with modular and cellular compartment-spanning domain architectures. We report a novel organization of PAS domains in CpxA, highlighting the versatility and diversity of this sensory fold. Ultimately, these studies will facilitate the continued development of novel antimicrobials against sensor kinases, including CpxA, which is a previously studied target for antimicrobials.
Thesis
Full-text available
There are three hypotheses about the origin of RNA virus. The first one is the Reduction hypothesis, which proposes that virus originated from parasitics cells that reduced its genetic material and its size. The second one is the Virus-first hypothesis, which suggests that virus originated before cells. The third one is the Escape hypothesis, which states that viruses originated from the escape of cellular elements. Contrary to the hypothesis of Virus-first, the taxonomic distribution of RNA virus hosts indicates that they are of recent origin since they mainly infect eukaryotes. In addition, it has been seen that some protein domains such as jelly roll are part of both cellular and viral proteins, which supports the hypothesis of the cellular origin of viruses or escape. The jelly roll domain is a structural folding that consists of eight antiparallel beta strands that fit into two sheets of four strands each, giving it a sandwich or wedge shape. In order to determine whether the cellular and viral jelly roll domains are homologous, phylogenetic analysis is necessary. Because, the conservation at the sequence level of amino acid residues is very low, traditional phylogenetic analyzes are not very informative. Given that the tertiary structure is usually preserved more than the primary structure, it was decided to perform analyzes based on the first one. This allowed determining that jelly roll domains of viruses are homologous to those of cellular proteins, which would support the hypothesis of escape origin. Additionally, the dendrograms resulting from the comparisons of jelly roll domains of cellular and viral proteins, allowed us to make inferences about the evolutionary history of the S and P subdomains of the VP2 protein of Birnaviridae, as well as the protein family of tumor necrosis factors and the mitogen derived from Yersinia.
Preprint
Full-text available
Protein phosphorylation is an integral part of many cellular processes, not only in eukaryotes but also in bacteria. The discovery of both prokaryotic protein kinases and phosphatases has created interest in generating antibacterial therapeutics that target these enzymes. NMA1982 is a putative phosphatase from Neisseria meningitidis , the causative agent of meningitis and meningococcal septicemia. The overall fold of NMA1982 closely resembles that of protein tyrosine phosphatases (PTPs). However, the hallmark C(X) 5 R PTP signature motif, containing the catalytic cysteine and invariant arginine, is shorter by one amino acid in NMA1982. This has cast doubt about the catalytic mechanism of NMA1982 and its assignment to the PTP superfamily. Here, we demonstrate that NMA1982 indeed employs a catalytic mechanism that is specific to PTPs. Mutagenesis experiments, transition state inhibition, pH-dependence activity, and oxidative inactivation experiments all support that NMA1982 is a genuine PTP. Importantly, we show that NMA1982 is secreted by N. meningitidis , suggesting that this protein is a potential virulence factor. Future studies will need to address whether NMA1982 is indeed essential for N. meningitidis survival and virulence. Based on its unique active site conformation, NMA1982 may become a suitable target for developing selective antibacterial drugs.
Article
Bacterial sugar kinase is a central enzyme for proper sugar degradation in bacteria, essential for survival and growth. Therefore, this enzyme family is a primary target for antibacterial drug development, with YdjH most preferring to phosphorylate higher-order monosaccharides with a carboxylate terminus. Sugar kinases express diverse specificity and functions, making specificity determination of this family a prominent issue. This study examines the YdjH crystal structure from Acinetobacter baumannii (abYdjH), which has an exceptionally high antibiotic resistance and is considered a superbug. Our structural and biochemical study revealed that abYdjH has a widely open lid domain and is a solution dimer. In addition, the putative active site of abYdjH was determined based on structural analysis, sequence comparison, and in silico docking. Finally, we proposed the active site-forming residues that determine various sugar specificities from abYdjH. This study contributes towards a deeper understanding of the phosphorylation process and bacterial sugar metabolism of YdjH family to design the next-generation antibiotics for targeting A. baumannii.
Article
Full-text available
Thio­redoxin (Trx) is essential in a redox-control system, with many bacteria containing two Trxs: Trx1 and Trx2. Due to a Trx system’s critical function, Trxs are targets for novel antibiotics. Here, a 1.20 Å high-resolution structure of Trx2 from Acinetobacter baumannii (abTrx2), an antibiotic resistant pathogenic superbug, is elucidated. By comparing Trx1 and Trx2, it is revealed that the two Trxs possess similar activity, although Trx2 contains an additional N-terminal zinc-finger domain and exhibits more flexible properties in solution. Finally, it is shown that the Trx2 zinc-finger domain might be rotatable and that proper zinc coordination at the zinc-finger domain is critical to abTrx2 activity. This study enhances understanding of the Trx system and will facilitate the design of novel antibiotics.
Article
Monoamine oxidases (MAOs) play a key role in the breakdown of primary and secondary amines. In eukaryotic organisms, these enzymes are vital to the regulation of monoamine neurotransmitters and the degradation of dietary monoamines. MAOs have also been identified in prokaryotic species, although their role in these organisms is not well understood. Here, we report the biophysical and structural properties of a promiscuous, bacterial MAO from Corynebacterium ammoniagenes (caMAO). caMAO catalyzes the oxidation of a number of monoamine substrates including dopamine and norepinephrine, as well as exhibiting some activity with polyamine substrates such as cadaverine. The X-ray crystal structures of Michaelis complexes with seven substrates show that conserved hydrophobic interactions and hydrogen-bonding pattern (for polar substrates) allow the broad specificity range. The structure of caMAO identifies an unusual cysteine (Cys424) residue in the so-called "aromatic cage", which flanks the flavin isoalloxazine ring in the active site. Site-directed mutagenesis, steady-state kinetics in air-saturated buffer, and UV-vis spectroscopy revealed that Cys424 plays a role in the pH dependence and modulation of electrostatics within the caMAO active site. Notably, bioinformatic analysis shows a propensity for variation at this site within the "aromatic cage" of the flavin amine oxidase (FAO) superfamily. Structural analysis also identified the conservation of a secondary substrate inhibition site, present in a homologous member of the superfamily. Finally, genome neighborhood diagram analysis of caMAO in the context of the FAO superfamily allows us to propose potential roles for these bacterial MAOs in monoamine and polyamine degradation and catabolic pathways related to scavenging of nitrogen.
Article
Hepatitis E virus (HEV), a major cause of acute viral hepatitis, is a single-stranded, positive-sense RNA virus. As such, it encodes a 1700-residue replication polyprotein pORF1 that directs synthesis of new viral RNA in infected cells. Here we report extensive modeling with AlphaFold2 of the full-length pORF1, and its production by in vitro translation. From this, we give a detailed update on the breakdown into domains of HEV pORF1. We also provide evidence that pORF1's N-terminal domain is likely to oligomerize to form a dodecameric pore, homologously to what has been described for Chikungunya virus. Beyond providing accurate folds for its five domains, our work highlights that there is no canonical protease encoded in pORF1 and that flexibility in several functionally important regions rather than proteolytic processing may serve to regulate HEV RNA synthesis.
Article
Due to an increasing interest in immunity and signal transduction in teleost fish, important key signaling molecules associated with the immune response, including TRAF molecules, have been recently cloned and characterized. To better understand the role of TRAF4 in fish immune signaling and compare it with the human system, our study cloned the TRAF4 gene from the Antarctic yellowbelly rockcod Notothenia coriiceps (ncTRAF4) and purified the protein. Here, we report the first crystal structure of teleost fish TRAF4. Based on biochemical characterization, our findings elucidated the mechanisms through which signaling molecules gain cold adaptivity. Additionally, we identified a platelet receptor GPIbβ homolog in N. coriiceps (ncGPIbβ) and found that the “RRFERLFKEARRTS” region of this homolog directly binds to ncTRAF4, indicating that ncTRAF4 also recognizes the “RLXA” motif for receptor interactions and further TARF4-mediated cellular signaling. Collectively, our findings provide novel insights into the mechanisms of TRAF4-mediated immune cell and platelet signaling in fish and the structural flexibility-mediated cold adaptiveness of signaling molecules.
Chapter
The study and understanding of proteins fields are excellent in the biosciences field. The interactions of proteins provide essential information about life. Therefore, many techniques have been developed for this analysis, such as in vitro, in vivo , and in silico . Despite each technique having advantages, in silico methods are a terrific alternative for analyzing the proteins and their interactions using computer tools by its versatility through algorithms. The active sites are of great interest because of their significance in the structure of the protein to interact with another molecule. This chapter details some of the main techniques currently applied to study the active sites on proteins, the database where the information is available, such as Protein Data Bank (PDB), Dali server, structural alignment program (SSAP), structural alignment of multiple proteins (STAMP), catalytic site atlas (CSA), or protein families' database (Pfam). Besides, it describes relevant information about some algorithms that have been developed based on machine learning, such as PDBSiteScan program, patterns in nonhomologous tertiary structures (PINTS), genetic active site search (GASS), site map, computed atlas of surface topography of proteins (Castp), etc. These programs allow getting trustful information about the site actives and other interactions.
Article
Full-text available
Structure comparison and alignment are of fundamental importance in structural biology studies. We developed the first universal platform, US-align, to uniformly align monomer and complex structures of different macromolecules—proteins, RNAs and DNAs. The pipeline is built on a uniform TM-score objective function coupled with a heuristic alignment searching algorithm. Large-scale benchmarks demonstrated consistent advantages of US-align over state-of-the-art methods in pairwise and multiple structure alignments of different molecules. Detailed analyses showed that the main advantage of US-align lies in the extensive optimization of the unified objective function powered by efficient heuristic search iterations, which substantially improve the accuracy and speed of the structural alignment process. Meanwhile, the universal protocol fusing different molecular and structural types helps facilitate the heterogeneous oligomer structure comparison and template-based protein–protein and protein–RNA/DNA docking. US-align is a universal protocol for monomeric and oligomeric structural alignments of protein, RNA and DNA molecules, built on the coupling of a uniform TM-score objective function and the heuristic iterative searching algorithm.
Article
Full-text available
Microbial channelrhodopsins are light-gated ion channels widely used for optogenetic manipulation of neuronal activity. ChRmine is a bacteriorhodopsin-like cation channelrhodopsin (BCCR) more closely related to ion pump rhodopsins than other channelrhodopsins. ChRmine displays unique properties favorable for optogenetics including high light sensitivity, a broad, red-shifted activation spectrum, cation selectivity, and large photocurrents, while its slow closing kinetics impedes some applications. The structural basis for ChRmine function, or that of any other BCCR, is unknown. Here, we present cryo-EM structures of ChRmine in lipid nanodiscs in apo (opsin) and retinal-bound (rhodopsin) forms. The structures reveal an unprecedented trimeric architecture with a lipid filled central pore. Large electronegative cavities on either side of the membrane facilitate high conductance and selectivity for cations over protons. The retinal binding pocket structure suggests channel properties could be tuned with mutations and we identify ChRmine variants with ten-fold decreased and two-fold increased closing rates. A T119A mutant shows favorable properties relative to wild-type and previously reported ChRmine variants for optogenetics. These results provide insight into structural features that generate an ultra-potent microbial opsin and provide a platform for rational engineering of channelrhodopsins with improved properties that could expand the scale, depth, and precision of optogenetic experiments.
Article
Full-text available
A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent mathematical results on the stochastic properties of MSP scores allow an analysis of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the analysis of multiple regions of similarity in long DNA sequences. In addition to its flexibility and tractability to mathematical analysis, BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.
Article
Full-text available
The crystal structure of urease from Klebsiella aerogenes has been determined at 2.2 A resolution and refined to an R factor of 18.2 percent. The enzyme contains four structural domains: three with novel folds playing structural roles, and an (alpha beta)8 barrel domain, which contains the bi-nickel center. The two active site nickels are 3.5 A apart. One nickel ion is coordinated by three ligands (with low occupancy of a fourth ligand) and the second is coordinated by five ligands. A carbamylated lysine provides an oxygen ligand to each nickel, explaining why carbon dioxide is required for the activation of urease apoenzyme. The structure is compatible with a catalytic mechanism whereby urea ligates Ni-1 to complete its tetrahedral coordination and a hydroxide ligand of Ni-2 attacks the carbonyl carbon. A surprisingly high structural similarity between the urease catalytic domain and that of the zinc-dependent adenosine deaminase reveals a remarkable example of active site divergence.
Article
SRS (Sequence Retrieval System) is an information indexing and retrieval system designed for libraries with a flat file format such as the EMBL nucleotide sequence databank, the SwissProt protein sequence databank or the Prosite library of protein subsequence consensus patterns. SRS supports the data structure of these libraries by providing special indices for inzplemenzing lists of subenfities (e.g. feature tables) or hierarchically structured data–fields (e.g. taxonomic classification). A language (ODD) has been designed for the convenient specification of library format and organization, representation of individual data–fields within the system (design of indices) and structuring other data needed during retrieval. This ensures flexibility required for coping with different library formats, which are subject to continuous change. Queries and inspection of retrieved entries can be performed from a user interface with pull–down menus and windows. SRS supports rious input and output formats but is particularly well adapted to the GCG programs.
Article
The database of known protein three-dimensional structures can be significantly increased by the use of sequence homology, based on the following observations. (1) The database of known sequences, currently at more than 12,000 proteins, is two orders of magnitude larger than the database of known structures. (2) The currently most powerful method of predicting protein structures is model building by homology. (3) Structural homology can be inferred from the level of sequence similarity. (4) The threshold of sequence similarity sufficient for structural homology depends strongly on the length of the alignment. Here, we first quantify the relation between sequence similarity, structure similarity, and alignment length by an exhaustive survey of alignments between proteins of known structure and report a homology threshold curve as a function of alignment length. We then produce a database of homology-derived secondary structure of proteins (HSSP) by aligning to each protein of known structure all sequences deemed homologous on the basis of the threshold curve. For each known protein structure, the derived database contains the aligned sequences, secondary structure, sequence variability, and sequence profile. Tertiary structures of the aligned sequences are implied, but not modeled explicitly. The database effectively increases the number of known protein structures by a factor of five to more than 1800. The results may be useful in assessing the structural significance of matches in sequence database searches, in deriving preferences and patterns for structure prediction, in elucidating the structural role of conserved residues, and in modeling three-dimensional detail by homology.
Article
The FASTA program can search the NBRF protein sequence library (2.5 million residues) in less than 20 min on an IBM-PC microcomputer and unambiguously detect proteins that shared a common ancestor billions of years in the past. FASTA is both fast and selective because it initially considers only amino acid identities. Its sensitivity is increased not only by using the PAM250 matrix to score and rescore regions with large numbers of identities but also by joining initial regions. The results of searches with FASTA compare favorably with results using NWS-based programs that are 100 times slower. FASTA is slightly less sensitive but considerably more selective. It is not clear that NWS-based programs would be more successful in finding distantly related members of the G-protein-coupled receptor family. The joining step by FASTA to calculate the initn score is especially useful for sequences that share regions of sequence similarity that are separated by variable-length loops.
Article
To determine how different amino acid sequences form similar protein structures, and how proteins adapt to mutations that change the volume of residues buried in their close-packed interiors, we have analysed and compared the atomic structures of nine different globins. The homology of the sequences in the two most distantly related molecules is only 16%.The principal determinants of three-dimensional structure of these proteins are the approximately 59 residues involved in helix to helix and helix to haem packings. Half of these residues are buried within the molecules. The observed variations in the sequence keep the side-chains of buried residues non-polar, but do not maintain their size: the mean variation of the volume among homologous amino acids is 56 Å3.Changes in the volumes of buried residues are accompanied by changes in the geometry of the helix packings. The relative positions and orientations of homologous pairs of helices in the globins differ by rigid body shifts of up to 7 Å and 30 °. In order to retain functional activity these shifts are coupled so that the geometry of the residues forming the haem pocket is very similar in all the globins.We discuss the implications of these results for the mechanism of protein evolution.
Article
Guidelines for submitting commentsPolicy: Comments that contribute to the discussion of the article will be posted within approximately three business days. We do not accept anonymous comments. Please include your email address; the address will not be displayed in the posted comment. Cell Press Editors will screen the comments to ensure that they are relevant and appropriate but comments will not be edited. The ultimate decision on publication of an online comment is at the Editors' discretion. Formatting: Please include a title for the comment and your affiliation. Note that symbols (e.g. Greek letters) may not transmit properly in this form due to potential software compatibility issues. Please spell out the words in place of the symbols (e.g. replace “α” with “alpha”). Comments should be no more than 8,000 characters (including spaces ) in length. References may be included when necessary but should be kept to a minimum. Be careful if copying and pasting from a Word document. Smart quotes can cause problems in the form. If you experience difficulties, please convert to a plain text file and then copy and paste into the form.
Article
The number of protein structures known in atomic detail has increased from one in 1960 (Kendrew, J.C., Strandberg, B.E., Hart, R.G., Davies, D.R., Phillips, D.C., Shore, V.C. Nature (London) 185:422-427, 1960) to more than 1000 in 1994. The rate at which new structures are being published exceeds one a day as a result of recent advances in protein engineering, crystallography, and spectroscopy. More and more frequently, a newly determined structure is similar in fold to a known one, even when no sequence similarity is detectable. A new generation of computer algorithms has now been developed that allows routine comparison of a protein structure with the database of all known structures. Such structure database searches are already used daily and they are beginning to rival sequence database searches as a tool for discovering biologically interesting relationships.
Article
A search in the database of known three-dimensional protein structures with the structure of a plant endochitinase revealed a subtle but unambiguous similarity to lysozymes from animals and phages. An evolutionary connection between plant endochitinases and lysozymes is supported by similar overall topology of fold, overlapping substrate specificities and remarkable conservation of some sequence and architectural detail around the active site. Much of the knowledge about lysozyme can now be extended by analogy to endochitinase. New insights into the mechanism of endochitinase are expected to stimulate genetic engineering studies into plant defense mechanisms against pests and pathogens.
Article
With a rapidly growing pool of known tertiary structures, the importance of protein structure comparison parallels that of sequence alignment. We have developed a novel algorithm (DALI) for optimal pairwise alignment of protein structures. The three-dimensional co-ordinates of each protein are used to calculate residue-residue (C alpha-C alpha) distance matrices. The distance matrices are first decomposed into elementary contact patterns, e.g. hexapeptide-hexapeptide submatrices. Then, similar contact patterns in the two matrices are paired and combined into larger consistent sets of pairs. A Monte Carlo procedure is used to optimize a similarity score defined in terms of equivalent intramolecular distances. Several alignments are optimized in parallel, leading to simultaneous detection of the best, second-best and so on solutions. The method allows sequence gaps of any length, reversal of chain direction and free topological connectivity of aligned segments. Sequential connectivity can be imposed as an option. The method is fully automatic and identifies structural resemblances and common structural cores accurately and sensitively, even in the presence of geometrical distortions. An all-against-all alignment of over 200 representative protein structures results in an objective classification of known three-dimensional folds in agreement with visual classifications. Unexpected topological similarities of biological interest have been detected, e.g. between the bacterial toxin colicin A and globins, and between the eukaryotic POU-specific DNA-binding domain and the bacterial lambda repressor.