Figure 7 - uploaded by Heather M Brewer
Content may be subject to copyright.
Representative identification of a putative translational frameshift. A) Hybridization evidence for oligos labeled A, B, and C is shown. Expression levels are shown normalized to each oligo's mean (via a Z-score calculation) across a time course/thermal switch (37uC/26uC) experiment for Y. pestis CO92 (YPO) and Y. pestis Pestoides F (YPDSF). Green indicates down-regulation relative to the mean, and red indicates upregulation relative to the mean. Genome annotations are labeled for YPDSF_1005 and YPO2124 corresponding to annotated coding sequences. Although oligos A and B purportedly lack a corresponding transcription for YPDSF (NA = not applicable), evidence clearly shows hybridization consistent with oligo C. B) illustrates the 210 aa translation of YPO2124 and C) illustrates the 63 aa translation of YPDSF_1005. Frame translations are shown below gene level detail with oligo evidence (black) overlaid on each gene and peptide evidence (red) overlaid on the appropriate reading frame. For YPDSF_1005, gene alignment with YPO2124 reveals two oligos upstream of the coding region. Corroborating peptide evidence was also seen upstream but in a different reading frame than the existing annotation. This evidence supports the expression of YPDSF_1005 as a frameshifted protein. doi:10.1371/journal.pone.0033903.g007

Representative identification of a putative translational frameshift. A) Hybridization evidence for oligos labeled A, B, and C is shown. Expression levels are shown normalized to each oligo's mean (via a Z-score calculation) across a time course/thermal switch (37uC/26uC) experiment for Y. pestis CO92 (YPO) and Y. pestis Pestoides F (YPDSF). Green indicates down-regulation relative to the mean, and red indicates upregulation relative to the mean. Genome annotations are labeled for YPDSF_1005 and YPO2124 corresponding to annotated coding sequences. Although oligos A and B purportedly lack a corresponding transcription for YPDSF (NA = not applicable), evidence clearly shows hybridization consistent with oligo C. B) illustrates the 210 aa translation of YPO2124 and C) illustrates the 63 aa translation of YPDSF_1005. Frame translations are shown below gene level detail with oligo evidence (black) overlaid on each gene and peptide evidence (red) overlaid on the appropriate reading frame. For YPDSF_1005, gene alignment with YPO2124 reveals two oligos upstream of the coding region. Corroborating peptide evidence was also seen upstream but in a different reading frame than the existing annotation. This evidence supports the expression of YPDSF_1005 as a frameshifted protein. doi:10.1371/journal.pone.0033903.g007

Source publication
Article
Full-text available
Genome sequencing continues to be a rapidly evolving technology, yet most downstream aspects of genome annotation pipelines remain relatively stable or are even being abandoned. The annotation process is now performed almost exclusively in an automated fashion to balance the large number of sequences generated. One possible way of reducing errors i...

Contexts in source publication

Context 1
... predicted pseudogenes in Y. pestis CO92 have experimental evidence in multiple reading frames suggestive of translational frameshifts which may explain a misclassification as a pseudogene. Figure 7 illustrates an apparent frameshift in YPDSF_1005, an ortholog of YPO2124. This gene encodes a hypothetical protein in both Y. pestis strains. ...
Context 2
... generating intensity signals $35,000 ($3s of control probes across all chips) were considered to have positive hybridization above background and therefore incorporated as experimental measurements. While not presented for all findings, in cases where multiple oligos map to a single open reading frame, expression patterns of annotated mRNA (as shown in Figure 7) can support the identification of anomalous hybridization signals across experimental samples. Transcriptomics data have been deposited in the GEO repository under series accession GSE30634. ...

Citations

... Proteomic data set description. Many Y. pestis proteomes have been aggregated and processed by the Pacific Northwest National Laboratory (PNNL) to create a tool that differentiates naturally occurring and laboratory strains of Y. pestis (53), encompassing data from different studies (48,54,55) as well as PNNL archives (56). These processed intensity data sets were kindly shared by Eric D. Merkley. ...
Article
Full-text available
The genus Yersinia includes a large variety of nonpathogenic and life-threatening pathogenic bacteria, which cause a broad spectrum of diseases in humans and animals, such as plague, enteritis, Far East scarlet-like fever (FESLF), and enteric redmouth disease. Like most clinically relevant microorganisms, Yersinia spp. are currently subjected to intense multi-omics investigations whose numbers have increased extensively in recent years, generating massive amounts of data useful for diagnostic and therapeutic developments. The lack of a simple and centralized way to exploit these data led us to design Yersiniomics, a web-based platform allowing straightforward analysis of Yersinia omics data. Yersiniomics contains a curated multi-omics database at its core, gathering 200 genomic, 317 transcriptomic, and 62 proteomic data sets for Yersinia species. It integrates genomic, transcriptomic, and proteomic browsers, a genome viewer, and a heatmap viewer to navigate within genomes and experimental conditions. For streamlined access to structural and functional properties, it directly links each gene to GenBank, the Kyoto Encyclopedia of Genes and Genomes (KEGG), UniProt, InterPro, IntAct, and the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) and each experiment to Gene Expression Omnibus (GEO), the European Nucleotide Archive (ENA), or the Proteomics Identifications Database (PRIDE). Yersiniomics provides a powerful tool for microbiologists to assist with investigations ranging from specific gene studies to systems biology studies. IMPORTANCE The expanding genus Yersinia is composed of multiple nonpathogenic species and a few pathogenic species, including the deadly etiologic agent of plague, Yersinia pestis. In 2 decades, the number of genomic, transcriptomic, and proteomic studies on Yersinia grew massively, delivering a wealth of data. We developed Yersiniomics, an interactive web-based platform, to centralize and analyze omics data sets on Yersinia species. The platform allows user-friendly navigation between genomic data, expression data, and experimental conditions. Yersiniomics will be a valuable tool to microbiologists.
... The primer set VharIR1 to VharIR14 are to determine mismatched nucleotide bases between IRs (Additional file 2 Table S1). Large single-copy (LSC); inverted repeats (IRs), reversed inverted repeat (revIR) and small single-copy (SSC) was possibly a result of the usual imperfection in computational predictions [33], as well as mis-assembly. Therefore, all the chloroplast genomes in this study were extensively refined before further comparative genomic analysis. ...
... Nevertheless, it is unlikely that all published genomes were entirely assembled and annotated to a high standard. This is because of the usual imperfection of computational tools, lack of standardized bioinformatics workflows, and uncorrected errors in existing genomes [14,33,47]. In this study, we extensively revised the annotation of the published chloroplast genomes of family Rhamnaceae, as unrevealed errors would complicate the subsequent evolutionary analysis of Rhamnaceae species. ...
Article
Full-text available
Background Massive parallel sequencing technologies have enabled the elucidation of plant phylogenetic relationships from chloroplast genomes at a high pace. These include members of the family Rhamnaceae. The current Rhamnaceae phylogenetic tree is from 13 out of 24 Rhamnaceae chloroplast genomes, and only one chloroplast genome of the genus Ventilago is available. Hence, the phylogenetic relationships in Rhamnaceae remain incomplete, and more representative species are needed. Results The complete chloroplast genome of Ventilago harmandiana Pierre was outlined using a hybrid assembly of long- and short-read technologies. The accuracy and validity of the final genome were confirmed with PCR amplifications and investigation of coverage depth. Sanger sequencing was used to correct for differences in lengths and nucleotide bases between inverted repeats because of the homopolymers. The phylogenetic trees reconstructed using prevalent methods for phylogenetic inference were topologically similar. The clustering based on codon usage was congruent with the molecular phylogenetic tree. The groups of genera in each tribe were in accordance with tribal classification based on molecular markers. We resolved the phylogenetic relationships among six Hovenia species , three Rhamnus species , and two Ventilago species . Our reconstructed tree provides the most complete and reliable low-level taxonomy to date for the family Rhamnaceae. Similar to other higher plants, the RNA editing mostly resulted in converting serine to leucine. Besides, most genes were subjected to purifying selection. Annotation anomalies, including indel calling errors, unaligned open reading frames of the same gene, inconsistent prediction of intergenic regions, and misannotated genes, were identified in the published chloroplast genomes used in this study. These could be a result of the usual imperfections in computational tools, and/or existing errors in reference genomes. Importantly, these are points of concern with regards to utilizing published chloroplast genomes for comparative genomic analysis. Conclusions In summary, we successfully demonstrated the use of comprehensive genomic data, including DNA and amino acid sequences, to build a reliable and high-resolution phylogenetic tree for the family Rhamnaceae. Additionally, our study indicates that the revision of genome annotation before comparative genomic analyses is necessary to prevent the propagation of errors and complications in downstream analysis and interpretation.
... A few in silico studies have indicated the presence of eubacterial recoding (31,35), but the wet-lab evidence is scarce. In the recent decade, proteomics has been applied to identification of translated pseudogenes in bacteria, including M. tuberculosis, S. glossinidius, Shewanella oneidensis, and Yersinia strains (36)(37)(38)(39). While peptides derived from dozens of pseudogenes were identified, few of these studies inspected the peptide location in relation to the disruptive site to rule out the possibility of translational reinitiation with an alternative start codon, or validated the pseudogene expression by other methods. ...
Article
Full-text available
Pseudogenes (genes disrupted by frameshift or in-frame stop codons) are ubiquitously present in the bacterial genome and considered as nonfunctional fossil. Here, we used RNA-seq and mass-spectrometry technologies to measure the transcriptomes and proteomes of Salmonella enterica serovars Paratyphi A and Typhi. All pseudogenes’ mRNA sequences remained disrupted, and were present at comparable levels to their intact homologs. At the protein level, however, 101 out of 161 pseudogenes suggested successful translation, with their low expression regardless of growth conditions, genetic background and pseudogenization causes. The majority of frameshifting detected was compensatory for -1 frameshift mutations. Readthrough of in-frame stop codons primarily involved UAG; and cytosine was the most frequent base adjacent to the codon. Using a fluorescence reporter system, fifteen pseudogenes were confirmed to express successfully in vivo in Escherichia coli. Expression of the intact copy of the fifteen pseudogenes in S. Typhi affected bacterial pathogenesis as revealed in human macrophage and epithelial cell infection models. The above findings suggest the need to revisit the nonstandard translation mechanism as well as the biological role of pseudogenes in the bacterial genome.
... Nonetheless, pseudogenes (frequently defined as protein-coding genes with in-frame stop codons) in pro-and eukaryotic genomes persist on the evolutionary timescale, implying that they are maintained by natural selection [6]. In addition, pseudogenes can be transcribed and translated [3,7]. ...
Article
Full-text available
Nonsense mutations turn a coding (sense) codon into an in-frame stop codon that is assumed to result in a truncated protein product. Thus, nonsense substitutions are the hallmark of pseudogenes and are used to identify them. Here we show that in-frame stop codons within bacterial protein-coding genes are widespread. Their evolutionary conservation suggests that many of them are not pseudogenes, since they maintain dN/dS values (ratios of substitution rates at non-synonymous and synonymous sites) significantly lower than 1 (this is a signature of purifying selection in protein-coding regions). We also found that double substitutions in codons—where an intermediate step is a nonsense substitution—show a higher rate of evolution compared to null models, indicating that a stop codon was introduced and then changed back to sense via positive selection. This further supports the notion that nonsense substitutions in bacteria are relatively common and do not necessarily cause pseudogenization. In-frame stop codons may be an important mechanism of regulation: Such codons are likely to cause a substantial decrease of protein expression levels.
... In conclusion, this methodology allowed us to observe slight variations of efflux pump gene's expression in Y. pestis. These differences could not be highlighted in previous transcriptomic studies using microarray expression data in Y. pestis [41][42][43] . The risk of bias is limited by using a validated set of RGs to normalize RT-qPCR data and following the MIQE recommendations with rigorous quality controls. ...
Article
Full-text available
Reverse transcription quantitative real-time polymerase chain reaction (RT-qPCR) is a very sensitive widespread technique considered as the gold standard to explore transcriptional variations. While a particular methodology has to be followed to provide accurate results many published studies are likely to misinterpret results due to lack of minimal quality requirements. Yersinia pestis is a highly pathogenic bacterium responsible for plague. It has been used to propose a ready-to-use and complete approach to mitigate the risk of technical biases in transcriptomic studies. The selection of suitable reference genes (RGs) among 29 candidates was performed using four different methods (GeNorm, NormFinder, BestKeeper and the Delta-Ct method). An overall comprehensive ranking revealed that 12 following candidate RGs are suitable for accurate normalization: gmk, proC, fabD, rpoD, nadB, rho, thrA, ribD, mutL, rpoB, adk and tmk. Some frequently used genes like 16S RNA had even been found as unsuitable to study Y. pestis. This methodology allowed us to demonstrate, under different temperatures and states of growth, significant transcriptional changes of six efflux pumps genes involved in physiological aspects as antimicrobial resistance or virulence. Previous transcriptomic studies done under comparable conditions had not been able to highlight these transcriptional modifications. These results highlight the importance of validating RGs prior to the normalization of transcriptional expression levels of targeted genes. This accurate methodology can be extended to any gene of interest in Y. pestis. More generally, the same workflow can be applied to identify and validate appropriate RGs in other bacteria to study transcriptional variations.
... Our broader research group also had published [20] and unpublished proteomic data from unrelated studies using the avirulent Y. pestis laboratory strain KIMD27 [14]. The data in the EMSL archive, gathered from experiments by various research efforts and collaborations [ [21,22] and unpublished data] was derived from the widely-used virulent North American laboratory strain CO92 [15] and three mutant derivatives of CO92. The virulent Y. pestis cultures were produced in biosafety level 3 (BSL3) containment facilities at other institutions, inactivated, and then sent to PNNL for analysis. ...
Article
Full-text available
The rapid pace of bacterial evolution enables organisms to adapt to the laboratory environment with repeated passage and thus diverge from naturally-occurring environmental (“wild”) strains. Distinguishing wild and laboratory strains is clearly important for biodefense and bioforensics; however, DNA sequence data alone has thus far not provided a clear signature, perhaps due to lack of understanding of how diverse genome changes lead to convergent phenotypes, difficulty in detecting certain types of mutations, or perhaps because some adaptive modifications are epigenetic. Monitoring protein abundance, a molecular measure of phenotype, can overcome some of these difficulties. We have assembled a collection of Yersinia pestis proteomics datasets from our own published and unpublished work, and from a proteomics data archive, and demonstrated that protein abundance data can clearly distinguish laboratory-adapted from wild. We developed a lasso logistic regression classifier that uses binary (presence/absence) or quantitative protein abundance measures to predict whether a sample is laboratory-adapted or wild that proved to be ~98% accurate, as judged by replicated 10-fold cross-validation. Protein features selected by the classifier accord well with our previous study of laboratory adaptation in Y. pestis. The input data was derived from a variety of unrelated experiments and contained significant confounding variables. We show that the classifier is robust with respect to these variables. The methodology is able to discover signatures for laboratory facility and culture medium that are largely independent of the signature of laboratory adaptation. Going beyond our previous laboratory evolution study, this work suggests that proteomic differences between laboratory-adapted and wild Y. pestis are general, potentially pointing to a process that could apply to other species as well. Additionally, we show that proteomics datasets (even archived data collected for different purposes) contain the information necessary to distinguish wild and laboratory samples. This work has clear applications in biomarker detection as well as biodefense.
... Even though these assumed parametric functions may fit the histogram, a per-spectrum goodness-of-fit (GOF) is not provided. Hence, there is no guarantee that such procedures can consistently yield accurate E-values/P-values (Alves et al., 2007a;Segal, 2008). Nevertheless, there exist a few published methods that do not assume any parametric form for the score distribution and are able to compute accurate spectrum-specific significance consistently. ...
... DG-1 contains 12 MS/MS datasets (175 569 spectra) from Escherichia coli K-12; DG-2, 9 MS/MS datasets (141 332 spectra) from Mycobacterium tuberculosis H37Rv; and DG-3, 8 MS/ MS datasets (121 787 spectra) from Salmonella typhimurium ATCC 14028. Experimental details concerning sample preparations for DGs 1-3 can be found in (Mottaz-Brewer et al., 2008;Schrimpe-Rutledge et al., 2012). A summary of the datasets downloaded is provided in Supplementary Table S1. ...
Article
Motivation: There is a growing trend for biomedical researchers to extract evidence and draw conclusions from mass spectrometry based proteomics experiments, the cornerstone of which is peptide identification. Inaccurate assignments of peptide identification confidence thus may have far-reaching and adverse consequences. Although some peptide identification methods report accurate statistics, they have been limited to certain types of scoring function. The extreme value statistics based method, while more general in the scoring functions it allows, demands accurate parameter estimates and requires, at least in its original design, excessive computational resources. Improving the parameter estimate accuracy and reducing the computational cost for this method has two advantages: it provides another feasible route to accurate significance assessment, and it could provide reliable statistics for scoring functions yet to be developed. Results: We have formulated and implemented an efficient algorithm for calculating the extreme value statistics for peptide identification applicable to various scoring functions, bypassing the need for searching large random databases. Availability: The source code, implemented in C++ on a linux system, is available for download at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbp/qmbp_ms/RAId/RAId_Linux_64Bit CONTACT: yyu@ncbi.nlm.nih.gov SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
... Furthermore, previous transcriptomic comparative analysis of the epidemic Y. pestis CO92 and endemic Pestoides F, similar to the strains from the foci # 4-7, showed that yopT is highly expressed at 37°C in both strains. Nevertheless, proteomic assay did not reveal the presence of the YopT protein in Pestoides F, indicating possible instability of the truncated versions of the protein [80,81]. Previous biochemical characterization of recombinant YopT protease showed its biological activity in the form of glutathione S-transferase N-terminal fusion with the full size of 322 a.a. ...
Article
Antibiotic therapy of plague is hampered by the recent isolation of Yersinia pestis strain resistant to all of antibiotics recommended for cure. This has constrained a quest for new antimicrobials taking aim at alternative targets. Recently Y. pestis cysteine protease YopT has been explored as a potential drug target. Targets conserved in the pathogen populations should be more efficacious; therefore, we evaluated intraspecies variability in yopT genes and their products. 114 Y. pestis isolates were screened. Only two YopT full-size isoforms were found among them. The endemic allele (N149) was present in biovar caucasica from Dagestan-highland natural plague focus # 39. The biovar caucasica strains from Transcaucasian highland (# 4-6) and Pre-Araks (# 7) plague foci also contained the N149 allele. These strains from foci # 4-7 possessed a truncated version of YopT that was a consequence of a frame-shift due to the deletion of a single nucleotide at position 71 bp. Computational analyses showed that although the SNP at the position 149 has a very minimal effect of the intrinsic disorder propensity of YopT proteins, whereas the N-terminal truncations of the YopT detected in bv. caucasica strains Pestoides F_YopT1 and F_YopT2, and Pestoides G generated isoforms with the significantly modified intrinsic disorder propensities and with reduced capability to interact with lost ability to utilize their N-terminal tail for the disorder-based interactions with biological partners. Considering that representatives of biovar caucasica were reported to be the reason of sporadic cases of human plague, this study supports the necessity of additional testing of globally disseminated YopT (S149) isoform as a potential target for treatment of plague caused by the strains producing different YopT isoforms.
... This dataset was used to gauge the feasibility of the proposed method in performing microbial identification at genus, species, and strain level. Experimental details and optimized sample preparations used to generate this dataset can be found in previously described studies [57,58]. Here, we briefly mention some important experimental steps that differ between the production of the PNNL dataset and that of the in-house dataset. ...
Article
Full-text available
Correct and rapid identification of microorganisms is the key to the success of many important applications in health and safety, including, but not limited to, infection treatment, food safety, and biodefense. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is challenging correct microbial identification because of the large number of choices present. To properly disentangle candidate microbes, one needs to go beyond apparent morphology or simple ‘fingerprinting’; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptidome profiles of microbes to better separate them and by designing an analysis method that yields accurate statistical significance. Here, we present an analysis pipeline that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using MS/MS data of 81 samples, each composed of a single known microorganism, that the proposed pipeline can correctly identify microorganisms at least at the genus and species levels. We have also shown that the proposed pipeline computes accurate statistical significances, i.e., E-values for identified peptides and unified E-values for identified microorganisms. The proposed analysis pipeline has been implemented in MiCId, a freely available software for Microorganism Classification and Identification. MiCId is available for download at http:// www. ncbi. nlm. nih. gov/ CBBresearch/ Yu/ downloads. html. Graphical Abstract ᅟ
... These genetic changes are believed to account for phenotypic distinctions and pathogenic potential observed between lineages of Y. pestis, as genetic diversity is a known source of phenotypic diversity underlying disease dynamics. 39,40 However, the molecular mechanisms that Y. pestis acquired to specifically become a severe respiratory pathogen are not yet established, despite several comparative genomic, transcriptomic and proteomic studies 6,8,41 . Before this study, it was unknown whether ancestral strains of Y. pestis were capable of causing primary pneumonic plague and at what point during the evolution of Y. pestis they became competent to do so. ...
... This observation can best be explained by Pestoides F maintaining specific variations in metabolic pathways that allow for increased fitness within the host. Comparative genomics and transcriptomics have revealed frameshifts, point mutations and pseudogenes, which may account for phenotypic distinction in nutritional requirements, carbohydrate fermentation and other biochemical properties between CO92 and Pestoides F 35,39,41 . For instance, the ability to catabolize L-aspartate into fumarate by aspartase (AspA) has been lost by modern Y. pestis lineages such as CO92 and KIM, whereas Y. pseudotuberculosis and the ancestral Angola and Pestoides isolates of Y. pestis maintain AspA activity 47 . ...
Article
Full-text available
Yersinia pestis causes the fatal respiratory disease pneumonic plague. Y. pestis recently evolved from the gastrointestinal pathogen Y. pseudotuberculosis; however, it is not known at what point Y. pestis gained the ability to induce a fulminant pneumonia. Here we show that the acquisition of a single gene encoding the protease Pla was sufficient for the most ancestral, deeply rooted strains of Y. pestis to cause pneumonic plague, indicating that Y. pestis was primed to infect the lungs at a very early stage in its evolution. As Y. pestis further evolved, modern strains acquired a single amino-acid modification within Pla that optimizes protease activity. While this modification is unnecessary to cause pneumonic plague, the substitution is instead needed to efficiently induce the invasive infection associated with bubonic plague. These findings indicate that Y. pestis was capable of causing pneumonic plague before it evolved to optimally cause invasive infections in mammals.