Article

Community-wide analysis of microbial genome sequence signatures

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Since GC content provides a limited summary of composition and is not always sufficient to distinguish different organisms [19], short substrings (k-mers) can also be used for unsupervised binning. Separating contigs on the basis of k-mer frequencies and coverage, often with the help of dimensionality reduction, is well-established in metagenomics [20][21][22]. However, the performance of existing tools on mixtures of sequences that include organisms with substantial intragenomic heterogeneity has yet to be explored. ...
... As noted in the introduction, the observation that k-mer counts are useful for separating sequences from different microbes is well-established [20,22]. Indeed, some available binning tools also make use of VAEs [21,30,67]. ...
Preprint
Full-text available
The recent acceleration in genome sequencing targeting previously unexplored parts of the tree of life presents computational challenges. Samples collected from the wild often contain sequences from several organisms, including the target, its cobionts, and contaminants. Effective methods are therefore needed to separate sequences. Though advances in sequencing technology make this task easier, it remains difficult to taxonomically assign sequences from eukaryotic taxa that are not well-represented in databases. Therefore, reference-based methods alone are insufficient. Here, I examine how we can take advantage of differences in sequence composition between organisms to identify symbionts, parasites and contaminants in samples, with minimal reliance on reference data. To this end, I explore data from the Darwin Tree of Life project, including hundreds of high-quality HiFi read sets from insects. Visualising two-dimensional representations of read tetranucleotide composition learned by a Variational Autoencoder can reveal distinct components of a sample. Annotating the embeddings with additional information, such as coding density, estimated coverage, or taxonomic labels allows rapid assessment of the contents of a dataset. The approach scales to millions of sequences, making it possible to explore unassembled read sets, even for large genomes. Combined with interactive visualisation tools, it allows a large fraction of cobionts reported by reference-based screening to be identified. Crucially, it also facilitates retrieving genomes for which suitable reference data are absent.
... For each sample, scaffolds with a minimum length of 2.5 kbp were binned into genome bins using MetaBAT (v.2.12.1), with both tetranucleotide frequencies and scaffold coverage information considered. The clustering of scaffolds from the bins and the unbinned scaffolds was visualized using ESOM with a minimum window length of 2.5 kbp and a maximum window length of 5 kbp, as previously described 58 . Misplaced scaffolds were removed from bins, and unbinned scaffolds for which segments were placed within the bin areas of ESOMs were added to the corresponding bins. ...
... 72). Metagenomic binning was performed on scaffolds with a length of >3,000 bp using ESOM, including a total of 4,939 scaffolds with a length of 30,693,002 bp 58,72 . CheckM (v.1.0.5) was used to evaluate the accuracy of the binning approach by determining the percentage of completeness and contamination 67 . ...
Article
Full-text available
In the ongoing debates about eukaryogenesis—the series of evolutionary events leading to the emergence of the eukaryotic cell from prokaryotic ancestors—members of the Asgard archaea play a key part as the closest archaeal relatives of eukaryotes¹. However, the nature and phylogenetic identity of the last common ancestor of Asgard archaea and eukaryotes remain unresolved2–4. Here we analyse distinct phylogenetic marker datasets of an expanded genomic sampling of Asgard archaea and evaluate competing evolutionary scenarios using state-of-the-art phylogenomic approaches. We find that eukaryotes are placed, with high confidence, as a well-nested clade within Asgard archaea and as a sister lineage to Hodarchaeales, a newly proposed order within Heimdallarchaeia. Using sophisticated gene tree and species tree reconciliation approaches, we show that analogous to the evolution of eukaryotic genomes, genome evolution in Asgard archaea involved significantly more gene duplication and fewer gene loss events compared with other archaea. Finally, we infer that the last common ancestor of Asgard archaea was probably a thermophilic chemolithotroph and that the lineage from which eukaryotes evolved adapted to mesophilic conditions and acquired the genetic potential to support a heterotrophic lifestyle. Our work provides key insights into the prokaryote-to-eukaryote transition and a platform for better understanding the emergence of cellular complexity in eukaryotic cells.
... Approaches based on sequence composition, e.g. tetranucleotide frequencies, have been successfully used to reconstruct near-complete genomes from metagenomic contigs without the use of reference genomes, but can generally only discriminate down to the genus-level 14,15 . More recently, coverage variation across multiple samples has been used, allowing binning down to species and sometimes strain level [16][17][18][19] . ...
... Compared to the other typically freshwater clades acI (BACL 2,4,15) and Luna (BACL 25, 28), that belong to the order Actinomycetales, acIV MAG clusters have larger genome sizes and contain a significantly lower proportion of genes in the Carbohydrate transport and metabolism COG category (p<0.01), particularly ABC-type sugar transporters ( Supplementary Fig. 7). ...
Preprint
Full-text available
Microbes are main drivers of biogeochemical cycles in oceans and lakes. Yet an understanding of the regulation of such processes is hampered by limited genome-context insight into the metabolic potential of bacterial populations. Here we explored an automatic binning approach to reconstruct representative bacterioplankton genomes from metagenomic samples across a time-series in the Baltic Sea. The 30 unique genomes assembled represent novel species within typical marine and freshwater clades. Analysis of the first genomes for abundant lineages entirely lacking reference genomes, such as OM182, acIV and LD19, uncovered divergent ecological adaptations. While phylogenetic patterns in the seasonal succession of the investigated genomes were evident, closely related genomes sometimes displayed distinct seasonal patterns, that could to some extent be explained by gene content. Signs of streamlining were evident in most genomes; and genome sizes correlated with abundance variation across filter size fractions. Comparisons of 86 aquatic metagenomes against the assembled genomes revealed significant fragment recruitment from brackish waters in North America, but little from lakes or oceans, suggesting the existence of a global brackish microbiome. Current estimates of evolutionary rates imply brackish bacteria diverged from freshwater and marine relatives over 100,000 years ago, long before the Baltic Sea was formed (8000 ya), markedly contrasting the evolutionary history of Baltic Sea macro-organisms, which are locally adapted populations of nearby meta-populations. We have thus demonstrated how metagenome-assembled genomes enable an integrated analysis of ecological patterns, functional potential and evolutionary history of several relevant genomes at a time in natural communities.
... Direct, PCR-independent, massive sequencing of environmental DNA (metagenomics) completed this picture by (i) revealing lineages whose rRNA genes escaped PCR amplification and (ii) providing data about the complete gene complement of microbial communities and, hence, about their metabolic potential 1-4 . The possibility to assemble genome sequences from single cell amplified genomes 5 or by binning from complex metagenomes 6 has further led to gain genome-based knowledge for specific uncultured groups. Some of these groups are widely diverse and/or have pivotal importance in evolution, such as the eukaryote-related Asgard archaea 7 . ...
... Many of the resulting (57.2 Mb) raw sequences exhibited similarity to those of available Absconditabacteria/SR1 metagenomeassembled genomes (MAGs) and, as expected, some also to Gammaproteobacteria (host-derived sequences) as well as a small proportion of potential contaminants probably present in the original sample (Bacillus-and fungi-like sequences). To bin the Vampirococcus sequences out of this mini-metagenome, we applied tetranucleotide frequency analysis on the whole sequence dataset using emergent self-organizing maps (ESOM) 6 . One of the ESOM sequence bins was enriched in Absconditabacteria/ SR1-like sequences and corresponded to the Vampirococcus sequences, which we extracted and assembled independently. ...
Article
Full-text available
The Candidate Phyla Radiation (CPR) constitutes a large group of mostly uncultured bacterial lineages with small cell sizes and limited biosynthetic capabilities. They are thought to be symbionts of other organisms, but the nature of this symbiosis has been ascertained only for cultured Saccharibacteria, which are epibiotic parasites of other bacteria. Here, we study the biology and the genome of Vampirococcus lugosii, which becomes the first described species of Vampirococcus, a genus of epibiotic bacteria morphologically identified decades ago. Vampirococcus belongs to the CPR phylum Absconditabacteria. It feeds on anoxygenic photosynthetic gammaproteobacteria, fully absorbing their cytoplasmic content. The cells divide epibiotically, forming multicellular stalks whose apical cells can reach new hosts. The genome is small (1.3 Mbp) and highly reduced in biosynthetic metabolism genes, but is enriched in genes possibly related to a fibrous cell surface likely involved in interactions with the host. Gene loss has been continuous during the evolution of Absconditabacteria, and generally most CPR bacteria, but this has been compensated by gene acquisition by horizontal gene transfer and de novo evolution. Our findings support parasitism as a widespread lifestyle of CPR bacteria, which probably contribute to the control of bacterial populations in diverse ecosystems.
... Reads from NextSeq sequencing of metagenomic DNA were trimmed and quality-ltered using BBduck Metagenomic binning: reconstruction of metagenomeassembled genomes (MAGs) Scaffolds were clustered into genome bins using different algorithms on the basis of i) tetranucleotide frequencies and differential coverage across samples using ABAWACA 1.07 (Brown et al., 2015), ii) tetranucleotide frequencies using Emergent Self-Organizing Maps (tetra-ESOM) (Dick et al., 2009), and iii) differential coverage and tetranucleotide frequency patterns using MaxBin 2 (Wu et al., 2016), using 107 and 40 single-copy marker genes. For ABAWACA analyses, scaffolds were fragmented in two group sizes: 3-5 kb and 5-10 kb. ...
Preprint
Full-text available
Background: The Andean Altiplano hosts a repertoire of high-altitude lakes with harsh conditions for life. These lakes are undergoing a process of desiccation caused by the current climate, leaving terraces exposed to extreme atmospheric conditions and serving as analogs to Martian paleolake basins. Microbiomes in Altiplano lake terraces have been poorly studied, enclosing uncultured lineages and a great opportunity to understand environmental adaptation and the limits of life on Earth. Here we examine the microbial diversity and function in ancient sediments (10.3-11 ky BP (Before Present)) from a terrace profile of Laguna Lejía, a sulfur- and metal/metalloid-rich saline lake in the Chilean Altiplano. We also evaluate the physical and chemical changes of the lake over time by studying the mineralogy and geochemistry of the terrace profile. Results: The mineralogy and geochemistry of the terrace profile revealed large water level fluctuations in the lake, scarcity of organic carbon, and high concentration of SO 4 ³⁻ -S, Na, Cl and Mg. Lipid biomarker analysis indicated the presence of aquatic/terrestrial plant remnants preserved in the ancient sediments, and genome-resolved metagenomics unveiled a diverse prokaryotic community with still active microorganisms based on in silico growth predictions. We reconstructed 591 bacterial and archaeal metagenome-assembled genomes (MAGs), of which 98.8% belonged to previously unreported species. The most abundant and widespread metabolisms among MAGs were the reduction and oxidation of S, N, As and halogenated compounds, as well as CO oxidation, possibly as a key metabolic trait in the organic carbon-depleted sediments. The broad redox and CO 2 fixation pathways among phylogenetically distant bacteria and archaea extended the knowledge of metabolic capacities to previously unknown taxa. For instance, we identified genomic potential for dissimilatory sulfate reduction in Bacteroidota and α- and γ-Proteobacteria; ammonium oxidation in a novel Actinobacteriota; and we predicted enzymes of the Calvin-Benson-Bassham cycle in Planctomycetota, Gemmatimonadota, and Nanoarchaeota. The presence of genes encoding for enzymes involved in the above metabolic pathways in unexpected taxonomic groups has significant implications for the expansion of microorganisms involved in the biogeochemical cycles of carbon, nitrogen and sulfur.
... In order to identify other contigs within the metagenome from the same taxon, a trimer approach was used (33). In brief, the Python programming language (34) was used to count the proportion of each of the 64 possible trimers in all contigs containing at least 1,000 base pairs. ...
Article
Full-text available
Biogenic methane in subsurface coal seam environments is produced by diverse consortia of microbes. Although this methane is useful for global energy security, it remains unclear which microbes can liberate carbon from the coal. Most of this carbon is relatively resistant to biodegradation, as it is contained within aromatic rings. Thus, to explore for coal-degrading taxa in the subsurface, this study reconstructed relevant metagenome-assembled genomes (MAGs) from coal seams by using a key genomic marker for the anaerobic degradation of monoaromatic compounds as a guide: the benzoyl-CoA reductase gene (bcrABCD). Three MAGs were identified with this genetic potential. The first represented a novel taxon from the Krumholzibacteriota phylum, which this study is the first to describe. This Krumholzibacteriota MAG contained a full set of genes for benzoyl-CoA dearomatization, in addition to other genes for anaerobic catabolism of monoaromatics. Analysis of Krumholzibacteriota MAGs from other environments revealed that this genetic potential may be common, and thus, Krumholzibacteriota may be important organisms for the liberation of recalcitrant carbon in a broad range of environments. Moreover, the assembly and characterization of two Syntrophorhabdus aromaticivorans MAGs from different continents and a Syntrophaceae sp. MAG implicate the Deltaproteobacteria class in coal seam monoaromatic degradation. Each of these taxa are potential rate-limiting organisms for subsurface coal-to-methane biodegradation. Their description here provides some understanding of their function within the coal seam microbiome and will help inform future efforts in coal bed methane stimulation, anoxic bioremediation of organic pollutants, and assessments of anoxic, subsurface carbon cycling and emissions. IMPORTANCE Subsurface coal seams are highly anoxic, oligotrophic environments, where the main source of carbon is “locked away” within aromatic rings. Despite these challenges, many coal seams accumulate biogenic methane, implying that the coal seam microbiome is “unlocking” this carbon source in situ. For over two decades, researchers have endeavored to understand which organisms perform these processes. This study provides the first descriptions of organisms with this genetic potential from the coal seam environment. Here, we report metagenomic insights into carbon liberation from aromatic molecules and the degradation pathways involved and describe a Krumholzibacteriota, two Syntrophorhabdus aromaticivorans, and a Syntrophaceae MAG that contain this genetic potential. This is also the first time that the Krumholzibacteriota phylum has been implicated in anaerobic dearomatization of aromatic hydrocarbons. This potential is identified here in numerous MAGs from other terrestrial and marine subsurface habitats, implicating the Krumholzibacteriota in carbon-cycling processes across a broad range of environments.
... Here we found that for higher taxonomic levels (phylum to genus) up to 98% of Taxometer annotations could be reproduced by training using only TNFs. This is in concordance with previous findings that TNFs could be used to classify metagenomics fragments at the genus level and that abundance showed better strain-level binning performance compared to TNFs 15,24 . The number of correct species labels predicted by the model that combined both TNFs and abundances was 18%-35% larger than the models that only used TNFs or abundances for MMseqs2 annotations of the CAMI2 Airways dataset. ...
Preprint
Full-text available
For taxonomy based classification of metagenomics assembled contigs, current methods use sequence similarity to identify their most likely taxonomy. However, in the related field of metagenomics binning contigs are routinely clustered using information from both the contig sequences and their abundance. We introduce Taxometer, a neural network based method that improves the annotations and estimates the quality of any taxonomic classifier by combining contig abundance profiles and tetra-nucleotide frequencies. When applied to five short-read CAMI2 datasets, it increased the average share of correct species-level contig annotations of the MMSeqs2 tool from 66.6% to 86.2% and reduced the share of wrong species-level annotations in the CAMI2 Rhizosphere dataset two-fold on average for Metabuli, Centrifuge, and Kraken2. Finally, we applied Taxometer to two complex long-read metagenomics data sets for benchmarking taxonomic classifiers. Taxometer is available as open-source software and can enhance any taxonomic annotation of metagenomic contigs.
... With the development of high-throughput sequencing techniques and the advent of metagenomics, a proliferation of publicly available DNA sequence data from microbial communities residing in diverse ecosystems makes it possible to perform an extensive investigation of gene distribution (Podar et al., 2015;Holert et al., 2018;Viljakainen and Hug, 2021). This culture-independent analysis could also detect genes from organisms that currently are not culturable (Dick et al., 2009). In this study, we screened >2,500 publicly available (pre-)assembled microbial metagenomes to evaluate microbial SAC prevalence, distribution, and taxonomy in natural and engineered environments and in different human body parts. ...
Article
Full-text available
Sialic acids comprise a varied group of nine-carbon amino sugars found mostly in humans and other higher metazoans, playing major roles in cell interactions with external environments as well as other cells. Microbial sialic acid catabolism (SAC) has long been considered a virulence determinant, and appears to be mainly the purview of pathogenic and commensal bacterial species associated with eukaryotic hosts. Here, we used 2,521 (pre-)assembled metagenomes to evaluate the distribution of SAC in microbial communities from diverse ecosystems and human body parts. Our results demonstrated that microorganisms possessing SAC globally existed in non-host associated environments, although much less frequently than in mammal hosts. We also showed that the ecological significance and taxonomic diversity of microbial SAC have so far been largely underestimated. Phylogenetic analysis revealed a strong signal of horizontal gene transfer among distinct taxa and habitats, and also suggested a specific ecological pressure and a relatively independent evolution history in environmental communities. Our study expanded the known diversity of microbial SAC, and has provided the backbone for further studies on its ecological roles and potential pathogenesis.
... In order to assign contigs to genomes, a trimer approach was used (s. l. Dick et al., 2009). In brief, a custom Python script was used to count the proportion of each of the 64 possible trimers in all contigs that were at least 10,000 base pairs in length. ...
... com/ najos hi/ sickle) was used to dereplicate and trim the raw shotgun sequencing metagenomic reads with the "pe" option and default setting. The dereplicated, trimmed, and paired-end DNA reads were assembled using MEGAHIT [43] with the following parameters: k-min 31, k-max 127, and step 4 [44]. The 200,145-561,880 assembled contigs, which were longer than 1 kb were obtained and used to binning into putative taxonomic groups based on abundance information using MaxBin version 2.2.4 with the run MaxBin.pl ...
Article
Full-text available
Background A large proportion of prokaryotic microbes in marine sediments remains uncultured, hindering our understanding of their ecological functions and metabolic features. Recent environmental metagenomic studies suggested that many of these uncultured microbes contribute to the degradation of organic matter, accompanied by acetogenesis, but the supporting experimental evidence is limited. Results Estuarine sediments were incubated with different types of organic matters under anaerobic conditions, and the increase of uncultured bacterial populations was monitored. We found that (1) lignin stimulated the increase of uncultured bacteria within the class Dehalococcoidia. Their ability to metabolize lignin was further supported by the presence of genes associated with a nearly complete degradation pathway of phenolic monomers in the Dehalococcoidia metagenome-assembled genomes (MAGs). (2) The addition of cellulose stimulated the increase of bacteria in the phylum Ca. Fermentibacterota and family Fibrobacterales, a high copy number of genes encoding extracellular endoglucanase or/and 1,4-beta-cellobiosidase for cellulose decomposition and multiple sugar transporters were present in their MAGs. (3) Uncultured lineages in the order Bacteroidales and the family Leptospiraceae were enriched by the addition of casein and oleic acid, respectively, a high copy number of genes encoding extracellular peptidases, and the complete β-oxidation pathway were found in those MAGs of Bacteroidales and Leptospiraceae, respectively. (4) The growth of unclassified bacteria of the order Clostridiales was found after the addition of both casein and cellulose. Their MAGs contained multiple copies of genes for extracellular peptidases and endoglucanase. Additionally, ¹³C-labeled acetate was produced in the incubations when ¹³C-labeled dissolved inorganic carbon was provided. Conclusions Our results provide new insights into the roles of microorganisms during organic carbon degradation in anaerobic estuarine sediments and suggest that these macro and single molecular organic carbons support the persistence and increase of uncultivated bacteria. Acetogenesis is an additional important microbial process alongside organic carbon degradation. 8vg1mZsw3sS2b3Wiiq9h8HVideo Abstract
... To determine the taxonomic identity of sequenced cells, we employed a genome-resolved approach. Assignment of scaffolds to genome bins was performed using the tetranucleotide frequencies of all scaffolds ≥5 kbp long over windows of 5 kbp, as described in ref. 78. Results were computed and visualized using the Databionics ESOM Tools software v. 1.1 79 , leading to the reconstruction of 18 genome bins ( Supplementary Fig. 3). ...
Article
Full-text available
Much remains to be explored regarding the diversity of uncultured, host-associated microbes. Here, we describe rectangular bacterial structures (RBSs) in the mouths of bottlenose dolphins. DNA staining revealed multiple paired bands within RBSs, suggesting the presence of cells dividing along the longitudinal axis. Cryogenic transmission electron microscopy and tomography showed parallel membrane-bound segments that are likely cells, encapsulated by an S-layer-like periodic surface covering. RBSs displayed unusual pilus-like appendages with bundles of threads splayed at the tips. We present multiple lines of evidence, including genomic DNA sequencing of micromanipulated RBSs, 16S rRNA gene sequencing, and fluorescence in situ hybridization, suggesting that RBSs are bacterial and distinct from the genera Simonsiella and Conchiformibius (family Neisseriaceae), with which they share similar morphology and division patterning. Our findings highlight the diversity of novel microbial forms and lifestyles that await characterization using tools complementary to genomics such as microscopy.
... In such case, each contig can be represented by the M dimensional coverage vector only. Furthermore, different organisms usually have different tetra-mer composition profiles [39,40]. Therefore, the feature matrix of the contigs is denoted as X combo ∈ R N ×(M+T ) , ...
Article
Full-text available
Binning aims to recover microbial genomes from metagenomic data. For complex metagenomic communities, the available binning methods are far from satisfactory, which usually do not fully use different types of features and important biological knowledge. We developed a novel ensemble binner, MetaBinner, which generates component results with multiple types of features by k-means and uses single-copy gene information for initialization. It then employs a two-stage ensemble strategy based on single-copy genes to integrate the component results efficiently and effectively. Extensive experimental results on three large-scale simulated datasets and one real-world dataset demonstrate that MetaBinner outperforms the state-of-the-art binners significantly.
... Finally, we obtained 130 bins that met the MIMAG standard (Bowers et al., 2017). Each bin was then combined and visualized using ESOM (Dick et al., 2009). Afterward, gene prediction and annotation were conducted as previously described , and the metabolic pathways for each bin were reconstructed using the KEGG Automatic Annotation Server (KAAS) (Moriya et al., 2007) focusing on genes related to carbon (C) and sulfur (S) metabolism. ...
Article
The fate and removal of ciprofloxacin, a class of fluoroquinolone antibiotic, during sulfur-mediated biological wastewater treatment has been recently well documented. However, little is known regarding the genetic response of microorganisms to ciprofloxacin. Here, a lab-scale anaerobic sulfate-reducing bioreactor was continuously operated over a long term for ciprofloxacin-contaminated wastewater treatment to investigate the response of the microorganisms to ciprofloxacin by adopting a metagenomics approach. It was found that total organic carbon (TOC) removal and sulfate reduction were promoted by approximately 10% under ciprofloxacin stress, along with the enrichment of functional genera (e.g., Desulfobacter, Geobacter) involved in carbon and sulfur metabolism. The metagenomic analytical results demonstrated that ciprofloxacin triggered the microbial SOS response, as demonstrated by the up-regulation of the multidrug efflux pump genes (8–125-fold higher than that of the control) and ciprofloxacin-degrading genes (4–33-fold higher than that of the control). Moreover, the contents of ATP, NADH, and cytochrome C, as well as related functional genes (including genes involved in energy generation, electron transport, carbon metabolism, and sulfur metabolism) were markedly increased under ciprofloxacin stress. This demonstrated that the carbon and sulfur metabolisms were enhanced for energy (ATP) generation and electron transport in response to ciprofloxacin-induced stress. Interestingly, the microbes tended to cooperate while being subjected exposure to exogenous ciprofloxacin according to the reconstructed metabolic network using the NetSeed model. Particularly, the species with higher complementarity indices played more pivotal roles in strengthening microbial metabolism and the SOS response under long-term ciprofloxacin stress. This study characterized the response mechanisms of microorganisms to ciprofloxacin at the genetic level in sulfur-mediated biological wastewater treatment. These new understandings will contribute the scientific basis for improving and optimizing the sulfur-mediated bioprocess for antibiotics-laden wastewater treatment.
... com/ najos hi/ sickle) using the default parameters. Trimmed, paired-end DNA reads were assembled using IDBA-UD (Peng et al. 2012) with the following parameters: mink 52, maxk 92, step 8. Binning of assembled metagenomic sequences was initially performed using tetra-nucleotide frequencies signatures in ESOM using 5 Kb as fragment length cutoffs (Dick et al. 2009). Ribosomal RNA (rRNA) genes were called with EMIRGE individually. ...
Article
Full-text available
Small regulatory RNAs (sRNAs) are present in almost all investigated microbes, regarded as modulators and regulators of gene expression and also known to play their regulatory role in the environmentally significant process. It has been estimated that less than 1% of the microbes in nature are culturable in the laboratory, hindering our understanding of their physiology, and living strategies. However, recent big advancing of DNA sequencing and omics-related data analysis makes the understanding of the genetics, metabolic potentials, even ecological roles of uncultivated microbes possible. In this study, we used a metagenome and metatranscriptome-based integrated approach to identify small RNAs in the microbiome of Guaymas Basin sediments. Hundreds of environmental sRNAs comprising 228 groups were identified based on their homology, 82% of which displayed high similarity with previously known small RNAs in Rfam database, whereas, “18%” are putative novel sRNA motifs. A putative cis-acting sRNA potentially binding to methyl coenzyme M reductase, a key enzyme in methanogenesis or anaerobic oxidation of methane (AOM), was discovered in the genome of ANaerobic MEthane oxidizing archaea group 1 (ANME-1), which were the dominate microbe in the sample. These sRNAs were actively expressed in local Guaymas Basin hydrothermal environment, suggesting important roles of sRNAs in regulating microbial activity in natural environments.
... First, scaffolds were binned on the basis of % GC content, differential coverage abundance patterns across all samples using Abawaca1, and taxonomic affiliation. Scaffolds that did not associate with any cluster using this method were binned based on tetranucleotide frequency using Emergent Self-Organizing Maps (ESOM) [93]. All genomic bins were manually inspected within ggKbase. ...
Article
Full-text available
Background Archaea play fundamental roles in the environment, for example by methane production and consumption, ammonia oxidation, protein degradation, carbon compound turnover, and sulfur compound transformations. Recent genomic analyses have profoundly reshaped our understanding of the distribution and functionalities of Archaea and their roles in eukaryotic evolution. Results Here, 1179 representative genomes were selected from 3197 archaeal genomes. The representative genomes clustered based on the content of 10,866 newly defined archaeal protein families (that will serve as a community resource) recapitulates archaeal phylogeny. We identified the co-occurring proteins that distinguish the major lineages. Those with metabolic roles were consistent with experimental data. However, two families specific to Asgard were determined to be new eukaryotic signature proteins. Overall, the blocks of lineage-specific families are dominated by proteins that lack functional predictions. Conclusions Given that these hypothetical proteins are near ubiquitous within major archaeal groups, we propose that they were important in the origin of most of the major archaeal lineages. Interestingly, although there were clearly phylum-specific co-occurring proteins, no such blocks of protein families were shared across superphyla, suggesting a burst-like origin of new lineages early in archaeal evolution.
... Maximum likelihood tree was built using IQ-Tree v2.0.3 with model auto-detected (LG + G) and an ultrafast bootstrap of maximum iteration of 1000 [100] and visualized using Interactive Tree Of Life (iTOL) with branch length ignored [101]. Tetranucleotide frequencies of MMA and their viruses were calculated, clustered, and visualized using Emergent Self-Organizing Maps [102]. The correlation coefficients of the tetranucleotide frequencies of all MMA viruses were calculated using Python package pyani. ...
Article
Full-text available
The metabolism of methane in anoxic ecosystems is mainly mediated by methanogens and methane-oxidizing archaea (MMA), key players in global carbon cycling. Viruses are vital in regulating their host fate and ecological function. However, our knowledge about the distribution and diversity of MMA viruses and their interactions with hosts is rather limited. Here, by searching metagenomes containing mcrA (the gene coding for the α-subunit of methyl-coenzyme M reductase) from a wide variety of environments, 140 viral operational taxonomic units (vOTUs) that potentially infect methanogens or methane-oxidizing archaea were retrieved. Four MMA vOTUs (three infecting the order Methanobacteriales and one infecting the order Methanococcales) were predicted to cross-domain infect sulfate-reducing bacteria. By facilitating assimilatory sulfur reduction, MMA viruses may increase the fitness of their hosts in sulfate-depleted anoxic ecosystems and benefit from synthesis of the sulfur-containing amino acid cysteine. Moreover, cell-cell aggregation promoted by MMA viruses may be beneficial for both the viruses and their hosts by improving infectivity and environmental stress resistance, respectively. Our results suggest a potential role of viruses in the ecological and environmental adaptation of methanogens and methane-oxidizing archaea.
... Commonly used signals include phylogenetic profiles (sequence similarity to known organisms, e.g., ref. [14]), sequence composition (GC content or tetranucleotide frequency, e.g., refs. [15,16]), and relative abundance (coverage variation within a sample or across samples, e.g., ref. [17]). Accurate bins draw support from multiple, concordant signals that persist across all the sequences constituting the draft genome [18,19]. ...
Article
Full-text available
The plasticity of bacterial and archaeal genomes makes examining their ecological and evolutionary dynamics both exciting and challenging. The same mechanisms that enable rapid genomic change and adaptation confound current approaches for recovering complete genomes from metagenomes. Here, we use strain-specific patterns of DNA methylation to resolve complex bacterial genomes from long-read metagenomic data of a marine microbial consortium, the “pink berries” of the Sippewissett Marsh (USA). Unique combinations of restriction-modification (RM) systems encoded by the bacteria produced distinctive methylation profiles that were used to accurately bin and classify metagenomic sequences. Using this approach, we finished the largest and most complex circularized bacterial genome ever recovered from a metagenome (7.9 Mb with >600 transposons), the finished genome of Thiohalocapsa sp. PB-PSB1 the dominant bacteria in the consortia. From genomes binned by methylation patterns, we identified instances of horizontal gene transfer between sulfur-cycling symbionts (Thiohalocapsa sp. PB-PSB1 and Desulfofustis sp. PB-SRB1), phage infection, and strain-level structural variation. We also linked the methylation patterns of each metagenome-assembled genome with encoded DNA methyltransferases and discovered new RM defense systems, including novel associations of RM systems with RNase toxins.
... With the development of bioinformatics analyses and high-throughput sequencing, more and more metagenome sequencing data have been obtained from diverse ecological niches and multiple parts of the human body including the oral cavity. Metagenomics is a DNA sequencing methodology based on shotgun sequencing, which sequences the DNA directly separated from the environment and then, assigns the reconstructed genome segments to the genome sketches [24]. ...
Article
Full-text available
The Candidate Phyla Radiation (CPR), as a newly discovered and difficult-to-culture bacterium, accounts for the majority of the bacterial domain, which may be related to various oral diseases, including dental caries. Restricted by laboratory culture conditions, there is limited knowledge about oral CPR. Advances in metagenomics provide a new way to study CPR through molecular biology. Here, we used metagenomic assembly and binning to reconstruct more and higher quality metagenome-assembled genomes (MAGs) of CPR from oral dental plaque. These MAGs represent novel CPR species, which differed from all known CPR organisms. Relative abundance of different CPR MAGs in the caries and caries-free group was estimated by mapping metagenomic reads to newly constructed MAGs. The relative abundance of two CPR MAGs was significantly increased in the caries group, indicating that there might be a relationship with caries activity. The detection of a large number of unclassified CPR MAGs in the dataset implies that the phylogenetic diversity of CPR is enormous. The results provide a reference value for exploring the ecological distribution and function of uncultured or difficult-to-culture microorganisms.
... As all 18S rDNAs extracted from individual assemblies were identical, all reads (i.e., Illumina HiSeqs and MiSeq, and Nanopore) were assembled together by SPAdes v3.11.1 using --sc and k-mers of 21, 33, 55, 77, 99, and 121. The resulting assembly was binned and decontaminated using tetraESOM [64] and a BLASTing strategy as described previously [65]. The final assembly was scaffolded using the P_RNA_scaffolder [66]. ...
Article
Full-text available
Background Mitochondria and peroxisomes are the two organelles that are most affected during adaptation to microoxic or anoxic environments. Mitochondria are known to transform into anaerobic mitochondria, hydrogenosomes, mitosomes, and various transition stages in between, collectively called mitochondrion-related organelles (MROs), which vary in enzymatic capacity. Anaerobic peroxisomes were identified only recently, and their putatively most conserved function seems to be the metabolism of inositol. The group Archamoebae includes anaerobes bearing both anaerobic peroxisomes and MROs, specifically hydrogenosomes in free-living Mastigamoeba balamuthi and mitosomes in the human pathogen Entamoeba histolytica, while the organelles within the third lineage represented by Pelomyxa remain uncharacterized. Results We generated high-quality genome and transcriptome drafts from Pelomyxa schiedti using single-cell omics. These data provided clear evidence for anaerobic derivates of mitochondria and peroxisomes in this species, and corresponding vesicles were tentatively identified in electron micrographs. In silico reconstructed MRO metabolism harbors respiratory complex II, electron-transferring flavoprotein, a partial TCA cycle running presumably in the reductive direction, pyruvate:ferredoxin oxidoreductase, [FeFe]-hydrogenases, a glycine cleavage system, a sulfate activation pathway, and an expanded set of NIF enzymes for iron-sulfur cluster assembly. When expressed in the heterologous system of yeast, some of these candidates localized into mitochondria, supporting their involvement in the MRO metabolism. The putative functions of P. schiedti peroxisomes could be pyridoxal 5′-phosphate biosynthesis, amino acid and carbohydrate metabolism, and hydrolase activities. Unexpectedly, out of 67 predicted peroxisomal enzymes, only four were also reported in M. balamuthi, namely peroxisomal processing peptidase, nudix hydrolase, inositol 2-dehydrogenase, and d-lactate dehydrogenase. Localizations in yeast corroborated peroxisomal functions of the latter two. Conclusions This study revealed the presence and partially annotated the function of anaerobic derivates of mitochondria and peroxisomes in P. schiedti using single-cell genomics, localizations in yeast heterologous systems, and transmission electron microscopy. The MRO metabolism resembles that of M. balamuthi and most likely reflects the state in the common ancestor of Archamoebae. The peroxisomal metabolism is strikingly richer in P. schiedti. The presence of myo-inositol 2-dehydrogenase in the predicted peroxisomal proteome corroborates the situation in other Archamoebae, but future experimental evidence is needed to verify additional functions of this organelle.
... The assembly of the genome was performed using Canu 1.8 [21] with corMinCoverage set to zero and corOutCoverage set to 100 000. Following assembly, the data was binned using the tetraESOM method [22] and the eukaryotic bin was checked for bacterial contamination using a combination of blastn and blastp as described previously [3]. The final eukaryotic genome assembly was polished using the ONT reads with Nanopolish [17] followed by polishing using the Illumina short reads with Pilon v1.21 [19]. ...
Article
Full-text available
Monocercomonoides exilis is considered the first known eukaryote to completely lack mitochondria. This conclusion is based primarily on a genomic and transcriptomic study which failed to identify any mitochondrial hallmark proteins. However, the available genome assembly has limited contiguity and around 1.5 % of the genome sequence is represented by unknown bases. To improve the contiguity, we re-sequenced the genome and transcriptome of M. exilis using Oxford Nanopore Technology (ONT). The resulting draft genome is assembled in 101 contigs with an N50 value of 1.38 Mbp, almost 20 times higher than the previously published assembly. Using a newly generated ONT transcriptome, we further improve the gene prediction and add high quality untranslated region (UTR) annotations, in which we identify two putative polyadenylation signals present in the 3′UTR regions and characterise the Kozak sequence in the 5′UTR regions. All these improvements are reflected by higher BUSCO genome completeness values. Regardless of an overall more complete genome assembly without missing bases and a better gene prediction, we still failed to identify any mitochondrial hallmark genes, thus further supporting the hypothesis on the absence of mitochondrion.
... Scaffolds larger than 1 kb were used for downstream analyses. Genome binning was carried out using three binning algorithms-Abawaca v1.07 [15], ESOM [64,65] and Maxbin2 v2.2.4 [66]. The values 3000 and 5000 bp as well as 5000 and 10,000 bp were used as -min and -max parameters to calculate 4-mer frequencies for Abawaca and ESOM (the script esom-Wrapper.pl, ...
Article
Full-text available
Background The highly diverse Cand . Patescibacteria are predicted to have minimal biosynthetic and metabolic pathways, which hinders understanding of how their populations differentiate in response to environmental drivers or host organisms. Their mechanisms employed to cope with oxidative stress are largely unknown. Here, we utilized genome-resolved metagenomics to investigate the adaptive genome repertoire of Patescibacteria in oxic and anoxic groundwaters, and to infer putative host ranges. Results Within six groundwater wells, Cand . Patescibacteria was the most dominant (up to 79%) super-phylum across 32 metagenomes sequenced from DNA retained on 0.2 and 0.1 µm filters after sequential filtration. Of the reconstructed 1275 metagenome-assembled genomes (MAGs), 291 high-quality MAGs were classified as Cand . Patescibacteria. Cand . Paceibacteria and Cand . Microgenomates were enriched exclusively in the 0.1 µm fractions, whereas candidate division ABY1 and Cand . Gracilibacteria were enriched in the 0.2 µm fractions. On average, Patescibacteria enriched in the smaller 0.1 µm filter fractions had 22% smaller genomes, 13.4% lower replication measures, higher proportion of rod-shape determining proteins, and of genomic features suggesting type IV pili mediated cell–cell attachments. Near-surface wells harbored Patescibacteria with higher replication rates than anoxic downstream wells characterized by longer water residence time. Except prevalence of superoxide dismutase genes in Patescibacteria MAGs enriched in oxic groundwaters (83%), no major metabolic or phylogenetic differences were observed. The most abundant Patescibacteria MAG in oxic groundwater encoded a nitrate transporter, nitrite reductase, and F-type ATPase, suggesting an alternative energy conservation mechanism. Patescibacteria consistently co-occurred with one another or with members of phyla Nanoarchaeota, Bacteroidota, Nitrospirota, and Omnitrophota. Among the MAGs enriched in 0.2 µm fractions,, only 8% Patescibacteria showed highly significant one-to-one correlation, mostly with Omnitrophota. Motility and transport related genes in certain Patescibacteria were highly similar to genes from other phyla (Omnitrophota, Proteobacteria and Nanoarchaeota). Conclusion Other than genes to cope with oxidative stress, we found little genomic evidence for niche adaptation of Patescibacteria to oxic or anoxic groundwaters. Given that we could detect specific host preference only for a few MAGs, we speculate that the majority of Patescibacteria is able to attach multiple hosts just long enough to loot or exchange supplies.
... Genomes were binned using abawaca (github.com/CK7/abawaca), ESOM [95] and MaxBin2 [96], and the resulting bins were aggregated using DAS Tool [97]. Each genomic bin was manually curated using coverage, gene-based taxonomy, and GC content information for each scaffold. ...
Article
Full-text available
Background The hyperarid core of the Atacama Desert is an extremely harsh environment thought to be colonized by only a few heterotrophic bacterial species. Current concepts for understanding this extreme ecosystem are mainly based on the diversity of these few species, yet a substantial area of the Atacama Desert hyperarid topsoil is covered by expansive boulder accumulations, whose underlying microbiomes have not been investigated so far. With the hypothesis that these sheltered soils harbor uniquely adapted microbiomes, we compared metagenomes and geochemistry between soils below and beside boulders across three distantly located boulder accumulations in the Atacama Desert hyperarid core. Results Genome-resolved metagenomics of eleven samples revealed substantially different microbial communities in soils below and beside boulders, despite the presence of shared species. Archaea were found in significantly higher relative abundance below the boulders across all samples within distances of up to 205 km. These key taxa belong to a novel genus of ammonia-oxidizing Thaumarchaeota , Candidatus Nitrosodeserticola. We resolved eight mid-to-high quality genomes of this genus and used comparative genomics to analyze its pangenome and site-specific adaptations. Ca. Nitrosodeserticola genomes contain genes for ammonia oxidation, the 3-hydroxypropionate/4-hydroxybutyrate carbon fixation pathway, and acetate utilization indicating a chemolithoautotrophic and mixotrophic lifestyle. They also possess the capacity for tolerating extreme environmental conditions as highlighted by the presence of genes against oxidative stress and DNA damage. Site-specific adaptations of the genomes included the presence of additional genes for heavy metal transporters, multiple types of ATP synthases, and divergent genes for aquaporins. Conclusion We provide the first genomic characterization of hyperarid soil microbiomes below the boulders in the Atacama Desert, and report abundant and highly adapted Thaumarchaeaota with ammonia oxidation and carbon fixation potential. Ca. Nitrosodeserticola genomes provide the first metabolic and physiological insight into a thaumarchaeal lineage found in globally distributed terrestrial habitats characterized by various environmental stresses. We consequently expand not only the known genetic repertoire of Thaumarchaeota but also the diversity and microbiome functioning in hyperarid ecosystems.
... 4B and 4C have been used in various metagenome studies (Hayashi et al., 2005;Uchiyama et al., 2005;Abe et al., 2006b;, including the detection of diverse pathogenic virus sequences from tick-derived metagenome sequences (Nakao et al., 2013). Dick et al. (2009) developed a slightly different type of SOM, the ESOM (emergent SOM), and obtained clear phylotype-specific classification of metagenomic sequences by analyzing several acido-philic biofilms from the Richmond Mine in California. ...
Article
Full-text available
In genetics and related fields, huge amounts of data, such as genome sequences, are accumulating, and the use of artificial intelligence (AI) suitable for big data analysis has become increasingly important. Unsupervised AI that can reveal novel knowledge from big data without prior knowledge or particular models is highly desirable for analyses of genome sequences, particularly for obtaining unexpected insights. We have developed a batch-learning self-organizing map (BLSOM) for oligonucleotide compositions that can reveal various novel genome characteristics. Here, we explain the data mining by the BLSOM: an unsupervised AI. As a specific target, we first selected SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) because a large number of viral genome sequences have been accumulated via worldwide efforts. We analyzed more than 0.6 million sequences collected primarily in the first year of the pandemic. BLSOMs for short oligonucleotides (e.g., 4–6-mers) allowed separation into known clades, but longer oligonucleotides further increased the separation ability and revealed subgrouping within known clades. In the case of 15-mers, there is mostly one copy in the genome; thus, 15-mers that appeared after the epidemic started could be connected to mutations, and the BLSOM for 15-mers revealed the mutations that contributed to separation into known clades and their subgroups. After introducing the detailed methodological strategies, we explain BLSOMs for various topics, such as the tetranucleotide BLSOM for over 5 million 5-kb fragment sequences derived from almost all microorganisms currently available and its use in metagenome studies. We also explain BLSOMs for various eukaryotes, including fishes, frogs and Drosophila species, and found a high separation ability among closely related species. When analyzing the human genome, we found enrichments in transcription factor-binding sequences in centromeric and pericentromeric heterochromatin regions. The tDNAs (tRNA genes) could be separated according to their corresponding amino acid.
... ://github.com/CK7/abawaca), CONCOCT, MaxBin 2(Wu, Simmons et al. 2015), MetaBAT(Kang, Froula et al. 2015); and one nucleotide composition tool tetranucleotide ESOMs(Dick, Andersson et al. 2009)). Then, DASTool used to optimize the result of these individual binning tools. ...
Preprint
Full-text available
Whole genome shotgun sequencing is a powerful to study microbial community is a given environment. Metagenomic binning offers a genome centric approach to study microbiomes. There are several tools available to process metagenomic data from raw reads to the interpretation there is still lack of standard approach that can be used to process the metagenomic data step by step. In this study CuBi-MeAn (Customizable Binning and Metagenomic Analysis) create a customizable and flexible processing pipeline, to process the metagenomic data and generate results for further interpretation. This study aims to perform metagenomic binning to enhance taxonomical classification, functional potentials, and interactions among microbial populations in environmental systems. This customized pipeline which is comprised of a series of genomic/metagenomic tools designed to recover better quality results and reliable interpretation of the system dynamics for the given systems. For this reason, a metagenomic data processing pipeline is developed to evaluate metagenomic data from three environmental engineering projects. The use of our pipeline was demonstrated and compared on three different datasets that were of different sizes, from different sequencing platforms, and generated from three different environmental sources. By designing and developing a flexible and customized pipeline, this study has showed how to process large metagenomic data sets with limited resources. This result not only would help to uncover new information from environmental samples, but also, could be applicable to any other metagenomic studies across various disciplines.
... Metagenomic reads were assembled with IDBA_UD [80] with default parameters. Metagenomic binning was then performed using tetranucleotide frequency ESOMs [79] and the ggkbase manual binning platform. ...
Preprint
Full-text available
The Chloroflexi superphylum have been investigated primarily from the perspective of reductive dehalogenation of toxic compounds, anaerobic photosynthesis and wastewater treatment, but remain relatively little studied compared to their close relatives within the larger Terrabacteria group, including Cyanobacteria, Actinobacteria, and Firmicutes. Here, we conducted a detailed phylogenetic analysis of the phylum Chloroflexota, the phylogenetically proximal candidate phylum Dormibacteraeota, and a newly defined sibling phylum proposed in the current study, Eulabeiota. These groups routinely root together in phylogenomic analyses, and constitute the Chloroflexi supergroup. Chemoautotrophy is widespread in Chloroflexi. Two Form I Rubisco ancestral subtypes that both lack the small subunit are prevalent in ca. Eulabeiota and Chloroflexota, suggesting that the predominant modern pathway for CO2 fixation evolved in these groups. The single subunit Form I Rubiscos are inferred to have evolved prior to oxygenation of the Earth's atmosphere and now predominantly occur in anaerobes. Prevalent in both Chloroflexota and ca. Eulabeiota are capacities related to aerobic oxidation of gases, especially CO and H2. In fact, aerobic and anaerobic CO dehydrogenases are widespread throughout every class-level lineage, whereas traits such as denitrification and reductive dehalogenation are heterogeneously distributed across the supergroup. Interestingly, some Chloroflexota have a novel clade of group 3 NiFe hydrogenases that is phylogenetically distinct from previously reported groups. Overall, the analyses underline the very high level of metabolic diversity in the Chloroflexi supergroup, suggesting the ancestral metabolic platform for this group enabled highly varied adaptation to ecosystems that appeared in the aerobic world.
... In such cases, each contig can be represented by the M dimensional coverage vector only. Furthermore, different organisms usually have different tetra-mer composition profiles [43,44]. Therefore, the feature matrix . ...
Preprint
Full-text available
Binning is an essential procedure during metagenomic data analysis. However, the available individual binning methods usually do not simultaneously fully use different features or biological information. Furthermore, it is challenging to integrate multiple binning results efficiently and effectively. Therefore, we developed an ensemble binner, MetaBinner, which generates component results with multiple types of features and utilizes single-copy gene (SCG) information for k-means initialization. It then utilizes a two-step ensemble strategy based on SCGs to integrate the component results. Extensive experimental results over three large-scale simulated datasets and one real-world dataset demonstrate that MetaBinner outperforms other state-of-the-art individual binners and ensemble binners. MetaBinner is freely available at https://github.com/ziyewang/MetaBinner .
... As is the case with population metagenomics, shotgun metagenomics is complicated by the fact that it is very dif cult to determine what sequences belong to which member, particularly in cases of very high relatedness. To begin to address this problem, several "binning" algorithms have been developed to sort sequences based on GC content, read coverage, tetranucleotide frequency, as well as combinations of these (Alneberg et al. 2014;Dick et al. 2009;Wu et al. 2016;Kang et al. 2015). Independently, these tools give varying results; however, DAS Tool has recently been developed that can combine any number of different binning tools to provide consensus bins (Sieber et al. 2018). ...
Chapter
This chapter focuses on “omics” disciplines used in molecular entomology, in the study of triatomines, including the use of genomics, transcriptomics, metagenomics, and metabolomics. We present an initial introduction to the different methodologies including the uptake of these methods in entomology. The challenges associated with the analysis of these big datasets are discussed along with a number of commonly used tools. Subsequently, a summary of studies published to date is presented followed by a perspective for future research utilizing these technologies to answer more complex biological questions, specifically addressing triatomine biology.
... First, scaffolds were binned on the basis of % GC content, differential coverage abundance patterns across all samples using Abawaca1, and taxonomic affiliation. Scaffolds that did not associate with any cluster using this method were binned based on tetranucleotide frequency using Emergent Self-Organizing Maps (ESOM; Dick et al., 2009). All genomic bins were manually inspected within ggKbase. ...
Article
Full-text available
DPANN are small-celled archaea that are generally predicted to be symbionts, and in some cases are known episymbionts of other archaea. As the monophyly of the DPANN remains uncertain, we hypothesized that proteome content could reveal relationships among DPANN lineages, constrain genetic overlap with bacteria, and illustrate how organisms with hybrid bacterial and archaeal protein sets might function. We tested this hypothesis using protein family content that was defined in part using 3,197 genomes including 569 newly reconstructed genomes. Protein family content clearly separates the final set of 390 DPANN genomes from other archaea, paralleling the separation of Candidate Phyla Radiation (CPR) bacteria from all other bacteria. This separation is partly driven by hypothetical proteins, some of which may be symbiosis-related. Pacearchaeota with the most limited predicted metabolic capacities have Form II/III and III-like Rubisco, suggesting metabolisms based on scavenged nucleotides. Intriguingly, the Pacearchaeota and Woesearchaeota with the smallest genomes also tend to encode large extracellular murein-like lytic transglycosylase domain proteins that may bind and degrade components of bacterial cell walls, indicating that some might be episymbionts of bacteria. The pathway for biosynthesis of bacterial isoprenoids is widespread in Woesearchaeota genomes and is encoded in proximity to genes involved in bacterial fatty acids synthesis. Surprisingly, in some DPANN genomes we identified a pathway for synthesis of queuosine, an unusual nucleotide in tRNAs of bacteria. Other bacterial systems are predicted to be involved in protein refolding. For example, many DPANN have the complete bacterial DnaK-DnaJ-GrpE system and many Woesearchaeota and Pacearchaeota possess bacterial group I chaperones. Thus, many DPANN appear to have mechanisms to ensure efficient protein folding of both archaeal and laterally acquired bacterial proteins.
... . 4B and 4C have been used in various metagenome studies(Hayash et al., 2005;Uchiyama et al., 2005;Abe et al., 2006b;Uehara et al., 2011), including the detection of diverse pathogenic virus sequences from tick-derived metagenome sequences(Nakao et al., 2013).Dick et al. (2009) developed a slightly different type of SOM "ESOM (emergent SOM)" and obtained clear phylotype-specific classification of metagenomic sequences by analyzing several acidophilic biofilms in the Richmond mine. ...
Preprint
Full-text available
In genetics and related fields, huge amounts of data, such as genome sequences, are accumulating, and the use of artificial intelligence (AI) suitable for big data analysis has become increasingly important. Unsupervised AI that can reveal novel knowledge from big data without prior knowledge or particular models is highly desirable for analyses of genome sequences, particularly for obtaining unexpected insights. We have developed a batch-learning self-organizing map (BLSOM) for oligonucleotide compositions that can reveal various novel genome characteristics. Here, we explain the data mining by the BLSOM: unsupervised and explainable AI. As a specific target, we first selected SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) because a large number of the viral genome sequences have been accumulated via worldwide efforts. We analyzed more than 0.6 million sequences collected primarily in the first year of the pandemic. BLSOMs for short oligonucleotides (e.g., 4~6-mers) allowed separation into known clades, but longer oligonucleotides further increased the separation ability and revealed subgrouping within known clades. In the case of 15-mers, there is mostly one copy in the genome; thus, 15-mers appeared after the epidemic start could be connected to mutations. Because BLSOM is an explainable AI, BLSOM for 15-mers revealed the mutations that contributed to separation into known clades and their subgroups. After introducing the detailed methodological strategies, we explained BLSOMs for various topics. The tetranucleotide BLSOM for over 5 million 5-kb fragment sequences derived from almost all microorganisms currently available and its use in metagenome studies. We also explained BLSOMs for various eukaryotes, such as fishes, frogs and Drosophila species, and found a high separation ability among closely related species. When analyzing the human genome, we found evident enrichments in transcription factor-binding sequences (TFBSs) in centromeric and pericentromeric heterochromatin regions. The tDNAs (tRNA genes) were separated by the corresponding amino acid.
... The first objective of any genome-resolve approach is to assemble the representative members of a given community into MAGs. This process is termed "binning", in which short reads are de novo assembled into longer contigs, with groups of contigs hypothesized to arise from the same genomic context subsequently clustered based on sequence composition, differential coverage across time and/or space, or a combination of both parameters ( Figure 3) (Albertsen et al., 2013;Alneberg et al., 2014;Dick et al., 2009;Sharon et al., 2013). These groups of "binned" contigs can be considered as working models of whole genomes, and are also referred to as population genomes, as the phylogenetic composition of assembled bins can be operationally defined by the minimum sequence identity of reads mapping back to assembled contigs (Jain et al., 2018;Olm et al., 2020). ...
Preprint
Full-text available
Advances in high-throughput sequencing technologies and bioinformatics approaches over almost the last three decades have substantially increased our ability to explore microorganisms and their functions-including those that have yet to be cultivated in pure isolation. Genome-resolved metagenomic approaches have enabled linking powerful functional predictions to specific taxonomical groups with increasing fidelity. Additionally, whole community gene expression surveys and metabolite profiling have permitted direct surveys of community-scale functions in specific environmental settings. These advances have allowed for a shift in microbiome science away from descriptive studies and towards mechanistic and predictive frameworks for designing and harnessing microbial communities for desired beneficial outcomes. Here, we review how modern genome-resolved metagenomic approaches have been applied to a variety of water engineering applications from lab-scale bioreactors to full-scale systems. We describe integrated omics analysis across engineered water systems and the foundations for pairing these insights with modeling approaches. Lastly, we summarize emerging omics-based technologies that we believe will be powerful tools for water engineering applications. Overall, we provide a framework for microbial ecologists specializing in water engineering to apply cutting-edge omics approaches to their research questions to achieve novel functional insights. Successful adoption of predictive frameworks in engineered water systems could enable more economically and environmentally sustainable bioprocesses as demand for water and energy resources increases.
... For each sample only scaffolds larger than 2500 bp were binned using MetaBAT (v.2.12.1) with default parameters, considering both tetranucleotide frequencies (TNF) and scaffold coverage information. The scaffolds from the obtained bins and the unbinned scaffolds were visualized using ESOM with a minimum length of 2500 bp and maximum length of 5000 bp as previously described 62 and the bins were modified by removing any out-of-range scaffolds (indicated by sequence points) or adding any unbinned scaffolds using ESOM related scripts 37 . MAGs from Tibet hot springs with scaffolds ≥1000 bp were uploaded to ggKbase (http://ggkbase.berkeley. ...
Article
Full-text available
Geothermal environments, such as hot springs and hydrothermal vents, are hotspots for carbon cycling and contain many poorly described microbial taxa. Here, we reconstructed 15 archaeal metagenome-assembled genomes (MAGs) from terrestrial hot spring sediments in China and deep-sea hydrothermal vent sediments in Guaymas Basin, Gulf of California. Phylogenetic analyses of these MAGs indicate that they form a distinct group within the TACK superphylum, and thus we propose their classification as a new phylum, ‘Brockarchaeota’, named after Thomas Brock for his seminal research in hot springs. Based on the MAG sequence information, we infer that some Brockarchaeota are uniquely capable of mediating non-methanogenic anaerobic methylotrophy, via the tetrahydrofolate methyl branch of the Wood-Ljungdahl pathway and reductive glycine pathway. The hydrothermal vent genotypes appear to be obligate fermenters of plant-derived polysaccharides that rely mostly on substrate-level phosphorylation, as they seem to lack most respiratory complexes. In contrast, hot spring lineages have alternate pathways to increase their ATP yield, including anaerobic methylotrophy of methanol and trimethylamine, and potentially use geothermally derived mercury, arsenic, or hydrogen. Their broad distribution and their apparent anaerobic metabolic versatility indicate that Brockarchaeota may occupy previously overlooked roles in anaerobic carbon cycling.
... A 2009 paper [19] by Dick et al describes the use of Self-Organizing Maps [13] (SOMs) for reducing the dimensionality of tetranucleotide genomic signatures belonging to two acidophilic biofilm communities. The unsupervised learning technique SOM uses an artificial neural network to generate a two-dimensional representation of the high dimensional data. ...
Article
Full-text available
Analysis of metagenomic data is not only challenging because they are acquired from a sample in their natural habitats but also because of the high volume and high dimensionality. The fact that no prior lab based cultivation is carried out in metagenomics makes the inference on the presence of numerous microorganisms all the more challenging, accentuating the need for an informative visualization of this data. In a successful visualization, the congruent reads of the sequences should appear in clusters depending on the diversity and taxonomy of the microorganisms in the sequenced sample. The metagenomic data represented by their oligonucleotide frequency vectors is inherently high dimensional and therefore impossible to visualize as is. This raises the need for a dimensionality reduction technique to convert these higher dimensional sequence data into lower dimensional data for visualization purposes. In this process, preservation of the genomic characteristics must be given highest priority. Currently, for dimensionality reduction purposes in metagenomics, Principal Component Analysis (PCA) which is a linear technique and t-distributed Stochastic Neighbor Embedding (t-SNE), a non-linear technique, are widely used. Albeit their wide use, these techniques are not exceptionally suited to the domain of metagenomics with certain shortcomings and weaknesses. Our research explores the possibility of using autoencoders, a deep learning technique, that has the potential to overcome the prevailing impediments of the existing dimensionality reduction techniques eventually leading to richer visualizations.
... MAG quality, including completeness, contamination, and heterogeneity, was estimated using CheckM v1.0.12. 29 To optimize the MAGs, emergent self-organizing maps 30 were used to visualize the bins, and contigs with abnormal coverage or discordant tetranucleotide frequencies were removed manually. Finally, all MAGs were reassembled using SPAdes with the following parameters: -careful -k 21,33,55,77,99,127. ...
Article
Full-text available
The discovery of complete ammonia-oxidizing (comammox) Nitrospira has added an important new process to the microbial nitrogen cycle. While comammox Nitrospira have been detected in various ecosystems, only few studies have achieved their enrichment over other canonical nitrifiers. Here, we obtained a selective enrichment of comammox Nitrospira in a urine-fed membrane bioreactor in less than 200 days. By using 16S rRNA gene amplicon sequencing and quantitative PCR of the functional marker gene amoA , we observed a dominance (up to 30% relative abundance) of comammox Nitrospira over ammonia-oxidizing bacteria and archaea. Furthermore, the complete genomes of three new clade A comammox Nitrospira were recovered by metagenomics. These three strains were divergent from previously reported comammox species according to comparative genome and amoA -based analyses. In addition to the key genes for ammonia and nitrite oxidation, the three recovered genomes contained a complete urea utilization pathway. Our findings suggest that the urea present in the urine media played a significant role in the selective enrichment of these novel comammox Nitrospira , and support the diversity and versatility of their metabolism.
... All the derived MAGs with hgcAB were then proceeded by the script esomWrapper.pl [67] to determine tetra-nucleotide frequencies signatures, followed by generating ESOM with Databionic emergent self-organizing map tools (http://data bionic-esom.sourceforge.net/). The presence of hgcAB fragments in the MAGs were confirmed by checking manually in the ESOM. ...
Article
Full-text available
Microbes transform aqueous mercury (Hg) into methylmercury (MeHg), a potent neurotoxin that accumulates in terrestrial and marine food webs, with potential impacts on human health. This process requires the gene pair hgcAB , which encodes for proteins that actuate Hg methylation, and has been well described for anoxic environments. However, recent studies report potential MeHg formation in suboxic seawater, although the microorganisms involved remain poorly understood. In this study, we conducted large-scale multi-omic analyses to search for putative microbial Hg methylators along defined redox gradients in Saanich Inlet, British Columbia, a model natural ecosystem with previously measured Hg and MeHg concentration profiles. Analysis of gene expression profiles along the redoxcline identified several putative Hg methylating microbial groups, including Calditrichaeota, SAR324 and Marinimicrobia, with the last the most active based on hgc transcription levels. Marinimicrobia hgc genes were identified from multiple publicly available marine metagenomes, consistent with a potential key role in marine Hg methylation. Computational homology modelling predicts that Marinimicrobia HgcAB proteins contain the highly conserved amino acid sites and folding structures required for functional Hg methylation. Furthermore, a number of terminal oxidases from aerobic respiratory chains were associated with several putative novel Hg methylators. Our findings thus reveal potential novel marine Hg-methylating microorganisms with a greater oxygen tolerance and broader habitat range than previously recognized.
Preprint
Full-text available
The ability to differentiate between viable and dead microorganisms in metagenomic samples is crucial for various microbial inferences, ranging from assessing ecosystem functions of environmental microbiomes to inferring the virulence of potential pathogens. While established viability-resolved metagenomic approaches are labor-intensive as well as biased and lacking in sensitivity, we here introduce a new fully computational framework that leverages nanopore sequencing technology to assess microbial viability directly from freely available nanopore signal data. Our approach utilizes deep neural networks to learn features from such raw nanopore signal data that can distinguish DNA from viable and dead microorganisms in a controlled experimental setting. The application of explainable AI tools then allows us to robustly pinpoint the signal patterns in the nanopore raw data that allow the model to make viability predictions at high accuracy. Using the model predictions as well as efficient explainable AI-based rules, we show that our framework can be leveraged in a real-world application to estimate the viability of pathogenic Chlamydia, where traditional culture-based methods suffer from inherently high false negative rates. This application shows that our viability model captures predictive patterns in the nanopore signal that can in principle be utilized to predict viability across taxonomic boundaries and indendent of the killing method used to induce bacterial cell death. While the generalizability of our computational framework needs to be assessed in more detail, we here demonstrate for the first time the potential of analyzing freely available nanopore signal data to infer the viability of microorganisms, with many applications in environmental, veterinary, and clinical settings.
Preprint
Full-text available
Few aerobic hyperthermophiles degrade polysaccharides. Here, we describe the genome-enabled enrichment and optical tweezer-based isolation of an aerobic polysaccharide-degrading hyperthermophile, Fervidibacter sacchari, which was originally ascribed to candidate phylum Fervidibacteria. F. sacchari uses polysaccharides and monosaccharides as sole carbon sources from 65-87.5 °C and expresses 191 carbohydrate-active enzymes (CAZymes) according to RNA-Seq and proteomics, including 30 with unusual glycoside hydrolase (GH)109, 177, or 179 domains. Many CAZymes were also expressed in a proteolytic enrichment culture, and fluorescence in situ hybridization and nanoscale secondary ion mass spectrometry confirmed rapid assimilation of 13C-starch in spring sediments. Purified GHs were optimally active at 80-100 °C on eight different polysaccharides. Finally, we reassign Fervidibacteria as a class within phylum Armatimonadota, along with 18 other species, and trace the evolution of aerobic and anaerobic polysaccharide catabolism within the phylum. This study establishes Fervidibacteria as unique hyperthermophilic polysaccharide-degrading specialists in terrestrial geothermal springs.
Article
Full-text available
Recent discoveries of methyl-coenzyme M reductase-encoding genes (mcr) in uncultured archaea beyond traditional euryarchaeotal methanogens have reshaped our view of methanogenesis. However, whether any of these nontraditional archaea perform methanogenesis remains elusive. Here, we report field and microcosm experiments based on 13C-tracer labeling and genome-resolved metagenomics and metatranscriptomics, revealing that nontraditional archaea are predominant active methane producers in two geothermal springs. Archaeoglobales performed methanogenesis from methanol and may exhibit adaptability in using methylotrophic and hydrogenotrophic pathways based on temperature/substrate availability. A five-year field survey found Candidatus Nezhaarchaeota to be the predominant mcr-containing archaea inhabiting the springs; genomic inference and mcr expression under methanogenic conditions strongly suggested that this lineage mediated hydrogenotrophic methanogenesis in situ. Methanogenesis was temperature-sensitive , with a preference for methylotrophic over hydrogenotrophic pathways when incubation temperatures increased from 65° to 75°C. This study demonstrates an anoxic ecosystem wherein methanogenesis is primarily driven by archaea beyond known methanogens, highlighting diverse nontraditional mcr-containing archaea as previously unrecognized methane sources.
Article
Full-text available
While sponges are valuable sources of bioactive natural products, a majority of these compounds are produced in small quantities by uncultured symbionts, hampering the study and clinical development of these unique compounds. Lasonolide A (LSA), isolated from marine sponge Forcepia sp., is a cytotoxic molecule active at nanomolar concentrations, which causes premature chromosome condensation, blebbing, cell contraction, and loss of cell adhesion, indicating a novel mechanism of action and making it a potential anticancer drug lead.
Article
Full-text available
Cyanobacterial harmful algal blooms (CHABs) threaten freshwater ecosystems globally through the production of toxins. Toxin production by cyanobacterial species and strains during CHABs varies widely over time and space, but the ecological drivers of the succession of toxin-producing species remain unclear.
Article
Full-text available
Interactions between heterotrophic bacteria and phytoplankton influence competition and successions between phytoplankton taxa, thereby influencing ecosystem-wide processes such as carbon cycling and algal bloom development. The cyanobacterium Microcystis forms harmful blooms in freshwaters worldwide and grows in buoyant colonies that harbor other bacteria in their phycospheres.
Article
Full-text available
Host-associated phages of the bacterium Ralstonia identified in snow samples can be used to track microbial dispersal over thousands of kilometers across the Antarctic continent, which functions as an extraterrestrial analogue because of its harsh environmental conditions. Due to the presence of these bacteria carrying genome-integrated prophages on space-related equipment and the potential for dispersal of host-associated phages demonstrated here, our work has implications for planetary protection, a discipline in astrobiology interested in preventing contamination of celestial bodies with alien biomolecules or forms of life.
Article
Full-text available
Earth’s mantle releases 38.7 ± 2.9 Tg/yr CO2 along with other reduced and oxidized gases to the atmosphere shaping microbial metabolism at volcanic sites across the globe, yet little is known about its impact on microbial life under non-thermal conditions. Here, we perform comparative metagenomics coupled to geochemical measurements of deep subsurface fluids from a cold-water geyser driven by mantle degassing. Key organisms belonging to uncultivated Candidatus Altiarchaeum show a global biogeographic pattern and site-specific adaptations shaped by gene loss and inter-kingdom horizontal gene transfer. Comparison of the geyser community to 16 other publicly available deep subsurface sites demonstrate a conservation of chemolithoautotrophic metabolism across sites. In silico replication measures suggest a linear relationship of bacterial replication with ecosystems depth with the exception of impacted sites, which show near surface characteristics. Our results suggest that subsurface ecosystems affected by geological degassing are hotspots for microbial life in the deep biosphere.
Preprint
Full-text available
Extreme Antarctic conditions provide one of the closest analogues of extraterrestrial environments. Since air and snow samples especially from polar regions yield DNA amounts in the lower picogram range, binning of prokaryotic genomes is challenging and renders studying the dispersal of biological entities across these environments difficult. Here, we hypothesized that dispersal of host-associated bacteriophages (adsorbed, replicating or prophages) across the Antarctic continent can be tracked via their genetic signatures and benefits our understanding of virus and host dispersal across long distances. Phage genome fragments (PGFs) reconstructed from surface snow metagenomes of three Antarctic stations were assigned to four host genomes, mainly Betaproteobacteria including Ralstonia spp. Betaproteobacteria of this genus have been found in Antarctic snow as well as on space-related equipment. We reconstructed the complete genome of a temperate phage with near-complete alignment to a prophage in the reference genome of Ralstonia pickettii 12D. PGFs from different stations were related to each other at the genus level and matched similar hosts. Metagenomic read mapping and nucleotide polymorphism analysis revealed a wide dispersal of highly identical PGFs, 13 of which appeared in seawater from the Western Antarctic Peninsula with up to 5538 km to the snow sampling stations. Our results suggest that host-associated phages, especially of Ralstonia sp. disperse over long distances despite harsh conditions of the Antarctic continent. Due to the additional identification of 14 phages associated with two R. pickettii draft genomes isolated from space equipment, we conclude implications for the spread of biological contaminants in extraterrestrial settings. Importance Host-associated phages of the bacterium Ralstonia identified in snow samples can be used to track microbial dispersal over thousands of kilometers across the Antarctic continent, which functions as an extraterrestrial analogue because of its harsh environmental conditions. Due to presence of this bacterial strain including genome-integrated prophages on space-related equipment, and the here demonstrated potential for dispersal of host-associated phages, our work has implications for Planetary Protection, a discipline in Astrobiology interested in preventing contamination of celestial bodies with alien biomolecules or forms of life.
Article
Full-text available
Information theoretic approaches are ubiquitous and effective in a wide variety of bioinformatics applications. In comparative genomics, alignment-free methods, based on short DNA words, or k -mers, are particularly powerful. We evaluated the utility of varying k -mer lengths for genome comparisons by analyzing their sequence space coverage of 5805 genomes in the KEGG GENOME database. In subsequent analyses on four k-mer lengths spanning the relevant range (11, 21, 31, 41), hierarchical clustering of 1634 genus-level representative genomes using pairwise 21- and 31-mer Jaccard similarities best recapitulated a phylogenetic/taxonomic tree of life with clear boundaries for superkingdom domains and high subtree similarity for named taxons at lower levels (family through phylum). By analyzing ~14.2M prokaryotic genome comparisons by their lowest-common-ancestor taxon levels, we detected many potential misclassification errors in a curated database, further demonstrating the need for wide-scale adoption of quantitative taxonomic classifications based on whole-genome similarity.
Preprint
Full-text available
The highly diverse Cand . Patescibacteria are predicted to have minimal biosynthetic and metabolic pathways, which hinders understanding of how their populations differentiate to environmental drivers or host organisms. Their metabolic traits to cope with oxidative stress are largely unknown. Here, we utilized genome-resolved metagenomics to investigate the adaptive genome repertoire of Patescibacteria in oxic and anoxic groundwaters, and to infer putative host ranges. Within six groundwater wells, Cand . Patescibacteria was the most dominant (up to 79%) super-phylum across 32 metagenomes obtained from sequential 0.2 and 0.1 µm filtration. Of the reconstructed 1275 metagenome-assembled genomes (MAGs), 291 high-quality MAGs were classified as Cand . Patescibacteria. Cand . Paceibacteria and Cand . Microgenomates were enriched exclusively in the 0.1 µm fractions, whereas candidate division ABY1 and Cand . Gracilibacteria were enriched in the 0.2 µm fractions. Patescibacteria enriched in the smaller 0.1 µm filter fractions had 22% smaller genomes, 13.4% lower replication measures, higher fraction of rod-shape determining proteins, and genomic features suggesting type IV pili mediated cell-cell attachments. Near-surface wells harbored Patescibacteria with higher replication rates than anoxic downstream wells characterized by longer water residence time. Except prevalence of superoxide dismutase genes in Patescibacteria MAGs enriched in oxic groundwaters (83%), no major metabolic or phylogenetic differences were observed based on oxygen concentrations. The most abundant Patescibacteria MAG in oxic groundwater encoded a nitrate transporter, nitrite reductase, and F-type ATPase, suggesting an alternative energy conservation mechanism. Patescibacteria consistently co-occurred with one another or with members of phyla Nanoarchaeota, Bacteroidota, Nitrospirota, and Omnitrophota. However, only 8% of MAGs showed highly significant one-to-one association, mostly with Omnitrophota. Genes coding for motility and transport functions in certain Patescibacteria were highly similar to genes from other phyla (Omnitrophota, Proteobacteria and Nanoarchaeota). Other than genes to cope with oxidative stress, we found little genomic evidence for niche adaptation of Patescibacteria to oxic or anoxic groundwaters. Given that we could detect specific host preference only for a few MAGs, we propose that the majority of Patescibacteria can attach to multiple hosts just long enough to loot or exchange supplies with an economic lifestyle of little preference for geochemical conditions.
Article
Full-text available
“Candidatus Aenigmarchaeota” (“Ca. Aenigmarchaeota”) represents one of the earliest proposed evolutionary branches within the Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanoarchaeota, and Nanohaloarchaeota (DPANN) superphylum. However, their ecological roles and potential host-symbiont interactions are still poorly understood. Here, eight metagenome-assembled genomes (MAGs) were reconstructed from hot spring ecosystems, and further in-depth comparative and evolutionary genomic analyses were conducted on these MAGs and other genomes downloaded from public databases. Although with limited metabolic capacities, we reported that “Ca. Aenigmarchaeota” in thermal environments harbor more genes related to carbohydrate metabolism than “Ca. Aenigmarchaeota” in non- thermal environments. Evolutionary analyses suggested that members from the Thaumarchaeota, Aigarchaeota, Crenarchaeota, and Korarchaeota (TACK) superphylum and Euryarchaeota contribute substantially to the niche expansion of “Ca. Aenigmarchaeota” via horizontal gene transfer (HGT), especially genes related to virus defense and stress responses. Based on co-occurrence network results and recent genetic exchanges among community members, we conjectured that “Ca. Aenigmarchaeota” may be symbionts associated with one MAG affiliated with the genus Pyrobaculum, though host specificity might be wide and variable across different “Ca. Aenigmarchaeota” organisms. This study provides significant insight into possible DPANN-host interactions and ecological roles of “Ca. Aenigmarchaeota.”
Article
Full-text available
Candidatus Bathyarchaeia" is a phylogenetically diverse and widely distributed lineage often in high abundance in anoxic submarine sediments; however, their evolution and ecological roles in terrestrial geothermal habitats are poorly understood. In the present study, 35 Ca. Bathyarchaeia metagenome-assembled genomes (MAGs) were recovered from hot spring sediments in Tibet and Yunnan, China. Phylogenetic analysis revealed all MAGs of Ca. Bathyarchaeia can be classified into 7 orders and 15 families. Among them, 4 families have been first discovered in the present study, significantly expanding the known diversity of Ca. Bathyarchaeia. Comparative genomics demonstrated Ca. Bathyarchaeia MAGs from thermal habitats to encode a large variety of genes related to carbohydrate degradation, which are likely a metabolic adaptation of these organisms to a lifestyle at high temperatures. At least two families are potential methanogens/alkanotrophs, indicating a potential for the catalysis of short-chain hydrocarbons. Three MAGs from Family-7.3 are identified as alkanotrophs due to the detection of an Mcr complex. Family-2 contains the largest number of genes relevant to alkyl-CoM transformation, indicating the potential for methylotrophic methanogenesis, although their evolutionary history suggests the ancestor of Ca. Bathyarchaeia was unable to metabolize alkanes. Subsequent lineages have acquired the ability via horizontal gene transfer. Overall, our study significantly expands our knowledge and understanding of the metabolic capabilities, habitat adaptations, and evolution of Ca. Bathyarchaeia in thermal environments. IMPORTANCE Ca. Bathyarchaeia MAGs from terrestrial hot spring habitats are poorly revealed, though they have been studied extensively in marine ecosystems. In this study, we uncovered the metabolic capabilities and ecological role of Ca. Bathyarchaeia in hot springs and give a comprehensive comparative analysis between thermal and nonthermal habitats to reveal the thermal adaptability of Ca. Bathyarchaeia. Also, we attempt to determine the evolutionary history of methane/alkane metabolism in Ca. Bathyarchaeia, since it appears to be the first archaea beyond Euryarchaeota which contains the mcrABG genes. The reclassifica-tion of Ca. Bathyarchaeia and significant genomic differences among different lin-eages largely expand our knowledge on these cosmopolitan archaea, which will be beneficial in guiding the future studies.
Article
Mark Achtman introduced the term “genetically monomorphic bacteria” (GM bacteria) for some human and plant pathogens. They displayed a great uniformity in terms of their “genetic” properties. This “uniformity” poses a challenge to microbiologists. To address these problems, we used CodonW and IslandViewer 3 as analytical tools and took Escherichia coli, Salmonella, and Shigella strains as a model organisms. We hypothesized that GM bacterium contains a common molecular signature among them. We have found a significant correlation regarding the number of protein‐coding genes, predicted highly expressed genes, and the highest length of gene in this regard. On the other hand, the correspondence analysis of pathogenicity‐related genes identified by IslandViewer 3 displayed a somewhat unique pattern in GM bacteria. The probable pathogenic genes are clustered into two separate groups, which is a hallmark of some pattern. Similar genes of non‐monomorphic pathogenic strain clustered almost similarly, but the clusters are joined together, they are not completely separated. These features, in our considered view, may be considered as codon usages signatures of these bacteria, and E. coli in particular.
ResearchGate has not been able to resolve any references for this publication.