A flowchart describing the general workflow of the software as it relates to metatranscriptomes (METs) and metagenomes (MAGs).

A flowchart describing the general workflow of the software as it relates to metatranscriptomes (METs) and metagenomes (MAGs).

Contexts in source publication

Context 1
... ( Krinos et al., 2020) (Figure 1) is an open-source Python-based package designed to simplify taxonomic identification of marine eukaryotes in meta-omic samples. The package is written in Python, but may be installed as a Python module via PyPI, as a standalone tool via conda, or through download of the EUKulele tarball through GitHub. ...
Context 2
... package is written in Python, but may be installed as a Python module via PyPI, as a standalone tool via conda, or through download of the EUKulele tarball through GitHub. User-provided metatranscriptomic or metagenomic samples are aligned against a database of the user's choosing, using a user-chosen aligner (BLAST (Kent, 2002) or DIAMOND (Buchfink et al., 2015)). The "blastx" utility is used by default if metatranscriptomic samples are only provided in nucleotide format, while the "blastp" utility is used for samples available as translated protein sequences. ...

Similar publications

Article
Full-text available
Transcription is the fundamental process of gene expression, which in eukaryotes occurs within the complex physicochemical environment of the nucleus. Decades of research have provided extreme detail in the molecular and functional mechanisms of transcription, but the spatial and genomic organization of transcription remains mysterious. Recent disc...
Article
Full-text available
Plain Language Summary During the Mesoproterozoic Era (1.6–1.0 Ga), ocean anoxia and nutrient limitation have been suggested to explain why complex multicellular eukaryotes did not expand until later in the Neoproterozoic Era. Among several kilometers thick marine carbonates that dominated the Early Mesoproterozoic Era (∼1.6–1.4 Ga) in North China,...
Cover Page
Full-text available
Yogsothoth carteri (Yogsothothidae, Panacanthocystida, Centroplasthelida, Haptista, Diaphoretickes, Eukaryota) ### License CC BY-NC-ND 4.0 ### Yegor Shɨshkin-Skarð

Citations

... For functional annotation eggnog within Eggnog-mapper (v.2.1.9) (Cantalapiedra et al., 2021) was chosen while EUKulele (v.2.0.3) (Krinos et al., 2021) has been used to annotate taxonomical classification. For prokaryote annotation, GTDB was used within EUkulele while a second annotation with PhyloDB (default) was used to specify eukaryotic transcripts. ...
Article
Full-text available
The world’s oceans are challenged by climate change linked warming with typically highly populated coastal areas being particularly susceptible to these effects. Many studies of climate change on the marine environment use large, short-term temperature manipulations that neglect factors such as long-term adaptation and seasonal cycles. In this study, a Baltic Sea ‘heated’ bay influenced by thermal discharge since the 1970s from a nuclear reactor (in relation to an unaffected nearby ‘control’ bay) was used to investigate how elevated temperature impacts surface water microbial communities and activities. 16S rRNA gene amplicon based microbial diversity and population structure showed no difference in alpha diversity in surface water microbial communities, while the beta diversity showed a dissimilarity between the bays. Amplicon sequencing variant relative abundances between the bays showed statistically higher values for, e.g., Ilumatobacteraceae and Burkholderiaceae in the heated and control bays, respectively. RNA transcript-derived activities followed a similar pattern in alpha and beta diversity with no effect on Shannon’s H diversity but a significant difference in the beta diversity between the bays. The RNA data further showed more elevated transcript counts assigned to stress related genes in the heated bay that included heat shock protein genes dnaKJ, the co-chaperonin groS, and the nucleotide exchange factor heat shock protein grpE. The RNA data also showed elevated oxidative phosphorylation transcripts in the heated (e.g., atpHG) compared to control (e.g., atpAEFB) bay. Furthermore, genes related to photosynthesis had generally higher transcript numbers in the control bay, such as photosystem I (psaAC) and II genes (psbABCEH). These increased stress gene responses in the heated bay will likely have additional cascading effects on marine carbon cycling and ecosystem services.
... Using the extracted BUSCO proteins for each of the MAGs and a number of reference genomes curated by the JGI [68], we used trimal [69] For taxonomic annotation of the MAGs, we compared three approaches. First, we used the suggested NCBI database and utility provided by EukCC ( [65]; 0.2) in concert with completeness estimates Second, we used EUKulele [72] with a custom database comprised of the MMETSP [67,73], MarRef [74], and a selection of reference genomes of eukaryotes from the JGI's Genome Portal [68] to annotate the proteins extracted from the contigs binned into each MAG. ...
Preprint
Full-text available
Protists, single-celled eukaryotic organisms, are critical to food web ecology, contributing to primary productivity and connecting small bacteria and archaea to higher trophic levels. Lake Mendota is a large, eutrophic natural lake that is a Long-Term Ecological Research site and among the world's best-studied freshwater systems. Metagenomic samples have been collected and shotgun sequenced from Lake Mendota for the last twenty years. Here, we analyze this comprehensive time series to infer changes to the structure and function of the protistan community, and to hypothesize about their interactions with bacteria. Based on small subunit rRNA genes extracted from the metagenomes and metagenome-assembled genomes of microeukaryotes, we identify shifts in the eukaryotic phytoplankton community over time, which we predict to be a consequence of reduced zooplankton grazing pressures after the invasion of a invasive predator (the spiny water flea) to the lake. The metagenomic data also reveal the presence of the spiny water flea and the zebra mussel, a second invasive species to Lake Mendota, prior to their visual identification during routine monitoring. Further, we use species co-occurrence and co-abundance analysis to connect the protistan community with bacterial taxa. Correlation analysis suggests that protists and bacteria may interact or respond similarly to environmental conditions. Cryptophytes declined in the second decade of the timeseries, while many alveolate groups (e.g. ciliates and dinoflagellates) and diatoms increased in abundance, changes that have implications for food web efficiency in Lake Mendota. We demonstrate that metagenomic sequence-based community analysis can complement existing efforts to monitor protists in Lake Mendota based on microscopy-based count surveys. We observed patterns of seasonal abundance in microeukaryotes in Lake Mendota that corroborated expectations from other systems, including high abundance of cryptophytes in winter and diatoms in fall and spring, but with much higher resolution than previous surveys. Our study identified long-term changes in the abundance of eukaryotic microbes, and provided context for the known establishment of an invasive species that catalyzes a trophic cascade involving protists. Our findings are important for decoding potential long-term consequences of human interventions, including invasive species introduction.
... EukProt 108 , and PhyloDB (allenlab.ucsd.edu/data). The taxonomic annotations were performed using EUKulele with a last common ancestor approach (LCA) 107 . The LCA conservatively annotated ORFs only when bitscores associated within the top 3% of hits were in taxonomic agreement. ...
Preprint
Full-text available
Protists (microeukaryotes) are key contributors to marine carbon cycling, influencing the transfer of energy to higher trophic levels and the vertical movement of carbon to the ocean interior. Their physiology, ecology, and interactions with the chemical environment are still poorly understood in offshore ecosystems, and especially in the deep ocean. Using the Autonomous Underwater Vehicle (AUV) Clio, the microbial community along a 1,050 km transect in the western North Atlantic Ocean was surveyed at 10-200 m vertical depth increments to capture metabolic microeukaryote signatures spanning a gradient of oligotrophic, continental margin, and productive coastal ecosystems. Plankton biomass was collected along the surface of this transect and across depth features, and taxonomy and metabolic function were examined using a paired metatranscriptomic and metaproteomic approach. A shift in the microeukaryote community composition was observed from the euphotic zone through the mesopelagic and into the bathypelagic ocean. A diverse surface assemblage consisting of haptophytes, stramenopiles, dinoflagellates and ciliates was represented in both the transcript and protein fractions, with foraminifera, radiolaria, picozoa, and discoba proteins enriched at >200 m depth, and fungal proteins emerging in waters >3,000 m depth. In the broad microeukaryote community, nitrogen stress biomarkers were found in productive coastal sites, with phosphorus stress biomarkers in offshore waters where Saharan dust input is thought to supply iron and nitrogen. This multi-omics dataset broadens our understanding of how microeukaryotic taxa and their functional processes are structured along environmental gradients of temperature, light, macronutrients, and trace metals.
... For the metatranscriptome analyses, the pre-filtering of RNA data for quality control was performed by JGI and then the cleaned mRNA reads were coassembled with Megahit [75] and gene calling was performed with Prodigal [76]. The predicted genes were assigned functions with eggNOGmapper [77] against the eggNOG database [78] and taxonomy with EUKulele [79]. BBMap was used to map the reads from each sample back to the assembly and featureCounts [80] was used to summarize the counts for each gene. ...
Article
Full-text available
Besides long-term average temperature increases, climate change is projected to result in a higher frequency of marine heatwaves. Coastal zones are some of the most productive and vulnerable ecosystems, with many stretches already under anthropogenic pressure. Microorganisms in coastal areas are central to marine energy and nutrient cycling and therefore, it is important to understand how climate change will alter these ecosystems. Using a long-term heated bay (warmed for 50 years) in comparison with an unaffected adjacent control bay and an experimental short-term thermal (9 days at 6–35 °C) incubation experiment, this study provides new insights into how coastal benthic water and surface sediment bacterial communities respond to temperature change. Benthic bacterial communities in the two bays reacted differently to temperature increases with productivity in the heated bay having a broader thermal tolerance compared with that in the control bay. Furthermore, the transcriptional analysis showed that the heated bay benthic bacteria had higher transcript numbers related to energy metabolism and stress compared to the control bay, while short-term elevated temperatures in the control bay incubation experiment induced a transcript response resembling that observed in the heated bay field conditions. In contrast, a reciprocal response was not observed for the heated bay community RNA transcripts exposed to lower temperatures indicating a potential tipping point in community response may have been reached. In summary, long-term warming modulates the performance, productivity, and resilience of bacterial communities in response to warming.
... The collation of laboratory transcriptomic data to a single location and format by the Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP) [3,20] began as a repository effort and became one of the most important databases enabling the identification of marine microbial eukaryotes from metatranscriptomic sequences (e.g. [21][22][23][24]). Substantial discoveries have been made using sequenced metatranscriptomes, including novel explanations for persistent gaps in ecological understanding, such as coexistence within a seemingly narrow niche [23], discovering new genes or putative organisms from previously unknown sequences [19], developing a molecular understanding of the basis of coral disease [25], and decoding the complexities of deep-sea hydrothermal vent microbial communities [26]. ...
... While eukrhythmic is primarily designed for assembly, the user may optionally elect to annotate the assembly output as part of the pipeline. Presently, the pipeline provides annotation tools including phylogenetic assessment using EUKulele [21], and basic functional assessment using the companion tool eggNOG-mapper [53]. To characterize KEGG annotations [54], we grouped results by Kegg Orthology ID (KO). ...
... As performed within the eukrhythmic pipeline, we generated taxonomic annotations for both the designer metatranscriptomes and the reassembled products from eukrhythmic with the EUKulele tool (version 2.0.3) using the default reference database of contigs from all MMETSP transcriptomes and the MarRef database [3,20,21,66]. We report differences in the number of annotated species and genera from EUKulele in the reassembled products as compared to the sequences which were prescribed to be included in the designer metatranscriptome using the jEUKebox pipeline. ...
Article
Full-text available
Background Diverse communities of microbial eukaryotes in the global ocean provide a variety of essential ecosystem services, from primary production and carbon flow through trophic transfer to cooperation via symbioses. Increasingly, these communities are being understood through the lens of omics tools, which enable high-throughput processing of diverse communities. Metatranscriptomics offers an understanding of near real-time gene expression in microbial eukaryotic communities, providing a window into community metabolic activity. Results Here we present a workflow for eukaryotic metatranscriptome assembly, and validate the ability of the pipeline to recapitulate real and manufactured eukaryotic community-level expression data. We also include an open-source tool for simulating environmental metatranscriptomes for testing and validation purposes. We reanalyze previously published metatranscriptomic datasets using our metatranscriptome analysis approach. Conclusion We determined that a multi-assembler approach improves eukaryotic metatranscriptome assembly based on recapitulated taxonomic and functional annotations from an in-silico mock community. The systematic validation of metatranscriptome assembly and annotation methods provided here is a necessary step to assess the fidelity of our community composition measurements and functional content assignments from eukaryotic metatranscriptomes.
... Another limiting factor for both microeukaryotic and viral metagenomics is the lack of taxonomy classification with the same rigor as GTDB-Tk. Currently, the only peer-reviewed tool designed for eukaryotic taxonomy classification is EUKulele [139] but there were several barriers we experienced when attempting to incorporate EUKulele. First, many of the existing EUKulele databases are targeted towards marine ecosystems, thus, not practical for alternative environments (e.g., human microbiomes, soil, built-environments), contain eukaryotes which would not be expected to be binned in a metagenome, and contain prokaryotic genomes that increase computational resource demand. ...
Article
Full-text available
Background With the advent of metagenomics, the importance of microorganisms and how their interactions are relevant to ecosystem resilience, sustainability, and human health has become evident. Cataloging and preserving biodiversity is paramount not only for the Earth’s natural systems but also for discovering solutions to challenges that we face as a growing civilization. Metagenomics pertains to the in silico study of all microorganisms within an ecological community in situ, however, many software suites recover only prokaryotes and have limited to no support for viruses and eukaryotes. Results In this study, we introduce the Viral Eukaryotic Bacterial Archaeal (VEBA) open-source software suite developed to recover genomes from all domains. To our knowledge, VEBA is the first end-to-end metagenomics suite that can directly recover, quality assess, and classify prokaryotic, eukaryotic, and viral genomes from metagenomes. VEBA implements a novel iterative binning procedure and hybrid sample-specific/multi-sample framework that yields more genomes than any existing methodology alone. VEBA includes a consensus microeukaryotic database containing proteins from existing databases to optimize microeukaryotic gene modeling and taxonomic classification. VEBA also provides a unique clustering-based dereplication strategy allowing for sample-specific genomes and genes to be directly compared across non-overlapping biological samples. Finally, VEBA is the only pipeline that automates the detection of candidate phyla radiation bacteria and implements the appropriate genome quality assessments. VEBA’s capabilities are demonstrated by reanalyzing 3 existing public datasets which recovered a total of 948 MAGs (458 prokaryotic, 8 eukaryotic, and 482 viral) including several uncharacterized organisms and organisms with no public genome representatives. Conclusions The VEBA software suite allows for the in silico recovery of microorganisms from all domains of life by integrating cutting edge algorithms in novel ways. VEBA fully integrates both end-to-end and task-specific metagenomic analysis in a modular architecture that minimizes dependencies and maximizes productivity. The contributions of VEBA to the metagenomics community includes seamless end-to-end metagenomics analysis but also provides users with the flexibility to perform specific analytical tasks. VEBA allows for the automation of several metagenomics steps and shows that new information can be recovered from existing datasets.
... It is recommended to include marine bacteria, metazoa, and common contaminants in the databases for purposes of identifying these community members, if present, and avoiding misannotation. For example, the default EUKulele database includes the MarRef database of marine prokaryotic genomes (Klemetsen et al., 2018;Krinos et al., 2021). In addition, using databases containing reference transcriptomes that have been screened for multi-species presence and decontaminated are preferred (Van Vlierberghe et al., 2021). ...
Article
Full-text available
Microeukaryotes (protists) serve fundamental roles in the marine environment as contributors to biogeochemical nutrient cycling and ecosystem function. Their activities can be inferred through metatranscriptomic investigations, which provide a detailed view into cellular processes, chemical-biological interactions in the environment, and ecological relationships among taxonomic groups. Established workflows have been individually put forth describing biomass collection at sea, laboratory RNA extraction protocols, and bioinformatic processing and computational approaches. Here, we present a compilation of current practices and lessons learned in carrying out metatranscriptomics of marine pelagic protistan communities, highlighting effective strategies and tools used by practitioners over the past decade. We anticipate that these guidelines will serve as a roadmap for new marine scientists beginning in the realms of molecular biology and/or bioinformatics, and will equip readers with foundational principles needed to delve into protistan metatranscriptomics.
... And yet, our 8 tools for studying the ecology and evolution of eukaryotic microbes are still quite 9 limited, at least in comparison to their prokaryotic neighbors [8]. 10 Several recent developments have greatly advanced our ability to survey the ecology 11 of microbial eukaryotes directly from the environment using metagenomics. Large-scale 12 efforts to augment the sizes of our existing genomic and transcriptomic databases, 13 specifically the Marine Microbial Eukaryote Transcriptome Sequencing Project 14 (MMETSP; [9]), have expanded our ability to use database-dependent approaches for 15 metagenomic analysis for both taxonomic and functional classification (e.g., [10][11][12][13]). ...
... And yet, our 8 tools for studying the ecology and evolution of eukaryotic microbes are still quite 9 limited, at least in comparison to their prokaryotic neighbors [8]. 10 Several recent developments have greatly advanced our ability to survey the ecology 11 of microbial eukaryotes directly from the environment using metagenomics. Large-scale 12 efforts to augment the sizes of our existing genomic and transcriptomic databases, 13 specifically the Marine Microbial Eukaryote Transcriptome Sequencing Project 14 (MMETSP; [9]), have expanded our ability to use database-dependent approaches for 15 metagenomic analysis for both taxonomic and functional classification (e.g., [10][11][12][13]). 16 At the same time, novel approaches for binning and validation have been applied by 17 multiple groups to reconstruct high-quality metagenome-assembled genomes (MAGs) 18 from environmental datasets [14][15][16]. ...
... We used these annotations, 309 and (similar to above) searched for ribosomal proteins using blastp v2.10.1 [62] against 310 a custom blast database of ribosomal proteins of eukaryotic microbes drawn from the 311 Ribosomal Protein Gene Database [63]. We ran EUKulele v1.0.6 to classify these MAGs 312 taxonomically and omitted any organisms identified as Metazoa from downstream 313 analyses [10]. Division-level classifications were taken as the division assigned as most 314 likely by eukulele. ...
Preprint
Full-text available
Microbial eukaryotes are ubiquitous in the environment and play important roles in key ecosystem processes, including accounting for a significant portion of global primary production. Yet, our tools for assessing the functional capabilities of eukaryotic microbes in the environment are quite limited because many microbes have yet to be grown in culture. Maximum growth rate is a fundamental parameter of microbial lifestyle that reveals important information about an organism's functional role in a community. We developed and validated a genomic estimator of maximum growth rate for eukaryotic microbes, enabling the assessment of growth potential for both cultivated and yet-to-be-cultivated organisms. We produced a database of over 700 growth predictions from genomes, transcriptomes, and metagenome-assembled genomes, and found that closely related and/or functionally similar organisms tended to have similar maximal growth rates. By comparing the maximal growth rates of existing culture collections with environmentally-derived genomes we found that, unlike for prokaryotes, culture collections of microbial eukaryotes are only minimally biased in terms of growth potential. We then extended our tool to make community-wide estimates of growth potential from over 500 marine metagenomes, mapping growth potential across the global oceans. We found that prokaryotic and eukaryotic communities have highly correlated growth potentials near the ocean surface, but that this relationship disappears deeper in the water column. This suggests that fast growing eukaryotes and prokaryotes thrive under similar conditions at the ocean surface, but that there is a decoupling of these communities as resources become scarce deeper in the water column.
... In addition, 18S sequencing can be skewed by organisms that have many 18S gene copies (de Vargas et al., 2015). Metagenomic sequencing is rarely undertaken for microeukaryotes due to their large genome sizes but recovering eukaryotic genomes from metagenomes is becoming increasingly feasible with the development of new bioinformatic tools (e.g., Delmont and Eren, 2016;Krinos et al., 2021). The community composition of microeukaryotes in OMZs has been assessed in the Arabian Sea (More et al., 2018), the Black Sea (Wylezich et al., 2018), the Cariaco Basin (Edgcomb et al., 2011a,b;Orsi et al., 2011Orsi et al., , 2012, the Costa Rica Dome (Jing et al., 2015), the ETNP (Duret et al., 2015), the ETSP (Parris et al., 2014;De la Iglesia et al., 2020), Framvaren Fjord (Behnke et al., 2006;Orsi et al., 2012), Saanich Inlet (Orsi et al., 2012;Torres-Beltrán et al., 2018), and Tolo Harbor (Rocke et al., 2016). ...
Article
Full-text available
Oxygen minimum zones (OMZs) have substantial effects on the global ecology and biogeochemical processes of marine microbes. However, the diversity and activity of OMZ microbes and their trophic interactions are only starting to be documented, especially in regard to the potential roles of viruses and protists. OMZs have expanded over the past 60 years and are predicted to expand due to anthropogenic climate change, furthering the need to understand these regions. This review summarizes the current knowledge of OMZ formation, the biotic and abiotic factors involved in OMZ expansion, and the microbial ecology of OMZs, emphasizing the importance of bacteria, archaea, viruses, and protists. We describe the recognized roles of OMZ microbes in carbon, nitrogen, and sulfur cycling, the potential of viruses in altering host metabolisms involved in these cycles, and the control of microbial populations by grazers and viruses. Further, we highlight the microbial community composition and roles of these organisms in oxic and anoxic depths within the water column and how these differences potentially inform how microbial communities will respond to deoxygenation. Additionally, the current literature on the alteration of microbial communities by other key climate change parameters such as temperature and pH are considered regarding how OMZ microbes might respond to these pressures. Finally, we discuss what knowledge gaps are present in understanding OMZ microbial communities and propose directions that will begin to close these gaps.
Article
The microbiome is a complex community of microorganisms, encompassing prokaryotic (bacterial and archaeal), eukaryotic, and viral entities. This microbial ensemble plays a pivotal role in influencing the health and productivity of diverse ecosystems while shaping the web of life. However, many software suites developed to study microbiomes analyze only the prokaryotic community and provide limited to no support for viruses and microeukaryotes. Previously, we introduced the Viral Eukaryotic Bacterial Archaeal (VEBA) open-source software suite to address this critical gap in microbiome research by extending genome-resolved analysis beyond prokaryotes to encompass the understudied realms of eukaryotes and viruses. Here we present VEBA 2.0 with key updates including a comprehensive clustered microeukaryotic protein database, rapid genome/protein-level clustering, bioprospecting, non-coding/organelle gene modeling, genome-resolved taxonomic/pathway profiling, long-read support, and containerization. We demonstrate VEBA’s versatile application through the analysis of diverse case studies including marine water, Siberian permafrost, and white-tailed deer lung tissues with the latter showcasing how to identify integrated viruses. VEBA represents a crucial advancement in microbiome research, offering a powerful and accessible software suite that bridges the gap between genomics and biotechnological solutions.