Viral classification analysis. (A) Percentage of the input sequences in a vConTACT v2 cluster and (B) Percentage of sequences assigned by VFP-class to a genus. The performance of VPF-class was calculated using confidence score (CS) and membership ratio (MR) thresholds of ≥0.2 (Pons et al., 2021). Full-size DOI: 10.7717/peerj.11447/fig-3

Viral classification analysis. (A) Percentage of the input sequences in a vConTACT v2 cluster and (B) Percentage of sequences assigned by VFP-class to a genus. The performance of VPF-class was calculated using confidence score (CS) and membership ratio (MR) thresholds of ≥0.2 (Pons et al., 2021). Full-size DOI: 10.7717/peerj.11447/fig-3

Source publication
Article
Full-text available
Background Viruses influence global patterns of microbial diversity and nutrient cycles. Though viral metagenomics (viromics), specifically targeting dsDNA viruses, has been critical for revealing viral roles across diverse ecosystems, its analyses differ in many ways from those used for microbes. To date, viromics benchmarking has covered read pre...

Contexts in source publication

Context 1
... we evaluated whether false-positive detections were associated with specific types of non-viral sequences, including other mobile genetic elements and 'novel' microbial genomes. To this end, we generated datasets composed only of archaea, plasmid, or eukaryotic sequences, and measured false-discovery rates across the viral identification tools (Fig. S3). It is important to note however that, to our knowledge, there is currently no 'clean' plasmid database that is not also containing phages/viruses' genome. Therefore, our benchmark is based on a cleaning based on 'complete' plasmid/phages, and primarily looking at how genome fragmentation impacts the delineation of plasmid vs phage. ...
Context 2
... v.1.2.0 when using the virome flag (highest in eukaryote up to > 90% false-discovery, while other version of VIBRANT is less affected), MetaPhinder (highest in plasmid up to >40% false-discovery), MARVEL (up to ∼20% false discovery for plasmid dataset), and VirSorter when using the virome flag (up to ∼24% false-discovery in eukaryote datasets) (Fig. S3). This suggests the data used to train these tools may have under-represented eukaryotic and/or plasmid sequence and highlights the importance of including diverse non-viral sequences in a balanced training set when establishing machine-learning based viral contig detection tools, as previously highlighted (Ponsero & Hurwitz, 2019;. ...
Context 3
... sized genome fragments against those from complete genomes. Our results showed the percentage of sequences accurately assignable to specific viral taxa increased with fragment length. Specifically, the percentage of sequences clustered in a vConTACT v2 gene-sharing network increased from 61% to >80% from 3 kb to fragment to complete genomes (Fig. 3A). This difficulty in robustly integrating short genome or genome fragments in a gene-sharing network is further illustrated by the network topology itself, which shows a much higher fragmentation of the network for 3 kb fragment compared to complete genomes, accompanied by an inflated number of 'new VCs' and a higher number of ...
Context 4
... this, we performed similar comparisons of taxonomic assignment for varying genome fragment lengths using other viral classification tools including VipTree (genome-wide similarities-based), VIRIDIC (BLASTN-based), and VPF-class (protein family based). The general results show that the performance of these tools also increased with fragment size (Fig. 3B, Fig. S6, Fig. S8). For VPF-class, the percentage of sequence with a taxonomic assignation increased from ∼46% for 3 kbp fragments to ∼82% for 20 kbp (Fig. 3B), while the percentage of sequences assigned to the correct genus also increased with sequence length (Fig. S6B). For ViPTree and VIRIDIC, an increase in performance was also observed from 3 kbp ...
Context 5
... similarities-based), VIRIDIC (BLASTN-based), and VPF-class (protein family based). The general results show that the performance of these tools also increased with fragment size (Fig. 3B, Fig. S6, Fig. S8). For VPF-class, the percentage of sequence with a taxonomic assignation increased from ∼46% for 3 kbp fragments to ∼82% for 20 kbp (Fig. 3B), while the percentage of sequences assigned to the correct genus also increased with sequence length (Fig. S6B). For ViPTree and VIRIDIC, an increase in performance was also observed from 3 kbp through 20 kbp (Fig. S8). Together these results suggest genome fragmentation negatively impact virus taxonomic classification for all common ...

Similar publications

Article
Full-text available
Viral infections modulate bacterial metabolism and ecology. Here, we investigated the hypothesis that viruses influence the ecology of purple and green sulfur bacteria in anoxic and sulfidic lakes, analogs of euxinic oceans in the geologic past. By screening metagenomes from lake sediments and water column, in addition to publicly-available genomes...
Article
Full-text available
Viruses are key players in the environment, and recent metagenomic studies have revealed their diversity and genetic complexity. Despite progress in understanding the ecology of viruses in extreme environments, viruses’ dynamics and functional roles in dryland ecosystems, which cover about 45% of the Earth’s land surfaces, remain largely unexplored...
Article
Full-text available
Microbial communities have huge impacts on their ecosystems and local environments spanning from marine and soil communities to the mammalian gut. Bacteriophages (phages) are important drivers of population control and diversity in the community, but our understanding of complex microbial communities is halted by biased detection techniques. Metage...
Article
Full-text available
SAR11 bacteria dominate ocean surface bacterioplankton communities, and play an important role in marine carbon and nutrient cycling. The biology and ecology of SAR11 are impacted by SAR11 phages (pelagiphages) that are highly diverse and abundant in the ocean. Among the currently known pelagiphages, HTVC010P represents an extremely abundant but un...
Article
Full-text available
Anthropogenic land use changes have been recognized with significant effects on the abundance and diversity of antibiotic resistance genes (ARGs) in soil, but their impacts on ARGs with potential health risk remained poorly understood. In this study, paired metagenomes and viromes were obtained from soils (Anthrosols and Nitisols) with different la...

Citations

... Several benchmarking studies have compared the performance of various virus identification tools [48][49][50][51][52][53] (Additional file 2: Table S2). Most of them used simulated sequencing data or sequencing data from mock community as testing datasets. ...
... Another one benchmarked tools that specialize in identifying viruses from clinical samples [51]. Four benchmarking works [49,50,52,53] mainly used simulated viral and non-viral testing datasets that were sampled from publicly available complete viral and microbial genomes (e.g., NCBI RefSeq). A summary of the tested tools and testing datasets of each study can be found in Additional file 2: Table S2. ...
... Most tools performed significantly worse on the mock community dataset than on the simulated dataset derived from reference genomes, illustrating how simulated data sampled from RefSeq could overestimate tools' performance. Pratama et al. 2021 found that viral identification efficiency increases with fragment length, and almost all tools can correctly identify viral contigs of 10 kb or longer [50]. "Gene content based" tools (VirSorter) can maximize the true positive rate and minimize the false positive rate at length > 3 kb. ...
Article
Full-text available
Background As most viruses remain uncultivated, metagenomics is currently the main method for virus discovery. Detecting viruses in metagenomic data is not trivial. In the past few years, many bioinformatic virus identification tools have been developed for this task, making it challenging to choose the right tools, parameters, and cutoffs. As all these tools measure different biological signals, and use different algorithms and training and reference databases, it is imperative to conduct an independent benchmarking to give users objective guidance. Results We compare the performance of nine state-of-the-art virus identification tools in thirteen modes on eight paired viral and microbial datasets from three distinct biomes, including a new complex dataset from Antarctic coastal waters. The tools have highly variable true positive rates (0–97%) and false positive rates (0–30%). PPR-Meta best distinguishes viral from microbial contigs, followed by DeepVirFinder, VirSorter2, and VIBRANT. Different tools identify different subsets of the benchmarking data and all tools, except for Sourmash, find unique viral contigs. Performance of tools improved with adjusted parameter cutoffs, indicating that adjustment of parameter cutoffs before usage should be considered. Conclusions Together, our independent benchmarking facilitates selecting choices of bioinformatic virus identification tools and gives suggestions for parameter adjustments to viromics researchers.
... Given their likely derivation from cellular entities, completely ruling out false-positive predictions concerning viral overlapping populations remains challenging. Therefore, further research, including genomic context assessment and functional analyses of these putative AMGs [77], is essential for a more detailed understanding of viral impacts on the Chaohu Lake ecosystem. ...
Article
Full-text available
Viruses, as the most prolific entities on Earth, constitute significant ecological groups within freshwater lakes, exerting pivotal ecological roles. In this study, we selected Chaohu Lake, a representative eutrophic freshwater lake in China, as our research site to explore the community distribution, driving mechanisms, and potential ecological functions of diverse viral communities, the intricate virus–host interaction systems, and the overarching influence of viruses on global biogeochemical cycling.
... Then, the viral genomes were processed with VirSorter2 (--prep-for-dramv) and subsequently annotated using DRAM v1.4.6 (viral mode, default parameters) [95]. Following the previous protocols [41,54], genes with an auxiliary score ≤ 3 and with the M and without A, V, and T markers were considered putative AMGs and were further manually validated for gene location. Conserved domains of the proteins encoded by the AMGs were examined using PROSITE [96], while protein structural homology modeling was conducted with SWISS-MODEL [97]. ...
Preprint
Full-text available
Background: The gastrointestinal tract (GIT) microbiome of ruminants significantly influences their nutrition metabolism and health. Current understanding is extensive for bacterial and archaeal communities, but limited for viral communities within the GIT. Results: We created the Ruminant Gastrointestinal Virome Catalogue (RGVC), which includes 43,981 non-redundant viral Operational Taxonomic Units (vOTUs), with 89.3% newly identified, derived from 370 samples across 10 GIT regions in seven ruminant species. The composition of viral communities is mainly influenced by the GIT regions rather than by the ruminant species. We identified 4,603 putative prokaryotic hosts across 31 bacterial and three archaeal classes for 5,954 viruses, along with significant variations across GIT regions and a strong correlation between hosts and their associated viruses. Lysogeny, constituting 45.6% of survival strategies, was more prevalent than the lytic cycle (4.08%), and the abundances of these viruses varied regionally. The lysogenic viruses encoded 1,805 auxiliary metabolic genes (AMGs) that play key roles in carbohydrates, amino acids, and other metabolisms in their hosts. The variation in AMG abundance across regions highlights the distinct viral communities and the varied virus-host interactions within the GIT. Conclusion: This study offers a comprehensive view of the spatial heterogeneity of viral communities in the ruminant GIT and indicates that this diversity is driven by the interaction of lysogenic viruses with their prokaryotic hosts through AMGs. These findings set the stage for future research into the ecological and nutritional impacts of the ruminant virome, with the potential to uncover novel roles and mechanisms in various GIT regions.
... Standardizing RNA virus detection would strongly require a community-built consensus about performance evaluation pipelines (sensitivity, recall, F1, precision, algorithm resilience, etc.), similarly to the ongoing efforts in microbial and DNA virus metagenomics (55)(56)(57)(58). Directly linked to this, unequivocal agreements on the plurality of operational taxonomic unit (OTU) definitions, clustering thresholds, and minimal procedure for genome completeness estimation of novel and divergent viruses will help set gold standards for the scientific community ( Figure 2). ...
Article
Full-text available
Improved RNA virus understanding is critical to studying animal and plant health, and environmental processes. However, the continuous and rapid RNA virus evolution makes their identification and characterization challenging. While recent sequence-based advances have led to extensive RNA virus discovery, there is growing variation in how RNA viruses are identified, analyzed, characterized, and reported. To this end, an RdRp Summit was organized and a hybrid meeting took place in Valencia, Spain in May 2023 to convene leading experts with emphasis on early career researchers (ECRs) across diverse scientific communities. Here we synthesize key insights and recommendations and offer these as a first effort to establish a consensus framework for advancing RNA virus discovery. First, we need interoperability through standardized methodologies, data-sharing protocols, metadata provision and interdisciplinary collaborations and offer specific examples as starting points. Second, as an emergent field, we recognize the need to incorporate cutting-edge technologies and knowledge early and often to improve omic-based viral detection and annotation as novel capabilities reveal new biology. Third, we underscore the significance of ECRs in fostering international partnerships to promote inclusivity and equity in virus discovery efforts. The proposed consensus framework serves as a roadmap for the scientific community to collectively contribute to the tremendous challenge of unveiling the RNA virosphere.
... According to the previous suggested practices for AMG identification (Pratama et al., 2021), in this work, we identified and curated viral AMGs using the combination of VIBRANT (v1.2.1) (Kieft et al., 2020) and DRAM-V (v1.2.0) (Shaffer et al., 2020) with default parameters. Based on the output files of DRAM-V, the putative AMGs were defined as those with auxiliary scores of 1-3, as well as AMG flags of M (indicating a metabolism-related gene) and F (indicating a gene near the end of a contig). ...
... The putative AMGs were identified and evaluated for viruses recovered from both 982 publicly published metagenomes from a range of environments where microbial MM genes were detected (see next paragraph) (Supplementary Data 1, 2, and 3) and the 11 VLS metagenomes originally constructed in this study, according to our previously established methods 79 . Specifically, once viral contigs were recovered Article https://doi.org/10.1038/s41467-024-46109-x ...
Article
Full-text available
Methane is a potent greenhouse gas contributing to global warming. Microorganisms largely drive the biogeochemical cycling of methane, yet little is known about viral contributions to methane metabolism (MM). We analyzed 982 publicly available metagenomes from host-associated and environmental habitats containing microbial MM genes, expanding the known MM auxiliary metabolic genes (AMGs) from three to 24, including seven genes exclusive to MM pathways. These AMGs are recovered on 911 viral contigs predicted to infect 14 prokaryotic phyla including Halobacteriota, Methanobacteriota, and Thermoproteota. Of those 24, most were encoded by viruses from rumen (16/24), with substantially fewer by viruses from environmental habitats (0–7/24). To search for additional MM AMGs from an environmental habitat, we generate metagenomes from methane-rich sediments in Vrana Lake, Croatia. Therein, we find diverse viral communities, with most viruses predicted to infect methanogens and methanotrophs and some encoding 13 AMGs that can modulate host metabolisms. However, none of these AMGs directly participate in MM pathways. Together these findings suggest that the extent to which viruses use AMGs to modulate host metabolic processes (e.g., MM) varies depending on the ecological properties of the habitat in which they dwell and is not always predictable by habitat biogeochemical properties.
... Putative viral-encoded AMGs were identified using DRAM-v (52). Due to the expected increased false positive signal arising from the high non-viral sequence space in the soil metagenomes, strict curation of candidate AMGs was performed, as suggested (76). Briefly, this included genes on viral contigs ≥ 10 kb or complete genomes, with an auxiliary score of 1-3 and with both the "M" flag (corresponding to metabolic function) and the "F" flag (corresponding to genes within 5,000 bases of the end of the viral contig). ...
Article
Full-text available
Soil microbes play pivotal roles in global carbon cycling; however, the fundamental interactions between microbes and their infecting viruses remain unclear. This is exacerbated with soil depth, where the patterns of viral dispersal, ecology, and evolution are markedly underexplored. To investigate viral communities throughout the soil depth profile, we leveraged a publicly available metagenomic data set sampled from grassland soil in Northern California. In total, 10,196 non-redundant viral operational taxonomic units were recovered from soil between 20 cm and 115 cm below the surface. Viral prevalence was high throughout the soil depth profile, with viruses infecting dominant soil hosts, including Actinomycetia . Contrary to leading hypotheses, lysogeny did not dominate in the soil viral communities. Viral diversity was assessed at both the population level (i.e., macrodiversity) and strain level (i.e., microdiversity) to reveal diverse ecological and evolutionary patterns of virus-host interactions in surface and subsurface soils. Investigating viral microdiversity uncovered potential patterns of antagonistic co-evolution across both surface and subsurface soils. Furthermore, we have provided evidence for the potential of soil viruses to augment the remineralization of soil carbon. While we continue to yield a more comprehensive understanding of soil viral ecology, our work appeals to future researchers to further investigate subsurface viral communities. IMPORTANCE Soil viruses can moderate the roles that their host microbes play in global carbon cycling. However, given that most studies investigate the surface layer (i.e., top 20 cm) of soil, the extent to which this occurs in subsurface soil (i.e., below 20 cm) is unknown. Here, we leveraged public sequencing data to investigate the interactions between viruses and their hosts at soil depth intervals, down to 115 cm. While most viruses were detected throughout the soil depth profile, their adaptation to host microbes varied. Nonetheless, we uncovered evidence for the potential of soil viruses to encourage their hosts to recycle plant-derived carbon in both surface and subsurface soils. This work reasons that our understanding of soil viral functions requires us to continue to dig deeper and compare viruses existing throughout soil ecosystems.
... Other identified viral genera showed differential abundances, typically with higher values in the non-amplified samples, as reported at the family-level classification. Although genus-level classification has been obtained using previously utilised confidence scores [43], these findings should be viewed with caution since the limited length of the viral contigs may affect the correct taxonomic classification [55]. ...
Article
Full-text available
Viruses are the most abundant 'biological entities' in the world's oceans. However, technical and methodological constraints limit our understanding of their diversity, particularly in benthic abyssal ecosystems (>4000 m depth). To verify advantages and limitations of analyzing virome DNA subjected either to random amplification or unamplified, we applied shotgun sequencing-by-synthesis to two sample pairs obtained from benthic abyssal sites located in the Northeastern Atlantic Ocean at ca. 4700 m depth. One amplified DNA sample was also subjected to single-molecule long-read sequencing for comparative purposes. Overall, we identified 24,828 viral Operational Taxonomic Units (vOTUs), belonging to 22 viral families. Viral reads were more abundant in the amplified DNA samples (38.5-49.9%) compared to the unamplified ones (4.4-5.8%), with the latter showing a greater viral diversity and 11-16% of dsDNA viruses almost undetectable in the amplified samples. From a procedural point of view, the viromes obtained by direct sequencing (without amplification step) provided a broader overview of both ss and dsDNA viral diversity. Nevertheless, our results suggest that the contextual use of random amplification of the same sample and long-read technology can improve the assessment of viral assemblages by reducing off-target reads.
... 62 The pAMGs were identified by DRAM-v with auxiliary scores >3 and flagged with "M" were chosen. 63 The AMG identification by VIBRANT was conducted with default parameters in "virome" mode and 4 ORFs per scaffold to limit input sequences. 62 All pAMGs were annotated based on Kofam, PFAM, NCBI Viral RefSeq, and VOGDB database. ...
... Meanwhile, the host prediction results from the three in silico methods were coupled to represent putative phage−host associations based on their distinctive genomic characteristics, and these findings may require further verification due to the intrinsic limitations of bioinformatic analysis. 63 Environmental stress could promote a symbiotic bacterium− phage relationship as phages in the lysogenic state may facilitate host adaptation to the hostile environment, thus benefiting its own survival. 88 The nonmotile phage would be impeded in the encounter and adsorption of host bacterium under the hydraulic environment. ...
... Viruses also contain auxiliary metabolic genes (AMGs) in their genomes that may be used to modulate host metabolisms. AMG-containing viruses were found in various ecosystems [18,19]. However, whether AMGs also mediate the metabolisms of supraglacial microorganisms remains unclear. ...
... Putative AMGs with viral hallmark genes upstream and/or downstream, and viral-specific genes both upstream and/or downstream or that fell within regions enriched by genes with only hypothetical functions were kept, as described by Wu et al. [41] and Shaffer et al. [40]. In addition, the functions of listed AMG examples were further checked with Phyre 2 and swiss-MODEL protein modeling, as previously described [19]. For genomic architecture, the annotation files of vOTUs with putative AMGs were imported into Geneious v2021.2.2 or visualized by gggene (https://github.com/wilkox/ ...