Prediction performances of random forest classifiers based on gut viral abundance. (A) Within and cross study AUROC matrix obtained by using GPD genome-level abundance. The diagonal refers to results of cross validation within each dataset. Off-diagonal values refer to prediction results trained on the study of each row and tested on the study of each column. (B) Within and cross study AUROC matrix obtained by using species-level abundance. See Supplementary Figures S12A, B for genus-level and family-level AUROC. (C) Within and cross study AUROC matrix obtained by using gene-family abundance. See Supplementary Figure S12C for pathway AUROC. (D) LODO results with the x axis indicating the study left out as the validation set and other studies combined as the training set.

Prediction performances of random forest classifiers based on gut viral abundance. (A) Within and cross study AUROC matrix obtained by using GPD genome-level abundance. The diagonal refers to results of cross validation within each dataset. Off-diagonal values refer to prediction results trained on the study of each row and tested on the study of each column. (B) Within and cross study AUROC matrix obtained by using species-level abundance. See Supplementary Figures S12A, B for genus-level and family-level AUROC. (C) Within and cross study AUROC matrix obtained by using gene-family abundance. See Supplementary Figure S12C for pathway AUROC. (D) LODO results with the x axis indicating the study left out as the validation set and other studies combined as the training set.

Source publication
Article
Full-text available
The association of colorectal cancer (CRC) and the human gut microbiome dysbiosis has been the focus of several studies in the past. Many bacterial taxa have been shown to have differential abundance among CRC patients compared to healthy controls. However, the relationship between CRC and non-bacterial gut microbiome such as the gut virome is unde...

Similar publications

Article
Full-text available
Early infancy is critical for the development of an infant's gut flora. Many factors can influence microbiota development during the pre- and postnatal periods, including maternal factors, antibiotic exposure, mode of delivery, dietary patterns, and feeding type. Therefore, investigating the connection between these variables and host and microbiom...

Citations

... Next generation sequencing technology can generate a lot of sequences from a variety of environmental samples in a short period, constructing a metagenome [6,7]. To analyze viral-host interactions [8] from metagenomic data and further analyze human diseases [9][10][11][12], such as colorectal cancer (CRC) [13][14][15] and inf lammatory bowel disease [16], identifying viral sequences directly from metagenome is the very first step [17]. Because of the vast number of sequences and the low content of virus sequences in metagenomes [18], identifying viral sequences accurately becomes a challenge. ...
Article
Full-text available
Viruses are the most abundant biological entities on earth and are important components of microbial communities. A metagenome contains all microorganisms from an environmental sample. Correctly identifying viruses from these mixed sequences is critical in viral analyses. It is common to identify long viral sequences, which has already been passed thought pipelines of assembly and binning. Existing deep learning-based methods divide these long sequences into short subsequences and identify them separately. This makes the relationships between them be omitted, leading to poor performance on identifying long viral sequences. In this paper, VirGrapher is proposed to improve the identification performance of long viral sequences by constructing relationships among short subsequences from long ones. VirGrapher see a long sequence as a graph and uses a Graph Convolutional Network (GCN) model to learn multilayer connections between nodes from sequences after a GCN-based node embedding model. VirGrapher achieves a better AUC value and accuracy on validation set, which is better than three benchmark methods.
... These phages exhibit distinct patterns in both healthy and diseased microbiomes (Yang et al., 2023). The correlation between the human virome and various health conditions, such as cancer, inflammatory bowel diseases, and diabetes, has been documented (Zhao et al., 2017;Han et al., 2018;Nakatsu et al., 2018;Fernandes et al., 2019;Liang et al., 2020;Zuo et al., 2022). However, deeper research is needed to discern causality and their impact on microbial and host biological processes. ...
Article
Full-text available
Background In the evolving landscape of microbiology and microbiome analysis, the integration of machine learning is crucial for understanding complex microbial interactions, and predicting and recognizing novel functionalities within extensive datasets. However, the effectiveness of these methods in microbiology faces challenges due to the complex and heterogeneous nature of microbial data, further complicated by low signal-to-noise ratios, context-dependency, and a significant shortage of appropriately labeled datasets. This study introduces the ProkBERT model family, a collection of large language models, designed for genomic tasks. It provides a generalizable sequence representation for nucleotide sequences, learned from unlabeled genome data. This approach helps overcome the above-mentioned limitations in the field, thereby improving our understanding of microbial ecosystems and their impact on health and disease. Methods ProkBERT models are based on transfer learning and self-supervised methodologies, enabling them to use the abundant yet complex microbial data effectively. The introduction of the novel Local Context-Aware (LCA) tokenization technique marks a significant advancement, allowing ProkBERT to overcome the contextual limitations of traditional transformer models. This methodology not only retains rich local context but also demonstrates remarkable adaptability across various bioinformatics tasks. Results In practical applications such as promoter prediction and phage identification, the ProkBERT models show superior performance. For promoter prediction tasks, the top-performing model achieved a Matthews Correlation Coefficient (MCC) of 0.74 for E. coli and 0.62 in mixed-species contexts. In phage identification, ProkBERT models consistently outperformed established tools like VirSorter2 and DeepVirFinder, achieving an MCC of 0.85. These results underscore the models' exceptional accuracy and generalizability in both supervised and unsupervised tasks. Conclusions The ProkBERT model family is a compact yet powerful tool in the field of microbiology and bioinformatics. Its capacity for rapid, accurate analyses and its adaptability across a spectrum of tasks marks a significant advancement in machine learning applications in microbiology. The models are available on GitHub (https://github.com/nbrg-ppcu/prokbert) and HuggingFace (https://huggingface.co/nerualbioinfo) providing an accessible tool for the community.
... Zuo et al reported that patients with CRC had elevated pathways for fatty acid biosynthesis and depleted production of chemicals in their gut virome that inhibit CRC cell proliferation (L-methionine) or maintain homoeostasis (acetate). 73 These findings suggest that an altered virome may also have a direct role in oncogenesis and/or tumour progression, but more mechanistic studies are needed to establish its validity. ...
Article
Full-text available
Objective The gut virome is a dense community of viruses inhabiting the gastrointestinal tract and an integral part of the microbiota. The virome coexists with the other components of the microbiota and with the host in a dynamic equilibrium, serving as a key contributor to the maintenance of intestinal homeostasis and functions. However, this equilibrium can be interrupted in certain pathological states, including inflammatory bowel disease, causing dysbiosis that may participate in disease pathogenesis. Nevertheless, whether virome dysbiosis is a causal or bystander event requires further clarification. Design This review seeks to summarise the latest advancements in the study of the gut virome, highlighting its cross-talk with the mucosal microenvironment. It explores how cutting-edge technologies may build upon current knowledge to advance research in this field. An overview of virome transplantation in diseased gastrointestinal tracts is provided along with insights into the development of innovative virome-based therapeutics to improve clinical management. Results Gut virome dysbiosis, primarily driven by the expansion of Caudovirales , has been shown to impact intestinal immunity and barrier functions, influencing overall intestinal homeostasis. Although emerging innovative technologies still need further implementation, they display the unprecedented potential to better characterise virome composition and delineate its role in intestinal diseases. Conclusions The field of gut virome is progressively expanding, thanks to the advancements of sequencing technologies and bioinformatic pipelines. These have contributed to a better understanding of how virome dysbiosis is linked to intestinal disease pathogenesis and how the modulation of virome composition may help the clinical intervention to ameliorate gut disease management.
... Compared to studies of gut bacteria, studies inclusive of viruses (the virome), fungi (the mycobiome) and archaea (the archaeome) are not only under-represented but lack mechanistic insights. While CRC-associated alterations in these subcommunities have been identified 17,[23][24][25][26][27] , the key question of their functional capacity within the gut remains (that is, are they passengers, do they themselves or their metabolites produce direct pro-carcinogenic or anticarcinogenic effects, or do they predominantly mediate their effects by modulating the abundances of carcinogenic or anticarcinogenic bacteria?). Equally pressing is the biology of intratumoural microbiota, seemingly present in diverse human cancers 28,29 and persisting even in the context of metastasis 30 . ...
Article
Full-text available
Colorectal cancer (CRC) is a substantial source of global morbidity and mortality in dire need of improved prevention and treatment strategies. As our understanding of CRC grows, it is becoming increasingly evident that the gut microbiota, consisting of trillions of microorganisms in direct interface with the colon, plays a substantial role in CRC development and progression. Understanding the roles that individual microorganisms and complex microbial communities play in CRC pathogenesis, along with their attendant mechanisms, will help yield novel preventive and therapeutic interventions for CRC. In this Review, we discuss recent evidence concerning global perturbations of the gut microbiota in CRC, associations of specific microorganisms with CRC, the underlying mechanisms by which microorganisms potentially drive CRC development and the roles of complex microbial communities in CRC pathogenesis. While our understanding of the relationship between the microbiota and CRC has improved in recent years, our findings highlight substantial gaps in current research that need to be filled before this knowledge can be used to the benefit of patients.
... The viral family Herelleviridae was found to be depleted in patients with colorectal cancer. [83] Humans ...
Article
Full-text available
The gut microbiota, including bacteria, archaea, fungi, and viruses, compose a diverse mammalian gut environment and are highly associated with host health. Bacteriophages, the viruses that infect bacteria, are the primary members of the gastrointestinal virome, known as the phageome. However, our knowledge regarding the gut phageome remains poorly understood. In this review, the critical role of the gut phageome and its correlation with mammalian health were summarized. First, an overall profile of phages across the gastrointestinal tract and their dynamic roles in shaping the surrounding microorganisms was elucidated. Further, the impacts of the gut phageome on gastrointestinal fitness and the bacterial community were highlighted, together with the influence of diets on the gut phageome composition. Additionally, new reports on the role of the gut phageome in the association of mammalian health and diseases were reviewed. Finally, a comprehensive update regarding the advanced phage benchwork and contributions of phage-based therapy to prevent/treat mammalian diseases was provided. This study provides insights into the role and impact of the gut phagenome in gut environments closely related to mammal health and diseases. The findings provoke the potential applications of phage-based diagnosis and therapy in clinical and agricultural fields. Future research is needed to uncover the underlying mechanism of phage–bacterial interactions in gut environments and explore the maintenance of mammalian health via phage-regulated gut microbiota.
... Firmicutes) and Actinomycetota, all of which have been reported as common members of the human gut microbiome [71]. In particular, the families Drexlerviridae, Salasmaviridae and Herelleviridae have already been identified as frequent members of the human gut virome [72,73]. Interestingly, the sequences classified in the subfamilies Nymbaxtervirinae and Arquatrovirinae corresponded to putative phages targeting bacteria from either the Bacillota or Actinomycetota phyla. ...
... The vast majority of the sequences were assigned to families and subfamilies within the class Caudoviricetes, which agrees with the former classifications reported in the GPD metadata. Furthermore, phages classified in the reported taxa have been previously found to target bacteria within some of the most frequently identified bacterial phyla in the human gut [71][72][73]. However, a few of the representative sequences were classified in the families Iridoviridae and Herpesviridae. ...
Article
Full-text available
The study of viral communities has revealed the enormous diversity and impact these biological entities have on various ecosystems. These observations have sparked widespread interest in developing computational strategies that support the comprehensive characterization of viral communities based on sequencing data. Here we introduce VIRify, a new computational pipeline designed to provide a user-friendly and accurate functional and taxonomic characterization of viral communities. VIRify identifies viral contigs and prophages from metagenomic assemblies and annotates them using a collection of viral profile hidden Markov models (HMMs). These include our manually-curated profile HMMs, which serve as specific taxonomic markers for a wide range of prokaryotic and eukaryotic viral taxa and are thus used to reliably classify viral contigs. We tested VIRify on assemblies from two microbial mock communities, a large metagenomics study, and a collection of publicly available viral genomic sequences from the human gut. The results showed that VIRify could identify sequences from both prokaryotic and eukaryotic viruses, and provided taxonomic classifications from the genus to the family rank with an average accuracy of 86.6%. In addition, VIRify allowed the detection and taxonomic classification of a range of prokaryotic and eukaryotic viruses present in 243 marine metagenomic assemblies. Finally, the use of VIRify led to a large expansion in the number of taxonomically classified human gut viral sequences and the improvement of outdated and shallow taxonomic classifications. Overall, we demonstrate that VIRify is a novel and powerful resource that offers an enhanced capability to detect a broad range of viral contigs and taxonomically classify them.
... The enteric virome, an essential component and regulator of gut microflora, affects the intestinal microbiota's structure and abundance, potentially impacting CRC occurrence, progression, and outcomes by altering bacterial-host communities. [40][41][42] The abundance and diversity of gut viral microbiota such as Siphoviridae, Myoviridae, and Podoviridae increase significantly, and Herelleviridae is significantly depleted in CRC patients. 40 Intestinal viral dysregulation is associated with early and late stages of CRC, and viruses such as Betabaculovirus, Punalikevirus, and Mulikevirus were associated with clinical outcomes. ...
... [40][41][42] The abundance and diversity of gut viral microbiota such as Siphoviridae, Myoviridae, and Podoviridae increase significantly, and Herelleviridae is significantly depleted in CRC patients. 40 Intestinal viral dysregulation is associated with early and late stages of CRC, and viruses such as Betabaculovirus, Punalikevirus, and Mulikevirus were associated with clinical outcomes. 41 A meta-analysis of metagenomic data showed that bacteriophages of Porphyromonas, Fusobacterium, and Hungatella were enriched in CRC patients. ...
Article
Full-text available
Colorectal cancer (CRC) is the third most common malignant tumor worldwide. The incidence and mortality rates of CRC have been increasing in China, possibly due to economic development, lifestyle, and dietary changes. Evidence suggests that gut microbiota plays an essential role in the tumorigenesis of CRC. Gut dysbiosis, specific pathogenic microbes, metabolites, virulence factors, and microbial carcinogenic mechanisms contribute to the initiation and progression of CRC. Gut microbiota biomarkers have potential translational applications in CRC screening and early diagnosis. Gut microbiota-related interventions could improve anti-tumor therapy’s efficacy and severe intestinal toxic effects. Chinese researchers have made many achievements in the relationship between gut microbiota and CRC, although some challenges remain. This review summarizes the current evidence from China on the role of gut microbiota in CRC, mainly including the gut microbiota characteristics, especially Fusobacterium nucleatum and Parvimonas micra, which have been identified to be enriched in CRC patients; microbial pathogens such as F. nucleatum and enterotoxigenic Bacteroides fragilis, and P. micra, which Chinese scientists have extensively studied; diagnostic biomarkers especially F. nucleatum; therapeutic effects, including microecological agents represented by certain Lactobacillus strains, fecal microbiota transplantation, and traditional Chinese medicines such as Berberine and Curcumin. More efforts should be focused on exploring the underlying mechanisms of microbial pathogenesis of CRC and providing novel gut microbiota-related therapeutic and preventive strategies.
... In addition, our results showed that crAss-phage and Microviridae had signi cant positive correlation with fecal urgency score. It has been reported that patients with GI diseases (such as ulcerative colitis 41 , IBS-D 42 , and colorectal cancer 43 ) had signi cantly more crAss-phage, Microviridae, and Herelleviridae than healthy subjects, suggesting high levels of gut crAss-phage, Microviridae and Herelleviridae sequences might be linked to diarrhea. These results suggested that desirable changes and interactions between gut bacteria and bacteriophages in patients could be a potential explanation for the effectiveness of P9 in relieving diarrhea. ...
Preprint
Full-text available
This study evaluated the beneficial effects of administering Lactiplantibacillus plantarum P9 (P9) on chronic diarrhea. A randomized, double-blind, placebo-controlled trial was performed. Patients were assigned to the probiotic or placebo group randomly. The primary endpoint was the diarrhea symptom severity score; the secondary endpoints were the stool consistency, the number of bowel movements, fecal urgency score, the Depression Anxiety Stress Scales-21 score, fecal metagenome and metabolome. Administering P9 for 4-week significantly improved diarrhea symptoms and the stool consistency, accompanied by a multitude of patients’ gut microbiota and metabolome changes: increases in several gut short-chain fatty acid (SCFA)-producers and a bile acid metabolizing species; elevation in fecal metabolites of bile acids, amino acids, and short-chain fatty acids; increases in cumulative gene abundances of 15 carbohydrate-active enzyme subfamilies; increases in fecal acetate and butyrate concentrations. P9 administration had a remarkable therapeutic effect on chronic diarrhea, supporting using probiotics to alleviate chronic diarrhea.
... [13][14][15] Recent studies of metagenomic CRC datasets have demonstrated the association between particular microbial species and CRC, suggesting these taxa as signatures for early identification of CRC. [16][17][18][19][20][21][22][23][24] While most of the gut microbiota is composed of the phyla Bacteroidetes and Firmicutes, 1,13 the composition of species and strains is unique to each person. 1 Early reports based on isolated cultures indicated a possible role of some organisms based on their metabolic properties, with more recent studies reinforcing the role of some species in CRC oncogenesis, 25 including John Cunningham virus, 26 Streptococcus gallolyticus, 27 Bacteroides fragilis 28,29 and Fusobacterium nucleatum. [30][31][32] One of the most accepted hypotheses for CRC development is a 'driver-passenger' model, 33 which proposes that commonly associated CRC species, such as Fusobacterium nucleatum and Bacteroides fragilis are responsible for the promotion of tumorigenesis (i.e. ...
... These results are in line with previous articles showing the same pattern of little statistical differences in species diversity between CRC and normal conditions but high levels of differences according to cohort. 16,21,35,36,38,61 We tested for all confounders effects (especially Project) using two-sided Wilcoxon-Mann-Whitney tests, comparing the condition blocked for each confounder separately. These analyses replicated the same results observed for the univariate results based on samples conditions (Table S8, ESI †). ...
Article
Colorectal cancer (CRC) is one of the most common types of cancer, with many studies associating its development with changes in the gut microbiota. Recent developments in sequencing technologies and subsequent meta-analyses of gut metagenome provided a better understanding of species association to CRC tumorigenesis. Still, the importance of high-importance taxonomic singletons (i.e. species highly associated with a given condition but observed only in the minority of datasets) and the species interactions and co-abundance across cohorts need further exploration. It has been shown that the gut metagenome presents a high functional redundancy, meaning that species interactions could mitigate the absence of any given species. In a CRC framework, this implies that species co-abundance could play a role in tumorigenesis, even if CRC-associated species show low abundance. We propose to evaluate the impact of these low-prevalent species by initially analyzing each dataset individually and subsequently intersecting the results for differentially abundant species in CRC samples. We then identify metabolic pathways from these species based on KEGG orthologs, highlighting metabolic pathways associated with CRC. Our results indicate six species with high prevalence across all projects and with high association to CRC, including the genus Bacteroides, Enterocloster and Prevotella, with a high potential for methane metabolism. Finally, we show that CRC is also characterized by the co-occurrence of species that do not present significant differential abundance, but have been described in the literature as potential CRC biomarkers. These results indicate that between-species interactions could also play a role in CRC tumorigenesis.
... However, the importance of gut virome has been understudied. Only few studies have revealed an altered gut viral composition in different pathological conditions such as cancer 12,13 , type I diabetes 14 and inflammatory bowel disease (IBD) 15,16 . Little is known about the impact of HIV infection on gut virome (bacteriophages and eukaryotic viruses). ...
Article
Full-text available
Viruses are the most abundant components of the human gut microbiome with a significant impact on health and disease. The effects of human immunodeficiency virus (HIV) infection on gut virome has been scarcely analysed. Several studies suggested that integrase strand transfers inhibitors (INSTIs) are associated with a healthier gut. Thus, the objective of this work was to evaluate the effects of HIV infection and INSTIs on gut virome composition. 26 non-HIV-infected volunteers, 15 naive HIV-infected patients and 15 INSTIs-treated HIV-infected patients were recruited and their gut virome composition was analysed using shotgun sequencing. Bacteriophages were the most abundant and diverse viruses present in gut. HIV infection was accompanied by a decrease in phage richness which was reverted after INSTIs-based treatment. β-diversity of phages revealed that samples from HIV-infected patients clustered separately from those belonging to the control group. Differential abundant analysis showed an increase in phages belonging to Caudoviricetes class in the naive group and a decrease of Malgrandaviricetes class phages in the INSTIs-treated group compared to the control group. Besides, it was observed that INSTIs-based treatment was not able to reverse the increase of lysogenic phages associated with HIV infection or to modify the decrease observed on the relative abundance of Proteobacteria-infecting phages. Our study describes for the first time the impact of HIV and INSTIs on gut virome and demonstrates that INSTIs-based treatments are able to partially restore gut dysbiosis at the viral level, which opens several opportunities for new studies focused on microbiota-based therapies.