Estimated species abundances from various virus classification tools for the LabB-6 HeLa cell lysate sample with 0.1 genome copies of each virus per cell (a) and the LabB-5 HeLa cell lysate sample with 3 genome copies of each virus per cell (b). The "Unclassified" category refers to reads that could not be mapped to the host genome or RVDBv16.

Estimated species abundances from various virus classification tools for the LabB-6 HeLa cell lysate sample with 0.1 genome copies of each virus per cell (a) and the LabB-5 HeLa cell lysate sample with 3 genome copies of each virus per cell (b). The "Unclassified" category refers to reads that could not be mapped to the host genome or RVDBv16.

Source publication
Article
Full-text available
Adventitious agent detection during the production of vaccines and biotechnology-based medicines is of critical importance to ensure the final product is free from any possible viral contamination. Increasing the speed and accuracy of viral detection is beneficial as a means to accelerate development timelines and to ensure patient safety. Here, se...

Contexts in source publication

Context 1
... unassembled reads based on their phylogenetic distance from a set of reference sequences. The LOD for FeLV for our bioinformatics methods, as well as Lab B's method, was between 0.1 and 3 genome copies per cell because no reads were identified as FeLV in the LabB-6 HeLa cell lysate sample which had viruses spiked-in at 0.1 genome copies per cell (Fig. 5a). The LODs of RSV and EBV for our approach and that of Lab B were ,0.1 genome copies per cell as RSV and EBV reads were still identified in this sample. The LOD of Reo1 in HeLa lysate across all bioinformatics approaches, including that of Lab B, was between 3 and 100 genome copies per cells (Fig. 5b). For the HeLa whole cell samples, ...
Context 2
... spiked-in at 0.1 genome copies per cell (Fig. 5a). The LODs of RSV and EBV for our approach and that of Lab B were ,0.1 genome copies per cell as RSV and EBV reads were still identified in this sample. The LOD of Reo1 in HeLa lysate across all bioinformatics approaches, including that of Lab B, was between 3 and 100 genome copies per cells (Fig. 5b). For the HeLa whole cell samples, the results showed that the viruses had the same LODs as they did in the HeLa cell lysate samples, except that Reo1 was identified at three genome copies per cell (Fig. 6). These LODs are comparable to ones previously reported for detection of adventitious viruses using NGS (48) and fall within current ...
Context 3
... tools were able to identify the viruses spiked into each sample (see Data Set S4 and Fig. S5). This includes reovirus, which was spiked-in at only 13 virus particles for every 10,000 CHO cells (Fig. 8). Interestingly, the tools also identified some potential cross-contamination of viral spike-ins. For instance, for the Reo3 spike-in (experiment 1), encephalomyocarditis virus (EMCV) and vesicular stomatitis virus (VSV) were ...

Similar publications

Article
Full-text available
Bryum is recognized as a cosmopolitan genus of mosses in the family Bryaceae that contains the largest diversity of mosses. While there are around 100 species of moss species reported from Antarctica to date, the actual species diversity remains elusive as the continent remains one of the least explored habitats globally. Here, we describe a new sp...
Article
Full-text available
As genome resources for wheat (Triticum L.) expand at a rapid pace, it is important to update targeted sequencing tools to incorporate improved sequence assemblies and regions of previously unknown significance. Here, we developed an updated regulatory region enrichment capture for wheat and other Triticeae species. The core target space includes s...
Preprint
Full-text available
Aim: Streptococcus pneumoniae and influenza H1N1 virus are common organisms associated with human infections. These infections could play a significant role in immune regulation. The study was performed to analyse the genome sequences of these organisms with human genome and study its functional significance. Methods: The study was performed to ana...
Article
Full-text available
Viruses are well known drivers of several human malignancies. A causative factor for oral cavity squamous cell carcinoma (OSCC) in patients with limited exposure to traditional risk factors, including tobacco use, is yet to be identified. Our study aimed to comprehensively evaluate the role of viral drivers in OSCC patients with low cumulative expo...
Preprint
Full-text available
As genome resources for wheat expand at a rapid pace, it is important to update targeted sequencing tools to incorporate improved sequence assemblies and regions of previously unknown significance. Here, we developed an updated regulatory region enrichment capture for wheat ( Triticum L.) and other Triticeae species. The core target space includes...

Citations

... One of the main developments since INSaFLU's first release [19] focused on upgrading the platform for automated metagenomic virus identification, in order to support both human and veterinary clinical practice and disease outbreak investigations. After reviewing the current state-of-the-art field of bioinformatics pipelines for metagenomic virus diagnostics [18,[23][24][25][26][27][28] and consulting the TELEVIR consortium (Public Health and Veterinary institutes across all Europe), a modular pipeline was designed and developed, incorporating the key steps of NGS metagenomics taxonomic classification and reporting (Figs. 1 and 2), namely: read quality control, viral enrichment/host depletion, de novo assembly, reads/contigs taxonomic classification, and confirmatory reference-based remapping and reporting. The choice of the internal components of the implemented workflows (software, default parameters, etc.) resulted from an extensive benchmarking (next section). ...
... The TELEVIR workflow provides a light-weight but robust approach, in classification and interpretation, to false positives. Firstly, it follows suggestions expressed in the literature to first filter out reads enriched with low complexity regions (e.g., homopolymeric tracts or short-repeats), as well as unwanted material (host or non-viral "contaminants") through host depletion and/or viral enrichment [23,28,33]. These steps aim at decreasing background noise and increasing the performance [28,34,35] and the speed of both read classifiers and assemblers. ...
... Firstly, it follows suggestions expressed in the literature to first filter out reads enriched with low complexity regions (e.g., homopolymeric tracts or short-repeats), as well as unwanted material (host or non-viral "contaminants") through host depletion and/or viral enrichment [23,28,33]. These steps aim at decreasing background noise and increasing the performance [28,34,35] and the speed of both read classifiers and assemblers. In turn, besides the reads classification, the workflow takes advantage of the increased precision of contig classification [36], which provides an additional, robust metric with which to assess the validity of the final results. ...
Article
Full-text available
Background Implementation of clinical metagenomics and pathogen genomic surveillance can be particularly challenging due to the lack of bioinformatics tools and/or expertise. In order to face this challenge, we have previously developed INSaFLU, a free web-based bioinformatics platform for virus next-generation sequencing data analysis. Here, we considerably expanded its genomic surveillance component and developed a new module (TELEVIR) for metagenomic virus identification. Results The routine genomic surveillance component was strengthened with new workflows and functionalities, including (i) a reference-based genome assembly pipeline for Oxford Nanopore technologies (ONT) data; (ii) automated SARS-CoV-2 lineage classification; (iii) Nextclade analysis; (iv) Nextstrain phylogeographic and temporal analysis (SARS-CoV-2, human and avian influenza, monkeypox, respiratory syncytial virus (RSV A/B), as well as a “generic” build for other viruses); and (v) algn2pheno for screening mutations of interest. Both INSaFLU pipelines for reference-based consensus generation (Illumina and ONT) were benchmarked against commonly used command line bioinformatics workflows for SARS-CoV-2, and an INSaFLU snakemake version was released. In parallel, a new module (TELEVIR) for virus detection was developed, after extensive benchmarking of state-of-the-art metagenomics software and following up-to-date recommendations and practices in the field. TELEVIR allows running complex workflows, covering several combinations of steps (e.g., with/without viral enrichment or host depletion), classification software (e.g., Kaiju, Kraken2, Centrifuge, FastViromeExplorer), and databases (RefSeq viral genome, Virosaurus, etc.), while culminating in user- and diagnosis-oriented reports. Finally, to potentiate real-time virus detection during ONT runs, we developed findONTime, a tool aimed at reducing costs and the time between sample reception and diagnosis. Conclusions The accessibility, versatility, and functionality of INSaFLU-TELEVIR are expected to supply public and animal health laboratories and researchers with a user-oriented and pan-viral bioinformatics framework that promotes a strengthened and timely viral metagenomic detection and routine genomics surveillance. INSaFLU-TELEVIR is compatible with Illumina, Ion Torrent, and ONT data and is freely available at https://insaflu.insa.pt/ (online tool) and https://github.com/INSaFLU (code).
... Both short-and long-read ntNGS have been used for quality control in bird products and detection of contaminants in biological products and vaccine manufacturing [87]. Similarly, as in farm management, quality control of final products, biologicals, and vaccines normally requires multiple tests for endogenous agents that may be present in cells and media [88,89]. ...
... NGS is already being used informally by vaccine companies to check the quality of vaccines and it may not be long before regulatory agencies require comprehensive NGSbased tests for adventitious agents. MacDonald and collaborators describe the use of this approach to detect adventitious agents in vaccines and biotechnology-based medicines [87]. The approach was also previously used for the detection of minority variants and adventitious viruses in attenuated vaccines [88,89]. ...
Article
Full-text available
Simple Summary Significant progress in next-generation sequencing (NGS) is positioning this technology as a key tool to be utilized in clinical diagnosis of disease agents and/or for veterinary surveillance. Recent advances in direct sequencing of poultry and other avian samples for the detection of microbial agents are reviewed here. This review, although not comprehensive, highlights key developments in avian NGS-based technology for diagnostic uses during the last five years and discusses the future challenges for practical implementation, as well as potential applications in new areas related to poultry production. Abstract Direct-targeted next-generation sequencing (tNGS), with its undoubtedly superior diagnostic capacity over real-time PCR (RT-PCR), and direct-non-targeted NGS (ntNGS), with its higher capacity to identify and characterize multiple agents, are both likely to become diagnostic methods of choice in the future. tNGS is a rapid and sensitive method for precise characterization of suspected agents. ntNGS, also known as agnostic diagnosis, does not require a hypothesis and has been used to identify unsuspected infections in clinical samples. Implemented in the form of multiplexed total DNA metagenomics or as total RNA sequencing, the approach produces comprehensive and actionable reports that allow semi-quantitative identification of most of the agents present in respiratory, cloacal, and tissue samples. The diagnostic benefits of the use of direct tNGS and ntNGS are high specificity, compatibility with different types of clinical samples (fresh, frozen, FTA cards, and paraffin-embedded), production of nearly complete infection profiles (viruses, bacteria, fungus, and parasites), production of “semi-quantitative” information, direct agent genotyping, and infectious agent mutational information. The achievements of NGS in terms of diagnosing poultry problems are described here, along with future applications. Multiplexing, development of standard operating procedures, robotics, sequencing kits, automated bioinformatics, cloud computing, and artificial intelligence (AI) are disciplines converging toward the use of this technology for active surveillance in poultry farms. Other advances in human and veterinary NGS sequencing are likely to be adaptable to avian species in the future.
... Initially, the Kraken 2-k-mer mapping method was coupled with confirmation through BLASTN-full sequence alignment method. The utilisation of WGS data to detect viral integration is reliant on establishing homology between sequencing reads and known reference viral sequences [39]. This can be determined by either using a portion of the sequencing read and finding an exact match in the viral sequence (k-mers mapping), or full alignment of a sequencing read [39]. ...
... The utilisation of WGS data to detect viral integration is reliant on establishing homology between sequencing reads and known reference viral sequences [39]. This can be determined by either using a portion of the sequencing read and finding an exact match in the viral sequence (k-mers mapping), or full alignment of a sequencing read [39]. Kraken 2 uses k-mer mapping to provide a precise method of assigning viral taxonomy with large volumes of sequencing data against numerous vast viral databases including both the JGI and NCBI databases which include over 700, 000 viruses [24,26]. ...
... Kraken 2 uses k-mer mapping to provide a precise method of assigning viral taxonomy with large volumes of sequencing data against numerous vast viral databases including both the JGI and NCBI databases which include over 700, 000 viruses [24,26]. In addition, Kraken 2 has been shown to have high precision metrics, including high specificity, in benchmarking studies [39,40]. Complimenting this approach, BLASTN has shown high sensitivity for the detection of viral reads, with full sequences assessed against a viral reference genome [39]. ...
Article
Full-text available
Viruses are well known drivers of several human malignancies. A causative factor for oral cavity squamous cell carcinoma (OSCC) in patients with limited exposure to traditional risk factors, including tobacco use, is yet to be identified. Our study aimed to comprehensively evaluate the role of viral drivers in OSCC patients with low cumulative exposure to traditional risk factors. Patients under 50 years of age with OSCC, defined using strict anatomic criteria were selected for WGS. The WGS data was interrogated using viral detection tools (Kraken 2 and BLASTN), together examining >700,000 viruses. The findings were further verified using tissue microarrays of OSCC samples using both immunohistochemistry and RNA in situ hybridisation (ISH). 28 patients underwent WGS and comprehensive viral profiling. One 49-year-old male patient with OSCC of the hard palate demonstrated HPV35 integration. 657 cases of OSCC were then evaluated for the presence of HPV integration through immunohistochemistry for p16 and HPV RNA ISH. HPV integration was seen in 8 (1.2%) patients, all middle-aged men with predominant floor of mouth involvement. In summary, a wide-ranging interrogation of >700,000 viruses using OSCC WGS data showed HPV integration in a minority of male OSCC patients and did not carry any prognostic significance.
... K-mer tools find exact matches between small substrings (k-mers) of a read and a viral reference sequence from a database. Due to lower species identification sensitivity and specificity, these methods enable rapid sequence classification but struggle to identify divergent viral sequences [15]. Few tools use high-sensitivity protein alignment, but they need high memory and computational resources. ...
Article
Motivation The study of the Human Virome remains challenging nowadays. Viral metagenomics, through high-throughput sequencing data, is the best choice for virus discovery. The metagenomics approach is culture-independent and sequence-independent, helping search for either known or novel viruses. Though it is estimated that more than 40% of the viruses found in metagenomics analysis are not recognizable, we decided to analyze several tools to identify and discover viruses in RNA-seq samples. Results We have analyzed eight Virus Tools for the identification of viruses in RNA-seq data. These tools were compared using a synthetic dataset of 30 viruses and a real one. Our analysis shows that no tool succeeds in recognizing all the viruses in the datasets. So we can conclude that each of these tools has pros and cons, and their choice depends on the application domain. Availability Synthetic data used through the review and raw results of their analysis can be found at https://zenodo.org/record/6426147. FASTQ files of real data can be found in GEO (https://www.ncbi.nlm.nih.gov/gds) or ENA (https://www.ebi.ac.uk/ena/browser/home). Raw results of their analysis can be downloaded from https://zenodo.org/record/6425917.
Preprint
Full-text available
Background Implementation of clinical metagenomics and pathogen genomic surveillance can be particularly challenging due to the lack of bioinformatics tools and/or expertise. In order to face this challenge, we have previously developed INSaFLU (https://insaflu.insa.pt/), a free web-based bioinformatics platform for virus next-generation sequencing data analysis. Here, we considerably expanded its genomic surveillance component and developed a new module (TELEVIR) for metagenomic virus identification. Results The routine genomic surveillance component was strengthened with new workflows and functionalities, including: i) a reference-based genome assembly pipeline for Oxford Nanopore technologies (ONT) data; ii) automated SARS-CoV-2 lineage classification; iii) Nextclade analysis; iv) Nextstrain phylogeographic and temporal analysis (SARS-CoV-2, human and avian influenza, monkeypox, respiratory syncytial virus (RSV A/B), as well as a “generic” build for other viruses); and, v) algn2pheno (https://github.com/insapathogenomics/algn2pheno) for screening mutations of interest. Both INSaFLU pipelines for reference-based consensus generation (Illumina and ONT) were benchmarked against commonly used command line bioinformatics workflows for SARS-CoV-2, and an INSaFLU snakemake version was released. In parallel, a new module (TELEVIR) for virus detection was developed, after extensive benchmarking of state-of-the-art metagenomics software and following up-to-date recommendations and practices in the field. TELEVIR allows running complex workflows, covering several combinations of steps (e.g., with/without viral enrichment or host depletion), classification software (e.g., Kaiju, Kraken2, Centrifuge, FastViromeExplorer) and databases (RefSeq viral genome, Virosaurus, etc), while culminating in user- and diagnosis-oriented reports. Finally, to potentiate real-time virus detection during ONT runs, we developed findONTime (https://github.com/INSaFLU/findONTime), a tool aimed at reducing costs and the time between sample reception and diagnosis. Conclusion The accessibility, versatility and functionality of INSaFLU-TELEVIR is expected to supply public and animal health laboratories and researchers with a user-oriented and pan-viral bioinformatics framework that promotes a strengthened and timely viral metagenomic detection and routine genomics surveillance. INSaFLU-TELEVIR is compatible with Illumina, Ion Torrent and ONT data and is freely available at https://insaflu.insa.pt/ (online tool) and https://github.com/INSaFLU (code).
Article
Whole genome sequencing (WGS) datasets, usually generated for the investigation of the individual animal genome, can be used for additional mining of the fraction of sequencing reads that remains unmapped to the respective reference genome. A significant proportion of these reads contains viral DNA derived from viruses that infected the sequenced animals. In this study, we mined more than 480 billion sequencing reads derived from 1471 WGS datasets produced from cattle, pigs, chickens and rabbits. We identified 367 different viruses among which 14, 11, 12 and 1 might specifically infect the cattle, pig, chicken and rabbit, respectively. Some of them are ubiquitous, avirulent, highly or potentially damaging for both livestock and humans. Retrieved viral DNA information provided a first unconventional and opportunistic landscape of the livestock viromes that could be useful to understand the distribution of some viruses with potential deleterious impacts on the animal food production systems.