Summary of MinION sequencing

Summary of MinION sequencing

Source publication
Article
Full-text available
Background Short-read sequencing technologies have made microbial genome sequencing cheap and accessible. However, closing genomes is often costly and assembling short reads from genomes that are repetitive and/or have extreme %GC content remains challenging. Long-read, single-molecule sequencing technologies such as the Oxford Nanopore MinION have...

Contexts in source publication

Context 1
... data was subsampled from Av JG3, Fs ARS-166-14, and Ps JKS002128 to determine the mini- mum read depth required to create contiguous MinION-based assemblies. Fast5-formatted reads for each strain were subsampled in the order that they were acquired from the MinION sequencer to achieve 10X, 20X, 30X, 40X, 50X, (for Fs ARS-166-14, Av JG3 and Ps JKS002128), 60X (Fs ARS-166-14 and Ps JKS002128 only), and 70X (Ps JKS002128 only) coverage of the Canu assembly for each strain, calculated using the mean MinION read length for that strain (Table 2). This strategy was used to simulate runs stopped after achiev- ing each level of coverage. ...
Context 2
... sequenced the genomes of nine bacterial strains using both Oxford Nanopore MinION and Illumina MiSeq technologies, together spanning a wide range of GC content (Flavobacterium: 31%; Aeromonas: 59-61%; Pseudonocardia: 74%). MinION sequencing coverage ranged from 40-135X and generated median read lengths of 1629-9665 bps (Table 2). Median MinION read lengths for Ah CA-13-1 and Av CIP107763 T were considerably shorter than for the other MinION libraries due to difficulties in extracting high molecular weight DNA from these strains. ...
Context 3
... assemblies were more contiguous and had higher N50 values than all MiSeq-based assemblies, except for Av CIP107763 T Unicycler-hybrid and SPAdes-hybrid assem- blies and the Ah CA-13-1 Unicycler-hybrid assembly (Fig. 1a, b). These two strains had lower quality MinION librar- ies ( Table 2) that likely compromised the Canu assemblies, even if they were still more contiguous than the MiSeq-only SPAdes and Unicycler assemblies. Canu assemblies were used as the reference for polishing with either Nanopolish or Pilon, and so the number of contigs was the same for the Canu, Canu+Nanopolish, and Canu +Pilon assemblies (Fig. 1). ...

Similar publications

Article
Full-text available
Cassava brown streak disease is caused by cassava brown streak virus (CBSV) and Uganda cassava brown streak virus (UCBSV). Many of the CBSV and UCBSV diversity studies utilize partial coat protein sequences due to the unavailability of representative full genome sequences. Hence, there is little information on the diversity of cassava brown streak...
Article
Full-text available
Background: Currently available short read genome assemblies of the tetraploid protozoan parasite Giardia intestinalis are highly fragmented, highlighting the need for improved genome assemblies at a reasonable cost. Long nanopore reads are well suited to resolve repetitive genomic regions resulting in better quality assemblies of eukaryotic genom...

Citations

... Reconstructing complete bacterial genomes using de novo assembly methods had been considered too costly and timeconsuming to be widely recommended in most cases, even as recently as 2015 [1]. This was due to the reliance on short-read sequencing technologies, which does not allow for reconstructing regions with repeats and extremely high GC content [2]. However, since then, advances in long-read sequencing technologies have allowed for the automatic construction of complete genomes using hybrid assembly approaches. ...
Article
Full-text available
Improvements in the accuracy and availability of long-read sequencing mean that complete bacterial genomes are now routinely reconstructed using hybrid (i.e. short- and long-reads) assembly approaches. Complete genomes allow a deeper understanding of bacterial evolution and genomic variation beyond single nucleotide variants. They are also crucial for identifying plasmids, which often carry medically significant antimicrobial resistance genes. However, small plasmids are often missed or misassembled by long-read assembly algorithms. Here, we present Hybracter which allows for the fast, automatic and scalable recovery of near-perfect complete bacterial genomes using a long-read first assembly approach. Hybracter can be run either as a hybrid assembler or as a long-read only assembler. We compared Hybracter to existing automated hybrid and long-read only assembly tools using a diverse panel of samples of varying levels of long-read accuracy with manually curated ground truth reference genomes. We demonstrate that Hybracter as a hybrid assembler is more accurate and faster than the existing gold standard automated hybrid assembler Unicycler. We also show that Hybracter with long-reads only is the most accurate long-read only assembler and is comparable to hybrid methods in accurately recovering small plasmids.
... However, it is challenging to assemble plastid and mitochondrial genomes that harbor repeats longer than the read length of a single-type platform for short reads [32]. Long reads generated by the ONT or PacBio platform can improve the accuracy and reliability of organelle genome structure compared with those generated by a short-read-based assembly [33,34]. ...
Article
Full-text available
Background Corydalis DC., the largest genus in the family Papaveraceae, comprises > 465 species. Complete plastid genomes (plastomes) of Corydalis show evolutionary changes, including syntenic arrangements, gene losses and duplications, and IR boundary shifts. However, little is known about the evolution of the mitochondrial genome (mitogenome) in Corydalis. Both the organelle genomes and transcriptomes are needed to better understand the relationships between the patterns of evolution in mitochondrial and plastid genomes. Results We obtained complete plastid and mitochondrial genomes from Corydalis pauciovulata using a hybrid assembly of Illumina and Oxford Nanopore Technologies reads to assess the evolutionary parallels between the organelle genomes. The mitogenome and plastome of C. pauciovulata had sizes of 675,483 bp and 185,814 bp, respectively. Three ancestral gene clusters were missing from the mitogenome, and expanded IR (46,060 bp) and miniaturized SSC (202 bp) regions were identified in the plastome. The mitogenome and plastome of C. pauciovulata contained 41 and 67 protein-coding genes, respectively; the loss of genes was a plastid-specific event. We also generated a draft genome and transcriptome for C. pauciovulata. A combination of genomic and transcriptomic data supported the functional replacement of acetyl-CoA carboxylase subunit β (accD) by intracellular transfer to the nucleus in C. pauciovulata. In contrast, our analyses suggested a concurrent loss of the NADH-plastoquinone oxidoreductase (ndh) complex in both the nuclear and plastid genomes. Finally, we performed genomic and transcriptomic analyses to characterize DNA replication, recombination, and repair (DNA-RRR) genes in C. pauciovulata as well as the transcriptomes of Liriodendron tulipifera and Nelumbo nuicifera. We obtained 25 DNA-RRR genes and identified their structure in C. pauciovulata. Pairwise comparisons of nonsynonymous (dN) and synonymous (dS) substitution rates revealed that several DNA-RRR genes in C. pauciovulata have higher dN and dS values than those in N. nuicifera. Conclusions The C. pauciovulata genomic data generated here provide a valuable resource for understanding the evolution of Corydalis organelle genomes. The first mitogenome of Papaveraceae provides an example that can be explored by other researchers sequencing the mitogenomes of related plants. Our results also provide fundamental information about DNA-RRR genes in Corydalis and their related rate variation, which elucidates the relationships between DNA-RRR genes and organelle genome stability.
... However, it is important to acknowledge that challenges persist with any genome assembly (Peona et al. 2018;Weissensteiner and Suh 2019). Although continuity and completeness have significantly improved, some genomic regions with high GC content and repetitive elements may still present challenges for accurate assembly (Chen et al. 2013;Goldstein et al. 2019). ...
Article
The European green woodpecker, Picus viridis, is a widely distributed species found in the Western Palearctic region. Here, we assembled a highly contiguous genome assembly for this species using a combination of short- and long-read sequencing and scaffolded with chromatin conformation capture (Hi-C). The final genome assembly was 1.28 Gb and features a scaffold N50 of 37 Mb and a scaffold L50 of 39.165 Mb. The assembly incorporates 89.4% of the genes identified in birds in OrthoDB. Gene and repetitive content annotation on the assembly detected 15,805 genes and a ∼30.1% occurrence of repetitive elements, respectively. Analysis of synteny demonstrates the fragmented nature of the P. viridis genome when compared to the chicken (Gallus gallus). The assembly and annotations produced in this study will certainly help for further research into the genomics of P. viridis and the comparative evolution of woodpeckers. Five historical and seven contemporary samples have been resequenced and may give insights on the population history of this species.
... Strain panel MP-9 was obtained from ATCC, which is comprised of six strains representing each of the non-O157 STEC serogroups, colloquially referred to as the "Big Six" (Eklund et al., 2001;Johnson et al., 2006;Bettelheim, 2007;Hadler et al., 2011;Hegde et al., 2012;Gould et al., 2013;Vishram et al., 2021;Supplementary Table S1). STEC genomes house an extensive and partly repetitive phage complement that hampers assembly into closed genomes (Goldstein et al., 2019;Jaudou et al., 2022). In response, we applied a long-and short read sequencing hybrid strategy (Nyong et al., 2020;Allué-Guardia et al., 2022) that allowed us to provide the high-quality closed genomes, including carried plasmids (Figures 1, 2; Supplementary Figures S1, S2). ...
Article
Full-text available
Shiga toxin (Stx)-producing Escherichia coli (STEC) of non-O157:H7 serotypes are responsible for global and widespread human food-borne disease. Among these serogroups, O26, O45, O103, O111, O121, and O145 account for the majority of clinical infections and are colloquially referred to as the “Big Six.” The “Big Six” strain panel we sequenced and analyzed in this study are reference type cultures comprised of six strains representing each of the non-O157 STEC serogroups curated and distributed by the American Type Culture Collection (ATCC) as a resource to the research community under panel number ATCC MP-9. The application of long- and short-read hybrid sequencing yielded closed chromosomes and a total of 14 plasmids of diverse functions. Through high-resolution comparative phylogenomics, we cataloged the shared and strain-specific virulence and resistance gene content and established the close relationship of serogroup O26 and O103 strains featuring flagellar H-type 11. Virulence phenotyping revealed statistically significant differences in the Stx-production capabilities that we found to be correlated to the strain’s individual stx-status. Among the carried Stx1a, Stx2a, and Stx2d phages, the Stx2a phage is by far the most responsive upon RecA-mediated phage mobilization, and in consequence, stx2a + isolates produced the highest-level of toxin in this panel. The availability of high-quality closed genomes for this “Big Six” reference set, including carried plasmids, along with the recorded genomic virulence profiles and Stx-production phenotypes will provide a valuable foundation to further explore the plasticity in evolutionary trajectories in these emerging non-O157 STEC lineages, which are major culprits of human food-borne disease.
... Beyond CheckM2 completeness and contamination estimates, we also compared how long reads affected the number of high-quality MAGs obtained as defined by the MIMAG criteria (CheckM completeness >90%, contamina tion <5%, 5S, 16S, 23S rRNA genes present, at least 18 tRNA genes present) (51); 44.2% of long-read MAGs were high-quality (50/113), 23.1% for hybrid MAGs (40/173), 3.5% for short-read 20-Gbp MAGs (5/142), and 3.1% for short-read 40-Gbp MAGs (6/191). Secondary metabolite BGCs can be difficult to assemble using short reads (52). We used antiSMASH to compare how many BGCs were reconstructed in each of the assembly approaches. ...
Article
Full-text available
Shotgun metagenomics enables the reconstruction of complex microbial communities at a high level of detail. Such an approach can be conducted using both short-read and long-read sequencing data, as well as a combination of both. To assess the pros and cons of these different approaches, we used 22 fecal DNA extracts collected weekly for 11 weeks from two respective lab mice to study seven performance metrics over four combinations of sequencing depth and technology: (i) 20 Gbp of Illumina short-read data, (ii) 40 Gbp of short-read data, (iii) 20 Gbp of PacBio HiFi long-read data, and (iv) 40 Gbp of hybrid (20 Gbp of short-read +20 Gbp of long-read) data. No strategy was best for all metrics; instead, each one excelled across different metrics. The long-read approach yielded the best assembly statistics, with the highest N50 and lowest number of contigs. The 40 Gbp short-read approach yielded the highest number of refined bins. Finally, the hybrid approach yielded the longest assemblies and the highest mapping rate to the bacterial genomes. Our results suggest that while long-read sequencing significantly improves the quality of reconstructed bacterial genomes, it is more expensive and requires deeper sequencing than short-read approaches to recover a comparable amount of reconstructed genomes. The most optimal strategy is study-specific and depends on how researchers assess the trade-off between the quantity and quality of recovered genomes. IMPORTANCE Mice are an important model organism for understanding the gut microbiome. When studying these gut microbiomes using DNA techniques, researchers can choose from technologies that use short or long DNA reads. In this study, we perform an extensive benchmark between short- and long-read DNA sequencing for studying mice gut microbiomes. We find that no one approach was best for all metrics and provide information that can help guide researchers in planning their experiments.
... The development of more robust read assembly and read correction tools and pipelines is still an area to explore. Studies have shown that the usage of mix-and-matched freely available read assembly and read correction tools significantly improves not only assembly parameters, but also antimicrobial resistant genes detection, plasmid identification and pan-genome analysis with and without using short sequencing reads for read correction [14,16,[40][41][42]. In addition, adjusting the read assembly and/or read correction tools parameters could be beneficial. ...
Article
Full-text available
Background: Eukaryotes’ whole-genome sequencing is crucial for species identification, gene detection, and protein annotation. Oxford Nanopore Technology (ONT) is an affordable and rapid platform for sequencing eukaryotes; however, the relatively higher error rates require computational and bioinformatic efforts to produce more accurate genome assemblies. Here, we evaluated the effect of read correction tools on eukaryote genome completeness, gene detection and protein annotation. Methods: Reads generated by ONT of four eukaryotes, C. albicans, C. gattii, S. cerevisiae, and P. falciparum, were assembled using minimap2 and underwent three rounds of read correction using flye, medaka and racon. The generates consensus FASTA files were compared for total length (bp), genome completeness, gene detection, and protein-annotation by QUAST, BUSCO, BRAKER1 and InterProScan, respectively. Results: Genome completeness was dependent on the assembly method rather than on the read correction tool; however, medaka performed better than flye and racon. Racon significantly performed better than flye and medaka in gene detection, while both racon and medaka significantly performed better than flye in protein-annotation. Conclusion: We show that three rounds of read correction significantly affect gene detection and protein annotation, which are dependent on assembly quality in preference to assembly completeness.
... There are many programmes currently available for the assembly of bacterial genomes from ONT sequences, and previous studies have compared some of the different assembly approaches (e.g. [38,39,40,41], yet no clear consensus pipeline has been developed. ...
Article
Full-text available
This study presents the assembly and comparative genomic analysis of luminous Photobacterium strains isolated from the light organs of 12 fish species using Oxford Nanopore Technologies (ONT) sequencing. The majority of assemblies achieved chromosome-level continuity, consisting of one large (>3 Mbp) and one small (~1.5 Mbp) contig, with near complete BUSCO scores along with varying plasmid sequences. Leveraging this dataset, this study significantly expanded the available genomes for P. leiognathi and its subspecies P. ‘mandapamensis’ , enabling a comparative genomic analysis between the two lineages. An analysis of the large and small chromosomes unveiled distinct patterns of core and accessory genes, with a larger fraction of the core genes residing on the large chromosome, supporting the hypothesis of secondary chromosome evolution from megaplasmids in Vibrionaceae. In addition, we discovered a proposed new species, Photobacterium acropomis sp. nov., isolated from an acropomatid host, with an average nucleotide identify (ANI) of 93 % compared to the P. leiognathi and P. ‘mandapamensis’ strains. A comparison of the P. leiognathi and P. ‘mandapamensis’ lineages revealed minimal differences in gene content, yet highlighted the former’s larger genome size and potential for horizontal gene transfer. An investigation of the lux-rib operon, responsible for light production, indicated congruence between the presence of luxF and host family, challenging its role in differentiating P. ‘mandapamensis’ from P. leiognathi . Further insights were derived from the identification of metabolic differences, such as the presence of the NADH:quinone oxidoreductase respiratory complex I in P. leiognathi as well as variations in the type II secretion system (T2S) genes between the lineages, potentially impacting protein secretion and symbiosis. In summary, this study advances our understanding of Photobacterium genome evolution, highlighting subtle differences between closely related lineages, specifically P. leiognathi and P. ‘mandapamensis’ . These findings highlight the benefit of long read sequencing for bacterial genome assembly and pangenome analysis and provide a foundation for exploring early bacterial speciation processes of these facultative light organ symbionts.
... but it is important to acknowledge that challenges persist with any genome assembly (Peona et al. 2018;Weissensteiner and Suh 2019). Although continuity and completeness have significantly improved, some genomic regions with high GC content and repetitive elements may still present challenges for accurate assembly (Chen et al. 2013;Goldstein et al. 2019). ...
Preprint
Full-text available
The European Green Woodpecker, Picus viridis , is a widely distributed species found in the Western Palearctic region. Here we assembled a highly contiguous genome assembly for this species using a combination of short and long reads sequencing and scaffolded with chromatin conformation capture (Hi-C). The final genome assembly was 1.28 Gb and features a scaffold N50 of 37Mb and a scaffold L50 of 39.165 Mb. The assembly incorporates 89.4% of the genes identified in birds in OrthoDB. Gene and repetitive content annotation on the assembly detected 15,805 genes and a 30.1% occurrence of repetitive elements, respectively. Analysis of synteny demonstrates the fragmented nature of the Picus viridis genome when compared to the chicken ( Gallus gallus ). The assembly and annotations produced in this study will certainly help for further research into the genomics of P. viridis and the comparative evolution of woodpeckers.
... However, they only generate reads of a few hundreds of bases. Bacterial genomes present repetitive regions, and this technology cannot disambiguate repetitive regions of the genome when they are longer than the read size (Goldstein et al., 2019). In addition, sample preparation protocols often involve multiple time-consuming steps. ...
Article
Full-text available
Introduction Whole Genome Sequencing (WGS) implementation in food safety laboratories is a significant advancement in food pathogen control and outbreak tracking. However, the initial investment for acquiring next-generation sequencing platforms and the need for bioinformatic skills represented an obstacle for the widespread use of WGS. Long-reading technologies, such as the one developed by Oxford Nanopore Technologies, can be easily implemented with a minor initial investment and with simple protocols that can be performed with basic laboratory equipment. Methods Herein, we report a simple MinION Galaxy-based workflow with analysis parameters that allow its implementation in food safety laboratories with limited computer resources and without previous knowledge in bioinformatics for rapid Salmonella serotyping, virulence, and identification of antimicrobial resistance genes. For that purpose, the single use Flongle flow cells, along with the MinION Mk1B for WGS, and the community-driven web-based analysis platform Galaxy for bioinformatic analysis was used. Three strains belonging to three different serotypes, monophasic S . Typhimurium, S . Grancanaria, and S . Senftenberg, were sequenced. Results After 24 h of sequencing, enough coverage was achieved in order to perform de novo assembly in all three strains. After evaluating different tools, Flye de novo assemblies with medaka polishing were shown to be optimal for in silico Salmonella spp. serotyping with SISRT tool followed by antimicrobial and virulence gene identification with ABRicate. Discussion The implementation of the present workflow in food safety laboratories with limited computer resources allows a rapid characterization of Salmonella spp. isolates.
... Illumina sequencing platform has become the most widely used method for metagenomic studies because of its high accuracy (0.1-1% error rates) and throughput [9]. However, Illumina short-read sequences often result in highly fragmented genomes when performing de novo assemblies for environmental samples and pure cultures, since short reads fail to correctly assemble genomic regions containing longer repetitive elements [4,10]. This fragmentation problem is magnified due to the existence of intergenomic repeats, especially when sequenced microbial communities contain closely related species or subspecies in different and unknown abundances [4,11,12]. ...
... Represented by Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), the recently emerged third-generation sequencing platforms offer a possible solution to partly resolve ambiguous repetitive regions and to improve genome contiguity [10,13]. Although these platforms are criticized due to its considerably high error rate (> 10%), the produced long reads (up to 10-12 kb of mean read length) can generate genomes with high degree of completeness [9,14,15]. ...
Article
Full-text available
Background Mangrove wetlands are coastal ecosystems with important ecological features and provide habitats for diverse microorganisms with key roles in nutrient and biogeochemical cycling. However, the overall metabolic potentials and ecological roles of microbial community in mangrove sediment are remained unanswered. In current study, the microbial and metabolic profiles of prokaryotic and fungal communities in mangrove sediments were investigated using metagenomic analysis based on PacBio single-molecule real time (SMRT) and Illumina sequencing techniques. Results Comparing to Illumina short reads, the incorporation of PacBio long reads significantly contributed to more contiguous assemblies, yielded more than doubled high-quality metagenome-assembled genomes (MAGs), and improved the novelty of the MAGs. Further metabolic reconstruction for recovered MAGs showed that prokaryotes potentially played an essential role in carbon cycling in mangrove sediment, displaying versatile metabolic potential for degrading organic carbons, fermentation, autotrophy, and carbon fixation. Mangrove fungi also functioned as a player in carbon cycling, potentially involved in the degradation of various carbohydrate and peptide substrates. Notably, a new candidate bacterial phylum named as Candidatus Cosmopoliota with a ubiquitous distribution is proposed. Genomic analysis revealed that this new phylum is capable of utilizing various types of organic substrates, anaerobic fermentation, and carbon fixation with the Wood-Ljungdahl (WL) pathway and the reverse tricarboxylic acid (rTCA) cycle. Conclusions The study not only highlights the advantages of HiSeq-PacBio Hybrid assembly for a more complete profiling of environmental microbiomes but also expands our understanding of the microbial diversity and potential roles of distinct microbial groups in biogeochemical cycling in mangrove sediment.