Assessment of the Aphis glycines v1 (A_gly_v1) genome assembly. (a) BUSCO analysis of published aphid genome assemblies using the Arthropoda gene set of 1,066 conserved single copy genes. Bars show the proportions of genes found in each assembly as a percentage of the total gene set. (b) Taxon-annotated GC content-coverage plot of A_gly_v1. Each circle represents a scaffold in the assembly, scaled by length, and coloured by order-level NCBI taxonomy assigned by BlobTools. The X axis corresponds to the average GC content of each scaffold and the Y axis corresponds to the average coverage based on alignment of pooled A. glycines Illumina MiSeq short-read libraries from Wenger et. al. (2017). Marginal histograms show cumulative genome content (in Kb) for bins of coverage (Y axis) and GC content (X axis).

Assessment of the Aphis glycines v1 (A_gly_v1) genome assembly. (a) BUSCO analysis of published aphid genome assemblies using the Arthropoda gene set of 1,066 conserved single copy genes. Bars show the proportions of genes found in each assembly as a percentage of the total gene set. (b) Taxon-annotated GC content-coverage plot of A_gly_v1. Each circle represents a scaffold in the assembly, scaled by length, and coloured by order-level NCBI taxonomy assigned by BlobTools. The X axis corresponds to the average GC content of each scaffold and the Y axis corresponds to the average coverage based on alignment of pooled A. glycines Illumina MiSeq short-read libraries from Wenger et. al. (2017). Marginal histograms show cumulative genome content (in Kb) for bins of coverage (Y axis) and GC content (X axis).

Source publication
Preprint
Full-text available
Aphids are an economically important insect group due to their role as plant disease vectors. Despite this economic impact, genomic resources have only been generated for a small number of aphid species. The soybean aphid ( Aphis glycines Matsumura) was the third aphid species to have its genome sequenced and the first to use long-read sequence dat...

Contexts in source publication

Context 1
... assessed the quality of A_gly_v1 and a selection of published aphid genome assemblies by searching for conserved single copy genes using BUSCO ( Simão et al. 2015;Waterhouse et al. 2018) with the Arthropoda gene set (n=1066). A_gly_v1 contains full length copies of 93.9% of arthropod BUSCOs (Figure 1a), indicating a high level of genome completeness. However, compared to other aphid genome assemblies, A_gly_v1 has more than twice as many duplicated BUSCO genes (10.6% vs. 2.3 -4.7%). ...
Context 2
... doi: bioRxiv preprint first posted online Sep. 25, 2019; high GC content, indicating the presence of contamination (Supplementary Table 1; Figure 1b). Scaffolds in the primary "blob" account for the majority of A_gly_v1 sequence and are mostly assigned to Hemiptera as expected. ...
Context 3
... likely represents an underestimation of the hymenopteran content in A_gly_v1 as there are many scaffolds with unannotated taxonomy also clustering with the Hymenoptera scaffolds due to a lack of sequenced aphid parasitoid wasp genomes. Consistent with this, blast hit identities for Hymenoptera scaffolds are significantly lower than for Hemiptera scaffolds (Mann-Whitney U Test: p < 2.2x10 -16 , U = 1552300000; Supplementary Figure 1). Nonetheless, inspection of the A_gly_v1 official gene set (v1.0) reveals that scaffolds assigned to Hymenoptera contain 806 genes previously thought to be derived from A. glycines. ...
Context 4
... doi: bioRxiv preprint first posted online Sep. 25, 2019; instance as the parasitoid wasp and aphid have similar GC content (Figure 1b), making it difficult to distinguish between target species contigs and contamination. To identify libraries that contain high levels of contamination, I mapped the Illumina libraries derived from wildcaught biotype 4 aphids (n=13) to the Canu assembly and set aside libraries with low mapping efficiency (< 75% of reads mapped). ...
Context 5
... bioRxiv preprint first posted online Sep. 25, 2019; A_gly_v2 is contiguous, free from obvious contamination and is highly complete. Half of the genome is contained in only 40 scaffolds (941 scaffolds in total, longest scaffold = 7.28 Mb) and the scaffold N50 is increased by 1,342% (2.51 Mb vs 0.17 Mb) compared to A_gly_v1 (Table 1;Figure 2a). After exclusion of parasitoid wasp scaffolds from A_gly_v1, A_gly_v2 contains 4 Mb more sequence than A_gly_v1 (303 Mb vs. 299 Mb), and is close to the predicted genome size of 317 Mb based on flow cytometry ( Wenger et al. 2017). ...

Similar publications

Preprint
Full-text available
Background Inversion Symmetry is a generalization of the second Chargaff rule, stating that the count of a string of k nucleotides on a single chromosomal strand equals the count of its inverse (reverse-complement) k-mer. It holds for many species, both eukaryotes and prokaryotes, for ranges of k which may vary from 7 to 10 as chromosomal lengths v...
Preprint
Full-text available
Over 99% of ray-finned fishes (Actinopterygii) are teleosts, a clade that comprises half of all living vertebrates that have diversified across virtually all fresh and saltwater ecosystems. This ecological diversity raises the question of how the immunogenetic diversity required to persist under heterogeneous pathogen pressures evolved. The teleost...
Article
Full-text available
TAD boundaries are essential for organizing the chromatin spatial structure and regulating gene expression in eukaryotes. However, for large-scale pan-3D genome research, identifying conserved and specific TAD boundaries across different species or individuals is computationally challenging. Here, we present Tcbf, a rapid and powerful Python/R tool...
Article
Full-text available
Background: The emergence of multidrug-resistant bacteria remains poorly understood in the wild ecosystem and at the interface of habitats. Here, we explored the spread of Escherichia coli containing IncI1-ST3 plasmid encoding resistance gene cefotaximase-Munich-1 (blaCTX-M-1) in human-influenced habitats and wild fauna using a genomic approach....
Article
Full-text available
Among the numerous lineages of teleost fish that have independently transitioned from obligate water-breathing to facultative air-breathing, evolved properties of hemoglobin (Hb)-O2 transport may have been shaped by the prevalence and severity of aquatic hypoxia (which influences the extent to which fish are compelled to switch to aerial respiratio...

Citations

... Recently, considering that aphid genome assemblies could be greatly improved by combining NGS to additional tools, such as single-molecule optical mapping technologies (Giani et al., 2020), some aphids genomes have been revisited and in particular the genomes of A. glycines has been independently revised by two different consortia making available greatly improved genomes (Giordano et al., 2020;Mathers, 2019). In particular, Mathers (2019) recently developed methods to reassemble the soybean aphid genome using the original sequence data and the version 2 of the A. glycines genome assembly is highly contiguous, containing half of the genome in only 40 scaffolds. ...
... Recently, considering that aphid genome assemblies could be greatly improved by combining NGS to additional tools, such as single-molecule optical mapping technologies (Giani et al., 2020), some aphids genomes have been revisited and in particular the genomes of A. glycines has been independently revised by two different consortia making available greatly improved genomes (Giordano et al., 2020;Mathers, 2019). In particular, Mathers (2019) recently developed methods to reassemble the soybean aphid genome using the original sequence data and the version 2 of the A. glycines genome assembly is highly contiguous, containing half of the genome in only 40 scaffolds. At the same time, Giordano et al. (2020) published a new version of the A. glycines genome, obtained from biotype 1 sampled in Illinois (USA), that is resulting from the combined use of NGS and the optical mapping technologies, together with the support of cytogenetic data (Giordano et al., 2020;Mandrioli et al., 2019a). ...
... Successive studies, based on manually selected sets of genes (Mandrioli et al., , 2019a) and on bioinformatic comparative genomics (Mandrioli et al., 2019b), enlarged the set of genes mapped in different aphid species and evidenced the presence of numerous traits of synteny between aphids and the fly Drosophila melanogaster, despite they diverged about 300 million years ago (Hedges et al., 2015). This suggestion has been recently supported by Mathers (2019) that put in evidence a synteny of the insect-specific gene cluster Osiris Smith et al., 2018) between D. melanogaster and A. glycines. ...
Chapter
In the last decade the genomes of several aphid species have been sequenced allowing a better understanding of their biology and evolution. Unfortunately, as frequently occurs with the next generation sequencing technologies, several aphid genomes consist in fragmented assemblies that contain thousands of genomic scaffolds of reduced length. In order to improve the quality of the published genomic data, several research groups are currently resequencing aphid DNA making possible to take the full advantage of genomics to face complex biological problems, such as aphid diversification. This review is aimed to discuss the current state of art in aphid genomics focusing in particular on the aspects that could improve our knowledge of their evolution.