Eric K. Wafula's research while affiliated with Pennsylvania State University and other places

Publications (86)

Article
Full-text available
Background Plants have complex and dynamic immune systems that have evolved to resist pathogens. Humans have worked to enhance these defenses in crops through breeding. However, many crops harbor only a fraction of the genetic diversity present in wild relatives. Increased utilization of diverse germplasm to search for desirable traits, such as dis...
Article
Pediatric brain and spinal cancers are collectively the leading disease-related cause of death in children; thus, we urgently need curative therapeutic strategies for these tumors. To accelerate such discoveries, the Children's Brain Tumor Network (CBTN) and Pacific Pediatric Neuro-Oncology Consortium (PNOC) created a systematic process for tumor b...
Article
Full-text available
Plant genome-scale resources are being generated at an increasing rate as sequencing technologies continue to improve and raw data costs continue to fall; however, the cost of downstream analyses remains large. This has resulted in a considerable range of genome assembly and annotation qualities across plant genomes due to their varying sizes, comp...
Article
Full-text available
Mycoheterotrophy is an alternative nutritional strategy whereby plants obtain sugars and other nutrients from soil fungi. Mycoheterotrophy and associated loss of photosynthesis has evolved repeatedly in plants, particularly in monocots. Although reductive evolution of plastomes in mycoheterotrophs is well documented, the dynamics of nuclear genome...
Article
Full-text available
We assess relationships among 192 species in all 12 monocot orders and 72 of 77 families, using 602 conserved single-copy (CSC) genes and 1375 benchmarking single-copy ortholog (BUSCO) genes extracted from genomic and transcriptomic datasets. Phylogenomic inferences based on these data, using both coalescent-based and supermatrix analyses, are larg...
Preprint
Full-text available
Plant genome-scale resources are being generated at an increasing rate as sequencing technologies continue to improve and raw data costs continue to fall; however, the cost of downstream analyses remains large. This has resulted in a considerable range of genome assembly and annotation qualities across plant genomes due to their varying sizes, comp...
Article
Full-text available
The rapid development of sequencing technologies has led to a deeper understanding of plant genomes. However, direct experimental evidence connecting genes to important agronomic traits is still lacking in most non-model plants. For instance, the genetic mechanisms underlying plant architecture are poorly understood in pome fruit trees, creating a...
Preprint
Full-text available
The rapid development of sequencing technologies has led to a deeper understanding of horticultural plant genomes. However, experimental evidence connecting genes to important agronomic traits is still lacking in most non-model organisms. For instance, the genetic mechanisms underlying plant architecture are poorly understood in pome fruit trees, c...
Article
Significance Genomic structural variants (SVs) are frequent contributors to adaptation and speciation, but our understanding of their overall fitness consequences is limited, with data and analyses primarily available for humans and short-lived domesticated species. Here, we use 31 high-quality genome assemblies to study the evolutionary impact of...
Article
Full-text available
Cowpea (Vigna unguiculata) cultivar B301 is resistant to races SG4 and SG3 of the root parasitic weed Striga gesnerioides, developing a hypersensitive response (HR) at the site of parasite attachment. By contrast, race SG4z overcomes B301 resistance and successfully parasitises the plant. Comparative transcriptomics and in silico analysis identifie...
Article
Full-text available
Green plants (Viridiplantae) include around 450,000–500,000 species1,2 of great diversity and have important roles in terrestrial and aquatic ecosystems. Here, as part of the One Thousand Plant Transcriptomes Initiative, we sequenced the vegetative transcriptomes of 1,124 species that span the diversity of plants in a broad sense (Archaeplastida),...
Article
Green plants (Viridiplantae) include around 450,000–500,000 species1,2 of great diversity and have important roles in terrestrial and aquatic ecosystems. Here, as part of the One Thousand Plant Transcriptomes Initiative, we sequenced the vegetative transcriptomes of 1,124 species that span the diversity of plants in a broad sense (Archaeplastida),...
Article
Full-text available
Green plants (Viridiplantae) include around 450,000–500,000 species1,2 of great diversity and have important roles in terrestrial and aquatic ecosystems. Here, as part of the One Thousand Plant Transcriptomes Initiative, we sequenced the vegetative transcriptomes of 1,124 species that span the diversity of plants in a broad sense (Archaeplastida),...
Article
Parasitic plants in the genus Striga, commonly known as witchweeds, cause major crop losses in sub-Saharan Africa and pose a threat to agriculture worldwide. An understanding of Striga parasite biology, which could lead to agricultural solutions, has been hampered by the lack of genome information. Here, we report the draft genome sequence of Strig...
Article
Full-text available
Horizontal gene transfer (HGT), the movement and genomic integration of DNA across species boundaries, is commonly associated with bacteria and other microorganisms, but functional HGT (fHGT) is increasingly being recognized in heterotrophic parasitic plants that obtain their nutrients and water from their host plants through direct haustorial feed...
Article
Full-text available
Background Root parasitic weeds are a major constraint to crop production worldwide causing significant yearly losses in yield and economic value. These parasites cause their destruction by attaching to their hosts with a unique organ, the haustorium, that allows them to obtain the nutrients (sugars, amino acids, etc.) needed to complete their life...
Article
Plastid genomes (plastomes) vary enormously in size and gene content among the many lineages of nonphotosynthetic plants, but key lineages remain unexplored. We therefore investigated plastome sequence and expression in the holoparasitic and morphologically bizarre Balanophoraceae. The two Balanophora plastomes examined are remarkable, exhibiting f...
Article
Significance Horizontal gene transfer (HGT) is the nonsexual transfer and genomic integration of genetic materials between organisms. In eukaryotes, HGT appears rare, but parasitic plants may be exceptions, as haustorial feeding connections between parasites and their hosts provide intimate cellular contacts that could facilitate DNA transfer betwe...
Article
Full-text available
Parasitic plants in the Orobanchaceae cause serious agricultural problems worldwide. Parasitic plants develop a multi-cellular infectious organ called a haustorium after recognition of host-released signals. To understand the molecular events associated with host signal perception and haustorium development, we identified differentially regulated g...
Article
Full-text available
Plastid genomes of photosynthetic flowering plants are usually highly conserved in both structure and gene content. However, the plastomes of parasitic and mycoheterotrophic plants may be released from selective constraint due to the reduction or loss of photosynthetic ability. Here we present the greatly reduced and highly divergent, yet functiona...
Data
The coverage of Arabidopsis cDNAs shows a subtle gradation of assembly completeness. Unigenes were aligned to detected gene cDNAs to determine coverage, which was expressed as the percent of cDNA bases covered by assembled sequence. The darkest bar is 0% or “No Hit” and each progressively lighter bar is a bin containing genes covered in 10% increme...
Data
As minimum sequence length cutoffs are imposed the assembly landscape becomes more even. The effect on the N50 of assembled sequence length and N50 of Mbp of assembled sequence resulting from sequence length cutoffs (imposed at 100–600 bp) for the post-processed assemblies of Illumina biological replicate 1. (TIFF)
Data
Gene pairs with higher Ks are less likely to be co-expressed. The frequency of pairs with increasing Ks values were plotted revealing that pairs with lower Ks values were more likely to have expression sufficient (BS >0.1) for assembly of both pairs. Yet pairs with higher Ks values were more likely to have one mate with reads insufficient for assem...
Data
Alignment comparison of CLC and Trinity (Inchworm) unigenes representing the AT1G31330.1 transcript. The Sequence order from the top in each alignment (A-D) is gDNA (with a single intron—colored gray), cDNA, CDS and then unigene(s). A) Alignment of AT1G31330 reference sequences and the Inchworm BR1 unigenes (x2) annotated as AT1G31330. B) Alignment...
Data
Closely related genes in the “detected gene set” are not efficiently recovered. Gene pairs were identified by a reciprocal best BLASTn hit. Gene number is on the y axis, Ks value of pairs in on the x axis. Equivalent best-fit model components are identified by similar color. “TAIR10” pairs were identified from the comprehensive Arabidopsis cDNA col...
Data
The quality of unigenes as a function of sequencing depth for O. sativa. Publicly available data was retrieved from NCBI’s Sequence Read Archive and assembled with leading the reference based Mosaik and 3 leading de novo assemblers. While the data were insufficient to reconstruct a majority of the rice young leaf transcriptome, the inflection point...
Data
MEGAN classification of unigenes that do not align to Arabidopsis TAIR10 cDNAs. The classification was determined for unigenes that aligned to sequences in NR with a bit score >125. (TIFF)
Data
Normalization does not preferentially remove closely related gene pairs. Scatter plot of read counts to the detected gene set of the Illumina biological replicates 1 and 2 and the normalized Illumina data set. The log2 read counts +1 (to avoid taking the log of zero) for each gene were calculated for the Illumina Normalized data set and the Combine...
Data
This archive contains the SCERNA protocol. Necessary scripts are included and URLs for components (or alternatives) are also included. Instructions and scripts for generating plots (e.g. Fig 4) are also included. (ZIP)
Data
This files contains a table that summarizes alignment statistics for putative "new" Arabidopsis genes. The 5 columns contain database specific matches, though only the hits to nr were used for MEGAN analysis. Hits in other databases are described as "concordant" evidence in the manuscript text. (XLSX)
Data
This file contains statistics for primary and post processed assemblies of BR1, BR2, BR12, and NORM datasets. The change in each category is indicated in a shaded field with a delta sign, i.e. Mosaik-S Δ. (XLSX)
Data
Sequencing and alignment statistics for the normalized and non-normalized libraries used in this study. *Percentage of raw reads aligned. (TIFF)
Data
List of suffixes, abbreviations and gene lists. (TIFF)
Data
The recovery of EGPs follows a pronounced hit/no hit pattern. Bit Score (BS) frequency histogram and assembly summary table for Expressed Gene Pairs (EGPs). 473 gene pairs present in the Ks plot of the “Detected gene sets” (see S3 Fig) were absent in the Ks plot of the Mosaik assembly of BR1. For each assembly of BR1 the BS of each mate (946 genes)...
Data
Summary diagram of assembly error types. Type I assembly reports cases of incomplete assemblies where a given transcript is not assembled into a single sequence (Case I = gap, Case II = Insufficient overlap). Type I error can also consist of failure to bring contigs together (Case III) with sufficient overlap, presumably due to conflict. Type II er...
Data
This archive contains the results of the follow-up Type II error analysis. Only Type II Case 1 errors reliably identified true chimeras (see Training_data_BR1_CLCscaf.xlsx and S2_File_illustrations.pptx). Our follow-up analysis confirms that adjacent loci are co-assembled accurately and are not chimeric unigenes. Typically only a fraction of Type I...
Data
Summary of the 12 candidates chosen for a follow-up analysis by qRT-PCR. The arrows on the plot show the candidates that were chosen for this analysis. The “Probe–cDNA position” columns shows where on the reference cDNA the MicroArray probes hybridized. Generally, the poorly correlated candidates also had a poorer probe set, which may also have con...
Data
Type II error rates of Arabidopsis BR1 assemblies and the Rice young leaf transcriptome assemblies are similar. See S1 Fig for an error diagram. Unigenes from all assemblies are aligned to reference sequences with BLAST to allow for an unbiased estimation of Type II error. (TIFF)
Data
MEGAN classification of unigenes that do not align to Arabidopsis TAIR10 cDNAs. The classification was determined for unigenes that aligned to sequences in NR with a bit score ≥175. (TIFF)
Article
Full-text available
Whereas de novo assemblies of RNA-Seq data are being published for a growing number of species across the tree of life, there are currently no broadly accepted methods for evaluating such assemblies. Here we present a detailed comparison of 99 transcriptome assemblies, generated with 6 de novo assemblers including CLC, Trinity, SOAP, Oases, ABySS a...
Data
The poorly correlated candidates (disagreement between the microarray and RNA-Seq) were either cases of false positives on the array (2 of 3) or were erroneous signals from poorly annotated genes. This file details additional efforts to understand the reasons for disagreement between the two analyses. See also S10, S11 and S12 Figs. (DOCX)
Data
Gene expression correlations. Array BR1 vBR2: correlation of background corrected, normalized array intensities for biological replicates 1 and 2. Illumina BR1 v BR2: correlation of log2 read counts (reads +1) from the Illumina biological replicates 1 and 2 mapped to TAIR10 cDNAs. Array v Ilumina: correlation of log2 read counts (reads +1) mapped a...
Data
For well correlated genes all estimates of gene expression show excellent agreement. (see S11 Fig) Fold difference in expression relative to AtActin (AT3G18780.1) was determined for candidates indicated. Those within the linear portion the Array vs. RNA-Seq correlation with each method as appropriate (S11 Fig). Well correlated qRT-PCR candidates ar...
Data
Threshold alignment score of 125 is sufficient to exclude erroneous hits and classifying plant genes. The increase of non-assignment from alignment scores of 125 to 175 is minimal yet the instance of hits to plant genes is also decreased from alignment scores of 125 to 175. Depending on the desired outcome, alignment scores >125 can be used with co...
Data
The quality of assembled sequences as a function of sequencing depth for all Illumina assemblies. The units for “Assembly Quality” are Normalized Bit Score (BS, maximum of 2) and the units of “Sequence Depth” are Sequenced Fragments/bp (SFB). The number printed in the plot area is the number of assembled sequences with normalized Bit Score above 1....
Data
Full-text available
NCBI BLAST result (database: nr, BLASTp) of (A) P. aegyptiaca albumin1-1 (unigene 12653) and (B) P. aegyptiaca albumin1-2 (unigene 75797).
Data
Phylogeny of major lineage of plants, adapted from Soltis et al [42]. Legumes belong to the rosid order Fabales (blue box), while the parasites Phelipanche and Cuscuta represent derived lineages within the asterid orders Lamiales, (red box) and Solanales (green box), respectively.
Data
Full-text available
Amino acid alignment of insect toxin albumin 1 protein (Medicago_truncatula_albumin1_Q7XZC5) and inferred protein sequences for the two homologs in P. aegyptiaca, and (B) structure of the M. truncatula toxic albumin 1 gene. (A) Inferred protein sequence alignments are 57.3-58.3% identical and 72.7%-74.3% similar (= identity + conservative substitut...
Data
Developmental stages used for transcriptome sequencing in P. aegyptiaca[54] with characteristics of each stage and the expectation of host plant tissue contamination in library preparations.
Data
Alignments of the 3’ end of genomic and inferred CDS sequences of albumin 1 homologs from five Phelipanche species. Two genes are identified from P. aegyptiaca unigene 12653 (first five sequences, red bar) and unigene 75797 (yellow bar). Red box indicates putative stop codon.
Data
Full-text available
Maximum likelihood (ML) phylogeny of KNOTTIN homologs in broomrape species, Cuscuta pentagona and papilionoid legumes. ML and Bayesian Inference (BI) methods produced the same tree topology. Three Cuscuta pentagona sequences were obtained from the 1KP project and from additional independently prepared libraries. Other information as given (Figure 2...
Data
Evolutionary constraints in albumin 1 genes in Phelipanche and related legumes.
Data
HGT candidates BLAST database. Information that cannot be retrieved is marked as Not Applicable (NA). M: million; GB: Gigabase.
Data
Partial genomic DNA and cDNA alignments of M. truncatula albumin 1 (Medtr8g025950), P. aegyptiaca albumin1-1 (12653) and P. aegyptiaca albumin 1-2 (75797). Intron start and end positions are illustrated by arrows.
Data
Expression values for albumin 1 genes in P. aegyptiaca at different developmental stages. Expression levels were measured by number of mapped Reads to this gene Per Kilobase of sequence length per Million (M) library reads (RPKM) in Illumina sequence (G) libraries (PPGP). Developmental stages described in Table S2.

Citations

... Single nucleotide polymorphism (SNP) data has also been gathered and used for 200 cacao genotypes, providing representation for ten genetic groups and a resource for population genetics research [18]. Most recently, 31 genome assemblies from wild cacao accessions across the four genetic groups were published, further increasing the wealth of cacao genomic resources [19,20]. An additional set of seven high quality de novo assembled genomes were recently developed and will be released to the public as well [19,20]. ...
... (The dataset was downloaded from the PedcBioPortal, https://pedcbioportal. kidsfirstdrc.org/study/summary?id=openpbta,pbta_all (accessed on 16 May 2023) and compiled using the Open Pediatric Brain Tumor Atlas (OpenPBTA) and Pediatric Brain Tumor Atlas (PBTA, provisional) consortiums [40] (the keywordsfor the search were "brainstem glioma, diffuse intrinsic pontine glioma, diffuse midline glioma grade 4, diffuse midline glioma H2K27M WHO grade 4, diffuse midline glioma WHO grade 4 H3K27M mutant, DMG H3 K27M mutant WHO grade 4, diffuse midline high-grade glioma, diffuse hemispheric glioma H3 G34 mutant, WHO grade 4, and infiltrating DIPG")). The PedcBio-Portal enables the acquisition of CSV-formatted files for the compiled clinical metadata and expression values of the filtered patient subsets for further analyses [41][42][43][44]. ...
... The top 45 genes identified as being predictive of texture loss from the RF regression model were selected and classified into orthogroups pre-computed with the 26Gv2.0 scaffold using PlantTribes2 [51]. Orthogroup multiple sequence alignment, phylogenetic tree estimation, homology inference, and gene model evaluation were performed using genes from 16 Rosaceae genomes (the same 15 from [52] plus Malus baccata [53]) plus the scaffolding species following methods from [52]. ...
... New Phytologist non-nodulating species in angiosperms, for example, is explained by the convergent loss or loss-of-function of NIN genes (Griesmann et al., 2018). Within mycoheterotrophic plants, progressive decreases in gene expression precede highly convergent patterns of gene loss Timilsena et al., 2023). The evolution of highly derived and reduced body plans is a feature of many aquatic plants. ...
... All phylogenomic analyses in this study showed strong support for two major clades in Asparagaceae, each consisting of at least three subfamilies (Aphyllanthoideae not sampled here): (i) an Agavoideae + Brodiaeoideae + Scilloideae clade and (ii) an Asparagoideae + Lomandroideae + Nolinoideae clade ( Figure 3; Figures S8, S9 in Appendix S8). These results support previous findings based on plastome sequences (Steele et al., 2012;Chen et al., 2013;Lu et al., 2022;Ji et al., 2023;Bentz et al., 2024) and hundreds of nuclear genes (One Thousand Plant Transcriptomes Initiative, 2019; Timilsena et al., 2022). However, in those previous plastome trees, Lomandroideae is placed sister to a Nolinoideae + Asparagoideae clade with strong support, which agrees with our plastome phylogeny ( Figure 3B) but disagrees with our ASTRAL trees inferred from 1726 nuclear genes ( Figure 3A). ...
... We summarize quality parameters of currently available genomes in Table S13. Since 2021, three more genomes have been released, two P. pyrifolia assemblies for 'Nijisseiki' and 'Cuifuan', and P. communis 'Beurré d'Anjou' (Gao et al. 2021;Shirasawa et al. 2021;Zhang et al. 2022). Further, the P. communis cv. ...
... Polyploidy and asexuality Modern genomics has revealed episodes of ancient whole-genome duplication that preceded key innovations in several eukaryotic lineages, especially in flowering plants, all of which share a polyploid common ancestor [13,28,29]. In recent polyploids, multiple gene copies allow for higher physiological and phenotypic flexibility in response to environmental conditions [30,31]. ...
... Single nucleotide polymorphism (SNP) data has also been gathered and used for 200 cacao genotypes, providing representation for ten genetic groups and a resource for population genetics research [18]. Most recently, 31 genome assemblies from wild cacao accessions across the four genetic groups were published, further increasing the wealth of cacao genomic resources [19,20]. An additional set of seven high quality de novo assembled genomes were recently developed and will be released to the public as well [19,20]. ...
... Some studies indicated that hosts have evolved the ability to detect parasitic plant-specific signals to initiate signal transduction cascades that lead to an HR and prevent the haustorium penetration process of parasitic plants [48]. For instance, a cowpea cultivar resistant to S. gesnerioides was found to trigger a downstream signalling cascade to activate the avirulence (Avr) proteins, which is a positive regulator of the HR [49]. A similar case was also reported for the interaction between sunflowers and O. cumana. ...
... Taxon sampling percentage values were calculated using recent checklists and classification literature (Söderström et al., 2016;Brinda and Atwood, 2023). The trees are rooted with the hornworts, assumed to be the sister group of the setaphytes (Renzaglia et al., 2018), i.e., liverworts and mosses (Leebens-Mack et al., 2019). ...