ArticlePDF Available

Digital Gene Expression Profiling of the Phytophthora sojae Transcriptome

Authors:

Abstract and Figures

The transcriptome of the oomycete plant pathogen Phytophthora sojae was profiled at ten different developmental and infection stages based on a 3?-tag digital gene-expression protocol. More than 90 million clean sequence tags were generated and compared with the P. sojae genome and its 19,027 predicted genes. A total of 14,969 genes were detected, of which 10,044 were deemed reliable because they mapped to unambiguous tags. A comparison of the whole-library genes' expression patterns suggested four groups: i) mycelia and zoosporangia, ii) zoospores and cysts, iii) germinating cysts, and iv) five infection site libraries (IF1.5 to IF24h). The libraries from the different groups showed major transitional shifts in gene expression. From the ten libraries, 722 gene expression?pattern clusters were obtained and the top 16 clusters, containing more than half of the genes, comprised enriched genes with different functions including protein localization, triphosphate metabolism, signaling process, and noncoding RNA metabolism. An evaluation of the average expression level of 30 pathogenesis-related gene families revealed that most were infection induced but with diverse expression patterns and levels. A web-based server named the Phytophthora Transcriptional Database has been established.
Content may be subject to copyright.
1530 / Molecular Plant-Microbe Interactions
MPMI Vol. 24, No. 12, 2011, pp. 1530–1539. doi:10.1094/MPMI -05-11-0106. © 2011 The American Phytopathological Society
Digital Gene Expression Profiling
of the Phytophthora sojae Transcriptome
Wenwu Ye,1 Xiaoli Wang,1 Kai Tao,1 Yuping Lu,1 Tingting Dai,1 Suomeng Dong,1 Daolong Dou,1
Mark Gijzen,2 and Yuanchao Wang1
1Department of Plant Pathology, Nanjing Agricultural University, Nanjing 210095, China; 2Agriculture and Agri-Food Canada,
1391 Sandford Street, London, Ontario N5V 4T3, Canada
Submitted 13 May 2011. Accepted 11 August 2011.
The transcriptome of the oomycete plant pathogen Phytoph-
thora sojae was profiled at ten different developmental and
infection stages based on a 3-tag digital gene-expression
protocol. More than 90 million clean sequence tags were
generated and compared with the P. s o ja e genome and its
19,027 predicted genes. A total of 14,969 genes were de-
tected, of which 10,044 were deemed reliable because they
mapped to unambiguous tags. A comparison of the whole-
library genes’ expression patterns suggested four groups: i)
mycelia and zoosporangia, ii) zoospores and cysts, iii) ger-
minating cysts, and iv) five infection site libraries (IF1.5 to
IF24h). The libraries from the different groups showed
major transitional shifts in gene expression. From the ten
libraries, 722 gene expression–pattern clusters were ob-
tained and the top 16 clusters, containing more than half of
the genes, comprised enriched genes with different functions
including protein localization, triphosphate metabolism,
signaling process, and noncoding RNA metabolism. An
evaluation of the average expression level of 30 pathogene-
sis-related gene families revealed that most were infection
induced but with diverse expression patterns and levels. A
web-based server named the Phytophthora Transcriptional
Database has been established.
Phytophthora sojae is an oomycete plant pathogen that
causes stem and root rot of soybean. The oomycetes are fun-
gus-like organisms that are evolutionarily related to algae and
are classified in the kingdom Stramenopila (Baldauf et al.
2000; Forster et al. 1990; Harper et al. 2005). The economic
impact of P. s o j a e is large, as it is responsible for $1 to 2 bil-
lion in crop losses per year worldwide (Tyler 2007). Thus, this
organism has been the focus of molecular genetic and genomic
studies and is a model species for the study of oomycete plant
pathogens (Tyler 2007), along with the potato and tomato
pathogen Phytophthora infestans and the Arabidopsis thaliana
pathogen Hyaloperonospora arabidopsidis (Coates and
Beynon 2010).
P. s o j a e has a narrow host range and is restricted primarily
to soybean (Erwin and Ribeiro 1996). It is a homothallic or-
ganism that propagates clonally and through rare sexual out-
crossing (Tyler 2007). Asexual single-celled zoospores are bi-
flagellate, motile, and chemotactic to isoflavonoids secreted by
soybean roots (Morris et al. 1998; Hua et al. 2008; Tyler
2002). Zoospores encyst and germinate on the root or hypo-
cotyl surface, and the resulting germ tube may swell to form
an appressorium-like structure at the point of penetration into
host tissues (Moy et al. 2004; Tyler 2007). Soybean cultivars
that carry an effective resistance (Rps) gene to an attacking P.
sojae strain react rapidly with a hypersensitive response (HR),
which is activated within hours of zoospore attachment and
arrests further pathogen growth. This is characteristic of a re-
sistant or incompatible interaction. In contrast, no early HR
occurs in a susceptible or compatible interaction, and P. s oj ae ,
which is hemibiotrophic, is able to colonize host cells in an
initial biotrophic phase of growth that lasts for approximately
12 h (Moy et al. 2004). At later stages of infection, the patho-
gen enters a necrotrophic growth mode, spreading quickly
throughout host tissues, causing large, water-soaked, necrotic
lesions and leaving dead host cells in its wake (Supplementary
Fig. S1).
Defining the P. s o j a e transcriptome is an important molecu-
lar strategy to study gene function and to dissect the molecular
events that accompany pathogenesis. Gene transcripts can be
profiled by high throughput techniques such as serial analysis
of gene expression (Velculescu et al. 1995), microarray
(Lockhart et al. 1996; Schena et al. 1995), and sequencing of
clones from cDNA libraries (Adams et al. 1995; Asmann et al.
2009; Boguski et al. 1994). For the last decade, expressed se-
quence tag (EST) analysis and oligo-nucleotide microarrays
have been relied upon for transcriptional profiling of P. s o j a e
and its interaction with the soybean host plant. At least 31,314
P. s o ja e EST have been generated from a variety of tissues and
conditions, including free-swimming zoospores, germinating
cysts, mycelium, and P. s o j a e -infected soybean tissue (Qutob
et al. 2000; Torto-Alalibo et al. 2007). An amplified-cDNA mi-
croarray containing 3,927 soybean and 969 P. s oj ae sequences
was constructed to examine plant and pathogen gene expres-
sion over a timecourse of infection (Moy et al. 2004; Tyler
2007). More recently, a commercial array containing 37,637
soybean and 15,820 P. s o j ae targets was used to profile tran-
scription during infection (Dong et al. 2009; Qutob et al. 2009;
Zhou et al. 2009).
The published 95-Mbp P. s o ja e genome and its 19,027 pre-
dicted gene models (Tyler et al. 2006) provide important ref-
erence points for more comprehensive transcriptional studies
that can be done using next-generation sequencing (NGS)
technology (Mardis 2008). The NGS technologies offer an
opportunity to exhaustively sample transcripts and digitally
measure transcription levels in particular organs, tissues, or
cells, under different treatments or conditions (Asmann et al.
2009). For example, the 3-tag digital gene expression (DGE)
Corresponding author: Yuanchao Wang; Telephone: +1 86-25-84399071;
Fax: +1 86-25-84395325; E-mail: wangyc@njau.edu.cn
*The e-Xtra logo stands for “electronic extra” and indicates that six sup-
plementary figures and eight supplementary tables are published online.
e-Xt
r
a*
Vol. 24, No. 12, 2011 / 1531
protocol generates such extensive sequence data and depth-
of-coverage that even the rare transcripts can be detected and
quantified. This method uses oligo-dT to generate libraries
that are enriched in the 3 untranslated regions of polyade-
nylated mRNAs and produces 20- to 21-bp cDNA tags
(Eveland et al. 2010; Morrissy et al. 2009; t Hoen et al.
2008). The expression level of virtually all genes in the sam-
ple is measured by counting the number of the tags produced
from each gene (Xiang et al. 2010). Previously, the DGE pro-
filing of transcriptomes for organisms with completed ge-
nomes confirmed that the relatively short reads produced can
be effectively assembled and used for comparison of gene
expression profiles (Hegedus et al. 2009; Rosenkranz et al.
2008; Wang et al. 2010).
In this study, we report the massive parallel sequencing of P.
sojae transcriptome by the DGE protocol. A total of ten RNA
samples from various life cycle stages of development and in-
fection were sequenced, and the results were analyzed. A web
server named the Phytophthora transcriptional database was
developed to provide access to this transcriptome data.
RESULTS
Ten stages of development and infection were sampled
for library sequencing.
To capture a variety of developmental and infection stages,
a total of ten samples were collected from P. s o ja e P6497.
The five axenically grown stages were mycelia (MY), zoospo-
rangia (SP), zoospores (ZO), cysts (CY), and germinating
cysts (GC) (Fig. 1). The five infection stages, 1.5, 3, 6, 12,
and 24 h after inoculation onto susceptible soybean leaf tis-
sues (IF1.5h to IF24h), are illustrated in Figure 1, and the
description of each infection stage is provided in Supplemen-
tary Table S1. On the basis of the Illumina 3-tag DGE proto-
col, we generated between 3.9 and 14.8 million raw tags for
each of the ten samples. After removing low-quality reads,
the total number of clean tags per library ranged from 3.7 to
14.0 million and the number of tag entities with unique nu-
cleotide sequences (distinct tags) ranged from 84,181 to
666,004 (Table 1). From all libraries, 1,585,220 distinct tags
were obtained, with a total of 90.1 million clean tags (Sup-
plementary Fig. S3A).
Large proportions of P. s o ja e genes were detected
in the ten libraries.
The P. s oj ae reference genome assembly and the 19,027 pre-
dicted genes were used as a reference sequence database (Joint
Genome Institute website, P. s oj ae v1.1) (Tyler et al. 2006).
The reference database contains 486,711 distinct tags. After
mapping the clean tags generated from the ten DGE libraries
to the reference database, at least 68.4% of the clean tags were
mapped to the reference database in seven of the libraries (MY,
SP, ZO, CY, GC, IF1.5h, and IF3h). The mapped percentages
were lower (between 35.6 and 40.2%) in the remaining three
libraries (IF6h, IF12h, and IF24h) because the samples also
contained soybean tissue. Next, the gene-expression levels
were determined by calculating the number of tags for each
gene and, then, normalizing this value to the number of tran-
scripts per million tags (TPM) (Asmann et al. 2009; Wang et
al. 2010). A large portion of the 19,027 predicted P. s oj a e
genes was detected ranging from 10,532 (55.4%) to 13,805
(72.6%) among the ten libraries (Table 1; Fig. 2A). When con-
sidering all libraries, 14,969 (78.7%) P. s oj a e genes were de-
tected in at least one library (Fig. 2A).
Detected genes are assigned reliability categories based
on a multiple loci of the genome (MMT) value.
In this study, clean tags mapped to the reference database
from MMT were used to calculate a reliability measure, called
the MMT value, based on the proportion of TPM derived from
MMT. Because the data of MMT can not be separated to indi-
vidual mapped genome loci, they only represent the collective
expression levels of all genome loci having the identical tag
sequence. Another type of tag, unique mapped tags (UMT),
unambiguously mapped to single genome loci, indicating that
the transcription data are reliable. According to this, from the
ten libraries, 10,044 (52.8%) were detected by perfect matches
to UMT and were marked as reliable (MMT value = 0%;
unless otherwise stated, this set of genes was used for all fur-
ther analyses in this research), 4,915 (25.8%) were detected
but the expressed tags potentially matched MMT and were,
therefore, marked as unreliable (MMT value > 0%) and 3,134
(16.5%) were not detected (Fig. 2A; Supplementary Table S2).
The remaining 934 genes (4.9%) lack a tag site in their se-
quences to be mapped by DGE-generated tag, rendering them
Fig. 1. Schematic illustration and microscopic observation of the ten sampled stages: mycelia (MY), zoosporangia (SP), zoospores (ZO), cysts (CY),
germinating cysts (GC), and IF1.5h to IF24h (samples from 1.5, 3, 6, 12, and 24 h after infection of soybean leaves). The sandwiched inoculation method is
shown in the center of the figure. The scale bars in CY, ZO, and GC are 20 µm, others are 100 µm.
1532 / Molecular Plant-Microbe Interactions
undetectable by our sampling method. The full lists of the
18,093 genes having at least one tag are provided in Supple-
mentary Tables S4 and S5.
Comparison of DGE values
with quantitative reverser transcription-
polymerase chain reaction (qRT-PCR) analysis results.
Total sequence tags from all libraries that match to the
10,044 genes classified as detected and reliable were plotted as
integrated log2 values. As shown in the graph in Figure 2B, the
highest number of genes (1,737) distributes around 7 (7 and
<8, and 7 refers to 128-1 TPM) and 9.0% of the genes (899 of
10,044) are highly expressed at more than 10. To determine
whether gene expression levels estimated by TPM counts were
comparable to qRT-PCR results, 28 genes from different fami-
lies and with a range of expression values were selected for
further study. These genes and primers are listed in Supple-
mentary Table S3. Expression levels of the 28 genes were de-
termined by qRT-PCR from the ten different RNA samples
used for the DGE analysis, resulting in 280 datapoints (Fig.
2C). The Pearson correlation coefficient (R value) between the
cycle threshold (Ct) value of the qRT-PCR analysis and the
log2 TPM values of the DGE analysis was –0.75, meaning that
the genes’ expression levels from DGE analysis are positively
correlated with qRT-PCR (lower Ct value refers to higher ex-
pression level). This correlation between the two different plat-
forms (|R| = 0.75) is higher than many correlations of whole-
library gene-expression patterns between two DGE libraries
(but not replication) in our study but also lower than many oth-
ers, such as IF3h to IF6h (R = 0.97). A Pearson correlation co-
efficient matrix of all library pairs was generated and is pro-
vided in Supplementary Figure S4, with R values ranging from
0.48 (ZO-IF3h) to 0.97 (IF3h to IF6h).
Correlation among the ten libraries establishes
four different expression groups.
To study the relatedness of overall gene-expression patterns
among the ten libraries, a hierarchical clustering (HCL) tree
using the Pearson correlation method with average linkage was
constructed, using the DGE data from the 10,044 detected reli-
able genes (Fig. 3A). This shows that the DGE profiling of the
ZO library is closest to CY, GC is alone in a branch, MY is
close to SP, and the five infection site libraries (IF1.5h to
IF24h) exist in the same branch. Principal component analysis
Fig. 2. Characteristics and validation of detected genes. A, The distribution of genes within the different detectable categories, based on the sequence tag
analysis as described in the text. B, The distribution of gene expression levels, based on the number of genes falling in each log2 gene expression category.
Data are from all ten samples. C, Validation of digital gene expression (DGE) data by quantitative reverse transcription-polymerase chain reaction (qRT-
PCR). Scatter plots indicate the cycle threshold (Ct) value of qRT-PCR analysis and the log2 transcripts per million tags value of DGE for 280 datapoints
from 28 genes in ten samples. The Pearson correlation coefficient (R) is also shown.
Tab le 1 . Summary of the output data and mapping work
Tag or gene namea
Categoryb
MYc
SP
ZO
CY
GC
IF1.5h
IF3h
IF6h
IF12h
IF24h
Raw tags
Total 7,618,146 3,889,220 9,115,349 8,722,222 10,942,428 14,794,423 10,406,821 11,338,625 10,353,209 10,660,949
Distinct 339,251 235,351 305,957 341,113 1,400,829 980,071 1,439,380 2,052,365 1,917,624 2,011,485
Clean tags
Total 7,406,626 3,737,336 8,909,709 8,491,318 9,936,835 14,033,907 9,387,619 9,930,713 9,010,412 9,247,642
% of raw tags 97.2% 96.1% 97.7% 97.4% 90.8% 94.9% 90.2% 87.6% 87.0% 86.7%
Distinct 128,354 84,181 102,771 112,284 432,812 286,150 451,475 666,004 596,675 616,298
% of raw tags 37.8% 35.8% 33.6% 32.9% 30.9% 29.2% 31.4% 32.5% 31.1% 30.6%
Clean tags mapping to genome or gene
Total 6,763,028 3,397,523 8,221,308 7,861,599 6,933,838 12,980,742 6,421,067 3,987,978 3,514,638 3,287,787
% of clean tags 91.3% 90.9% 92.3% 92.6% 69.8% 92.5% 68.4% 40.2% 39.0% 35.6%
Distinct 95,904 65,520 74,605 79,620 186,826 212,225 191,088 155,151 145,068 141,085
% of clean tags 74.7% 77.8% 72.6% 70.9% 43.2% 74.2% 42.3% 23.3% 24.3% 22.9%
All tag-mapped genes
gene 12,323 11,978 10,532 11,450 13,179 13,805 12,743 12,519 12,394 12,196
% of 19,027 64.8% 63.0% 55.4% 60.2% 69.3% 72.6% 67.0% 65.8% 65.1% 64.1%
a Raw tags, sequence data prior to trimming and processing; clean tags, trimmed and processed 21-bp sequences.
b Distinct tags are classified according to their sequence. The Phytophthora sojae reference genome (Joint Genome Institute P. so j ae v1.1) describes 19,027
predicted genes, including 18,093 with at least one tag.
c MY = mycelia, SP = zoosporangia, ZO = zoospores; CY = cysts, GC = germinated cysts, and IF1.5h to IF24h, indicates samples from 1.5, 3, 6, 12, and 24 h
after infection of soybean leaves.
Vol. 24, No. 12, 2011 / 1533
(PCA) is a statistical method to reduce the dimensionality of
the dataset and allows a visual inspection of the samples based
on gene-expression profiles. Samples with a similar gene-ex-
pression profile would cluster in the same direction (Elferink
et al. 2011). The PCA plot of principal component (PC) 1 and
PC 2 shown in Figure 3B reveals a situation consistent with
the HCL tree. In the PCA plot, the libraries in the four major
branches of the tree (ZO and CY, GC, MY and SP, and IF1.5h
to IF24h) locate in distinguishably different regions. The accu-
mulated eigenvalue of the first two PC is 74.2% (PC 1, 56.7%
and PC 2, 17.5%), which means that the information from PC
1 and PC 2 have a highly reliable degree.
Differentially expressed gene analysis
reveals transcriptional shifts between libraries.
To study the differentially expressed genes between each li-
brary pair, we performed filtering to identify twofold upregu-
lated and twofold downregulated genes with P value 0.01,
employing the Chi2 test and Bonferroni correction. The re-
sults, shown in Figure 4A, indicate that the greatest changes in
gene expression occur during cyst germination and host infec-
tion, when thousands of genes were detected as upregulated
(GC and IF1.5h to IF24h compared with MY, SP, ZO, and
CY). However, by contrast to the five infection libraries, GC
downregulated more genes compared with MY and SP and
upregulated fewer genes compared with ZO and CY. This also
confirmed that the GC was a distinct group. For ZO and CY,
many more genes were downregulated when compared with
the other libraries. In contrast, comparison among the infection
libraries (IF1.5 to IF24h) indicates that gene-expression pat-
terns changed steadily but not dramatically during the course
of infection. The full list of differentially expressed genes be-
tween each library pair is provided in Supplementary Table S6.
Three stage pairs (MY-ZO and ZO-IF1.5h, with the largest
number of genes down- or upregulated, and IF3h-IF6h, with
least genes changed) were selected to represent the distribution
of genes corresponding to different fold change categories
(Fig. 4B).
Clustering analysis shows
different gene-expression patterns.
The ten libraries provide a wide range of stages to under-
stand gene expression during the P. s o j ae life cycle. To eluci-
date detailed gene-expression patterns, the clustering affinity
search technique (CAST) was used to generate clusters (Saeed
et al. 2003). Figure 5A shows a breakdown of 722 clusters
generated with members (genes) ranging from 1,675 to 1.
Most of the clusters (662 clusters) had no more than 20 genes.
However, the top 16 clusters, each of which had more than 90
members, contained >50% (51.4%) of all detected genes, illus-
trating the major gene-expression patterns (Fig. 5B). Among
the top 16 clusters, different sets of genes were upregulated at
varied stages: clusters a, b, and i, during infection; k and g,
early infection; o, middle infection; n and p, late infection; e,
during development; d, both MY and SP; c and m, both ZO
and CY; j, ZO, CY and GC; and f, h, and l were stage-specific
at GC, SP, and IF1.5h, respectively. The clusters with similar
patterns mentioned above also showed differences in detailed
stages. For example, the infection-related genes in cluster b
were up-regulated in stages from GC to IF24h but, in cluster i,
were from the later stages (IF1.5h to IF24h).
Fig. 4. Differentially expressed genes of each of two libraries. A, The num-
ber of upregulated and downregulated genes in each of two libraries. Dif-
ferentially expressed genes are identified by filtering of the twofold up-
and downregulated genes with P 0.01, employing both the Chi2test and
Bonferroni correction. B, The distribution of log2 fold change levels for three
selected stage pair–wise (MY-ZO and ZO-IF1.5h, with the largest numbe
r
of genes down- and upregulated; IF3h-IF6h, the most stable stage pair).
Fig. 3. Correlation of the whole-library genes expression patterns. A, Hierarchical clustering tree. The node height scale is shown below. B, Principal
component analysis plot of principal components 1 and 2, whose eigenvalues are 56.7 and 17.5%, respectively. The analyses for A and B were performed by
the MultiExperiment Viewer (vs. 4.6) software, using the 10,044 genes expression data.
1534 / Molecular Plant-Microbe Interactions
Fig. 5. Clustering and gene ontology (GO) enrichment analysis of gene expression patterns. A, Heat map shows the 722 gene–expression clusters generated
by the clustering affinity search technique method. Each line refers to data of one gene. The order is from the cluster with the most members (1,675 genes) to
that with the least members (a single gene). The color bar represents the log2 of transcripts per million tags values, ranging from dark blue (0) to red (8.0). B,
Log2 average gene-expression levels of the top 16 clusters. The number of cluster members is marked at the bottom right of each plot. C, GO enrichment
analysis of genes from the top 16 clusters. The mapped GO terms referring to biological processes were compared with the whole genome (GO terms for all
of the 19,027 genes) background and were filtered with P value 0.5 by Chi2 test and false discovery rate correction. The color bar represents the fold
higher than genome background of GO terms proportion, ranging from dark blue (0) to red (8.5). The gray blocks mean no data or data that were filtered by
the above criteria. The cluster names were marked for the first block at each column for distinguishing.
Vol. 24, No. 12, 2011 / 1535
Functional annotation of genes
in clusters with similar expression patterns.
To study the major gene functions of different expression
patterns, we performed gene ontology (GO) enrichment analy-
sis for genes from the top 16 clusters (Fig. 5B). The mapped
GO terms referring to biological process were compared with
the whole-genome background (GO terms for all of the 19,027
predicted P. s o ja e genes) and were filtered with P value 0.5
by Chi2 test and false discovery rate (FDR) correction. Figure
5C shows that the number of enriched GO terms per cluster
ranged from 0 (clusters h, i, and o) to 26 (cluster d). Different
gene functions were overrepresented in certain clusters. For
example, the genes in cluster a were rich in function of protein
localization (transport). Cluster d also had this characteristic,
but additionally, it had genes related to processes such as the
triphosphate metabolic process and the signaling process. Genes
in clusters b and p mostly referred to regulation of metabolic
process and transcription. A GO term, noncoding RNA meta-
bolic process, was found in clusters b, g, j, and k, whose ex-
pression patterns were all infection related. Furthermore, for
clusters m and n, a single enriched GO term was matched to
response to stress and carbohydrate metabolic process, respec-
tively. The full list of enriched GO terms and corresponding
genes was provided in Supplementary Table S7.
Average gene-expression patterns
of putative pathogenicity gene families.
Many gene families from P. s oj a e were studied or suggested
to have important roles in pathogenesis. To further understand
the expression patterns, 30 different gene families or groups
were selected, with each group containing from three to 396
distinct genes (Torto-Alalibo et al. 2007) (Fig. 6). To deter-
mine an average expression pattern, the TPM values for each
stage were calculated group by group by pooling data from the
UMT and the nonredundant MMT. The MMT provide good
data for this purpose because this is an analysis of gene-family
expression, anticipating that many MMT were mapped to the
same group. The results illustrated diverse expression patterns
but most gene groups were up-regulated during the above-
mentioned major transcriptional shifts. For example, the PDR-
like ABC transporters, glutathione transferase, and glutare-
doxin were up-regulated at GC; the aspartyl proteinases were
highly expressed at GC and IF1.5h; the cutinases were up-
regulated from ZO and highly expressed in GC; the RxLR and
NLP effector families were highly induced at GC and with an-
other peak at late infection; another two effector families, CRN
and elicitin (or elicitin-like), although expressing similar two-
peak patterns, consistently showed higher average expression
levels than RxLR and NLP; however, elicitin was expressed
with higher level at MY, ZO, and CY. For ubiquitin protease,
the genes stably had a high expression level.
Community access
to the Phytophthora transcriptional database (PTD).
We established the PTD web server (v1.1) to allow the re-
search community easy access to our data (Supplementary Fig.
S5). Each gene has a detailed page describing its basic annota-
tion and the DGE transcriptional data. The transcriptional data
Fig. 6. Average gene expression levels of the 30 putative pathogenicity gene families. The legend shown on the bottom of the figure identifies the gene fami-
lies plotted in each graph. The number of genes included in calculating average expression values is shown at the right side of family name. The function
categories are marked at the top of each figure.
1536 / Molecular Plant-Microbe Interactions
include expression TPM values together with MMT values and
sequences that correspond to the gene of interest. Graphical
outputs include histogram views that are intuitive for exploring
the data. The PTD provides access to the gene data via
searches by the gene ID, annotation, fold change between two
stages, and BLAST to search sequence-homologous genes. It
also provides links to the P. s oj ae databases and assemblies at
the Department of Energy Joint Genome Institute (Tyler et al.
2006) and the Virginia Bioinformatics Institute microbial data-
base (Tripathy et al. 2006) for quick access to extensive gene
or genome contextual information. To facilitate further analy-
sis of these data, the related tag data and analysis results are
also provided for downloading. A list of the current web re-
sources related to oomycete genomics research is also pro-
vided in PTD and Supplementary Table S8.
DISCUSSION
In this study, we used massively parallel sequencing tech-
nologies coupled with computational DGE analysis to charac-
terize the transcriptome of P. s oj a e during development and in-
fection. Our results provide an extensive picture of transcrip-
tion in P. s o j a e and offer investigators a rich set of sequence
data for reference and interrogation.
To organize the voluminous data and to differentiate the
transcripts, we based our computational analysis on 21-bp
tags because this approach has been proven successful in
other species (Asmann et al. 2009; Saha et al. 2002). Never-
theless, the numerous gene paralogs sharing sequence tags
resulted in a high frequency of MMT. The detected genes
with MMT represented 25.9% of the total number of genes
detected in P. s o j a e. In some DGE studies MMT are filtered
during the mapping process (Asmann et al. 2009; Wang et al.
2010). Such an approach provides expression values for all
reliably tagged genes. However, the expression values for
genes with MMT may be underestimated or completely lack-
ing; these genes would correspond to genes detected-reliable
or undetected in our study (2,324/12.2% and 2,591/13.6%
genes fit these descriptions, respectively). To avoid this prob-
lem, the MMT data were counted for gene expression level,
and the proportion of value derived from MMT was used to
calculate an MMT-value as a reliability measure. This
method provides more complete information of gene expres-
sion level and allows for the classification of reference genes
into four categories: detected-reliable, detected-unreliable,
undetected, and no tag. Although we deemed genes with
MMT to be detected-unreliable and did not analyze these
genes in this study, their expression data and MMT-values
were also provided in PTD, which are still valuable for genes
with low MMT-values. Moreover, the MMT data remain use-
ful for determining collective expression patterns for the dif-
ferent genes sharing the same tags (Fig. 6).
Another limitation of DGE analysis lies in the annotation of
the reference genome. The 19,027 predicted P. s oj a e genes
represent an estimation based on gene modeling programs
(Tripathy et al. 2006; Tyler et al. 2006). Many gene models are
erroneous, and annotation may be completely lacking for cer-
tain genes. These would cause the bias or absence of gene ex-
pression data. However, the transcript tag data that we have de-
veloped may be used to improve gene models (Supplementary
Fig. S6). It is even possible that tags that do not map to the
genome or to a predicted gene represent the junction of two
exons in a misannotated gene. Genome annotation is an itera-
tive process that will inevitably improve with time. The se-
quence data generated in this study offer an opportunity to
improve the annotation of the P. so j ae genome. Biologists have
long realized that problems with gene annotation exist for even
the best-characterized model species; thus the situation for P.
sojae is not unusual or extraordinary.
Besides the above mentioned limitations, there are still sev-
eral problems with DGE, including statistical modeling (par-
ticularly normalization) (Mak 2011); the depth of sequencing
required to effectively sample the transcriptome; the cost,
which may tempt some to avoid using biological replicates;
and the bioinformatics required to manage such a large amount
of data (Malone and Oliver 2011). However, every technology
has its advantages and inherent biases. For example, of the
established methods, microarrays remain useful and accurate
tools for measuring expression levels (Malone and Oliver
2011) with high throughput, but they have relative low sensi-
tivity for the detection of rare transcripts and potentially can
miss many targets that may not be included on the array. EST
studies can obtain longer full-length transcripts; however, they
are incomplete in their coverage of transcripts and are expen-
sive to perform (Asmann et al. 2009). Although the DGE tran-
scriptome profiling also has some draw-backs, it is based on
the produced relatively short reads and used for comparison of
gene expression profiles (Hegedus et al. 2009; Rosenkranz et
al. 2008; Wang et al. 2010), with extensive sequence data and
depth-of-coverage such that even the rare transcripts can be
detected and quantified (Eveland et al. 2010; Morrissy et al.
2009; t Hoen et al. 2008). Thus, NGS-based transcription pro-
filing methods can complement and extend other technologies,
(Malone and Oliver 2011), but it will take time to update and
improve the technology and protocol.
To evaluate the DGE data, a comparison with 280 datapoints
from the output of two different platforms (DGE and qRT-
PCR) was performed that indicated a positive correlation be-
tween the data. And the correlations between DGE data from
similar samples indicated that the highest R value was 0.97
(IF3h-IF6h). Overlooking the transcriptional analysis of ten
stages of the P. s oj a e life cycle, four library groups were found
by a comparison with whole-library gene-expression patterns.
This was confirmed by the analysis of differentially expressed
genes, which showed major transitional shifts between the
libraries from the different groups. Beside the inherent techno-
logical problems of DGE, some other points cannot be ex-
cluded when interpreting the data, including the experiment
time, conditions, and people. However, these transcriptional
shifts generally agreed with the biological process, for be-
tween the libraries, that of the pathogen grown axenically
(MY, SP, ZO, CY, and GC) and that of the pathogen grown in
contact with the plant (IF1.5h to IF24h), the host-pathogen
interaction can reasonably be expected to modify the patho-
gen’s expression profile. The distinction of ZO and CY from
the other libraries is probably due to its divergence from non-
mycelium status (e. g., they are either swimming in water or
attached to the host surface). And GC is a transitionary status
between cysts and mycelium. For the RNA samples we used
comparing the seven libraries collected from the pathogen
only, we also paid attention to the other three timepoint sam-
ples (IF6h, IF12h, and IF24h), collected from a mixture of in-
oculated mycelium and host tissue. In these three libraries,
fewer clean tags were mapped to the reference sequences, al-
though there was no obvious difference in the number of de-
tected genes (Table 1; Fig. 2A) or in the distribution of gene
numbers assigned to different expression level (data not
shown). The whole-library expression patterns and the analysis
of differentially expressed genes for these three samples also
did not show distinction from the unmixed samples IF1.5h and
IF3h (Figs. 3A and B and 4A).
P. s o j a e is a widespread and destructive plant pathogen and
is one of the best-studied species among oomycete organisms.
Here, we have demonstrated that deep sequencing of the tran-
Vol. 24, No. 12, 2011 / 1537
scriptome combined with computational tools such as DGE
can provide an unparalleled level of detail and coverage of
gene-expression patterns. Based on these DGE data and other
available resources of P. so ja e , a number of questions remain
for further study. For example, what are the identities of the
genes and how do their functions contribute to the transcrip-
tional shifts or specific expression patterns at certain stages? Is
there really a set of noncoding RNAs with important roles in
the interaction of P. s o j a e with its host? Are there novel gene
families playing important roles in the life of P. s oj a e that
could be found according to the species-specially expansion of
gene copy numbers and the expression patterns, e.g., the cuti-
nase family? Finally, the pathogen effector is one of the hot
spots in current research of pathogen-host interaction. Further
studies on the effector genes, including RxLR, CRN, NLP, and
even unknown genes are on the way. In brief, these data will
serve as a valuable public genomic resource and will help fur-
ther clarify the biology of Phytophthora plant pathogens.
MATERIALS AND METHODS
Preparation of biological material.
P. s o ja e P6497 (race 2) (Forster et al. 1994), from which the
reference genome was derived (Tyler et al. 2006), was used in
this research. MY were cultivated in 10% V8 liquid medium at
25°C in darkness for 48 h and were then blotted dry with ab-
sorbent paper and were preserved in liquid nitrogen for RNA
isolation. SP were induced by repeatedly washing 48- to 72-h-
old mycelial mats with sterile distilled water at 25°C in dark-
ness until sporangia formed abundantly. ZO were released by
placing the zoosporangial mycelial mat into 10 ml of sterile
distilled water at 5 to 10°C for 10 to 15 min, and then, at 25°C
for 10 to 30 min. The ZO were then concentrated by centrifu-
gation at 2,000 rpm at 0°C to a concentration of >150 ZO per
microliter and were then preserved in liquid nitrogen for RNA
isolation. The ZO were counted under a microscope based on
2 µl of concentrated ZO suspension sample with three repeats.
CY were produced by vortexing the ZO suspension at room
temperature for 30 s and were then collected by centrifugation
at 2,000 rpm at 0°C and were preserved in liquid nitrogen for
RNA isolation. GC were obtained by cultivating cysts with 5%
V8 liquid medium at 25°C, 150 rpm for 1 h and were then col-
lected by centrifugation at 2,000 rpm at 0°C. For mycelial
infection (IF1.5h to IF24h), the soybean cultivar Williams,
which is susceptible to P6497, was grown in a greenhouse at
22 to 28°C and was used at the second-leaf stage. Soybean
leaves were treated with 0.05% vol/vol solution of Tween 20 to
improve wetting. A mycelial mat was washed with sterile dis-
tilled water and was then laid on and sandwiched (Fig. 1) be-
tween upper surfaces of two leaves at 25°C, respectively, for
1.5, 3, 6, 12, and 24 h after infection. For the 1.5- and 3- h
timepoints, the mycelial mat was carefully peeled from the
leaves and preserved in liquid nitrogen. For the later time-
points, the regions of the leaves in contact with the mycelia
were excised together with the mycelia and were preserved in
liquid nitrogen. Parallel samples were prepared simultaneously
and were used for microscopic analysis. In addition to the
infection samples, the leaves were decolorized with absolute
ethanol before observation.
Library preparation and sequencing.
Tag library preparation for the ten samples was performed in
parallel, using the Illumina gene expression sample preparation
kit. Each sample of 6 µg of the total RNA was extracted from
above-mentioned samples (Total RNA purification system; In-
vitrogen, Carlsbad, CA, U.S.A.), and mRNA was purified by
oligo (dT) magnetic bead adsorption. mRNA bound to the
beads was then used as a template for first-strand cDNA syn-
thesis primed by oligo (dT), and the second-strand cDNA was
consequently synthesized using random primers. The cDNA
was cleaved with NlaIII at CATG sites, and then, the cDNA
fragments with 3 ends were purified with magnetic beads and
Illumina adapter A was added to their 5 ends, creating a rec-
ognition site of MmeI at the junction. MmeI cleaves 17 bp
downstream of the CATG site, producing tag fragments that
include Illumina adapter A. After removing 3 fragments by
magnetic bead precipitation, Illumina adapter B was intro-
duced at the 3 ends of tags, thus acquiring tags with different
adapters at each end to form a tag library. After 15 cycles of
linear PCR amplification, 85-bp oligonucleotides were purified
by 6% Tris-borate-EDTA polyacrylamide gel electrophoresis.
These oligonucleotides were then digested, and the single-chain
molecules were fixed onto the flow cell (Illumina Sequencing
Chip) for sequencing. Raw reads were generated with a se-
quencing length of 35 bp (Supplementary Fig. S2).
Analysis and mapping of DGE tags.
Raw sequences have 3 adaptor fragments as well as a few
low-quality sequences and several types of impurities. Raw se-
quences were transformed into clean 21-bp (CATG+17 bp)
tags by the following steps: i) 3 adaptor sequence was
trimmed, resulting in 21-bp tags from 35 bp of raw sequence,
ii) empty reads were removed (reads with only 3 adaptor se-
quences but no tags), iii) low-quality tags were removed (tags
with ambiguous base calls), iv) tags of unusual length were
removed, leaving only tags of 21 bp, and v) nonredundant tags
were removed (each tag needs to be detected at least twice to
be considered reliable). These raw datasets are available at the
National Center for Biotechnology Information Gene Expres-
sion Omnibus database with the accession number GSE29651.
A preprocessed database of all possible CATG+17 bp tag se-
quences was created using the P. s oj a e genome and gene mod-
els as a reference database (P. s o ja e v1.1) (Tyler et al. 2006).
All clean tags were mapped to this reference database, allow-
ing no more than 1 bp mismatch. The number of mapped clean
tags were calculated for each library and were then normalized
to TPM. Clean tags mapped to reference sequences from mul-
tiple loci (MMT) were identified but not removed. The expres-
sion value for a gene was derived from the sum of TPM for all
mapped tags. The proportion of TPM derived from MMT was
used to calculate an MMT value.
SYBR green real-time RT-PCR assay.
A total of 28 genes were selected for SYBR green real-time
RT-PCR assay, each using the same RNA for DGE from ten
samples, resulting in 280 datapoints. Pearson correlation coef-
ficient was calculated between the Ct value of the qRT-PCR
analysis and the log2 TPM values from the DGE analysis. A
real-time RT-PCR reaction (20 μl) included 20 ng of DNA, 0.2
µM each prime, 10 μl of SYBR Premix ExTaq (TaKaRa Bio
Inc. Shiga, Japan), and 6.8 μl of distilled H2O. Reactions were
performed on an ABI PRISM 7300 fast real-time PCR system
(Applied Biosystems, Foster City, CA, U.S.A.) under the fol-
lowing conditions: 95°C for 30 s, 40 cycles of 95°C for 5 s,
60°C for 31 s, to calculate Ct values, followed by 95°C for 15
s, 60°C for 1 min, and then, 95°C for 15 s, to obtain melt
curves. The 7300 system sequence detection software (v. 1.4)
was used for data analysis.
Further analysis of gene-expression data.
The MultiExperiment Viewer (v. 4.6) software package was
used to draw the heat maps, construct the HCL tree (using the
Pearson correlation method with average linkage), perform the
PCA (using the ‘mean’ for centering mode), and obtain the
1538 / Molecular Plant-Microbe Interactions
gene-expression pattern clusters using CAST (the distance
metric was the default Pearson correlation and the threshold
affinity value was 0.9) (Saeed et al. 2003). The differentially
expressed genes between each library pair were filtered by two
criteria: i) two-fold over- or underrepresentation of gene ex-
pression level, and ii) P value 0.01, employing the Chi2 test
and Bonferroni correction in the IDEG6 web server (Romualdi
et al. 2003). To determine the fold change, e.g., MY-SP is cal-
culated by (TPMSP + 0.1)/(TPMMY + 0.1). The annotated GO
terms were downloaded from P. so ja e v1.1 at the Joint Ge-
nome Institute database. For GO enrichment analysis, the
mapped GO terms referring to biological process were com-
pared with the whole-genome background (GO terms for all of
the 19,027 predicted P. s oj a e genes) and were filtered with P
value 0.5 by Chi2 test and FDR correction using the singular
enrichment analysis methods found at the AgriGO web server
(Du et al. 2010).
ACKNOWLEDGMENTS
We thank B. Tyler for editing of the manuscript. This work was sup-
ported, in part, by grants to Y. Wang from NSFC (number 30671345), by
the Special Fund for Agro-scientific Research in the Public Interest (3-20),
and National Soybean Industrial Technology system from China; and the
Agriculture and Agri-Food Canada Crop Genomics program to M. Gijzen.
Y. Wang conceived the research. Y. Wang, D. Dou M. Gijzen, S. Dong, and
W. Ye designed the research. W. Ye and D. Dou analyzed the data, X.
Wang prepared the RNA samples, K. Tao performed the microscopic ob-
servation, Y. Lu established the PTD database, and T. Dai provided the
qRT-PCR data. W. Ye, Y. Wang, and M. Gijzen wrote the paper.
LITERATURE CITED
Adams, M. D., Kerlavage, A. R., Fleischmann, R. D., Fuldner, R. A., Bult,
C. J., Lee, N. H., Kirkness, E. F., Weinstock, K. G., Gocayne, J. D.,
White, O., Sutton, G., Blake, J. A., Brandon, R. C., Chiu, M., Clayton,
R. A., Cline, R. T., Cotton, M. D., Hughes, J. E., Fine, L. D., Fitzgerald,
L. M., FitzHugh, W. M., Fritchman, J. L., Geoghagen, N. S. M.,
Glodek, A., Gnehm, C. L., Hanna, M. C., Hedblom, E., Hinkle-Jr. , P.
S., Kelley, J. M., Klimek, K. M., Kelley, J. C., Liu, L., Marmaros, S.
M., Merrick, J. M., Moreno-Palanques, R. F., McDonald, L. A.,
Nguyen, D. T., Pellegrino, S. M., Phillips, C. A., Ryder, S. E., Scott, J.
L., Saudek, D. M., Shirley, R., Small, K. V., Spriggs, T. A., Utterback,
T. R., Weidman, J. F., Li, Y., Barthlow, R., Bednarik, D. P., Cao, L.,
Cepeda, M. A., Coleman, T. A., Collins, E., Dimke, D., Feng, P., Ferrie,
A., Fischer, C., Hastings, G. A., He, W., Hu, J., Huddleston, K. A.,
Greene, J. M., Gruber, J., Hudson, P., Kim, A., Kozak, D. L., Kunsch,
C., Ji, H., Li, H., Meissner, P. S., Olsen, H., Raymond, L., Wei, Y.,
Wing, J., Xu, C., Yu, G., Ruben, S. M., Dillon, P. J., Fannon, M. R.,
Rosen, C. A., Haseltine, W. A., Fields, C., Fraser, C. M., and Venter, J.
C. 1995. Initial assessment of human gene diversity and expression
patterns based upon 83 million nucleotides of cDNA sequence. Nature
377:3-174.
Asmann, Y. W., Klee, E. W., Thompson, E. A., Perez, E. A., Middha, S.,
Oberg, A. L., Therneau, T. M., Smith, D. I., Poland, G. A., Wieben, E.
D., and Kocher, J. P. 2009. 3 tag digital gene expression profiling of
human brain and universal reference RNA using Illumina Genome Ana-
lyzer. BMC Genomics 10:531.
Baldauf, S. L., Roger, A. J., Wenk-Siefert, I., and Doolittle, W. F. 2000. A
kingdom-level phylogeny of eukaryotes based on combined protein
data. Science 290:972-977.
Boguski, M. S., Tolstoshev, C. M., and Bassett, D. E., Jr. 1994. Gene dis-
covery in dbEST. Science 265:1993-1994.
Coates, M. E., and Beynon, J. L. 2010. Hyaloperonospora arabidopsidis
as a pathogen model. Annu. Rev. Phytopathol. 48:329-345.
Dong, S., Qutob, D., Tedman-Jones, J., Kuflu, K., Wang, Y., Tyler, B. M.,
and Gijzen, M. 2009. The Phytophthora sojae avirulence locus Avr3c
encodes a multi-copy RXLR effector with sequence polymorphisms
among pathogen strains. PLoS One 4:e5556. Published online.
Du, Z., Zhou, X., Ling, Y., Zhang, Z., and Su, Z. 2010. agriGO: A GO
analysis toolkit for the agricultural community. Nucleic Acids Res.
38:W64-70.
Elferink, M. G., Olinga, P., van Leeuwen, E. M., Bauerschmidt, S., Polman,
J., Schoonen, W. G., Heisterkamp, S. H., and Groothuis, G. M. 2011.
Gene expression analysis of precision-cut human liver slices indicates
stable expression of ADME-Tox related genes. Toxicol. Appl. Pharmacol.
253:57-69.
Erwin, D. C., and Ribeiro, O. K. 1996. Phytophthora Diseases Worldwide.
The American Phytopathological Society, St. Paul, MN, U.S.A.
Eveland, A. L., Satoh-Nagasawa, N., Goldshmidt, A., Meyer, S., Beatty,
M., Sakai, H., Ware, D., and Jackson, D. 2010. Digital gene expression
signatures for maize development. Plant Physiol. 154:1024-1039.
Forster, H., Tyler, B. M., and Coffey, M. D. 1994. Phytophthora sojae
races have arisen by clonal evolution and by rare outcrosses. . Mol.
Plant-Microbe Interact. 7:780-791.
Forster, H., Coffey, M. D., Elwood, H., and Sogin, M. L. 1990. Sequence-
analysis of the small subunit ribosomal-RNAs of 3 zoosporic fungi and
implications for fungal evolution. Mycologia 82:306-312.
Harper, J. T., Waanders, E., and Keeling, P. J. 2005. On the monophyly of
chromalveolates using a six-protein phylogeny of eukaryotes. Int. J.
Syst. Evol. Microbiol. 55:487-496.
Hegedus, Z., Zakrzewska, A., Agoston, V. C., Ordas, A., Racz, P., Mink,
M., Spaink, H. P., and Meijer, A. H. 2009. Deep sequencing of the ze-
brafish transcriptome response to mycobacterium infection. Mol. Im-
munol. 46:2918-2930.
Hua, C., Wang, Y., Zheng, X., Dou, D., Zhang, Z., and Govers, F. 2008. A
Phytophthora sojae G-protein alpha subunit is involved in chemotaxis
to soybean isoflavones. Eukaryot. Cell 7:2133-2140.
Lockhart, D. J., Dong, H., Byrne, M. C., Follettie, M. T., Gallo, M. V.,
Chee, M. S., Mittmann, M., Wang, C., Kobayashi, M., Horton, H., and
Brown, E. L. 1996. Expression monitoring by hybridization to high-
density oligonucleotide arrays. Nat. Biotechnol. 14:1675-1680.
Mak, H. C. 2011. John Storey. Nat. Biotech. 29:331-333.
Malone, J. H., and Oliver, B. 2011. Microarrays, deep sequencing and the
true measure of the transcriptome. BMC Biol. 9:34.
Mardis, E. R. 2008. The impact of next-generation sequencing technology
on genetics. Trends Genet. 24:133-141.
Morris, P. F., Bone, E., and Tyler, B. M. 1998. Chemotropic and contact
responses of Phytophthora sojae hyphae to soybean isoflavonoids and
artificial substrates. Plant Physiol. 117:1171-1178.
Morrissy, A. S., Morin, R. D., Delaney, A., Zeng, T., McDonald, H., Jones,
S., Zhao, Y., Hirst, M., and Marra, M. A. 2009. Next-generation tag se-
quencing for cancer gene expression profiling. Genome Res. 19:1825-
1835.
Moy, P., Qutob, D., Chapman, B. P., Atkinson, I., and Gijzen, M. 2004.
Patterns of gene expression upon infection of soybean plants by Phy-
tophthora sojae. Mol. Plant-Microbe Interact. 17:1051-1062.
Qutob, D., Hraber, P. T., Sobral, B. W., and Gijzen, M. 2000. Comparative
analysis of expressed sequences in Phytophthora sojae. Plant Physiol.
123:243-254.
Qutob, D., Tedman-Jones, J., Dong, S., Kuflu, K., Pham, H., Wang, Y.,
Dou, D., Kale, S. D., Arredondo, F. D., Tyler, B. M., and Gijzen, M.
2009. Copy number variation and transcriptional polymorphisms of
Phytophthora sojae RXLR effector genes Avr1a and Avr3a. PLoS One
4:e5066. Published online.
Romualdi, C., Bortoluzzi, S., D’Alessi, F., and Danieli, G. A. 2003.
IDEG6: A web tool for detection of differentially expressed genes in
multiple tag sampling experiments. Physiol Genomics 12:159-162.
Rosenkranz, R., Borodina, T., Lehrach, H., and Himmelbauer, H. 2008.
Characterizing the mouse ES cell transcriptome with Illumina sequenc-
ing. Genomics 92:187-194.
Saeed, A. I., Sharov, V., White, J., Li, J., Liang, W., Bhagabati, N., Braisted,
J., Klapa, M., Currier, T., Thiagarajan, M., Sturn, A., Snuffin, M.,
Rezantsev, A., Popov, D., Ryltsov, A., Kostukovich, E., Borisovsky, I.,
Liu, Z., Vinsavich, A., Trush, V., and Quackenbush, J. 2003. TM4: A
free, open-source system for microarray data management and analysis.
Biotechniques 34:374-378.
Saha, S., Sparks, A. B., Rago, C., Akmaev, V., Wang, C. J., Vogelstein, B.,
Kinzler, K. W., and Velculescu, V. E. 2002. Using the transcriptome to
annotate the genome. Nat Biotechnol 20:508-512.
Schena, M., Shalon, D., Davis, R. W., and Brown, P. O. 1995. Quantitative
monitoring of gene-expression patterns with a complementary DNA mi-
croarray. Science 270:467-470.
t Hoen, P. A., Ariyurek, Y., Thygesen, H. H., Vreugdenhil, E., Vossen, R.
H., de Menezes, R. X., Boer, J. M., van Ommen, G. J., and den Dunnen,
J. T. 2008. Deep sequencing-based expression analysis shows major
advances in robustness, resolution and inter-lab portability over five
microarray platforms. Nucleic Acids Res 36:e141.
Torto-Alalibo, T. A., Tripathy, S., Smith, B. M., Arredondo, F. D., Zhou,
L., Li, H., Chibucos, M. C., Qutob, D., Gijzen, M., Mao, C., Sobral, B.
W., Waugh, M. E., Mitchell, T. K., Dean, R. A., and Tyler, B. M. 2007.
Expressed sequence tags from Phytophthora sojae reveal genes specific
to development and infection. Mol. Plant-Microbe Interact. 20:781-793.
Tripathy, S., Pandey, V. N., Fang, B., Salas, F., and Tyler, B. M. 2006.
VMD: A community annotation database for oomycetes and microbial
Vol. 24, No. 12, 2011 / 1539
genomes. Nucleic Acids Res. 34:D379-381.
Tyler, B. M. 2002. Molecular basis of recognition between Phytophthora
pathogens and their hosts. Annu. Rev. Phytopathol. 40:137-167.
Tyler, B. M. 2007. Phytophthora sojae: Root rot pathogen of soybean and
model oomycete. Mol. Plant Pathol. 8:1-8.
Tyler, B. M., Tripathy, S., Zhang, X., Dehal, P., Jiang, R. H., Aerts, A.,
Arredondo, F. D., Baxter, L., Bensasson, D., Beynon, J. L., Chapman,
J., Damasceno, C. M., Dorrance, A. E., Dou, D., Dickerman, A. W.,
Dubchak, I. L., Garbelotto, M., Gijzen, M., Gordon, S. G., Govers, F.,
Grunwald, N. J., Huang, W., Ivors, K. L., Jones, R. W., Kamoun, S.,
Krampis, K., Lamour, K. H., Lee, M. K., McDonald, W. H., Medina,
M., Meijer, H. J., Nordberg, E. K., Maclean, D. J., Ospina-Giraldo, M.
D., Morris, P. F., Phuntumart, V., Putnam, N. H., Rash, S., Rose, J. K.,
Sakihama, Y., Salamov, A. A., Savidor, A., Scheuring, C. F., Smith, B.
M., Sobral, B. W., Terry, A., Torto-Alalibo, T. A., Win, J., Xu, Z.,
Zhang, H., Grigoriev, I. V., Rokhsar, D. S., and Boore, J. L. 2006. Phy-
tophthora genome sequences uncover evolutionary origins and mecha-
nisms of pathogenesis. Science 313:1261-1266.
Velculescu, V. E., Zhang, L., Vogelstein, B., and Kinzler, K. W. 1995. Se-
rial analysis of gene expression. Science 270:484-487.
Wang, X. W., Luan, J. B., Li, J. M., Bao, Y. Y., Zhang, C. X., and Liu, S. S.
2010. De novo characterization of a whitefly transcriptome and analysis
of its gene expression during development. BMC Genomics 11:400.
Xiang, L. X., He, D., Dong, W. R., Zhang, Y. W., and Shao, J. Z. 2010.
Deep sequencing-based transcriptome profiling analysis of bacteria-
challenged Lateolabrax japonicus reveals insight into the immune-rele-
vant genes in marine fish. BMC Genomics 11:472.
Zhou, L., Mideros, S. X., Bao, L., Hanlon, R., Arredondo, F. D., Tripathy,
S., Krampis, K., Jerauld, A., Evans, C., St Martin, S. K., Maroof, M. A.,
Hoeschele, I., Dorrance, A. E., and Tyler, B. M. 2009. Infection and
genotype remodel the entire soybean transcriptome. BMC Genomics
10:49.
AUTHOR-RECOMMENDED INTERNET RESOURCES
Joint Genome Institute website: www.jgi.doe.gov
Phytophthora transcriptional database (PTD): phy.njau.edu.cn/ptd
... Oomycetes produce hundreds of effectors, such as cell walldegrading enzymes, elicitins, RxLRs, and CRNs (McGowan & Fitzpatrick, 2017). The transcriptome of Phytophthora sojae, the causal agent of stem and root rot in soybean, revealed distinct expression patterns and levels for these effector genes across 10 different developmental and infection stages (Ye et al., 2011). Currently, a few reports focus on the mechanisms of effector gene silencing. ...
... Phytophthora pathogens possess hundreds of effector genes with diverse expression patterns (McGowan & Fitzpatrick, 2017;Ye et al., 2011), and PsMyb37 regulates a subset of them, which implies that other TFs associated with effector gene expression remain to be explored. In the case of fungal pathogens, more than a dozen TFs from four different families have been found to regulate effector gene expression (John et al., 2021;Tan & Oliver, 2017). ...
... For RNA-seq, samples of vegetative mycelia (MY), germinated cysts (GC), as well as infection stages (0.5 and 1 hpi) were collected as described previously (Ye et al., 2011). Three biological replicates were included per treatment. ...
Article
Full-text available
Phytophthora pathogens possess hundreds of effector genes that exhibit diverse expression patterns during infection, yet how the expression of effector genes is precisely regulated remains largely elusive. Previous studies have identified a few potential conserved transcription factor binding sites (TFBSs) in the promoters of Phytophthora effector genes. Here, we report a MYB‐related protein, PsMyb37, in Phytophthora sojae, the major causal agent of root and stem rot in soybean. Yeast one‐hybrid and electrophoretic mobility shift assays showed that PsMyb37 binds to the TACATGTA motif, the most prevalent TFBS in effector gene promoters. The knockout mutant of PsMyb37 exhibited significantly reduced virulence on soybean and was more sensitive to oxidative stress. Consistently, transcriptome analysis showed that numerous effector genes associated with suppressing plant immunity or scavenging reactive oxygen species were down‐regulated in the PsMyb37 knockout mutant during infection compared to the wild‐type P. sojae. Several promoters of effector genes were confirmed to drive the expression of luciferase in a reporter assay. These results demonstrate that a MYB‐related transcription factor contributes to the expression of effector genes in P. sojae.
... It is a good model for studying the molecular features of the oomycete life cycle because it can be easily cultured in vitro, and all stages can be induced without a plant host [11]. Moreover, the genome and transcriptome data of P. sojae are available [12,13]. The recently established clustered regularly interspaced short palindromic repeats (CRISPR)-mediated gene knockout system has also strengthened functional genomic research in P. sojae [14,15]. ...
... A gene expression matrix was constructed using the transcriptional data (normalized gene expression levels) of P. sojae across asexual development and during host infection obtained by 3'-tag sequencing [13]. The five representative asexual stages (mycelia, sporangia, zoospore, cyst, and germinating cyst, hereafter referred to as MY, SP, ZO, CY and GC respectively) and five infectious stages with 1.5, 3, 6, 12 and 24 hours after P. sojae inoculation on soybean leaf (IF1.5h, ...
... Phase-specific transcriptional patterns of Phytophthora sojae developmental and host infection [13], an expression matrix including 10,953 transcripts was obtained. We found that the asexual stages exhibited greater diversity and specificity in mRNA levels than during the process of infection, reflecting a striking degree of transcriptome remodeling during asexual development, which is related to the dramatic structural and physiological changes [7,8]. ...
Article
Full-text available
Oomycetes are filamentous microorganisms easily mistaken as fungi but vastly differ in physiology, biochemistry, and genetics. This commonly-held misconception lead to a reduced effectiveness by using conventional fungicides to control oomycetes, thus it demands the identification of novel functional genes as target for precisely design oomycetes-specific microbicide. The present study initially analyzed the available transcriptome data of the model oomycete pathogen, Phytophthora sojae, and constructed an expression matrix of 10,953 genes across the stages of asexual development and host infection. Hierarchical clustering, specificity, and diversity analyses revealed a more pronounced transcriptional plasticity during the stages of asexual development than that in host infection, which drew our attention by particularly focusing on transcripts in asexual development stage to eventually clustered them into 6 phase-specific expression modules. Three of which respectively possessing a serine/threonine phosphatase (PP2C) expressed during the mycelial and sporangium stages, a histidine kinase (HK) expressed during the zoospore and cyst stages, and a bZIP transcription factor (bZIP32) exclusive to the cyst germination stage were selected for down-stream functional validation. In this way, we demonstrated that PP2C, HK, and bZIP32 play significant roles in P. sojae asexual development and virulence. Thus, these findings provide a foundation for further gene functional annotation in oomycetes and crop disease management.
... Forty-one RXLRs (36 up-regulated and ve down-regulated) were reported in Phytophthora cinnamomi-Persea americana interactions [130].Though the overall transcriptome revealed the expression of 46 NPP/NLP effectors, NLPs are highly expressed at 36 hpi. However, P. sojae: soybean and P. capsici: N. benthamiana interactions induce NLPs at the germinating cyst stage [131], [132], [122]. also reported the up-regulation of four NLP genes during the early stage of infection (1.5 hpi) and three NLP genes during the later stage of infection (24-72 hpi) [134]. ...
Preprint
Full-text available
Background The bud rot pathogen Phytophthora palmivora poses a significant threat to coconut production worldwide. Effective management strategies against this devastating pathogen are lacking due to the absence of resistant cultivars and limited knowledge about its pathogenicity mechanisms. To address this, we conducted dual RNA-seq analyses at three time points (12, 24, and 36 hours post-infection) during the initial progression of the disease, using a standardized in vitro assay. This study aimed to identify transcriptional regulation following infection and decipher the system-level host response to P. palmivora. Results Differential gene expression (DGE) analysis between control and infected samples revealed extensive modulation of stress-responsive genes in coconut. In contrast, P. palmivora showed differential expression of genes encoding effector and carbohydrate-active enzymes (CAZy). Pathway enrichment analysis highlighted the up-regulation of genes associated with plant-pathogen interaction pathway and plant hormone signal transduction in coconut. To validate our findings, we selected ten candidate differentially expressed genes (DEGs) from both coconut and P. palmivora for quantification using qRT-PCR at the three time points. The expression trends observed in qRT-PCR confirmed the reliability of the dual RNA-seq data, further supporting the comprehensive outlook on the global response of coconut to P. palmivora infection. Conclusions This study highlights the significant modulation of stress-responsive genes in coconut and differential expression of effector and carbohydrate-active enzyme genes in P. palmivora during bud rot infection. The findings provide valuable insights into the molecular interactions and transcriptional regulation underlying the coconut-P. palmivora pathosystem, aiding in the development of effective management strategies against this devastating pathogen.
... However, we previously reported that the PsRLK6 knockout mutants have no obvious phenotypes on growth, zoospore development, and virulence 16 . A transcriptome data showed that PsRLK6 was up-regulated during infection stages 25 , suggesting its potential role during interaction. Therefore, we over-expressed PsRLK6-GFP by PEG-mediated transformation 26 . ...
Article
Full-text available
Plant cell-surface leucine-rich repeat receptor-like kinases (LRR-RLKs) and receptor-like proteins (LRR-RLPs) form dynamic complexes to receive a variety of extracellular signals. LRR-RLKs are also widespread in oomycete pathogens, whereas it remains enigmatic whether plant and oomycete LRR-RLKs could mediate cell-to-cell communications between pathogen and host. Here, we report that an LRR-RLK from the soybean root and stem rot pathogen Phytophthora sojae, PsRLK6, can activate typical pattern-triggered immunity in host soybean and nonhost tomato and Nicotiana benthamiana plants. PsRLK6 homologs are conserved in oomycetes and also exhibit immunity-inducing activity. A small region (LRR5-6) in the extracellular domain of PsRLK6 is sufficient to activate BAK1- and SOBIR1-dependent immune responses, suggesting that PsRLK6 is likely recognized by a plant LRR-RLP. Moreover, PsRLK6 is shown to be up-regulated during oospore maturation and essential for the oospore development of P. sojae. Our data provide a novel type of microbe-associated molecular pattern that functions in the sexual reproduction of oomycete, and a scenario in which a pathogen LRR-RLK could be sensed by a plant LRR-RLP to mount plant immunity.
... The economic impacts of root and stem rots caused by Phytophthora sojae have resulted in it being ranked among the 10 most destructive and impactful oomycete pathogens [2,3]. P. sojae was the first oomycete to have its genome sequenced, and its transcriptome data are also available [4,5]. Because of its ease of culture and simple genetic operability, this species has become a model for studying plant and pathogen interactions and functional genomics. ...
Article
Full-text available
Proteins containing both FYVE and serine/threonine kinase catalytic (STKc) domains are exclusive to protists. However, the biological function of these proteins in oomycetes has rarely been reported. In the Phytophthora sojae genome database, we identified five proteins containing FYVE and STKc domains, which we named PsZFPK1, PsZFPK2, PsZFPK3, PsZFPK4, and PsZFPK5. In this study, we characterized the biological function of PsZFPK1 using a CRISPR/Cas9-mediated gene replacement system. Compared with the wild-type strain, P6497, the PsZFPK1-knockout mutants exhibited significantly reduced growth on a nutrient-rich V8 medium, while a more pronounced defect was observed on a nutrient-poor Plich medium. The PsZFPK1-knockout mutants also showed a significant increase in sporangium production. Furthermore, PsZFPK1 was found to be essential for oospore production and complete virulence but dispensable for the stress response in P. sojae. The N-terminal region, FYVE and STKc domains, and T602 phosphorylation site were found to be vital for the function of PsZFPK1. Conversely, these domains were not required for the localization of PsZFPK1 protein in the cytoplasm. Our results demonstrate that PsZFPK1 plays a critical role in vegetative growth, sporangium formation, oospore production, and virulence in P. sojae.
... Biological materials of P. capsici in the different development stages were collected as previously described, including mycelia (MY) from V8 agar, mycelia with sporangia (SP), zoospores (ZO), germinated cysts (CY), and infection stages (0, 1.5, 3, 6, 12, 24 and 48 h after inoculation on the pepper leaves) [61]. Total RNA was extracted from above biological materials of P. capsici using the SV Total RNA Isolation Kit (Promega, Beijing, China). ...
Article
Full-text available
Asparagine (Asn, N)-linked glycosylation is a conserved process and an essential post-translational modification that occurs on the NXT/S motif of the nascent polypeptides in endoplasmic reticulum (ER). The mechanism of N-glycosylation and biological functions of key catalytic enzymes involved in this process are rarely documented for oomycetes. In this study, an N-glycosylation inhibitor tunicamycin (TM) hampered the mycelial growth, sporangial release, and zoospore production of Phytophthora capsici, indicating that N-glycosylation was crucial for oomycete growth development. Among the key catalytic enzymes involved in N-glycosylation, the PcSTT3B gene was characterized by its functions in P. capsici. As a core subunit of the oligosaccharyltransferase (OST) complex, the staurosporine and temperature sensive 3B (STT3B) subunit were critical for the catalytic activity of OST. The PcSTT3B gene has catalytic activity and is highly conservative in P. capsici. By using a CRISPR/Cas9-mediated gene replacement system to delete the PcSTT3B gene, the transformants impaired mycelial growth, sporangial release, zoospore production, and virulence. The PcSTT3B-deleted transformants were more sensitive to an ER stress inducer TM and display low glycoprotein content in the mycelia, suggesting that PcSTT3B was associated with ER stress responses and N-glycosylation. Therefore, PcSTT3B was involved in the development, pathogenicity, and N-glycosylation of P. capsici.
Article
Phosphatases are important regulators of protein phosphorylation and various cellular processes, and they serve as counterparts to kinases. In this study, our comprehensive analysis of oomycete complete proteomes unveiled the presence of approximately 3833 phosphatases, with most species estimated to have between 100 and 300 putative phosphatases. Further investigation of these phosphatases revealed a significant increase in protein serine/threonine phosphatases (PSP) within oomycetes. In particular, we extensively studied the metallo‐dependent protein phosphatase (PPM) within the PSP family in the model oomycete Phytophthora sojae . Our results showed notable differences in the expression patterns of PPMs throughout 10 life stages of P. sojae , indicating their vital roles in various stages of oomycete pathogens. Moreover, we identified 29 PPMs in P. sojae , and eight of them possessed accessory domains in addition to phosphate domains. We investigated the biological function of one PPM protein with an extra PH domain (PPM1); this protein exhibited high expression levels in both asexual developmental and infectious stages. Our analysis confirmed that PPM1 is indeed an active protein phosphatase, and its accessory domain does not affect its phosphatase activity. To delve further into its function, we generated knockout mutants of PPM1 and validated its essential roles in mycelial growth, sporangia and oospore production, as well as infectious stages. To the best of our knowledge, this study provides the first comprehensive inventory of phosphatases in oomycetes and identifies an important phosphatase within the expanded serine/threonine phosphatase group in oomycetes.
Article
Full-text available
Plant pathogens secrete effector proteins to overcome host immunity and promote colonization. In oomycete plant pathogens, the expression of many effector genes is altered upon infection; however, the regulatory mechanisms are unclear. In this study, we identified a su(var)3-9, enhancer of zeste, and trithorax (SET) domain protein-encoding gene, PsKMT3, that was highly induced at early infection stages in Phytophthora sojae. Deletion of PsKMT3 led to asexual development and pathogenicity defects. Chromatin immunoprecipitation followed by sequencing (ChIP-seq) and western blot analyses demonstrated that histone H3K36 trimethylation (H3K36me3) was significantly reduced genome-wide in mutants. RNA-seq analysis identified 374 genes encoding secreted proteins that were differentially expressed in pskmt3 at the mycelium stage. The significantly altered genes encompassed the RxLR (Arg-x-Lys-Arg) effector gene family, including the essential effector genes Avh23, Avh181, Avh240, and Avh241. Transcriptome analysis at early infection stages showed misregulation of effector gene expression waves in pskmt3. H3K36me3 was directly and indirectly associated with RxLR effector gene activation. Our results reveal a role of a SET domain protein in regulating effector gene expression and modulating histone methylation in P. sojae.
Article
Full-text available
Phytophthora sojae (Kaufmann and Gerdemann) is an oomycete that causes stem and root rot on soybean (Glycine max L. Merr) plants. We have constructed three cDNA libraries using mRNA isolated from axenically grown mycelium and zoospores and from tissue isolated from plant hypocotyls 48 h after inoculation with zoospores. A total of 3,035 expressed sequence tags (ESTs) were generated from the three cDNA libraries, representing an estimated 2,189 cDNA transcripts. The ESTs were classified according to putative function based on similarity to known proteins, and were analyzed for redundancy within and among the three source libraries. Distinct expression patterns were observed for each library. By analysis of the percentage G+C content of the ESTs, we estimate that two-thirds of the ESTs from the infected plant library are derived from P. sojae cDNA transcripts. The ESTs originating from this study were also compared with a collection of Phytophthora infestans ESTs and with all other non-human ESTs to assess the similarity of the P. sojae sequences to existing EST data. This collection of cDNA libraries, ESTs, and accompanying annotation will provide a new resource for studies on oomycetes and on soybean responses to pathogen challenge.
Article
Full-text available
Microarrays first made the analysis of the transcriptome possible, and have produced much important information. Today, however, researchers are increasingly turning to direct high-throughput sequencing -- RNA-Seq -- which has considerable advantages for examining transcriptome fine structure -- for example in the detection of allele-specific expression and splice junctions. In this article, we discuss the relative merits of the two techniques, the inherent biases in each, and whether all of the vast body of array work needs to be revisited using the newer technology. We conclude that microarrays remain useful and accurate tools for measuring expression levels, and RNA-Seq complements and extends microarray measurements.
Article
Full-text available
Genome-wide expression signatures detect specific perturbations in developmental programs and contribute to functional resolution of key regulatory networks. In maize (Zea mays) inflorescences, mutations in the RAMOSA (RA) genes affect the determinacy of axillary meristems and thus alter branching patterns, an important agronomic trait. In this work, we developed and tested a framework for analysis of tag-based, digital gene expression profiles using Illumina's high-throughput sequencing technology and the newly assembled B73 maize reference genome. We also used a mutation in the RA3 gene to identify putative expression signatures specific to stem cell fate in axillary meristem determinacy. The RA3 gene encodes a trehalose-6-phosphate phosphatase and may act at the interface between developmental and metabolic processes. Deep sequencing of digital gene expression libraries, representing three biological replicate ear samples from wild-type and ra3 plants, generated 27 million 20- to 21-nucleotide reads with frequencies spanning 4 orders of magnitude. Unique sequence tags were anchored to 3'-ends of individual transcripts by DpnII and NlaIII digests, which were multiplexed during sequencing. We mapped 86% of nonredundant signature tags to the maize genome, which associated with 37,117 gene models and unannotated regions of expression. In total, 66% of genes were detected by at least nine reads in immature maize ears. We used comparative genomics to leverage existing information from Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) in functional analyses of differentially expressed maize genes. Results from this study provide a basis for the analysis of short-read expression data in maize and resolved specific expression signatures that will help define mechanisms of action for the RA3 gene.
Article
This chapter describes a protocol for performing digital gene expression profiling on the Illumina Genome Analyzer sequencing platform. Tag sequencing (Tag-seq) is an implementation of the LongSAGE protocol on the Illumina sequencing platform that increases utility while reducing both the cost and time required to generate gene expression profiles. The ultra-high-throughput sequencing capability of the Illumina GA platform allows the cost-effective generation of libraries containing an average of 20 million tags - a 200-fold improvement over classical LongSAGE. Tag-seq has less sequence composition bias, leading to a better representation of AT-rich tag sequences, and allows accurate profiling of a subset of the transcriptome characterized by AT-rich genes expressed at levels below the threshold of detection of LongSAGE.
Article
The small-subunit ribosomal RNA gene sequences of the chytridiomycete Blastocladiella emersonii and the oomycetes Lagenidium giganteum and Phytophthora megasperma f. sp. glycinea were determined and compared to published fungal sequences of Achlya bisexualis, Saccharomyces cerevisiae, and Neurospora crassa and those of other eukaryotic organisms. The gene phylogeny that was constructed showed two distinct fungal evolutionary lineages. Oomycetes together with chrysophytes and diatoms formed one lineage. Oomycetes appeared to be monophyletic and derived from heterokont photosynthetic algae. On a different phylogenetic branch, chytridiomycetes and ascomycetes were found. "Higher" fungi and chytridiomycetes appeared to share a relatively recent common ancestor. These two fungal evolutionary lines were unrelated to the higher plant lineage. It is evident that the fungi do not represent a natural taxonomic group of eukaryotic organisms.
Article
If one accepts that the fundamental pursuit of genetics is to determine the genotypes that explain phenotypes, the meteoric increase of DNA sequence information applied toward that pursuit has nowhere to go but up. The recent introduction of instruments capable of producing millions of DNA sequence reads in a single run is rapidly changing the landscape of genetics, providing the ability to answer questions with heretofore unimaginable speed. These technologies will provide an inexpensive, genome-wide sequence readout as an endpoint to applications ranging from chromatin immunoprecipitation, mutation mapping and polymorphism discovery to noncoding RNA discovery. Here I survey next-generation sequencing technologies and consider how they can provide a more complete picture of how the genome shapes the organism.
Article
John Storey provides his take on the importance of new statistical methods for high-throughput sequencing.
Article
In the process of drug development it is of high importance to test the safety of new drugs with predictive value for human toxicity. A promising approach of toxicity testing is based on shifts in gene expression profiling of the liver. Toxicity screening based on animal liver cells cannot be directly extrapolated to humans due to species differences. The aim of this study was to evaluate precision-cut human liver slices as in vitro method for the prediction of human specific toxicity by toxicogenomics. The liver slices contain all cell types of the liver in their natural architecture. This is important since drug-induced toxicity often is a multi-cellular process. Previously we showed that toxicogenomic analysis of rat liver slices is highly predictive for rat in vivo toxicity. In this study we investigated the levels of gene expression during incubation up to 24 h with Affymetrix microarray technology. The analysis was focused on a broad spectrum of genes related to stress and toxicity, and on genes encoding for phase-I, -II and -III metabolizing enzymes and transporters. Observed changes in gene expression were associated with cytoskeleton remodeling, extracellular matrix and cell adhesion, but for the ADME-Tox related genes only minor changes were observed. PCA analysis showed that changes in gene expression were not associated with age, sex or source of the human livers. Slices treated with acetaminophen showed patterns of gene expression related to its toxicity. These results indicate that precision-cut human liver slices are relatively stable during 24h of incubation and represent a valuable model for human in vitro hepatotoxicity testing despite the human inter-individual variability.