ArticlePDF Available

A Large Scale Analysis of cDNA in Arabidopsis thaliana: Generation of 12,028 Non-redundant Expressed Sequence Tags from Normalized and Size-selected cDNA Libraries

Authors:

Abstract and Figures

For comprehensive analysis of genes expressed in the model dicotyledonous plant, Arabidopsis thaliana, expressed sequence tags (ESTs) were accumulated. Normalized and size-selected cDNA libraries were constructed from aboveground organs, flower buds, roots, green siliques and liquid-cultured seedlings, respectively, and a total of 14,026 5′-end ESTs and 39,207 3′-end ESTs were obtained. The 3′-end ESTs could be clustered into 12,028 non-redundant groups. Similarity search of the non-redundant ESTs against the public non-redundant protein database indicated that 4816 groups show similarity to genes of known function, 1864 to hypotheticalgenes, and the remaining 5348 are novel sequences. Gene coverage by the non-redundant ESTs was analyzed using the annotated genomic sequences of approximately 10 Mb on chromosomes 3 and 5. A total of 923 regions were hit by at least one EST, among which only 499 regions were hit by the ESTs deposited in the public database. The result indicates that the EST source generated in this project complements the EST data in the public database and facilitates new gene discovery. The EST sequence data of individual cDNA clones are available at the web site: http://www.kazusa.or.jp/en/plant/arabi/EST/.
Content may be subject to copyright.
DNA RESEARCH
7,
175-180 (2000)
A Large Scale Analysis of cDNA in Arabidopsis thaliana:
Generation of 12,028 Non-redundant Expressed Sequence Tags
from Normalized and Size-selected cDNA Libraries
Erika ASAMIZU, Yasukazu NAKAMURA, Shusei SATO, and Satoshi TABATA*
Kazusa DNA
Research
Institute, 1532-3 Yana, Kisarazu,
Chiba
292-0812,
Japan
(Received 19 May 2000)
Abstract
For comprehensive analysis of genes expressed in the model dicotyledonous plant, Arabidopsis thaliana,
expressed sequence tags (ESTs) were accumulated. Normalized and size-selected cDNA libraries were con-
structed from aboveground organs, flower buds, roots, green siliques and liquid-cultured seedlings, respec-
tively, and
a
total of 14,026
5'-end
ESTs and 39,207 3'-end ESTs were obtained. The 3'-end ESTs could be
clustered into 12,028 non-redundant groups. Similarity search of the non-redundant ESTs against the public
non-redundant protein database indicated that 4816 groups show similarity to genes of known function, 1864
to hypothetical genes, and the remaining 5348 are novel sequences. Gene coverage by the non-redundant
ESTs was analyzed using the annotated genomic sequences of approximately 10 Mb on chromosomes 3 and
5.
A total of
923
regions were hit by
at
least one EST, among which only 499 regions were hit by the ESTs
deposited in the public database. The result indicates that the EST source generated in this project com-
plements the EST data in the public database and facilitates new gene discovery. The EST sequence data
of individual cDNA clones are available
at
the web site: http://www.kazusa.or.jp/en/plant/arabi/EST/.
Key words:
Arabidopsis
thaliana;
cDNA; EST
1.
Introduction
Arabidopsis thaliana
has been adopted as
a
model or-
ganism in the study of plant biology since
it
has the ad-
vantages
of
small size, short generation time, and ease
of transformation.1 Because the
A.
thaliana genome
is
the smallest genome among known higher plant species
(130-140 Mb),2'3 the genome sequencing project of this
plant
is
underway
as a
joint project
of
Japan, Europe,
and the United States.4 To date, two of five chromosomes
(chromosomes
2
and 4) have been sequenced except
for
the nucleolar organizer regions and centromeres,2'3 and
sequencing
of
the remaining three chromosomes
is
near
completion.
Under these circumstances,
the
accurate assignment
of protein coding regions on the genomic sequence gains
importance
as the
logical next step.
In
this respect,
information
on
cDNA structure
is
essential. Also,
comprehensive analysis
of
cDNA sequences
is an
effec-
tive way
to
catalogue genes expressed
in an
organism
with
a
large genome.
A
large number
of
EST
(ex-
pressed sequence tag) sequences
of
several crop plants
Communicated by Mituru Takanami
* To whom correspondence should
be
addressed. Tel. +81-438-
52-3933,
Fax. +81-438-52-3934, E-mail: tabata@kazusa.or.jp
have been deposited
in the
public database, dbEST
(http://www.ncbi.nlm.nih.gov/dbEST/dbESTj3ummary.
html).
In
addition, EST accumulation of several model
plants have been initiated.5"7
In
A.
thaliana, more than 45,000 EST sequences have
been deposited in dbEST, including sequences from large-
scale EST projects promoted
by
two consortia
of
lab-
oratories, one
in
France
and the
other
in the
United
States.8"10 The French program generated sequence data
from ten kinds
of
cDNA libraries prepared from differ-
ent tissues, organs and developmental stages. They de-
posited
the 5'- and
3'-end sequences
of
approximately
6,000 non-redundant clones in dbEST.8'9 The U.S. group
produced 31,000 ESTs mainly from a single library made
from
a
mixture of mRNAs from four different tissues.10
These EST clones altogether cover approximately 34% of
the predicted genes on chromosome 4.3
To complement
the
EST data currently available
in
the public database and facilitate new gene discovery, we
constructed normalized and size-selected cDNA libraries
from five different tissues of A.
thaliana,
and accumulated
5'-end
and 3'-end ESTs.
by guest on June 1, 2013http://dnaresearch.oxfordjournals.org/Downloaded from
176A Large Scale EST Project in A. thaliana
[Vol.
7,
s
z
2500
2000
1500
1000
500 _n n
0.5-
1.0-1.5-
<2.0
2.0-
<2.5
2.5-
3.0-
<3.0 <3.5
Insert size (kb)
3.5-
<4.0
4.0-
<4.5
4.5-
<5.0
>5.0
Figure 1. Size distribution of the inserts in analyzed cDNA clones from the normalized (solid bars) and size-selected libraries (gray
bars) of aboveground organs.
2.
Materials and Methods
2.1.
Preparation of tissues
Arabidopsis thaliana Columbia accession was used for
analysis and was grown in soil under a 16 hr photope-
riod at 22°C. Aboveground organs were harvested from
2-
to 6-week-old plants. Flower buds and green siliques
were also collected from the soil-grown plants. For liq-
uid culture, sterile seeds were sown in medium [1/2 B5
medium,11
1/1000
HYPONeX (Hyponex Japan), 1% su-
crose, pH adjusted to 5.7] and grown under continuous
light at 22° C with rotation for 2 weeks. Seedlings and
roots were collected from the liquid-cultured plants.
RNA and construction of
2.2.
Isolation of poly(A)+
cDNA libraries
Total RNA was extracted from aboveground or-
gans,
flower buds, roots, and liquid-cultured seedlings
by the guanidium thiocyanate/CsCl ultracentrifugation
method, and from green siliques by the SDS/phenol
method as described previously.5'7 Purification of
poly(A)+ RNA, conversion to cDNA, and size-selection
of cDNA was performed as described.5 Normalization
was performed for the library containing 0.5- to 3-kb frag-
ments as described.5'12 The names of cDNA libraries re-
fer to the tissue used for construction: AP, aboveground
organs; FB, flower buds; RZ, roots; SQ, green siliques;
pAZNII, liquid-cultured seedlings.
2.3.
Template preparation and
sequencing
For generation of all the
5'-end
sequences as well as
some of the 3'-end sequences from the AP and RZ li-
braries, PCR amplified fragments were used as a tem-
plate. Vector-derived sequences were used as primers (5'-
TGTGCTGCAAGGCGATTAAGTTGGG-3', and 5'-
TCATTAGGCACCCCAGGCTTTACAC-3'), and PCR
was performed by Taq DNA polymerase (TaKaRa,
Japan) using a Perkin-Elmer 9600 Thermal Cycler:
30 cycles of 10 sec at 98CC, 6 min at 68°C, and a final
extension for 10 min at 72°C. The amplified products
were precipitated by adding 1/3 to the final volume of
20%
PEG6000 in 2.5 M NaCl. Plasmid DNA was used
as a template for generation of the rest of 5'- and 3'-end
sequences. Plasmid DNA preparation and insert size de-
termination of each clone was performed as described.5
Sequence reaction was performed by Dye Terminator,
dRhodamine Terminator, and BigDye Terminator Cy-
cle Sequencing Ready Reaction Kit (PE Applied Biosys-
tems,
USA) and electrophoresed on the automated DNA
sequencers (ABI PRISM 373 and 377XL, PE Applied
Biosystems, USA).
2.4-
Sequence
data analysis
Only the 3'-end sequences were subjected to the data
analysis process. The vector-derived sequence and am-
biguous sequences were removed from the collected EST
sequences prior to the computer-aided analyses. Each
sequence was translated into its amino acid sequences in
six frames and subjected to similarity search against the
non-redundant protein database provided by NCBI using
the BLAST algorithm.13 Similarity between a deduced
amino acid sequence and a known sequence was judged
to be significant when the P-value was less than
1.0~14.
To identify the number of independent EST species, clus-
tering of the EST sequences was performed. The 3'-end
sequences were compared with a dataset of itself using
by guest on June 1, 2013http://dnaresearch.oxfordjournals.org/Downloaded from
No.
3]
E. Asamizu
et al.
177
Table
1.
Number
of
5'-cnd
and
3'-end ESTs generated from
cDNA libraries
of
five different tissues.Table
2.
Classification
of
3'-end ESTs
by
similarity search against
the non-redundant protein database.
Tissue
Aboveground organs
Flower buds
Roots
Green siliqiies
Liquid-cultured seedlings
Total
Library type
Normalized
Size-selected
Normalized
Normalized
Size-selected
Normalized
Size-selected
Normalized
Number
of
5'
end
ESTs
4996
2172
ND
5798
245
ND
ND
815
14026
Number
of
3'
end
ESTs
6863
1753
5827
8505
3161
11843
909
346
39207
ND:
Not
determined
the BLASTN program
and
clones that showed over 95%
identity
for
more than
50 bp
were included
in the
same
group.
3.
Results
and
Discussion
3.1.
Quality
of
cDNA libraries
The size distribution
of the
inserts
in
cDNA
was an-
alyzed
for the
clones from
the
cDNA libraries
of
above-
ground organs.
As
shown
in Fig. 1,
72.3%
of the
clones
from
the
normalized library contained inserts
of 0.5 to
1.5
kb, while 96.0%
of
the clones from
the
size-selected
li-
brary
had
inserts longer than
2.5 kb. The
average insert-
length
of
the clones
in the
normalized library was
1.28 kb,
whereas that
of the
size-selected library
was 3.17 kb. It
is therefore evident that
the
size selection procedure
is
effective
for
generation
of
long cDNA species.
The quality
of the
libraries with respect
to the in-
tactness
of
cDNA
was
assessed
by
comparison
of the
5'-end
sequences
to
known protein sequences. Among
116 clones randomly chosen from
the
normalized library
and
122
clones from
the
size-selected library,
74
(63.8%)
and
85
(69.7%) were found
to
contain
a
translation
ini-
tiation codon, respectively, indicating that roughly
two-
thirds of
the
cDNA clones
are
full-length
in
both libraries.
This result shows that
the two
libraries contain
an
abun-
dance
of
intact cDNA species with shorter
and
longer
sizes.
However,
we
only assessed
the
quality
of
libraries
using those prepared from aboveground organs.
The
quality
may be
different among libraries from different
tissues.
3.2. Generation
of
ESTs
cDNA clones were randomly chosen from
the
cDNA
libraries constructed,
and a
total
of
14,026 clones were
Similarity
Genes
of
known function01
Hypothetical genes'"
No similarity"
Total
Number
of
clones
24892
5071
9244
39207
Number of non-
redundant ESTs
4816
1864
5348
12028
a) showed similarity to genes of known function, b) showed
similarity to hypothetical genes that have no definition of
function, c) showed no similarity
sequenced from
the
5'-ends
and
39,207 clones were
sc-
quenced from
the
3'-ends.
The
number
of
ESTs
gen-
erated from
the
respective libraries
are
summarized
in
Table
1. The GC
content
of the
randomly selected
659 ESTs (279,604 bases)
was
estimated
to be
43.4%.
To identify
the
number
of
independent
EST
species,
clustering
of the
3'-EST sequences
was
performed.
As a
result,
the
39,207 3'-EST sequences were clustered into
12,028 independent groups. This number
is
supposed
to
be close
to the
actual number
of
gene species represented
by ESTs. However,
a
more accurate number
of
indepen-
dent gene species should
be
obtained
by
allocating
the
EST sequences
on the
genome, because
the
stringency
used
for
clustering was
not
strict (95% identity
for
50
bp).
3.3.
Sequence
similarity
of
ESTs
When
the
non-redundant
EST
groups deduced from
the 3'-end ESTs were searched
for
similarity using
the non-redundant protein database, 6680 groups
had
significant similarity
to the
registered sequences
and
the remaining 5348 groups were novel. Among
the
6680
EST
groups with significant similarity, 4816 showed
similarity
to
genes with known function
and the re-
maining
1864 to
hypothetical genes, with
no
func-
tional definition largely predicted from
the A.
thaliana
genome sequences (Table
2).
Genes whose functions
could
be
predicted from
a
similarity search were clas-
sified according
to the
biological roles
or
biochemical
functions2
as
shown
in
Table
3. The
search results
of
the
individual clone
are
available
at the web
site,
http://www.kazusa.or.jp/en/plant/arabi/EST/.
3.4- Estimation
of
gene coverage
by
ESTs
Gene coverage
of the
non-redundant
EST
groups
was
investigated using
the
annotated genomic sequences,
10,009,832
bp in
length,
on
chromosomes
314 and 515
(http://www.kazusa.or.jp/kaos/).
The
sequences taken
were
of 1 PI
clone
on
chromosome
3, and 106 PI and
30
TAC
clones
on
chromosome
5.
Along
the
genomic
sequences, 2324 regions have been assigned
as
potential
protein-coding genes. Analysis indicated that
788
were
hit
by at
least
one EST
group.
In
addition,
135 EST
by guest on June 1, 2013http://dnaresearch.oxfordjournals.org/Downloaded from
178A Large Scale EST Project
in A.
thaliana [Vol.
7,
| Aboveground organs
15.3%
^M Flower buds
8.1%
I
I
Roots 21.29!
B
Green siliques
22.8%
I Liquid-cultured secdhncs
0.7'v
I 2
tissues
19.4%
I I 3 lissues 8.4%
I I 4 lissues 3.8%
I I 5 tissues 0.3%
Figure 2. The proportions of EST groups identified only in one of the five tissues and those identified in two to five tissues. The
proportion of each category are given as percentages and indicated by color codes.
Table 3. Classification of the non-redundant EST groups with
similarity to known protein genes by their functional categories.
Functional categories
Energy metabolism
Regulatory functions
Cellular structure, organization and biogenesis
Protein fate
Signal transduction
Protein synthesis
Transport and binding proteins
Cellular processes
Secondary metabolism
Growth and development
Fatty acid and phospholipid metabolism
Amino-acid biosynthesis
Environmental response
Pathogen responses
DNA metabolism
General transcription
Purines, pyrimidines, nucleosides, and nucleotides
Central intermediary metabolism
Biosynthesis of cofactors, prosthetic groups, and carriers
Other categories
Unclassified
Total
Number of non-
redundant groups
538
483
416
369
360
328
319
218
209
184
132
125
113
99
89
76
68
53
52
23
562
4816
groups could
be
located
at
regions where no gene assign-
ment
has
been done. Gene coverage
by
ESTs deposited
in the database was examined, and only 499 out of
923
re-
gions
hit by our
EST groups were found
to hit.
We also analyzed gene coverage by mapping of the EST
groups on the completed sequence
of
A.
thaliana
chromo-
some
2, on
which 4037 genes have been assigned.2
As a
result, 1775
EST
groups were allocated
on the
genomic
sequence, of which 626 groups were found
to
have similar
sequences in the registered ESTs. Although the gene cov-
erage data observed
for
different chromosomal sequences
can not
be
compared directly, the data obviously indicate
that
the
non-redundant ESTs generated
in
this project
contain many new cDNA species.
3.5. Classification
of
ESTs with respect
to
tissue-
specific
expression
To gain information
on the
expression specificity
of
genes identified
by
EST analysis,
the
occurrence
of
non-
redundant ESTs
in the
population
of
3'-end ESTs
gen-
erated from each tissue
was
counted.
The
12,028
non-
redundant
EST
groups were classified into nine cate-
gories:
The
groups identified only
in one of
the five
tis-
sues,
and
those identified
in
two
to
five tissues. The
per-
centages
of the EST
groups classified
in
each category
to
the
total non-redundant EST groups are shown
in the
pie chart
in Fig. 2.
Although
the
population
of the 3'-
end ESTs
in
each tissue
is not
large enough
to
speculate,
the proportion
of
EST groups identified only
in one of
the five tissues
was
surprisingly high (68.1%) compared
with those identified
in
multiple tissues.
The
result
im-
plies that
the
classified EST groups
are
good sources
for
finding genes with tissue-specific expression.
In
Table
4,
the identity
of
genes which
are
abundantly represented
by ESTs
in
four different tissues (aboveground organs,
flower buds, roots
and
green siliques)
is
listed. Some
genes
on
this list have been reported
to
show tissue-
specific expression.16"18 However, five kinds of ribosomal
protein genes
are
seen
in the
flower
bud
groups, indicat-
ing the necessity
of
further analysis with more large EST
populations.
The EST sequences reported
in
this paper appear
in the
GenBank/EMBL/DDBJ databanks with accession num-
bers AB038710-AB038726, AV439465-AV442830
and
AV517879-AV567728.
Acknowledments:
We
thank
A.
Watanabe,
T.
Wada,
N.
Nakazaki,
K.
Naruo,
M.
Ishikawa,
and M.
Yamada
for
excellent technical assistance. This work was
supported
by the
Kazusa DNA Research Institute Foun-
dation.
by guest on June 1, 2013http://dnaresearch.oxfordjournals.org/Downloaded from
No.
3] E.
Asamizu
et al. 179
Table
4. The
identity
of
genes abundantly represented
by
3'-end ESTs
in
aboveground organs, flower buds, roots
and
green siliques.
Number of
clones Definition
of
the
most similar sequence
Aboveground organs
8 COL2 [Arabidopsis thaliana]
8 strong similarity to Arabidopsis 2A6 (gb!X83096). [Arabidopsis thaliana]
6 ethylene-forming enzyme [Arabidopsis thaliana]
5 unknown
5 involved in starch metabalism [Solanum tuberosum]
5 unknown
5 nitrate reductase NR1 (393 AA) [Arabidopsis thaliana]
5 unknown
5 unknown
4 beta-l,3-glucanase 2 [Arabidopsis thaliana]
Flower buds
19 PsCLl
8
ribosomal preprotein (AA -49 to 96) [Pisum sativum]
12 unknown
11 similar to Prunus pcctinesterase (gb!X95991). [Arabidopsis thaliana]
11 ribosomal protein L30 [Lupinus luteus]
10 acidic ribosomal protein P3a [Zea mays]
8 anther-specific gene product; putative [Brassica campestris]17
8 putative ribosomal protein [Arabidopsis thaliana]
8 similar to lipid transfer protein [Brassica
rapa]17
7 putative ribosomal protein SI6 [Arabidopsis thaliana]
7 NAP16kDa protein [Arabidopsis thaliana]
Root
28 jasmonate inducible protein isolog [Arabidopsis thaliana]
18 peroxidase ATP1 la [Arabidopsis thaliana]
16 cucumisin [Arabidopsis thaliana]
15 cytochrome P450 monooxygenase [Arabidopsis thaliana]
14 peroxidase ATP8a [Arabidopsis thaliana]
13 putative plasma membrane-cell wall linker proteins [Arabidopsis thaliana]"5
12 flavonol synthase [Arabidopsis thaliana]
11 beta-glucosidase [Arabidopsis thaliana]
9 Dr4 [Arabidopsis thaliana]
8 ABC transporter (PDR5-like) isolog [Arabidopsis thalianai
Green siliques
29 APG protein isolog [Arabidopsis thaliana]
26 12S cruciferin seed storage protein [Arabidopsis thaliana]18
26 gamma-VPE [Arabidopsis thaliana]
24 thioesterase homolog [Arabidopsis thaliana]
24 dihydroflavonol 4-reductase [Arabidopsis thaliana]
17 putative pectinesterase [Arabidopsis thaliana]
14 germin-like protein [Arabidopsis thaliana]
13 12S storage protein CRB [Arabidopsis thaliana]18
11 unknown
10 putative protein [Arabidopsis thaliana]
References
6.
Asamizu,
E.,
Nakamura,
Y.,
Sato,
S., and
Tabata,
S.
2000,
Generation
of 7,137
Non-redundant Expressed
Se-
1.
Meinke,
D. W.,
Cherry,
J. M.,
Dean,
C. D.,
Rounsley,
S.,
quence Tags from
a
Legume, Lotus japonicus,
DNA Res.,
and Koornneef,
M. 1998,
Arabidopsis thaliana:
a
model
7,
127-130.
plant
for
genome analysis, Science,
282,
662-682.
7.
Nikaido,
I.,
Asamizu,
E.,
Nakajima,
M.,
Nakamura,
Y.,
2.
Lin, X.,
Kaul,
S.,
Rounsley,
S. et al. 1999,
Sequence Saga,
N., and
Tabata,
S.
2000, Construction
of a
gene
and analysis
of
chromosome
2 of the
plant Arabidopsis catalogue
of a
marine
red
alga, Porphyra yezoensis.
I.
thaliana, Nature,
402,
761-768. Generation
of
10,154 expressed sequence tags,
DNA Res.,
3.
Mayer,
K.,
Schiiller,
C,
Wambutt,
R. et al. 1999, Se-
this issue.
quence
and
analysis
of
chromosome
4 of the
plant
Ara- 8.
Hofte,
H.,
Desprez,
T.,
Amselem,
J. et
al. 1993,
An
inven-
bidopsis thaliana, Nature,
402,
769-777. tory
of
1152 expressed sequence tags obtained
by
partial
4.
Bevan,
M. 1997,
Objective:
the
complete sequence
of a
sequencing
of
cDNAs from
Arabidopsis
thaliana, Plant
J.,
plant genome, Plant Cell,
9,
476-478.
4,
1051-1061.
5.
Asamizu,
E.,
Nakamura,
Y.,
Sato,
S.,
Fukuzawa,
H., and 9.
Cooke,
R.,
Raynal,
M.,
Laudie,
M. et al. 1996,
Further
Tabata,
S.
1999,
A
large scale structural analysis
of
cD- progress towards
a
catalogue
of all
Arabidopsis genes:
NAs
in a
unicellular green alga, Chlamydomonas rein- analysis
of a set of
5000 non-redundant ESTs, Plant
J.,
hardtii.
I.
Generation
of
3433 non-redundant expressed
9,
101-124.
sequence tags,
DNA Res., 6,
369-373.
10.
Newman,
T., de
Bruijn,
F. J.,
Green, P.
et
al. 1994, Genes
galore:
a
summary
of
methods
for
accessing results from
by guest on June 1, 2013http://dnaresearch.oxfordjournals.org/Downloaded from
180A Large Scale EST Project in A. thaliana [Vol. 7,
large-scale partial sequencing of anonymous Arabidopsis
cDNA clones, Plant Physioi, 106, 1241-1255.
11.
Horsch, R. B., Fry, J., Hoffman, N., Neidermeyer, J.,
Rogers, S. G., and Fraley, R. T. 1988, Plant Molecular
Biology Manual, Kluwer Academic Publishers, A5: 1-9.
12.
Bonaldo, M. F., Lennon, G., and Soares, M. B. 1996, Nor-
malization and subtraction: two approaches to facilitate
gene discovery, Genome Res., 6, 791-806.
13.
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and
Lipman, D. J. 1990, Basic local alignment search tool, J.
Mol. Bioi, 215, 403-410.
14.
Sato, S., Nakamura, Y., Kaneko, T., Katoh, T., Asamizu,
E., and Tabata, S. 2000, Structural Analysis of
Arabidop-
sis thaliana Chromosome 3. I. Sequence Features of the
Regions of 4,504,864 bp Covered by Sixty PI and TAC
Clones, DNA Res., 7, 131-135.
15.
Sato, S., Nakamura, Y., Kaneko, T. et al. 2000,
Structural Analysis of
Arabidopsis
thaliana Chromosome
5.
X. Sequence Features of the Regions of 3,076,755 bp
Covered by Sixty PI and TAC Clones, DNA Res., 7, 31-
63.
16.
Neuteboom, L. W., Ng, J. M., Kuyper, M., Clijdesdale,
O. R., Hooykaas, P. J., and van der Zaal, B. J. 1999, Iso-
lation and characterization of cDNA clones correspond-
ing with mRNAs that accumulate during auxin-induced
lateral root formation, Plant Mol. Bioi, 39, 273-287.
17.
Kim, H. U. and Chung, T. Y. 1997, Characterization
of three anther-specific genes isolated from Chinese cab-
bage,
Plant Mol. Bioi., 33, 193-198.
18.
Parcy, F., Valon, C, Kohara, A., Misera, S.,
and Giraudat, J. 1997, The ABSCISIC ACID-
INSENSITIVE3, FUSCA3, and LEAFY COTYLE-
D0N1 loci act in concert to control multiple aspects of
Arabidopsis seed development, Plant Cell, 9, 1265-1277.
by guest on June 1, 2013http://dnaresearch.oxfordjournals.org/Downloaded from
... These genome sequencing programs have been complemented by rapid gene discovery from large-scale EST sequencing in Arabidopsis (White et al., 2000;Asamizu et al. 2000b;Seki et al., 2001b;Seki et al., 2002) and rice (Goff, 1999;Ewing et al., 1999). To improve the accuracy of genomic sequence annotation of predicted transcription units and gene products, current efforts are focused on the characterization of full-length complementary DNAs (cDNAs) in Arabidopsis (Seki et al., 2002). ...
... This level of defense-related gene expression is quite high. Indeed other EST studies have shown that a more normal level of defenserelated genes is about 6% of ESTs (Asamizu et al. 2000, Kim et al. 2001. ...
Article
Full-text available
We have characterized the major nectar protein (Nectarin I) from ornamental tobacco as a superoxide dismutase that functions to generate high levels of hydrogen peroxide in nectar. Other nectar functions include an anti-polygalacturonase activity that may be due to a polygalacturonase inhibiting protein (PGIP). We also examined the expression of defense related genes in the nectary gland by two independent methods. We isolated a sample of nectary-expressed cDNAs and found that 21% of these cDNAs were defense related clones. Finally, we examined the expression of a number of specific defense-related genes by hybridization to specific cDNAs. These results demonstrated that a number of specific defense genes were more strongly expressed in the floral nectary than in the foliage. Taken together these results indicate that the floral nectary gland can have specific functions in plant defense.
Article
The xyloglucan Endotransglucosylase/hydrolase (XTH) genes are proposed to encode enzymes responsible for cleaving and reattaching xyloglucan polymers. Despite prior identification of the XTH gene family in Arabidopsis and rice, the XTH family in upland cotton, a tetraploid plant whose fiber cell is an excellent model for the study of plant cell elongation, is yet uncharacterized. In this study, iron tetroxide based magnetic nanobead (Fe 3 O 4 NPs) was successfully prepared and applied to extract xyloglucan endoglucosidase/hydrolase genes. Analysis of the genes can provide insight into the evolutionary significance and function of the XTH gene family. A total of 41 XTH genes found by searching the phytozomev 10 database were classified into three groups based on their phylogeny and the motifs of individual genes. The 25 and 5 GhXTH genes occurred as clusters resulting from the segmental and tandem duplication. More frequent duplication events in cotton contributed to the expansion of the family. Global microarray analysis of GhXTH gene expression in cotton fibers showed that 18 GhXTH genes could be divided into two clusters and four subclusters based on their expression patterns. Accumulated expression levels were relatively high at the elongation stages of the cotton fibers, suggesting that cotton fiber elongation requires high amounts of the GhXTH protein. The expression profiles of GhXTH3 and GhXTH4 showed by quantitative realtime PCR were similar to those determined by microarray. Additionally, the expression levels of GhXTH3 and GhXTH4 in Gossypium barbadense were higher than those in Gossypium hirsutum at developmental stages, indicating that expression levels of GhXTH3 and GhXTH4 in fibers varied among cultivars differing in fiber length.
Article
Full-text available
Mulberry ( Morus alba L.) represents one of the most commonly utilized plants in traditional medicine and as a nutritional plant used worldwide. The polyhydroxylated alkaloid 1-deoxynojirimycin (DNJ) is the major bioactive compounds of mulberry in treating diabetes. However, the DNJ content in mulberry is very low. Therefore, identification of key genes involved in DNJ alkaloid biosynthesis will provide a basis for the further analysis of its biosynthetic pathway and ultimately for the realization of synthetic biological production. Here, two cDNA libraries of mulberry leaf samples with different DNJ contents were constructed. Approximately 16 Gb raw RNA-Seq data was generated and de novo assembled into 112,481 transcripts, with an average length of 766 bp and an N50 value of 1,392. Subsequently, all unigenes were annotated based on nine public databases; 11,318 transcripts were found to be significantly differentially regulated. A total of 38 unique candidate genes were identified as being involved in DNJ alkaloid biosynthesis in mulberry, and nine unique genes had significantly different expression. Three key transcripts of DNJ biosynthesis were identified and further characterized using RT-PCR; they were assigned to lysine decarboxylase and primary-amine oxidase genes. Five CYP450 transcripts and two methyltransferase transcripts were significantly associated with DNJ content. Overall, the biosynthetic pathway of DNJ alkaloid was preliminarily speculated.
Patent
Full-text available
Abstract: Disclosed are constructs comprising sequences encoding 3-hydroxy-3methylglutaryl-Coenzyme A reductase and at least one other sterol synthesis pathway enzyme. Also disclosed are methods for using such constructs to alter sterol production and content in cells, plants, seeds and storage organs of plants. Also provided are oils and compositions containing altered sterol levels produced by use of the disclosed constructs. Novel nucleotide sequences useful in the alteration of sterol production are also provided. Also provided are cells, plants, seeds and storage organs of plants comprising sequences encoding 3-hydroxy-3methylglutaryl-Coenzyme A reductase, at least one other sterol synthesis pathway enzyme and at least one tocopherol synthesis enzyme.
Chapter
Advances in DNA cloning and sequencing technologies have allowed the performance of comprehensive analysis of genetic information in various flowering plants of biological and agronomical importance. Among them, Arabidopsis thaliana, a member of the Brassica family, was chosen as a plant most suitable for genomic sequencing (Goodman et al. 1995; Meinke et al. 1998), because the estimated genome size of 125–130Mb is the smallest among known higher plants and the content of repetitive sequences was assumed to be low. Its short life cycle (average 60 days) and prodigious seed production are the characteristics which make this small plant an ideal model organism in which to analyze metabolism, development, stress responses, and disease resistance in all the flowering plants
Chapter
One of the most epoch-making accomplishments in plant genetics in the 20th century was the completion of genome sequencing of Arabidopsis thaliana (The Arabidopsis-Genome Initiative 2000). As a consequence, an enormous amount of information on gene structures and their functions have been and are still being accumulated in this organism. Nevertheless, other plant species have their own characteristics and advantages for the study of individual biological phenomena. Further, comparison of knowledge from A. thaliana and that from other plant species is a promising approach for obtaining universal knowledge on the genetic systems in all plants.
Chapter
Jasmonates are known as growth regulators, which have cyclopentanone or cyclopentenone ring, and synthesized through lipoxygenase pathway. Jasmonates are widely distributed in the plant kingdom and modulates wounding responses, disease responses, and anther development. Some kinds of jasmonates were shown to have specific effects on plants. 12-oxo-phytodienoic acid (OPDA), a precursor for jasmonic acid (JA) biosynthesis, promotes tendril coiling, and this effect is stronger than MeJA (Falkenstein E., et al., 1991). However, except for the fact, little is known about the OPDA specific function in various physiological events.
Article
Heterologous hybridization was carried out with an Arabidopsis macroarray and cDNAs synthesized from total RNAs of the liverwort Marchantia polymorpha and the moss Physcomitrella patens. Total RNA isolated from A. thaliana plants was also used. The macroarray contained 5,760 Arabidopsis ESTs, corresponding to 4,372 genes. Intra- and inter-filter variations showed less than 2-fold range for almost all of the spots. Genes numbering 1,647 (37.7%), 1,427 (32.6%), and 1,217 (27.8%) had hybridization signals with intensities three-fold greater than that of A DNA (negative control), and were thus defined as expressed in A. thaliana plants, M. polymorpha thalli, and P. patens protonemata, respectively. Seventy-nine percent of the genes expressed in M. polymorpha were also expressed in P. patens. Overall, the three species had 763 expressed genes in common. Twenty-five co-expressed genes were chosen, based on their high expression levels in M. polymorpha, and 17 in P. patens EST clones related to these genes were identified, each showing more than 60% identity with the corresponding A. thaliana gene at the nucleotide level. Three hundred and sixty three genes were detected in bryophytes but not in A. thaliana. Of the 25 highly expressed bryophyte-specific genes, 14 had the P. patens EST homologs with greater than 60% identity to an A. thaliana gene. These results suggest that hybridization of Arabidopsis macroarrays with heterologous cDNA is a useful tool for gene expression profiling of distantly related plant species such as bryophytes.
Article
Full-text available
To isolate and analyze salt-stress inducible genes in a halophyte, sea aster (Aster tripolium L.), we screened 5760 Arabidopsis cDNA clones by macroarray procedure using 33P-labeled cDNA targets synthesized from mRNAs isolated from NaCl treated and untreated sea aster seedlings. Seventeen Arabidopsis cDNAs were hetero - hybridized to NaCl inducible sea aster genes. These cDNAs were used as probes to isolate cDNA homologs from a sea aster cDNA library. One of the obtained cDNAs shared 71% amino acids identity with Arabidopsis cysteine protease (AtCysP) and named SaCysP (sea aster CysP). Northern blot analysis revealed that mRNAs corresponding to both SaCysP and AtCysP were induced by salt and osmotic stress in leaves. On the other hand, SASR21 mRNA encoding another CysP in sea aster was irresponsive to these stress in leaves but respond in roots. SaCysP and SASR21 genes may have a tissue-specific function in stress response by modulating their expression levels.
Article
Full-text available
Arabidopsis thaliana (Arabidopsis) is unique among plant model organisms in having a small genome (130–140 Mb), excellent physical and genetic maps, and little repetitive DNA. Here we report the sequence of chromosome 2 from the Columbia ecotype in two gap-free assemblies (contigs) of 3.6 and 16 megabases (Mb). The latter represents the longest published stretch of uninterrupted DNA sequence assembled from any organism to date. Chromosome 2 represents 15% of the genome and encodes 4,037 genes, 49% of which have no predicted function. Roughly 250 tandem gene duplications were found in addition to large-scale duplications of about 0.5 and 4.5 Mb between chromosomes 2 and 1 and between chromosomes 2 and 4, respectively. Sequencing of nearly 2 Mb within the genetically defined centromere revealed a low density of recognizable genes, and a high density and diverse range of vestigial and presumably inactive mobile elements. More unexpected is what appears to be a recent insertion of a continuous stretch of 75% of the mitochondrial genome into chromosome 2.
Article
Full-text available
To understand genetic information carried in a unicellular green alga, Chlamydomonas reinhardtii, normalized and size-selected cDNA libraries were constructed from cells at photoautotrophic growth, and a total of 11,571 5′-end sequence tags were established. These sequences were grouped into 3433 independent EST species. Similarity search against the public non-redundant protein database indicated that 817 groups showed significant similarity to registered sequences, of which 140 were of previously identified C. reinhardtii genes, but the remaining 2616 species were novel sequences. The coverage of full-length protein coding regions was estimated to be over 60%. These cDNA clones and EST sequence information will provide a powerful source for the future genome-wide functional analysis of uncharacterized genes.
Article
Full-text available
A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent mathematical results on the stochastic properties of MSP scores allow an analysis of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the analysis of multiple regions of similarity in long DNA sequences. In addition to its flexibility and tractability to mathematical analysis, BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.
Article
Full-text available
High-throughput automated partial sequencing of anonymous cDNA clones provides a method to survey the repertoire of expressed genes from an organism. Comparison of the coding capacity of these expressed sequence tags (ESTs) with the sequences in the public data bases results in assignment of putative function to a significant proportion of the ESTs. Thus, the more than 13,400 plant ESTs that are currently available provide a new resource that will facilitate progress in many areas of plant biology. These opportunities are illustrated by a description of the results obtained from analysis of 1500 Arabidopsis ESTs from a cDNA library prepared from equal portions of poly(A+) mRNA from etiolated seedlings, roots, leaves, and flowering inflorescences. More than 900 different sequences were represented, 32% of which showed significant nucleotide or deduced amino acid sequences similarity to previously characterized genes or proteins from a wide range of organisms. At least 165 of the clones had significant deduced amino acid sequence homology to proteins or gene products that have not been previously characterized from higher plants. A summary of methods for accessing the information and materials generated by the Arabidopsis cDNA sequencing project is provided.
Article
Full-text available
Large-scale sequencing of cDNAs randomly picked from libraries has proven to be a very powerful approach to discover (putatively) expressed sequences that, in turn, once mapped, may greatly expedite the process involved in the identification and cloning of human disease genes. However, the integrity of the data and the pace at which novel sequences can be identified depends to a great extent on the cDNA libraries that are used. Because altogether, in a typical cell, the mRNAs of the prevalent and intermediate frequency classes comprise as much as 50-65% of the total mRNA mass, but represent no more than 1000-2000 different mRNAs, redundant identification of mRNAs of these two frequency classes is destined to become overwhelming relatively early in any such random gene discovery programs, thus seriously compromising their cost-effectiveness. With the goal of facilitating such efforts, previously we developed a method to construct directionally cloned normalized cDNA libraries and applied it to generate infant brain (INIB) and fetal liver/spleen (INFLS) libraries, from which a total of 45,192 and 86,088 expressed sequence tags, respectively, have been derived. While improving the representation of the longest cDNAs in our libraries, we developed three additional methods to normalize cDNA libraries and generated over 35 libraries, most of which have been contributed to our integrated Molecular Analysis of Genomes and Their Expression (IMAGE) Consortium and thus distributed widely and used for sequencing and mapping. In an attempt to facilitate the process of gene discovery further, we have also developed a subtractive hybridization approach designed specifically to eliminate (or reduce significantly the representation of) large pools of arrayed and (mostly) sequenced clones from normalized libraries yet to be (or just partly) surveyed. Here we present a detailed description and a comparative analysis of four methods that we developed and used to generate normalize cDNA libraries from human (15), mouse (3), rat (2), as well as the parasite Schistosoma mansoni (1). In addition, we describe the construction and preliminary characterization of a subtracted liver/spleen library (INFLS-SI) that resulted from the elimination (or reduction of representation) of -5000 INFLS-IMAGE clones from the INFLS library.
Article
As part of the goal to generate a detailed transcript map for Arabidopsis thaliana, 1152 single run sequences (expressed sequence tags or ESTs) have been determined from cDNA clones taken at random in libraries prepared from different sources of plant material: developing siliques, etiolated seedlings, flower buds, and cultured cells. Eight hundred and ninety-five different genes could be identified, 32% of which showed significant similarity to existing sequences in Arabidopsis and an array of other organisms. These sequences in combination with their positioning on the Arabidopsis genetic map will not only constitute a new set of molecular markers for genome analysis in Arabidopsis but also provide a direct route for the in vivo analysis of their gene products. The sequences have been made available to the public databases.
Article
Nearly 7000 Arabidopsis thaliana-expressed sequence tags (ESTs) from 10 cDNA libraries have been sequenced, of which almost 5000 non-redundant tags have been submitted to the EMBL data bank. The quality of the cDNA libraries used is analysed. Similarity searches in international protein data banks have allowed the detection of significant similarities to a wide range of proteins from many organisms. Alignment with ESTs from the rice systematic sequencing project has allowed the detection of amino acid motifs which are conserved between the two organisms, thus identifying tags to genes encoding highly conserved proteins. These genes are candidates for a common framework in genome mapping projects in different plants.
Article
Two cDNA libraries were constructed from poly(A)+ RNAs isolated from each of immature flowers (less than 2.0 mm long buds) and anthers (2.0-5.0 mm long buds) of Chinese cabbage (Brassica campestris L. ssp. pekinensis). Using dot-differential hybridization, three cDNA clones, designated BIF38, BAN54, and BAN237, have been isolated from the constructed cDNA libraries and sequenced completely in both directions. Northern blot analyses indicate that all three cDNA clones are abundantly expressed in anther, but not in leaf or other floral organs. The deduced amino acid sequences of BIF38, BAN54, and BAN237 showed high identity with those of known anther-specific genes. Especially the deduced amino acid sequence of BIF38 has 98% identity with that of a phospholipid protein gene (E2) from Brassica napus. Also, the deduced amino acid sequences of BAN54 and BAN237 are similar to the sequences of microspore-specific genes (Bp4A and Bp4C) and pollen oleosins (13, pol3 and C98), respectively. Southern blot analyses revealed that all three genes belong to multiple gene families in the Chinese cabbage genome.