ArticlePDF Available

A Large Scale Analysis of cDNA in Arabidopsis thaliana: Generation of 12,028 Non-redundant Expressed Sequence Tags from Normalized and Size-selected cDNA Libraries

July 2000
DNA Research 7(3)

July 2000
7(3)

DOI:10.1093/dnares/7.3.175

Authors:

Erika Asamizu

Ryukoku University

Yasukazu Nakamura

National Institute of Genetics

Shusei Sato

Tohoku University

For comprehensive analysis of genes expressed in the model dicotyledonous plant, Arabidopsis thaliana, expressed sequence tags (ESTs) were accumulated. Normalized and size-selected cDNA libraries were constructed from aboveground organs, flower buds, roots, green siliques and liquid-cultured seedlings, respectively, and a total of 14,026 5′-end ESTs and 39,207 3′-end ESTs were obtained. The 3′-end ESTs could be clustered into 12,028 non-redundant groups. Similarity search of the non-redundant ESTs against the public non-redundant protein database indicated that 4816 groups show similarity to genes of known function, 1864 to hypotheticalgenes, and the remaining 5348 are novel sequences. Gene coverage by the non-redundant ESTs was analyzed using the annotated genomic sequences of approximately 10 Mb on chromosomes 3 and 5. A total of 923 regions were hit by at least one EST, among which only 499 regions were hit by the ESTs deposited in the public database. The result indicates that the EST source generated in this project complements the EST data in the public database and facilitates new gene discovery. The EST sequence data of individual cDNA clones are available at the web site: http://www.kazusa.or.jp/en/plant/arabi/EST/.

. Classification of 3'-end ESTs by similarity search against the non-redundant protein database.

…

. The identity of genes abundantly represented by 3'-end ESTs in aboveground organs, flower buds, roots and green siliques.

…

Figures - uploaded by Shusei Sato

Content may be subject to copyright.

Content uploaded by Shusei Sato

Content may be subject to copyright.

DNA RESEARCH

175-180 (2000)

A Large Scale Analysis of cDNA in Arabidopsis thaliana:

Generation of 12,028 Non-redundant Expressed Sequence Tags

from Normalized and Size-selected cDNA Libraries

Erika ASAMIZU, Yasukazu NAKAMURA, Shusei SATO, and Satoshi TABATA*

Kazusa DNA

Research

Institute, 1532-3 Yana, Kisarazu,

Chiba

292-0812,

Japan

(Received 19 May 2000)

Abstract

For comprehensive analysis of genes expressed in the model dicotyledonous plant, Arabidopsis thaliana,

expressed sequence tags (ESTs) were accumulated. Normalized and size-selected cDNA libraries were con-

structed from aboveground organs, flower buds, roots, green siliques and liquid-cultured seedlings, respec-

tively, and

total of 14,026

5'-end

ESTs and 39,207 3'-end ESTs were obtained. The 3'-end ESTs could be

clustered into 12,028 non-redundant groups. Similarity search of the non-redundant ESTs against the public

non-redundant protein database indicated that 4816 groups show similarity to genes of known function, 1864

to hypothetical genes, and the remaining 5348 are novel sequences. Gene coverage by the non-redundant

ESTs was analyzed using the annotated genomic sequences of approximately 10 Mb on chromosomes 3 and

A total of

923

regions were hit by

least one EST, among which only 499 regions were hit by the ESTs

deposited in the public database. The result indicates that the EST source generated in this project com-

plements the EST data in the public database and facilitates new gene discovery. The EST sequence data

of individual cDNA clones are available

the web site: http://www.kazusa.or.jp/en/plant/arabi/EST/.

Key words:

Arabidopsis

thaliana;

cDNA; EST

Introduction

Arabidopsis thaliana

has been adopted as

model or-

ganism in the study of plant biology since

has the ad-

vantages

small size, short generation time, and ease

of transformation.1 Because the

thaliana genome

the smallest genome among known higher plant species

(130-140 Mb),2'3 the genome sequencing project of this

plant

underway

as a

joint project

Japan, Europe,

and the United States.4 To date, two of five chromosomes

(chromosomes

and 4) have been sequenced except

for

the nucleolar organizer regions and centromeres,2'3 and

sequencing

the remaining three chromosomes

near

completion.

Under these circumstances,

the

accurate assignment

of protein coding regions on the genomic sequence gains

importance

as the

logical next step.

this respect,

information

cDNA structure

essential. Also,

comprehensive analysis

cDNA sequences

is an

effec-

tive way

catalogue genes expressed

in an

organism

with

large genome.

large number

EST

(ex-

pressed sequence tag) sequences

several crop plants

Communicated by Mituru Takanami

* To whom correspondence should

addressed. Tel. +81-438-

52-3933,

Fax. +81-438-52-3934, E-mail: tabata@kazusa.or.jp

have been deposited

in the

public database, dbEST

(http://www.ncbi.nlm.nih.gov/dbEST/dbESTj3ummary.

html).

addition, EST accumulation of several model

plants have been initiated.5"7

thaliana, more than 45,000 EST sequences have

been deposited in dbEST, including sequences from large-

scale EST projects promoted

two consortia

lab-

oratories, one

France

and the

other

in the

United

States.8"10 The French program generated sequence data

from ten kinds

cDNA libraries prepared from differ-

ent tissues, organs and developmental stages. They de-

posited

the 5'- and

3'-end sequences

approximately

6,000 non-redundant clones in dbEST.8'9 The U.S. group

produced 31,000 ESTs mainly from a single library made

from

mixture of mRNAs from four different tissues.10

These EST clones altogether cover approximately 34% of

the predicted genes on chromosome 4.3

To complement

the

EST data currently available

the public database and facilitate new gene discovery, we

constructed normalized and size-selected cDNA libraries

from five different tissues of A.

thaliana,

and accumulated

5'-end

and 3'-end ESTs.

by guest on June 1, 2013http://dnaresearch.oxfordjournals.org/Downloaded from

176A Large Scale EST Project in A. thaliana

[Vol.

2500

2000

1500

1000

500• _n n

0.5-

1.0-1.5-

<2.0

2.0-

<2.5

2.5-

3.0-

<3.0 <3.5

Insert size (kb)

3.5-

<4.0

4.0-

<4.5

4.5-

<5.0

>5.0

Figure 1. Size distribution of the inserts in analyzed cDNA clones from the normalized (solid bars) and size-selected libraries (gray

bars) of aboveground organs.

Materials and Methods

2.1.

Preparation of tissues

Arabidopsis thaliana Columbia accession was used for

analysis and was grown in soil under a 16 hr photope-

riod at 22°C. Aboveground organs were harvested from

to 6-week-old plants. Flower buds and green siliques

were also collected from the soil-grown plants. For liq-

uid culture, sterile seeds were sown in medium [1/2 B5

medium,11

1/1000

HYPONeX (Hyponex Japan), 1% su-

crose, pH adjusted to 5.7] and grown under continuous

light at 22° C with rotation for 2 weeks. Seedlings and

roots were collected from the liquid-cultured plants.

RNA and construction of

2.2.

Isolation of poly(A)+

cDNA libraries

Total RNA was extracted from aboveground or-

gans,

flower buds, roots, and liquid-cultured seedlings

by the guanidium thiocyanate/CsCl ultracentrifugation

method, and from green siliques by the SDS/phenol

method as described previously.5'7 Purification of

poly(A)+ RNA, conversion to cDNA, and size-selection

of cDNA was performed as described.5 Normalization

was performed for the library containing 0.5- to 3-kb frag-

ments as described.5'12 The names of cDNA libraries re-

fer to the tissue used for construction: AP, aboveground

organs; FB, flower buds; RZ, roots; SQ, green siliques;

pAZNII, liquid-cultured seedlings.

2.3.

Template preparation and

sequencing

For generation of all the

5'-end

sequences as well as

some of the 3'-end sequences from the AP and RZ li-

braries, PCR amplified fragments were used as a tem-

plate. Vector-derived sequences were used as primers (5'-

TGTGCTGCAAGGCGATTAAGTTGGG-3', and 5'-

TCATTAGGCACCCCAGGCTTTACAC-3'), and PCR

was performed by Taq DNA polymerase (TaKaRa,

Japan) using a Perkin-Elmer 9600 Thermal Cycler:

30 cycles of 10 sec at 98CC, 6 min at 68°C, and a final

extension for 10 min at 72°C. The amplified products

were precipitated by adding 1/3 to the final volume of

20%

PEG6000 in 2.5 M NaCl. Plasmid DNA was used

as a template for generation of the rest of 5'- and 3'-end

sequences. Plasmid DNA preparation and insert size de-

termination of each clone was performed as described.5

Sequence reaction was performed by Dye Terminator,

dRhodamine Terminator, and BigDye Terminator Cy-

cle Sequencing Ready Reaction Kit (PE Applied Biosys-

tems,

USA) and electrophoresed on the automated DNA

sequencers (ABI PRISM 373 and 377XL, PE Applied

Biosystems, USA).

2.4-

Sequence

data analysis

Only the 3'-end sequences were subjected to the data

analysis process. The vector-derived sequence and am-

biguous sequences were removed from the collected EST

sequences prior to the computer-aided analyses. Each

sequence was translated into its amino acid sequences in

six frames and subjected to similarity search against the

non-redundant protein database provided by NCBI using

the BLAST algorithm.13 Similarity between a deduced

amino acid sequence and a known sequence was judged

to be significant when the P-value was less than

1.0~14.

To identify the number of independent EST species, clus-

tering of the EST sequences was performed. The 3'-end

sequences were compared with a dataset of itself using

by guest on June 1, 2013http://dnaresearch.oxfordjournals.org/Downloaded from

No.

E. Asamizu

et al.

177

Table

Number

5'-cnd

and

3'-end ESTs generated from

cDNA libraries

five different tissues.Table

Classification

3'-end ESTs

similarity search against

the non-redundant protein database.

Tissue

Aboveground organs

Flower buds

Roots

Green siliqiies

Liquid-cultured seedlings

Total

Library type

Normalized

Size-selected

Normalized

Size-selected

Normalized

Size-selected

Normalized

Number

end

ESTs

4996

2172

5798

245

815

14026

Number

end

ESTs

6863

1753

5827

8505

3161

11843

909

346

39207

ND:

Not

determined

the BLASTN program

and

clones that showed over 95%

identity

for

more than

50 bp

were included

in the

same

group.

Results

and

Discussion

3.1.

Quality

cDNA libraries

The size distribution

of the

inserts

cDNA

was an-

alyzed

for the

clones from

the

cDNA libraries

above-

ground organs.

shown

in Fig. 1,

72.3%

of the

clones

from

the

normalized library contained inserts

of 0.5 to

1.5

kb, while 96.0%

the clones from

the

size-selected

li-

brary

had

inserts longer than

2.5 kb. The

average insert-

length

the clones

in the

normalized library was

1.28 kb,

whereas that

of the

size-selected library

was 3.17 kb. It

is therefore evident that

the

size selection procedure

effective

for

generation

long cDNA species.

The quality

of the

libraries with respect

to the in-

tactness

cDNA

was

assessed

comparison

of the

5'-end

sequences

known protein sequences. Among

116 clones randomly chosen from

the

normalized library

and

122

clones from

the

size-selected library,

(63.8%)

and

(69.7%) were found

contain

translation

ini-

tiation codon, respectively, indicating that roughly

two-

thirds of

the

cDNA clones

are

full-length

both libraries.

This result shows that

the two

libraries contain

abun-

dance

intact cDNA species with shorter

and

longer

sizes.

However,

only assessed

the

quality

libraries

using those prepared from aboveground organs.

The

quality

may be

different among libraries from different

tissues.

3.2. Generation

ESTs

cDNA clones were randomly chosen from

the

cDNA

libraries constructed,

and a

total

14,026 clones were

Similarity

Genes

known function01

Hypothetical genes'"

No similarity"

Total

Number

clones

24892

5071

9244

39207

Number of non-

redundant ESTs

4816

1864

5348

12028

a) showed similarity to genes of known function, b) showed

similarity to hypothetical genes that have no definition of

function, c) showed no similarity

sequenced from

the

5'-ends

and

39,207 clones were

sc-

quenced from

the

3'-ends.

The

number

ESTs

gen-

erated from

the

respective libraries

are

summarized

Table

1. The GC

content

of the

randomly selected

659 ESTs (279,604 bases)

was

estimated

to be

43.4%.

To identify

the

number

independent

EST

species,

clustering

of the

3'-EST sequences

was

performed.

As a

result,

the

39,207 3'-EST sequences were clustered into

12,028 independent groups. This number

supposed

be close

to the

actual number

gene species represented

by ESTs. However,

more accurate number

indepen-

dent gene species should

obtained

allocating

the

EST sequences

on the

genome, because

the

stringency

used

for

clustering was

not

strict (95% identity

for

bp).

3.3.

Sequence

similarity

ESTs

When

the

non-redundant

EST

groups deduced from

the 3'-end ESTs were searched

for

similarity using

the non-redundant protein database, 6680 groups

had

significant similarity

to the

registered sequences

and

the remaining 5348 groups were novel. Among

the

6680

EST

groups with significant similarity, 4816 showed

similarity

genes with known function

and the re-

maining

1864 to

hypothetical genes, with

func-

tional definition largely predicted from

the A.

thaliana

genome sequences (Table

2).

Genes whose functions

could

predicted from

similarity search were clas-

sified according

to the

biological roles

biochemical

functions2

shown

Table

3. The

search results

the

individual clone

are

available

at the web

site,

http://www.kazusa.or.jp/en/plant/arabi/EST/.

3.4- Estimation

gene coverage

ESTs

Gene coverage

of the

non-redundant

EST

groups

was

investigated using

the

annotated genomic sequences,

10,009,832

bp in

length,

chromosomes

314 and 515

(http://www.kazusa.or.jp/kaos/).

The

sequences taken

were

of 1 PI

clone

chromosome

3, and 106 PI and

TAC

clones

chromosome

Along

the

genomic

sequences, 2324 regions have been assigned

potential

protein-coding genes. Analysis indicated that

788

were

hit

by at

least

one EST

group.

addition,

135 EST

by guest on June 1, 2013http://dnaresearch.oxfordjournals.org/Downloaded from

178A Large Scale EST Project

in A.

thaliana [Vol.

| Aboveground organs

15.3%

^M Flower buds

8.1%

Roots 21.29!

•B

Green siliques

22.8%

I Liquid-cultured secdhncs

0.7'v

•I 2

tissues

19.4%

I I 3 lissues 8.4%

I I 4 lissues 3.8%

I I 5 tissues 0.3%

Figure 2. The proportions of EST groups identified only in one of the five tissues and those identified in two to five tissues. The

proportion of each category are given as percentages and indicated by color codes.

Table 3. Classification of the non-redundant EST groups with

similarity to known protein genes by their functional categories.

Functional categories

Energy metabolism

Regulatory functions

Cellular structure, organization and biogenesis

Protein fate

Signal transduction

Protein synthesis

Transport and binding proteins

Cellular processes

Secondary metabolism

Growth and development

Fatty acid and phospholipid metabolism

Amino-acid biosynthesis

Environmental response

Pathogen responses

DNA metabolism

General transcription

Purines, pyrimidines, nucleosides, and nucleotides

Central intermediary metabolism

Biosynthesis of cofactors, prosthetic groups, and carriers

Other categories

Unclassified

Total

Number of non-

redundant groups

538

483

416

369

360

328

319

218

209

184

132

125

113

562

4816

groups could

located

regions where no gene assign-

ment

has

been done. Gene coverage

ESTs deposited

in the database was examined, and only 499 out of

923

re-

gions

hit by our

EST groups were found

to hit.

We also analyzed gene coverage by mapping of the EST

groups on the completed sequence

thaliana

chromo-

some

2, on

which 4037 genes have been assigned.2

As a

result, 1775

EST

groups were allocated

on the

genomic

sequence, of which 626 groups were found

have similar

sequences in the registered ESTs. Although the gene cov-

erage data observed

for

different chromosomal sequences

can not

compared directly, the data obviously indicate

that

the

non-redundant ESTs generated

this project

contain many new cDNA species.

3.5. Classification

ESTs with respect

tissue-

specific

expression

To gain information

on the

expression specificity

genes identified

EST analysis,

the

occurrence

non-

redundant ESTs

in the

population

3'-end ESTs

gen-

erated from each tissue

was

counted.

The

12,028

non-

redundant

EST

groups were classified into nine cate-

gories:

The

groups identified only

in one of

the five

tis-

sues,

and

those identified

two

five tissues. The

per-

centages

of the EST

groups classified

each category

the

total non-redundant EST groups are shown

in the

pie chart

in Fig. 2.

Although

the

population

of the 3'-

end ESTs

each tissue

is not

large enough

speculate,

the proportion

EST groups identified only

in one of

the five tissues

was

surprisingly high (68.1%) compared

with those identified

multiple tissues.

The

result

im-

plies that

the

classified EST groups

are

good sources

for

finding genes with tissue-specific expression.

Table

the identity

genes which

are

abundantly represented

by ESTs

four different tissues (aboveground organs,

flower buds, roots

and

green siliques)

listed. Some

genes

this list have been reported

show tissue-

specific expression.16"18 However, five kinds of ribosomal

protein genes

are

seen

in the

flower

bud

groups, indicat-

ing the necessity

further analysis with more large EST

populations.

The EST sequences reported

this paper appear

in the

GenBank/EMBL/DDBJ databanks with accession num-

bers AB038710-AB038726, AV439465-AV442830

and

AV517879-AV567728.

Acknowledments:

thank

Watanabe,

Wada,

Nakazaki,

Naruo,

Ishikawa,

and M.

Yamada

for

excellent technical assistance. This work was

supported

by the

Kazusa DNA Research Institute Foun-

dation.

by guest on June 1, 2013http://dnaresearch.oxfordjournals.org/Downloaded from

No.

3] E.

Asamizu

et al. 179

Table

4. The

identity

genes abundantly represented

3'-end ESTs

aboveground organs, flower buds, roots

and

green siliques.

Number of

clones Definition

the

most similar sequence

Aboveground organs

8 COL2 [Arabidopsis thaliana]

8 strong similarity to Arabidopsis 2A6 (gb!X83096). [Arabidopsis thaliana]

6 ethylene-forming enzyme [Arabidopsis thaliana]

5 unknown

5 involved in starch metabalism [Solanum tuberosum]

5 unknown

5 nitrate reductase NR1 (393 AA) [Arabidopsis thaliana]

5 unknown

4 beta-l,3-glucanase 2 [Arabidopsis thaliana]

Flower buds

19 PsCLl

ribosomal preprotein (AA -49 to 96) [Pisum sativum]

12 unknown

11 similar to Prunus pcctinesterase (gb!X95991). [Arabidopsis thaliana]

11 ribosomal protein L30 [Lupinus luteus]

10 acidic ribosomal protein P3a [Zea mays]

8 anther-specific gene product; putative [Brassica campestris]17

8 putative ribosomal protein [Arabidopsis thaliana]

8 similar to lipid transfer protein [Brassica

rapa]17

7 putative ribosomal protein SI6 [Arabidopsis thaliana]

7 NAP16kDa protein [Arabidopsis thaliana]

Root

28 jasmonate inducible protein isolog [Arabidopsis thaliana]

18 peroxidase ATP1 la [Arabidopsis thaliana]

16 cucumisin [Arabidopsis thaliana]

15 cytochrome P450 monooxygenase [Arabidopsis thaliana]

14 peroxidase ATP8a [Arabidopsis thaliana]

13 putative plasma membrane-cell wall linker proteins [Arabidopsis thaliana]"5

12 flavonol synthase [Arabidopsis thaliana]

11 beta-glucosidase [Arabidopsis thaliana]

9 Dr4 [Arabidopsis thaliana]

8 ABC transporter (PDR5-like) isolog [Arabidopsis thalianai

Green siliques

29 APG protein isolog [Arabidopsis thaliana]

26 12S cruciferin seed storage protein [Arabidopsis thaliana]18

26 gamma-VPE [Arabidopsis thaliana]

24 thioesterase homolog [Arabidopsis thaliana]

24 dihydroflavonol 4-reductase [Arabidopsis thaliana]

17 putative pectinesterase [Arabidopsis thaliana]

14 germin-like protein [Arabidopsis thaliana]

13 12S storage protein CRB [Arabidopsis thaliana]18

11 unknown

10 putative protein [Arabidopsis thaliana]

References

Asamizu,

E.,

Nakamura,

Y.,

Sato,

S., and

Tabata,

2000,

Generation

of 7,137

Non-redundant Expressed

Se-

Meinke,

D. W.,

Cherry,

J. M.,

Dean,

C. D.,

Rounsley,

S.,

quence Tags from

Legume, Lotus japonicus,

DNA Res.,

and Koornneef,

M. 1998,

Arabidopsis thaliana:

model

127-130.

plant

for

genome analysis, Science,

282,

662-682.

Nikaido,

I.,

Asamizu,

E.,

Nakajima,

M.,

Nakamura,

Y.,

Lin, X.,

Kaul,

S.,

Rounsley,

S. et al. 1999,

Sequence Saga,

N., and

Tabata,

2000, Construction

of a

gene

and analysis

chromosome

2 of the

plant Arabidopsis catalogue

of a

marine

red

alga, Porphyra yezoensis.

thaliana, Nature,

402,

761-768. Generation

10,154 expressed sequence tags,

DNA Res.,

Mayer,

K.,

Schiiller,

Wambutt,

R. et al. 1999, Se-

this issue.

quence

and

analysis

chromosome

4 of the

plant

Ara- 8.

Hofte,

H.,

Desprez,

T.,

Amselem,

J. et

al. 1993,

inven-

bidopsis thaliana, Nature,

402,

769-777. tory

1152 expressed sequence tags obtained

partial

Bevan,

M. 1997,

Objective:

the

complete sequence

of a

sequencing

cDNAs from

Arabidopsis

thaliana, Plant

J.,

plant genome, Plant Cell,

476-478.

1051-1061.

Asamizu,

E.,

Nakamura,

Y.,

Sato,

S.,

Fukuzawa,

H., and 9.

Cooke,

R.,

Raynal,

M.,

Laudie,

M. et al. 1996,

Further

Tabata,

1999,

large scale structural analysis

cD- progress towards

catalogue

of all

Arabidopsis genes:

NAs

in a

unicellular green alga, Chlamydomonas rein- analysis

of a set of

5000 non-redundant ESTs, Plant

J.,

hardtii.

Generation

3433 non-redundant expressed

101-124.

sequence tags,

DNA Res., 6,

369-373.

10.

Newman,

T., de

Bruijn,

F. J.,

Green, P.

al. 1994, Genes

galore:

summary

methods

for

accessing results from

by guest on June 1, 2013http://dnaresearch.oxfordjournals.org/Downloaded from

180A Large Scale EST Project in A. thaliana [Vol. 7,

large-scale partial sequencing of anonymous Arabidopsis

cDNA clones, Plant Physioi, 106, 1241-1255.

11.

Horsch, R. B., Fry, J., Hoffman, N., Neidermeyer, J.,

Rogers, S. G., and Fraley, R. T. 1988, Plant Molecular

Biology Manual, Kluwer Academic Publishers, A5: 1-9.

12.

Bonaldo, M. F., Lennon, G., and Soares, M. B. 1996, Nor-

malization and subtraction: two approaches to facilitate

gene discovery, Genome Res., 6, 791-806.

13.

Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and

Lipman, D. J. 1990, Basic local alignment search tool, J.

Mol. Bioi, 215, 403-410.

14.

Sato, S., Nakamura, Y., Kaneko, T., Katoh, T., Asamizu,

E., and Tabata, S. 2000, Structural Analysis of

Arabidop-

sis thaliana Chromosome 3. I. Sequence Features of the

Regions of 4,504,864 bp Covered by Sixty PI and TAC

Clones, DNA Res., 7, 131-135.

15.

Sato, S., Nakamura, Y., Kaneko, T. et al. 2000,

Structural Analysis of

Arabidopsis

thaliana Chromosome

X. Sequence Features of the Regions of 3,076,755 bp

Covered by Sixty PI and TAC Clones, DNA Res., 7, 31-

63.

16.

Neuteboom, L. W., Ng, J. M., Kuyper, M., Clijdesdale,

O. R., Hooykaas, P. J., and van der Zaal, B. J. 1999, Iso-

lation and characterization of cDNA clones correspond-

ing with mRNAs that accumulate during auxin-induced

lateral root formation, Plant Mol. Bioi, 39, 273-287.

17.

Kim, H. U. and Chung, T. Y. 1997, Characterization

of three anther-specific genes isolated from Chinese cab-

bage,

Plant Mol. Bioi., 33, 193-198.

18.

Parcy, F., Valon, C, Kohara, A., Misera, S.,

and Giraudat, J. 1997, The ABSCISIC ACID-

INSENSITIVE3, FUSCA3, and LEAFY COTYLE-

D0N1 loci act in concert to control multiple aspects of

Arabidopsis seed development, Plant Cell, 9, 1265-1277.

by guest on June 1, 2013http://dnaresearch.oxfordjournals.org/Downloaded from

Arabidopsis thaliana mRNA for centromere protein C homolog, partial cds, clone:SQ114h08F

Nucleotide Sequence

December 2003

Y. Ogura

Functional Genomics of Plant Abiotic Stress Tolerance

Chapter

Apr 2003

John Cushman

A major function of the tobacco floral nectary is defense against microbial attack

Article

Full-text available

May 2003

We have characterized the major nectar protein (Nectarin I) from ornamental tobacco as a superoxide dismutase that functions to generate high levels of hydrogen peroxide in nectar. Other nectar functions include an anti-polygalacturonase activity that may be due to a polygalacturonase inhibiting protein (PGIP). We also examined the expression of defense related genes in the nectary gland by two independent methods. We isolated a sample of nectary-expressed cDNAs and found that 21% of these cDNAs were defense related clones. Finally, we examined the expression of a number of specific defense-related genes by hybridization to specific cDNAs. These results demonstrated that a number of specific defense genes were more strongly expressed in the floral nectary than in the foliage. Taken together these results indicate that the floral nectary gland can have specific functions in plant defense.

Magnetic Bead Adsorption Extraction of Xyloglucan Endoglucosidase/Hydrolase Gene and Its Expression Analysis in Land Cotton

Article

Aug 2021

The xyloglucan Endotransglucosylase/hydrolase (XTH) genes are proposed to encode enzymes responsible for cleaving and reattaching xyloglucan polymers. Despite prior identification of the XTH gene family in Arabidopsis and rice, the XTH family in upland cotton, a tetraploid plant whose fiber cell is an excellent model for the study of plant cell elongation, is yet uncharacterized. In this study, iron tetroxide based magnetic nanobead (Fe 3 O 4 NPs) was successfully prepared and applied to extract xyloglucan endoglucosidase/hydrolase genes. Analysis of the genes can provide insight into the evolutionary significance and function of the XTH gene family. A total of 41 XTH genes found by searching the phytozomev 10 database were classified into three groups based on their phylogeny and the motifs of individual genes. The 25 and 5 GhXTH genes occurred as clusters resulting from the segmental and tandem duplication. More frequent duplication events in cotton contributed to the expansion of the family. Global microarray analysis of GhXTH gene expression in cotton fibers showed that 18 GhXTH genes could be divided into two clusters and four subclusters based on their expression patterns. Accumulated expression levels were relatively high at the elongation stages of the cotton fibers, suggesting that cotton fiber elongation requires high amounts of the GhXTH protein. The expression profiles of GhXTH3 and GhXTH4 showed by quantitative realtime PCR were similar to those determined by microarray. Additionally, the expression levels of GhXTH3 and GhXTH4 in Gossypium barbadense were higher than those in Gossypium hirsutum at developmental stages, indicating that expression levels of GhXTH3 and GhXTH4 in fibers varied among cultivars differing in fiber length.

Transcriptome analysis and identification of key genes involved in 1-deoxynojirimycin biosynthesis of mulberry ( Morus alba L.)

Article

Full-text available

Aug 2018

Mulberry ( Morus alba L.) represents one of the most commonly utilized plants in traditional medicine and as a nutritional plant used worldwide. The polyhydroxylated alkaloid 1-deoxynojirimycin (DNJ) is the major bioactive compounds of mulberry in treating diabetes. However, the DNJ content in mulberry is very low. Therefore, identification of key genes involved in DNJ alkaloid biosynthesis will provide a basis for the further analysis of its biosynthetic pathway and ultimately for the realization of synthetic biological production. Here, two cDNA libraries of mulberry leaf samples with different DNJ contents were constructed. Approximately 16 Gb raw RNA-Seq data was generated and de novo assembled into 112,481 transcripts, with an average length of 766 bp and an N50 value of 1,392. Subsequently, all unigenes were annotated based on nine public databases; 11,318 transcripts were found to be significantly differentially regulated. A total of 38 unique candidate genes were identified as being involved in DNJ alkaloid biosynthesis in mulberry, and nine unique genes had significantly different expression. Three key transcripts of DNJ biosynthesis were identified and further characterized using RT-PCR; they were assigned to lysine decarboxylase and primary-amine oxidase genes. Five CYP450 transcripts and two methyltransferase transcripts were significantly associated with DNJ content. Overall, the biosynthetic pathway of DNJ alkaloid was preliminarily speculated.

Transgenic plants containing altered levels of steroid compounds

Patent

Full-text available

Mar 2011

Abstract: Disclosed are constructs comprising sequences encoding 3-hydroxy-3methylglutaryl-Coenzyme A reductase and at least one other sterol synthesis pathway enzyme. Also disclosed are methods for using such constructs to alter sterol production and content in cells, plants, seeds and storage organs of plants. Also provided are oils and compositions containing altered sterol levels produced by use of the disclosed constructs. Novel nucleotide sequences useful in the alteration of sterol production are also provided. Also provided are cells, plants, seeds and storage organs of plants comprising sequences encoding 3-hydroxy-3methylglutaryl-Coenzyme A reductase, at least one other sterol synthesis pathway enzyme and at least one tocopherol synthesis enzyme.

Genome Analysis of a Flowering Plant, Arabidopsis thaliana

Chapter

Jan 2003

Advances in DNA cloning and sequencing technologies have allowed the performance of comprehensive analysis of genetic information in various flowering plants of biological and agronomical importance. Among them, Arabidopsis thaliana, a member of the Brassica family, was chosen as a plant most suitable for genomic sequencing (Goodman et al. 1995; Meinke et al. 1998), because the estimated genome size of 125–130Mb is the smallest among known higher plants and the content of repetitive sequences was assumed to be low. Its short life cycle (average 60 days) and prodigious seed production are the characteristics which make this small plant an ideal model organism in which to analyze metabolism, development, stress responses, and disease resistance in all the flowering plants

Sequence Analysis of the Lotus japonicus Genome

Chapter

Jan 2003

One of the most epoch-making accomplishments in plant genetics in the 20th century was the completion of genome sequencing of Arabidopsis thaliana (The Arabidopsis-Genome Initiative 2000). As a consequence, an enormous amount of information on gene structures and their functions have been and are still being accumulated in this organism. Nevertheless, other plant species have their own characteristics and advantages for the study of individual biological phenomena. Further, comparison of knowledge from A. thaliana and that from other plant species is a promising approach for obtaining universal knowledge on the genetic systems in all plants.

Monitoring of 12-oxo-Phytodienoic Acid (OPDA)-Induced Expression Changes in Arabidopsis by cDNA Macroarray

Chapter

Jan 2003

Jasmonates are known as growth regulators, which have cyclopentanone or cyclopentenone ring, and synthesized through lipoxygenase pathway. Jasmonates are widely distributed in the plant kingdom and modulates wounding responses, disease responses, and anther development. Some kinds of jasmonates were shown to have specific effects on plants. 12-oxo-phytodienoic acid (OPDA), a precursor for jasmonic acid (JA) biosynthesis, promotes tendril coiling, and this effect is stronger than MeJA (Falkenstein E., et al., 1991). However, except for the fact, little is known about the OPDA specific function in various physiological events.

Profiling of bryophyte gene expression by hybridization of an Arabidopsis cDNA array with bryophyte cDNA

Article

Jan 2006
J HATTORI BOT LAB

Heterologous hybridization was carried out with an Arabidopsis macroarray and cDNAs synthesized from total RNAs of the liverwort Marchantia polymorpha and the moss Physcomitrella patens. Total RNA isolated from A. thaliana plants was also used. The macroarray contained 5,760 Arabidopsis ESTs, corresponding to 4,372 genes. Intra- and inter-filter variations showed less than 2-fold range for almost all of the spots. Genes numbering 1,647 (37.7%), 1,427 (32.6%), and 1,217 (27.8%) had hybridization signals with intensities three-fold greater than that of A DNA (negative control), and were thus defined as expressed in A. thaliana plants, M. polymorpha thalli, and P. patens protonemata, respectively. Seventy-nine percent of the genes expressed in M. polymorpha were also expressed in P. patens. Overall, the three species had 763 expressed genes in common. Twenty-five co-expressed genes were chosen, based on their high expression levels in M. polymorpha, and 17 in P. patens EST clones related to these genes were identified, each showing more than 60% identity with the corresponding A. thaliana gene at the nucleotide level. Three hundred and sixty three genes were detected in bryophytes but not in A. thaliana. Of the 25 highly expressed bryophyte-specific genes, 14 had the P. patens EST homologs with greater than 60% identity to an A. thaliana gene. These results suggest that hybridization of Arabidopsis macroarrays with heterologous cDNA is a useful tool for gene expression profiling of distantly related plant species such as bryophytes.

Isolation and Characterization of Sea Aster Salt-Stress Responsive Cysteine Protease Gene Obtained by a Hetero-probed Macroarray

Article

Full-text available

Jun 2004

To isolate and analyze salt-stress inducible genes in a halophyte, sea aster (Aster tripolium L.), we screened 5760 Arabidopsis cDNA clones by macroarray procedure using 33P-labeled cDNA targets synthesized from mRNAs isolated from NaCl treated and untreated sea aster seedlings. Seventeen Arabidopsis cDNAs were hetero - hybridized to NaCl inducible sea aster genes. These cDNAs were used as probes to isolate cDNA homologs from a sea aster cDNA library. One of the obtained cDNAs shared 71% amino acids identity with Arabidopsis cysteine protease (AtCysP) and named SaCysP (sea aster CysP). Northern blot analysis revealed that mRNAs corresponding to both SaCysP and AtCysP were induced by salt and osmotic stress in leaves. On the other hand, SASR21 mRNA encoding another CysP in sea aster was irresponsive to these stress in leaves but respond in roots. SaCysP and SASR21 genes may have a tissue-specific function in stress response by modulating their expression levels.

Basic Local Alignment Search Tool

Article

Full-text available

Oct 1990

Stephen F Altschul

Sequence and analysis of chromosome 2 of the plant Arabidopsis thaliana

Article

Full-text available

Dec 1999

Arabidopsis thaliana (Arabidopsis) is unique among plant model organisms in having a small genome (130–140 Mb), excellent physical and genetic maps, and little repetitive DNA. Here we report the sequence of chromosome 2 from the Columbia ecotype in two gap-free assemblies (contigs) of 3.6 and 16 megabases (Mb). The latter represents the longest published stretch of uninterrupted DNA sequence assembled from any organism to date. Chromosome 2 represents 15% of the genome and encodes 4,037 genes, 49% of which have no predicted function. Roughly 250 tandem gene duplications were found in addition to large-scale duplications of about 0.5 and 4.5 Mb between chromosomes 2 and 1 and between chromosomes 2 and 4, respectively. Sequencing of nearly 2 Mb within the genetically defined centromere revealed a low density of recognizable genes, and a high density and diverse range of vestigial and presumably inactive mobile elements. More unexpected is what appears to be a recent insertion of a continuous stretch of 75% of the mitochondrial genome into chromosome 2.

A Large Scale Structural Analysis of cDNAs in a Unicellular Green Alga, Chlamydomonas reinhardtii. I. Generation of 3433 Non-redundant Expressed Sequence Tags

Article

Full-text available

Jan 2000

To understand genetic information carried in a unicellular green alga, Chlamydomonas reinhardtii, normalized and size-selected cDNA libraries were constructed from cells at photoautotrophic growth, and a total of 11,571 5′-end sequence tags were established. These sequences were grouped into 3433 independent EST species. Similarity search against the public non-redundant protein database indicated that 817 groups showed significant similarity to registered sequences, of which 140 were of previously identified C. reinhardtii genes, but the remaining 2616 species were novel sequences. The coverage of full-length protein coding regions was estimated to be over 60%. These cDNA clones and EST sequence information will provide a powerful source for the future genome-wide functional analysis of uncharacterized genes.

Basic Local Aligment Search Tool

Article

Full-text available

Oct 1990

A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent mathematical results on the stochastic properties of MSP scores allow an analysis of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the analysis of multiple regions of similarity in long DNA sequences. In addition to its flexibility and tractability to mathematical analysis, BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.

Genes Calore: A Summary of Methods for Accessing Results from Large-Scale Partia1 Sequencing of Anonymous Arabidopsis cDNA Clones

Article

Full-text available

Jan 1995
PLANT PHYSIOL

High-throughput automated partial sequencing of anonymous cDNA clones provides a method to survey the repertoire of expressed genes from an organism. Comparison of the coding capacity of these expressed sequence tags (ESTs) with the sequences in the public data bases results in assignment of putative function to a significant proportion of the ESTs. Thus, the more than 13,400 plant ESTs that are currently available provide a new resource that will facilitate progress in many areas of plant biology. These opportunities are illustrated by a description of the results obtained from analysis of 1500 Arabidopsis ESTs from a cDNA library prepared from equal portions of poly(A+) mRNA from etiolated seedlings, roots, leaves, and flowering inflorescences. More than 900 different sequences were represented, 32% of which showed significant nucleotide or deduced amino acid sequences similarity to previously characterized genes or proteins from a wide range of organisms. At least 165 of the clones had significant deduced amino acid sequence homology to proteins or gene products that have not been previously characterized from higher plants. A summary of methods for accessing the information and materials generated by the Arabidopsis cDNA sequencing project is provided.

Normalization and Subtraction: Two Approaches to Facilitate Gene Discovery

Article

Full-text available

Oct 1996
GENOME RES

Large-scale sequencing of cDNAs randomly picked from libraries has proven to be a very powerful approach to discover (putatively) expressed sequences that, in turn, once mapped, may greatly expedite the process involved in the identification and cloning of human disease genes. However, the integrity of the data and the pace at which novel sequences can be identified depends to a great extent on the cDNA libraries that are used. Because altogether, in a typical cell, the mRNAs of the prevalent and intermediate frequency classes comprise as much as 50-65% of the total mRNA mass, but represent no more than 1000-2000 different mRNAs, redundant identification of mRNAs of these two frequency classes is destined to become overwhelming relatively early in any such random gene discovery programs, thus seriously compromising their cost-effectiveness. With the goal of facilitating such efforts, previously we developed a method to construct directionally cloned normalized cDNA libraries and applied it to generate infant brain (INIB) and fetal liver/spleen (INFLS) libraries, from which a total of 45,192 and 86,088 expressed sequence tags, respectively, have been derived. While improving the representation of the longest cDNAs in our libraries, we developed three additional methods to normalize cDNA libraries and generated over 35 libraries, most of which have been contributed to our integrated Molecular Analysis of Genomes and Their Expression (IMAGE) Consortium and thus distributed widely and used for sequencing and mapping. In an attempt to facilitate the process of gene discovery further, we have also developed a subtractive hybridization approach designed specifically to eliminate (or reduce significantly the representation of) large pools of arrayed and (mostly) sequenced clones from normalized libraries yet to be (or just partly) surveyed. Here we present a detailed description and a comparative analysis of four methods that we developed and used to generate normalize cDNA libraries from human (15), mouse (3), rat (2), as well as the parasite Schistosoma mansoni (1). In addition, we describe the construction and preliminary characterization of a subtracted liver/spleen library (INFLS-SI) that resulted from the elimination (or reduction of representation) of -5000 INFLS-IMAGE clones from the INFLS library.

Objective: The Complete Sequence of a Plant Genome

Article

Apr 1997

An inventory of 1152 expressed sequence tags Obtained by partial sequencing of CDNAs from Arabidopsis thaliana

Article

Dec 1993
PLANT J

As part of the goal to generate a detailed transcript map for Arabidopsis thaliana, 1152 single run sequences (expressed sequence tags or ESTs) have been determined from cDNA clones taken at random in libraries prepared from different sources of plant material: developing siliques, etiolated seedlings, flower buds, and cultured cells. Eight hundred and ninety-five different genes could be identified, 32% of which showed significant similarity to existing sequences in Arabidopsis and an array of other organisms. These sequences in combination with their positioning on the Arabidopsis genetic map will not only constitute a new set of molecular markers for genome analysis in Arabidopsis but also provide a direct route for the in vivo analysis of their gene products. The sequences have been made available to the public databases.

Further progress towards a catalogue of all Arabidopsis genes: Analysis of a set of 5000 non-redundant ESTs

Article

Jan 1996

Nearly 7000 Arabidopsis thaliana-expressed sequence tags (ESTs) from 10 cDNA libraries have been sequenced, of which almost 5000 non-redundant tags have been submitted to the EMBL data bank. The quality of the cDNA libraries used is analysed. Similarity searches in international protein data banks have allowed the detection of significant similarities to a wide range of proteins from many organisms. Alignment with ESTs from the rice systematic sequencing project has allowed the detection of amino acid motifs which are conserved between the two organisms, thus identifying tags to genes encoding highly conserved proteins. These genes are candidates for a common framework in genome mapping projects in different plants.

Characterization of three anther-specific genes isolated from Chinese cabbage

Article

Feb 1997

Two cDNA libraries were constructed from poly(A)+ RNAs isolated from each of immature flowers (less than 2.0 mm long buds) and anthers (2.0-5.0 mm long buds) of Chinese cabbage (Brassica campestris L. ssp. pekinensis). Using dot-differential hybridization, three cDNA clones, designated BIF38, BAN54, and BAN237, have been isolated from the constructed cDNA libraries and sequenced completely in both directions. Northern blot analyses indicate that all three cDNA clones are abundantly expressed in anther, but not in leaf or other floral organs. The deduced amino acid sequences of BIF38, BAN54, and BAN237 showed high identity with those of known anther-specific genes. Especially the deduced amino acid sequence of BIF38 has 98% identity with that of a phospholipid protein gene (E2) from Brassica napus. Also, the deduced amino acid sequences of BAN54 and BAN237 are similar to the sequences of microspore-specific genes (Bp4A and Bp4C) and pollen oleosins (13, pol3 and C98), respectively. Southern blot analyses revealed that all three genes belong to multiple gene families in the Chinese cabbage genome.

A Large Scale Analysis of cDNA in Arabidopsis thaliana: Generation of 12,028 Non-redundant Expressed Sequence Tags from Normalized and Size-selected cDNA Libraries

Abstract and Figures

Supplementary resource (1)

Recommended publications

Method Enabling Fast Partial Sequencing of cDNA Clones

Expressed Sequence Tags from the Closterium peracerosum-strigosum-littorale complex, a Unicellular C...

Minilibraries constructed from cDNA generated by arbitrarily primed RT-PCR: An alternative to normal...

Characterization of expressed sequence tags generated from skin CDNA clones of Equus Caballus by sin...