ArticlePDF Available

An expression atlas of rice mRNAs and small RNAs

May 2007
Nature Biotechnology 25(4):473-7

May 2007
25(4):473-7

DOI:10.1038/nbt1291

Source
PubMed

Authors:

Venu Rc

RiceTec, Inc.

André Beló

GDM Seeds

Show all 11 authorsHide

Identification of all expressed transcripts in a sequenced genome is essential both for genome analysis and for realization of the goals of systems biology. We used the transcriptional profiling technology called 'massively parallel signature sequencing' to develop a comprehensive expression atlas of rice (Oryza sativa cv Nipponbare). We sequenced 46,971,553 mRNA transcripts from 22 libraries, and 2,953,855 small RNAs from 3 libraries. The data demonstrate widespread transcription throughout the genome, including sense expression of at least 25,500 annotated genes and antisense expression of nearly 9,000 annotated genes. An additional set of approximately 15,000 mRNA signatures mapped to unannotated genomic regions. The majority of the small RNA data represented lower abundance short interfering RNAs that match repetitive sequences, intergenic regions and genes. Among these, numerous clusters of highly regulated small RNAs were readily observed. We developed a genome browser (http://mpss.udel.edu/rice) for public access to the transcriptional profiling data for this important crop.

Deep sequencing of rice small RNAs by MPSS.(a) Match frequencies of small RNAs in rice versus those of A. thaliana. We determined the number of genomic matches ('hits') to their respective genomes for 149,978 and 56,920 distinct small RNA MPSS signatures from rice and A. thaliana inflorescence libraries. The A. thaliana data have been described elsewhere7. (b) The number of small RNAs matching individual protein-coding genes, transposable elements and intergenic regions. (c) Distributions of small RNAs, mRNA expression and genes or repeats across rice chromosome 8, plotted as moving averages of five adjacent bins of 100 kb. The light-green, pink, and light-blue lines (top) are small RNAs in inflorescence, stem and young leaf, respectively. mRNA levels in the same tissues are shown in green, red and blue (middle). Black and gray lines (bottom) are densities of genes and repeats. Blue vertical shading indicates the approximate position of the centromere. (d) Abundance values for small RNA signatures matching to the hairpin of the 171 genome-mapped, known rice miRNAs. The y-axis indicates the number of distinct signatures in each abundance class. (e) Small RNAs map to the terminal inverted repeats (TIRs, black arrowheads) of pack-MULE elements described elsewhere14, 15. Yellow shading indicates transposon-like sequences; orange indicates inverted repeats; black triangles are small RNAs; red and blue boxes are annotated exons. Additional features are as described in Supplementary online.

…

Figures - uploaded by Manoj Pillay

Content may be subject to copyright.

Content uploaded by Manoj Pillay

Content may be subject to copyright.

An expression atlas of rice mRNAs and small RNAs

Kan Nobuta

1,2

, R C Venu

, Cheng Lu

1,2

, Andre

, Kalyan Vemaraju

, Karthik Kulkarni

, Wenzhong Wang

Manoj Pillay

, Pamela J Green

1–3

, Guo-liang Wang

& Blake C Meyers

1,2

Identiﬁcation of all expressed transcripts in a sequenced

genome is essential both for genome analysis and for

realization of the goals of systems biology. We used the

transcriptional proﬁling technology called ‘massively parallel

signature sequencing’ to develop a comprehensive expression

atlas of rice (Oryza sativa cv Nipponbare). We sequenced

46,971,553 mRNA transcripts from 22 libraries, and

2,953,855 small RNAs from 3 libraries. The data demonstrate

widespread transcription throughout the genome, including

sense expression of at least 25,500 annotated genes and

antisense expression of nearly 9,000 annotated genes. An

additional set of B15,000 mRNA signatures mapped to

unannotated genomic regions. The majority of the small RNA

data represented lower abundance short interfering RNAs that

match repetitive sequences, intergenic regions and genes.

Among these, numerous clusters of highly regulated small

RNAs were readily observed. We developed a genome

browser (http://mpss.udel.edu/rice) for public access to

the transcriptional proﬁling data for this important crop.

Because of its scientiﬁc, economic and cultural importance, the

sequencing of the rice (Oryza sativa ssp. japonica cv Nipponbare)

genome

represents a milestone in plant biology. The recent annota-

tion of the rice genome (The Institute for Genomic Research, TIGR

version 4.0) includes 55,890 features that represent 42,653 predicted

protein-coding genes and 13,237 transposable elements

.Experimen-

tal evidence from full-length cDNA and expressed sequence tags

(ESTs) is critical for genome analysis

, yet these data are incomplete,

in that they are subsaturating and miss nonpolyadenylated transcripts

such as small RNAs. Small RNAs of 21–24 nucleotides are well

characterized in Arabidopsis thaliana, but little is known about the

diversity of these molecules in other plant species. Several categories

are known, including short interfering RNAs (siRNAs) and micro-

RNAs (miRNAs), both of which silence genes by targeting comple-

mentary mRNAs for degradation

. siRNAs can also trigger

transcriptional silencing by guiding nuclear complexes that target

either histone modiﬁcations or DNA methylation or both

.Small

RNAs are best discovered and measured by deep sequencing

approaches that have high sensitivity and speciﬁcity.

One advantage of using a signature-based expression proﬁling

method such as massively parallel signature sequencing (MPSS) to

improve genome annotation is its potential to characterize previously

unknown transcripts. Transcriptional analyses of the A. thaliana

genome with MPSS have identiﬁed extensive alternative polyadenyla-

tion, as well as large numbers of natural antisense transcripts, novel

transcripts from unannotated genomic regions, noncoding RNAs and

small RNAs

6,7

. Sequence-based data have very high speciﬁcity and

accuracy for assessing gene activity, because they are not subject to

cross-hybridization. In addition to enhancing genome annotation,

MPSS data provide quantitative expression information

To clarify the complexity of polyadenylated transcripts in rice, we

sequenced 22 mRNA libraries using MPSS (Ta ble 1, Supplementary

Tabl e 1 and Supplementary Data online). These libraries, from

12 diverse untreated tissues (with some replicates) and six abiotic

stress treatments, included 46,971,553 transcripts. The nonredundant

set comprised 249,990 distinct sequences. The expression values of

these signatures ranged over four orders of magnitude, with the

majority found in the range of 1 to 100 transcripts per million

(TPM) (Supplementary Table 1 online). Filtering to capture the

most ‘reliable’ signatures removed most erroneous sequences

, leaving

a set of 46,251,966 signatures (Tabl e 1 and Supplementary Fig. 1

online); these comprised 121,581 distinct signatures. This represents

the deepest reported set of plant transcriptional data. This sampling

depth is high enough that new transcripts are infrequently

discovered (Fig. 1a).

Signatures matching only once in the genome (hits ¼ 1) provide a

conservative assessment of transcript diversity and active genes,

whereas including duplicated signatures (hits 40) provides an

upper boundary (Supplementary Table 2 online). The majority of

the signatures (B75%) in the libraries matched sense-strand tran-

scripts of nearly half of the annotated rice genes (20,821 of 42,653,

Table 1 Summary statistics for rice MPSS libraries

Category mRNA total Small RNA total

Libraries 22 3

Signatures sequenced

46,971,553 2,953,855

Distinct signatures

249,990 284,301

Distinct genome-matched signatures

81,961 221,592

Abundance of genome-matched signatures

46,251,966 1,948,368

All sequencing reactions combined for each type of library.

The nonredundant set of either

the complete set of MPSS signatures or those that match to the genome.

For the genome-

matched mRNA signatures, only those that passed the reliability ﬁlter are included.

The sum

of the observed frequency for all distinct, genome-matched signatures.

Received 26 October 2006; accepted 25 January 2007; published online 11 March 2007; doi:10.1038/nbt1291

Delaware Biotechnology Institute,

Department of Plant and Soil Sciences,

College of Marine and Earth Studies, University of Delaware, Newark, Delaware 19711.

Department of Plant Pathology, The Ohio State University, Columbus, OH 43210. Correspondence should be addressed to B.C.M. (meyers@dbi.udel.edu).

NATURE BI OTECHNOLOGY ADVANCE ONLINE PUBLICATION 1

LETTERS

Fig. 1b and Supplementary Table 2a online). This number is

comparable to the number of gene annotations supported by EST or

full-length cDNA data (21,328) (Fig. 1b). Of genes with MPSS

support, 84% are supported by EST or full-length cDNA data, and

only 18% of those lacking MPSS support have ESTs or full-length

cDNAs (Fig. 1b). Because the ESTs or full-length cDNAs are derived

from 460 libraries (versus our 22 libraries), we concluded that

substantial sampling depth may substitute for tissue or treatment

diversity. Among the 20,821 genes with MPSS support, 87% had high

homology to A. thaliana genes, whereas only 39% of the genes without

MPSS support fall into this category (Fig. 1b). Because numerous

MPSS-supported low-homology genes (1,873) match ESTs or full-

length cDNAs, many validated rice genes apparently lack orthologs in

A. thaliana.InA. thaliana, 17 MPSS libraries conﬁrmed expression of

21,193 genes. By comparison to our rice data, these data suggest that

the two plant genomes have similar transcriptional complexities,

although the respective genome size and gene number of rice are

3 times and 1.5 times those of A. thaliana.

We expected that similar tissues and treatments would share sets of

expressed genes. Hierarchical clustering of the libraries was performed

using the 20,821 genes with MPSS support, and the pollen library was

an outgroup compared to all other libraries (Supplementary Fig. 2

online). Only one-third of the pollen-speciﬁc genes had A. thaliana

orthologs. The unique transcriptional pattern of the pollen library

may result from its haploid nature; the number of library-speciﬁc

genes was the highest in pollen (392), including an unusually large

number of transposable elements (Supplementary Fig. 3 online).

To examine the association between chromosomal architecture and

transcriptional activity, we compared gene density and expression

levels for each chromosome (Supplementary Fig. 4 online). Expres-

sion activity was observed around the gene-dense regions, and the

MPSS data are consistent with studies of centromeres 3, 4 and 8

indicating that genes in these regions are active

8–10

. Some rice centro-

meres, like Cen3, may be evolving from genic regions to repeat-based

mature centromeres

Genic positions of MPSS signatures identify transcripts resulting

from alternative polyadenylation and 3¢-splicing, as well as those

which are antisense transcripts. Among the 20,821 expressed rice

genes, more than half (11,941) had multiple sense signatures that

represent alternative transcripts. These transcripts can show marked

differences in their expression levels (Fig. 1c). This type of analysis

provides a novel view of complex gene expression events that may be

important in better understanding gene function. We also identiﬁed

11,001 antisense signatures corresponding to 8,023 annotated genes

(Supplementary Table 2a online). Some natural antisense transcripts

are coexpressed and induce alternative splicing, some induce dsRNA

cleavage and show reciprocal expression patterns

, and some even

generate regulatory small RNAs

. MPSS analysis identiﬁed natural

antisense transcripts for many rice genes, often with highly speciﬁc

expression patterns. A comparison of sense and antisense expression

levels demonstrates the difﬁculty of generalizing the function

of natural antisense transcripts (Supplementary Fig. 5 online),

which may require gene-by-gene analyses to discern their complex

regulatory mechanism.

The 22 libraries included 13,461 intergenic signatures (Supplemen-

tary Table 2a online) that could be (i) misannotated 3¢-untranslated

regions of known genes, (ii) novel genes or (iii) noncoding RNAs

including miRNAs. Sixty-three intergenic mRNA signatures matched

30 known miRNA genes (precursors) and 68 noncoding RNA genes

from the registry

(Supplementary Table 3 online). Consistent

with previous reports

, many noncoding RNAs were expressed in

the stigma-ovary library (Supplementary Table 3b online). Like

Figure 1 Genome-wide transcriptional analysis by

mRNA MPSS. (a) Discovery rates for new rice

transcripts based on the 46,971,553 mRNA

MPSS signatures, resampled 1,000 times with

replacement. Each signature was weighted by

abundance. The gold line indicates 249,990

distinct, unﬁltered signatures, whereas the green

line indicates 121,581 reliable signatures. Each

slope was calculated at the endpoint, indicating

the ﬁnal discovery rate for new transcripts after

sampling 50 million transcripts. Sampling of

additional tissues or treatments may increase this

rate. (b) mRNA MPSS signatures representing

sense-strand transcripts (classes 1, 2, 5 and 7),

compared to annotated rice genes and excluding

transposons and small genes (o50 amino acids).

Most genes with support by MPSS have

additional support from full-length cDNA data

and high similarity to A. thaliana genes. Total

gene numbers are indicated in the center (white

sections), and genes without cDNA support are

indicated by the blue sections labeled ‘‘others.’’

Numbers in parentheses are potential pack-

MULEs found in each category. HH and LH, high

and low homology to A. thaliana, respectively. Using signatures with hits 4 1 slightly increases MPSS supported genes (Supplementary Fig. 14 online).

(c) Each row in the heatmap represents a different gene preselected for evidence of a substantial number of alternative transcripts. Yellow and blue are

upregulated and downregulated transcripts in salt-treated (NSL) versus untreated young leaf (NYL), respectively. Green represents no difference in

expression. The ‘‘All’’ column is the NSL/NYL ratio (the expression ratio of all salt-treated to all young leaf transcripts), without considering alternatively

terminated transcripts. Column 1 is the expression level of the 3¢-most MPSS signature, the longest transcript, whereas column 5 is the shortest

transcript, and 5¢-most MPSS signature. Example genes are shown to the right, with the different transcripts and expression in the two libraries

indicated. Red and pink boxes represent coding regions and untranslated regions, respectively, and the black triangles indicate the ﬁve MPSS signatures

that correspond to the genes.

300,000

200,000

100,000

0 1020304050

Slope = 487 / 1,000,000

Slope = 34 / 1,000,000

Number of distinct signatures

All signatures

Reliable signatures

Total number of signatures sampled

(in millions)

39%

61%

Others

17,886

(1,358)

FL-cDNA

EST

3,946

(0)

21,832

genes

(1,358)

20,821

genes

(78)

FL-cDNA

EST

17,382

(19)

Others

3,439

(59)

Absent by MPSS Present by MPSS

87%

13%

All54321

Os04g32650

NYL

TPM

NSL

063

TPM

Os09g35990

NYL

NSL

0000

0 29 0 0 TPM

29 TPM

1 234

43215

Os10g37190

1 2

4321

NYL 0 0 500 TPM0

NSL 0

0 0 0 TPM13

Os03g18910

4321

NYL 0

38 0

TPM0

NSL 0

TPM

2 ADVANCE ONLINE PUBLICATION NATURE BIOTECHNOLOGY

LETTERS

‘housekeeping’ genes, many intergenic signatures were expressed in all

22 libraries and were supported by EST or full-length cDNA data.

Clearly there are a substantial number of unidentiﬁed genes in

rice. Some of these are expressed across a range of tissues and

developmental stages, whereas others may encode small peptides or

generate uncharacterized, regulatory small RNAs such as miRNAs

or siRNAs. However, intergenic transcripts were less broadly

expressed, with each intergenic transcript expressed in an average of

6.65 libraries versus 14.42 for gene-associated signatures, and

much more weakly expressed than genic transcripts (Supplementary

Fig. 4 online). This low expression contrasts with the prevailing

abundance of sense signatures and suggests that these intergenic

transcripts may have distinct characteristics unlike those of normal,

protein-coding genes.

To investigate the complexity of small RNAs in rice, we next

generated MPSS small RNA libraries of rice inﬂorescence, stem, and

seedling leaf. Of the 284,301 distinct small RNAs, 78% matched the

rice genome, representing 1,948,368 of 2,953,855 total sequences

(Ta bl e 1). Like A. thaliana

, the inﬂorescence library was proportion-

ally more complex, perhaps reﬂecting stronger germline silencing

(Supplementary Table 4 online). More than half of the distinct

signatures matched unique sites in the genome, but there were

many more highly duplicated signatures in the rice genome than

were observed in A. thaliana (Fig. 2a). Unlike small RNAs from

A. thaliana

, those from rice are not as concentrated across the

pericentromeric regions and are instead more widely distributed on

the chromosomes (Supplementary Fig. 6 online). The two completely

sequenced rice centromeres are characterized by small RNAs corre-

sponding to the centromeric repeats (Supplementary Fig. 7 online).

Because the distributions of A. thaliana small RNAs are strongly

correlated with those of repetitive sequences, we examined the

relationship between rice small RNAs, genes and repeats. Rice small

RNAs were strongly associated with intergenic regions (Supplemen-

tary Fig. 8 online), and some chromosomes demonstrated a pericen-

tromeric concentration when we examined the small RNA abundance

in addition to the distribution (Fig. 2 and Supplementary Fig. 9

online). Most chromosomal regions in rice show a range of patterns of

small RNAs, consistent with a dispersed arrangement of genes,

transposons and miniature inverted repeat transposable elements

(Supplementary Fig. 10 online). Notably, abrupt transitions between

concentrated siRNA clusters and active genes were often observed

(Supplementary Fig. 11 online), suggesting a localized shift from

silenced to active chromatin within a very short physical distance. This

type of abrupt transition is supported by methylation proﬁling studies

that can identify heterochromatin via biochemical (rather than

cytological) means

14,15

. These studies have indicated a strong coloca-

lization of siRNA clusters and DNA methylation, even within cytolo-

gically deﬁned euchromatin. Many rice small RNAs were derived from

transposons or retrotransposons, but many also matched unannotated

intergenic regions (Supplementary Table 5 online). At least 12,766 of

the 13,237 annotated retrotransposon or transposon-related sequences

in the rice genome had matches to small RNAs.

5′

160

140

120

100

1–25 TPQ

26–100 TPQ

>100 TPQ

Number of signatures matching to

annotated hairpin of the pre-miRNA

FLR SNU STM

Libraries

TIR

Os01g20930

TIR

Os02g49580

TIR

′

Os10g37080

TIR

5′

2,000

70,000

2,000

10 Mb 20 Mb 30 Mb

Small RNA

(TPQ)

mRNA

(TPM)

Genes and

repeats (bp)

Chromosome 8

100,000

10,000

1,000

100

Number of distinct signatures

Rice

Arabidopsis

2–10

11–100

101–1,000

>1,000

Number of hits to genome

10,000

1,000

100

Count of gene, TEs or IGRs

10 100

Number of distinct signatures

Protein-coding gene

Transposable

element

Intergenic region

Figure 2 Deep sequencing of rice small RNAs by MPSS. (a) Match frequencies of small RNAs in rice versus those of A. thaliana. We determined the

number of genomic matches (‘hits’) to their respective genomes for 149,978 and 56,920 distinct small RNA MPSS signatures from rice and A. thaliana

inﬂorescence libraries. The A. thaliana data have been described elsewhere

.(b) The number of small RNAs matching individual protein-coding genes,

transposable elements and intergenic regions. (c) Distributions of small RNAs, mRNA expression and genes or repeats across rice chromosome 8, plotted

as moving averages of ﬁve adjacent bins of 100 kb. The light-green, pink, and light-blue lines (top) are small RNAs in inﬂorescence, stem and young leaf,

respectively. mRNA levels in the same tissues are shown in green, red and blue (middle). Black and gray lines (bottom) are densities of genes and repeats.

Blue vertical shading indicates the approximate position of the centromere. (d) Abundance values for small RNA signatures matching to the hairpin of the

171 genome-mapped, known rice miRNAs. The y-axis indicates the number of distinct signatures in each abundance class. (e) Small RNAs map to the

terminal inverted repeats (TIRs, black arrowheads) of pack-MULE elements described elsewhere

14,15

. Yellow shading indicates transposon-like sequences;

orange indicates inverted repeats; black triangles are small RNAs; red and blue boxes are annotated exons. Additional features are as described in

Supplementary Figure 7 online.

NATURE BI OTECHNOLOGY ADVANCE ON LINE PUBLICATION 3

LETTERS

As we have demonstrated elsewhere

6,7

, repetitive sources of siRNAs

produce numerous distributed small RNAs, whereas miRNAs pro-

duced small focused clusters of speciﬁc sequences. Rice genes matched

distinct small RNAs at a rate at least as high as repeats and intergenic

regions (Fig. 2b and Supplementary Fig. 8 online), often resulting

from miniature inverted repeat transposable elements or other small

repeats embedded within an intron. As in A. thaliana, tandem repeats

and inverted repeats are rich sources of small RNAs (Supplementary

Fig. 12 online). From all three libraries, there were many clusters of

small RNAs, particularly in intergenic, unannotated regions of the

rice genome (Supplementary Table 6 online). This indicates that the

rice genome is much richer in silenced sequences compared with

A. thaliana, consistent with the higher degree of repetitive DNA.

Because small RNAs may also interact with imperfectly matched

targets, their biological effect may be far more substantial than we

have indicated.

The miRNA registry includes 182 rice miRNAs

, of which 171 were

mapped in the genome and 130 were expressed, accounting for 8.8%,

36.6% and 12.6% of the inﬂorescence, seedling and stem small RNAs.

These percentages are much lower than those for A. thaliana,whichis

consistent with a substantial abundance of repeat-associated siRNAs in

the more complex rice genome

6,7

. The lack of small RNA biogenesis

mutants in rice hinders our ability to distinguish siRNAs and

miRNAs. However, miRNAs are abundant, consistently expressed

and conserved, and we identiﬁed numerous conserved and consis-

tently expressed small RNAs (Supplementary Fig. 13 online), suggest-

ing that these data include many novel miRNAs.

With the three libraries, we examined the developmental regulation

of rice small RNAs. The chromosomal distributions of small RNAs

suggested a high degree of similarity between young leaf and the stem

small RNAs (Fig. 2c and Supplementary Fig. 9 online), both of which

differed in comparison to the inﬂorescence data. However, the stem

library produced the greatest number of small RNA clusters or genes

that were substantially much more abundant in only one of the three

libraries (Fig. 2d and Supplementary Table 7 online), suggesting an

unusual degree of small RNA regulation in the stem. This is the

ﬁrst report of small RNAs from plant stems, and it is possible that

some of these small RNAs are involved in signal transmission between

leaves and inﬂorescences

. In contrast to A. thaliana

6,7

, many small

RNA clusters were substantially reduced in the inﬂorescence as

compared with amounts in seedlings and the stem (Supplementary

Tabl e 7 online).

The rice genome contains many gene fragments (‘pack-MULEs’)

generated by transposons

. Our mRNA MPSS data identiﬁed 17,886

genes without expression data (Fig. 1b), 490% of which have no

known function, suggesting that many of these are inactive or are

pseudogenes. We compared the mRNA and small RNA MPSS data to

predicted rice pack-MULEs

, using 8,271 previously identiﬁed

MULEs

. This identiﬁed 1,358 potential pack-MULEs among the

17,886 unexpressed genes. These elements represent just one of several

classes of gene-shufﬂing transposable elements

19–21

, so other inactive

genes are likely to be transposed fragments. Small RNAs matched to

many of the terminal inverted repeats of these pack-MULEs but

infrequently to the internal gene fragments (Fig. 2e). The combined

mRNA and small RNA datasets may offer an experimentally based

system for pack-MULE identiﬁcation in rice and other plant genomes.

Taking into account annotated, expressed genes as well as small

RNAs, a very high proportion of the rice genome is actively tran-

scribed. Although there are thousands of annotated genes lacking both

mRNA and small RNA expression data, detection of their expression

may require sampling of highly specialized tissues, cell types or

treatments. The rice small RNA data indicate an extensive and

complex repertoire of such molecules. This vastly exceeds that of

A. thaliana, consistent with increased genome sizes correlating with an

increased complexity of small RNAs. This suggests that larger plant

genomes, such as those of most crops, will require deeper sequencing.

Additional complexity may be found in analyses of nonpolyadenylated

transcripts

. A comprehensive understanding of the network of gene

expression events in rice or other crops will require concerted efforts

such as ours to characterize the activities and functions of a compre-

hensive catalog of genomic components.

METHODS

High- and low-homology rice genes. The high- and low-homology rice genes

were identiﬁed according to their similarity to A. thaliana genes using a

threshold of a BLASTP e-value o1.0e-7.

MULE analysis. The rice MULE data have been previously described

and

were downloaded from http://www.genome.org/. Because the International

Rice Genome Sequencing Project annotation version 2.0 was used for their

analyses, we remapped all the MULEs onto TIGR4.0. After mapping these

sequences, potential pack-MULEs (genes ﬂanked on both sides by MULEs)

with or without MPSS signatures were identiﬁed. In addition, we also used

these sequences to identify pack-MULEs associated with the intergenic MPSS

signatures, because these could be pack-MULEs, which correspond to the

unannotated transcripts.

Analysis of alternative termination. As an example of the differential expres-

sion of alternative transcripts, we focused on two libraries (NYL and NSL) to

examine the effect of salt stress. The genes with alternative termination sites

were identiﬁed from the MPSS data (multiple sense-strand MPSS signatures

associated with a single gene), and from this set, those genes were selected that

had at least two sense signatures demonstrating tenfold higher levels of

expression in one library than the other. For each gene and for each library,

the expression levels and the sum of the expression levels were recorded for ﬁve

MPSS signatures located at 3¢ end of each gene. The expression level of the NSL

signatures was divided by that of the corresponding NYL signatures, with the

resulting values log-transformed and loaded into R to generate a heatmap.

Additional methods. Detailed methods are available in Supplementary

Methods online.

Accession codes. Gene Expression Omnibus (GEO): series identiﬁer GSE7107,

platform identiﬁers GPL3777 and GPL3776 for mRNA and small RNA samples,

respectively; sample identiﬁers, GSM169562, GSM169564, GSM169566,

GSM169567, GSM169568, GSM169569, GSM169570, GSM170900,

GSM170901, GSM170902, GSM170903, GSM170904, GSM170905,

GSM170906, GSM170907, GSM170909, GSM170912, GSM170912,

GSM170914, GSM170917, GSM170919 and GSM170921. The raw and normal-

ized MPSS data are also available at http://mpss.udel.edu/rice and this website

allows users to query these data based on physical location, gene identiﬁers or

by sequence.

Note: Supplementary information is available on the Nature Biotechnology website.

ACKNOWLEDGMENTS

We are grateful to C. Haudenschild, TIGR’s rice annotation project, S. Singh Tej,

M. Nakano, R. German, A. Hetawal, R. Gupta and S. Kaushik. This work was

supported by US National Science Foundation awards 0321437 (B.C.M. and

G.-l.W.) and 0439186 (P.J.G. and B.C.M.), and US Department of Agriculture

2005-35064-15326 (B.C.M. and P.J.G.).

AUTHOR CONTRIBUTIONS

K.N. performed research, analyzed data and wrote the manuscript, R.C.V. and

C.L. performed laboratory research and provided useful discussions, A.B., K.V.,

K.K., W.W. and M.P. performed computational research; P.J.G. and G.-l.W.

designed research and wrote manuscript; B.C.M. designed research, analyzed

data, and coordinated and wrote the manuscript.

4 ADVANCE ONLINE PU BLICATION NATURE BIOTECHNOLOGY

LETTERS

COMPETING INTERESTS STATEMENT

The authors declare no competing ﬁnancial interests.

Published online at http://www.nature.com/naturebiotechnology

Reprints and permissions information is available online at http://npg.nature.com/

reprintsandpermissions

1. International Rice Genome Sequencing Project. The map-based sequence of the rice

genome Nature 436, 793–800 (2005).

2. Yuan, Q. et al. The Institute for Genomic Research Osa1 rice genome annotation

database. Plant Physiol. 138, 18–26 (2005).

3. Kikuchi, S. et al. Collection, mapping, and annotation of over 28,000 cDNA clones

from japonica rice. Science 301, 376–379 (2003).

4. Bartel, D.P. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116,

281–297 (2004).

5. Verdel, A. et al. RNAi-mediated targeting of heterochromatin by the RITS complex.

Science 303, 672–676 (2004).

6. Meyers, B.C. et al. Analysis of the transcriptional complexity of Arabidopsis thaliana by

massively parallel signature sequencing. Nat. Biotechnol. 22, 1006–1011 (2004).

7. Lu, C. et al. Elucidation of the small RNA component of the transcriptome. Science

309, 1567–1569 (2005).

8. Cheng, Z. et al. Functional rice centromeres are marked by a satellite repeat and a

centromere-speciﬁc retrotransposon. Plant Cell 14, 1691–1704 (2002).

9. Yan, H. et al. Genomic and genetic characterization of rice cen3 reveals extensive

transcription and evolutionary implications of complex centromere. Plant Cell 18,

3227–3238 (2006).

10. Yan, H. et al. Transcription and histone modiﬁcations in the recombination-free region

spanning a rice centromere. Plant Cell 17, 3227–3238 (2005).

11. Jen, C.H., Michalopoulos, I., Westhead, D. & Meyer, P. Natural antisense transcripts

with coding capacity in Arabidopsis may have a regulatory role that is not linked to

double-stranded RNA degradation. Genome Biol. 6, R51 (2005).

12. Borsani, O., Zhu, J., Verslues, P.E., Sunkar, R. & Zhu, J.K. Endogenous siRNAs derived

from a pair of natural cis-antisense transcripts regulate salt tolerance in Arabidopsis.

Cell 123, 1279–1291 (2005).

13. Grifﬁths-Jones, S. The microRNA Registry. Nucleic Acids Res. 32, D109–D111

(2004).

14. Lippman, Z. et al. Role of transposable elements in heterochromatin and epigenetic

control. Nature 430, 471–476 (2004).

15. Zhang, X. et al. Genome-wide high-resolution mapping and functional analysis of DNA

methylation in Arabidopsis. Cell 126, 1189–1201 (2006).

16. Yoo, B.-C. et al. A systemic small RNA signaling system in plants. Plant Cell 16,

1979–2000 (2004).

17. Jiang, N., Bao, Z., Zhang, X., Eddy, S.R. & Wessler, S.R. Pack-MULE transposable

elements mediate gene evolution in plants. Nature 431, 569–573 (2004).

18. Juretic, N., Hoen, D.R., Huynh, M.L., Harrison, P.M. & Bureau, T.E. The evolutionary

fate of MULE-mediated duplications of host gene fragments in rice. Genome Res. 15,

1292–1297 (2005).

19. Morgante, M. et al. Gene duplication and exon shufﬂing by helitron-like trans-

posons generate intraspecies diversity in maize. Nat. Genet. 37,997–1002

(2005).

20. Britten, R. Transposable elements have contributed to thousands of human proteins.

Proc. Natl. Acad. Sci. USA 103, 1798–1803 (2006).

21. Lipatov, M., Lenkov, K., Petrov, D. & Bergman, C. Paucity of chimeric gene-transpo-

sable element transcripts in the Drosophila melanogaster genome. BMC Biol. 3,24

(2005).

22. Cheng, J. et al. Transcriptional maps of 10 human chromosomes at 5-nucleotide

resolution. Science 308, 1149–1154 (2005).

NATURE BI OTECHNOLOGY ADVANCE ON LINE PUBLICATION 5

LETTERS

Systems seed biology to understand and manipulate rice grain quality and nutrition

Article

Jun 2022

Rice is one of the most essential crops since it meets the calorific needs of 3 billion people around the world. Rice seed development initiates upon fertilization, leading to the establishment of two distinct filial tissues, the endosperm and embryo, which accumulate distinct seed storage products, such as starch, storage proteins, and lipids. A range of systems biology tools deployed in dissecting the spatiotemporal dynamics of transcriptome data, methylation, and small RNA based regulation operative during seed development, influencing the accumulation of storage products was reviewed. Studies of other model systems are also considered due to the limited information on the rice transcriptome. This review highlights key genes identified through a holistic view of systems biology targeted to modify biochemical composition and influence rice grain quality and nutritional value with the target of improving rice as a functional food.

Transcriptome atlas of Phalaenopsis equestris

Article

Full-text available

Dec 2021

The vast diversity of Orchidaceae together with sophisticated adaptations to pollinators and other unique features make this family an attractive model for evolutionary and functional studies. The sequenced genome of Phalaenopsis equestris facilitates Orchidaceae research. Here, we present an RNA-seq-based transcriptome map of P. equestris that covers 19 organs of the plant, including leaves, roots, floral organs and the shoot apical meristem. We demonstrated the high quality of the data and showed the similarity of the P. equestris transcriptome map with the gene expression atlases of other plants. The transcriptome map can be easily accessed through our database Transcriptome Variation Analysis (TraVA) for visualizing gene expression profiles. As an example of the application, we analyzed the expression of Phalaenopsis “orphan” genes–those that do not have recognizable similarity with the genes of other plants. We found that approximately half of these genes were not expressed; the ones that were expressed were predominantly expressed in reproductive structures.

Comprehensive Mechanism of Gene Silencing and Its Role in Plant Growth and Development

Article

Full-text available

Sep 2021

Gene silencing is a negative feedback mechanism that regulates gene expression to define cell fate and also regulates metabolism and gene expression throughout the life of an organism. In plants, gene silencing occurs via transcriptional gene silencing (TGS) and post-transcriptional gene silencing (PTGS). TGS obscures transcription via the methylation of 5′ untranslated region (5′UTR), whereas PTGS causes the methylation of a coding region to result in transcript degradation. In this review, we summarized the history and molecular mechanisms of gene silencing and underlined its specific role in plant growth and crop production.

Genomics and breeding innovations for enhancing genetic gain for climate resilience and nutrition traits

Article

Full-text available

Jun 2021
THEOR APPL GENET

Key message Integrating genomics technologies and breeding methods to tweak core parameters of the breeder’s equation could accelerate delivery of climate-resilient and nutrient rich crops for future food security. Abstract Accelerating genetic gain in crop improvement programs with respect to climate resilience and nutrition traits, and the realization of the improved gain in farmers’ fields require integration of several approaches. This article focuses on innovative approaches to address core components of the breeder’s equation. A prerequisite to enhancing genetic variance ( σ 2 g ) is the identification or creation of favorable alleles/haplotypes and their deployment for improving key traits. Novel alleles for new and existing target traits need to be accessed and added to the breeding population while maintaining genetic diversity. Selection intensity ( i ) in the breeding program can be improved by testing a larger population size, enabled by the statistical designs with minimal replications and high-throughput phenotyping. Selection priorities and criteria to select appropriate portion of the population too assume an important role. The most important component of breeder′s equation is heritability ( h 2 ). Heritability estimates depend on several factors including the size and the type of population and the statistical methods. The present article starts with a brief discussion on the potential ways to enhance σ 2 g in the population. We highlight statistical methods and experimental designs that could improve trait heritability estimation. We also offer a perspective on reducing the breeding cycle time ( t ), which could be achieved through the selection of appropriate parents, optimizing the breeding scheme, rapid fixation of target alleles, and combining speed breeding with breeding programs to optimize trials for release. Finally, we summarize knowledge from multiple disciplines for enhancing genetic gains for climate resilience and nutritional traits.

Updates on Genomic Resources for Crop Improvement

Chapter

Jun 2022

An increasing number of crop genomic resources, with novel technical achievements in genome analytics have led to dramatic changes in the landscape of agricultural research. This has improved our capacity to meet global challenges around food production and must be understood to better serve the needs of the human population. In this chapter, we provide a comprehensive review of historical changes in technologies which allow for improved plant genotyping, molecular marker discovery, and decoding of the plant genome. Further, we explore resources and databases available for multi-omics analysis and finally conclude with a discussion of translational genomics considerations. Ultimately, this chapter will serve as a tool for bioinformaticians and researchers to explore the deeply significant field of crop genomics.

Bioinformatics in Plant Genomics for Next-Generation Plant Breeding

Chapter

Jun 2022

Pratibha Parihar

Breeding has played a significant role in the evolution of human civilizations began with the domestication of plant and animal species estimated to date back 10,000–15,000 years ago. It provides sustainability to more than 6 billion world populations. Over the past 100 years, there is a drastic variation in the landscape for plant breeding due to uncontrolled population growth, demolition of agricultural land areas, and changing environmental conditions. Thus, it imposes a tremendous challenge on the researchers to improve the production and productivity of crops. The advent of novel genomics methods including NGS (Next-Generation Sequencing) and breeding tools has massively changed traditional breeding into next-generation breeding. Genome editing is a promising technique to alter specific genes to improve trait expression. Integrating computational tools with next-generation breeding technologies can speed up the breeding process and increase the genetic gains under different production systems. This chapter emphasizes the significance of next-generation sequencing-derived information (big data) and their analysis by omics tools to revolutionize crop improvement.

Omics: a tool for resilient rice genetic improvement strategies

Article

Full-text available

Jun 2022
MOL BIOL REP

Rice is pivotal pyramid of about half of the world population. Bearing small genome size and worldwide utmost food crop rice has been known as ideal cereal crop for genome research. Currently, decreasing water table and soil fatigue are big challenges and intense consequences in changing climate. Whole sequenced genome of rice sized 389 Mb of which 95% is covered with excellent mapping order. Sequenced rice genome helps in molecular biology and transcriptomics of cereals as it provides whole genome sequence of indica and japonica sub species. Through rice genome sequencing and functional genomics, QTLs or genes, genetic variability and halophyte blocks for agronomic characters were identified which have proved much more useful in molecular breeding and direct selection. There are different numbers of genes or QTLs identified for yield related traits i.e., 6 QTLs/genes for plant architecture, 6 for panicle characteristics, 4 for grain number, 1 gene/QTL for tiller, HGW, grain filling and shattering. QTLS/genes for grain quality, biotic stresses and for abiotic stresses are 7, 23 and 13 respectively. Low yield, inferior quality and susceptibility to biotic and abiotic stresses of a crop is due to narrow genetic background of new evolving rice verities. Wild rice provides genetic resources for improvement of these characters, molecular and genomics tool at different stages can overcome these stresses and improve yield and quality of rice crop.

Omics to Understand Drought Tolerance in Plants: An Update

Chapter

Aug 2021

Drought is a major threat to many plants and especially with rice. Rice is a staple food for more than 3.5 billion people worldwide. There is a need to increase its productivity to make up the ever-increasing demand. However, drought during flowering stage reduces the crop yield significantly. Advances in omics technologies such as transcriptomics, genomics, proteomics, and metabolomics have provided an opportunity to study the drought-responsive genes and their functional product at genome-wide level. In recent years, these state-of-the-art techniques have improved our understanding of the complex drought tolerance mechanisms significantly. However, there are still challenges in generating a drought resistant rice variety having good yield. In this chapter, we have discussed about the application of different omics technologies to improve the drought resistant varieties with special reference to rice.

Functional genomics approaches for combating the effect of abiotic stresses

Chapter

Full-text available

Jan 2021

Abiotic stresses, such as drought, salinity, low or high temperature, and heavy-metal toxicity are major limiting factors of crop productivity and sustainability. Over a few decades, several traditional and modern breeding methods have been used in the development of stress-tolerant plants. However, abiotic stress tolerance is a complex trait as plants respond to stresses by activating complex molecular and biochemical networks. To improve plant’s tolerance to abiotic stresses, a good knowledge of diverse mechanisms and/or pathways involved in the stress response is needed. With modern technology advancement, functional genomics approaches can provide enormously in understanding the gene-regulatory networks operating in diverse stresses. This chapter provides recent progresses on functional genomics, presenting various approaches such as next-generation sequencing, functional mapping of quantitative trait loci, genome-wide hybridization, and transgenesis and genome editing in addition to identification and validation of the candidate genes for crop improvements. Additionally, technologies related to gene expression, mutagenesis, map-based cloning, and different genomic-assisted strategies are assessed and discussed in the light of integration of the information acquired through functional genomics.

Transcriptome atlas of Phalaenopsis equestris

Preprint

Full-text available

Nov 2020

The vast diversity of Orchidaceae together with sophisticated adaptations to pollinators and other unique features make this family an attractive model for evolutionary and functional studies. The sequenced genome of Phalaenopsis equestris facilitates Orchidaceae research. Here we present an RNA-seq based transcriptome map of P. equestris which covers 19 organs of the plant including leaves, roots, floral organs and shoot apical meristem. We demonstrated the high quality of the data and showed the similarity of P. equestris transcriptome map with gene expression atlases of other plants. The transcriptome map can be easily accessed through our database Transcriptome Variation Analysis (TraVA) visualizing gene expression profiles. As an example of the application we analyzed the expression of Phalaenopsis “orphan” genes – the ones that do not have recognizable similarity with genes of other plants. We found that about a half of them are not expressed; the ones that are expressed have a predominant expression pattern in reproductive structures.

The map-based sequence of the rice genome International Rice Genome Sequencing Project Nature 2005 436 793 800

Article

Full-text available

Aug 2005
NATURE

Rice, one of the world's most important food plants, has important syntenic relationships with the other cereal species and is a model plant for the grasses. Here we present a map-based, finished quality sequence that covers 95% of the 389 Mb genome, including virtually all of the euchromatin and two complete centromeres. A total of 37,544 non-transposable-element-related protein-coding genes were identified, of which 71% had a putative homologue in Arabidopsis. In a reciprocal analysis, 90% of the Arabidopsis proteins had a putative homologue in the predicted rice proteome. Twenty-nine per cent of the 37,544 predicted genes appear in clustered gene families. The number and classes of transposable elements found in the rice genome are consistent with the expansion of syntenic regions in the maize and sorghum genomes. We find evidence for widespread and recurrent gene transfer from the organelles to the nuclear chromosomes. The map-based sequence has proven useful for the identification of genes underlying agronomic traits. The additional single-nucleotide polymorphisms and simple sequence repeats identified in our study should accelerate improvements in rice production. Rice (Oryza sativa L.) is the most important food crop in the world and feeds over half of the global population. As the first step in a systematic and complete functional characterization of the rice genome, the International Rice Genome Sequencing Project (IRGSP) has generated and analysed a highly accurate finished sequence of the rice genome that is anchored to the genetic map. Our analysis has revealed several salient features of the rice genome: . We provide evidence for a genome size of 389 Mb. This size estimation is ,260 Mb larger than the fully sequenced dicot plant model Arabidopsis thaliana. We generated 370 Mb of finished sequence, representing 95% coverage of the genome and virtually all of the euchromatic regions. . A total of 37,544 non-transposable-element-related protein-cod-ing sequences were detected, compared with ,28,000–29,000 in Arabidopsis, with a lower gene density of one gene per 9.9 kb in rice. A total of 2,859 genes seem to be unique to rice and the other cereals, some of which might differentiate monocot and dicot lineages. . Gene knockouts are useful tools for determining gene function and relating genes to phenotypes. We identified 11,487 Tos17 retro-transposon insertion sites, of which 3,243 are in genes. . Between 0.38 and 0.43% of the nuclear genome contains orga-nellar DNA fragments, representing repeated and ongoing transfer of organellar DNA to the nuclear genome. . The transposon content of rice is at least 35% and is populated by representatives from all known transposon superfamilies. . We have identified 80,127 polymorphic sites that distinguish between two cultivated rice subspecies, japonica and indica, resulting in a high-resolution genetic map for rice. Single-nucleo-tide polymorphism (SNP) frequency varies from 0.53 to 0.78%, which is 20 times the frequency observed between the Columbia and Landsberg erecta ecotypes of Arabidopsis. . A comparison between the IRGSP genome sequence and the 6.3 £ indica and 6 £ japonica whole-genome shotgun sequence assemblies revealed that the draft sequences provided coverage of 69% by indica and 78% by japonica relative to the map-based sequence. Rice has played a central role in human nutrition and culture for the past 10,000 years. It has been estimated that world rice pro-duction must increase by 30% over the next 20 years to meet projected demands from population increase and economic devel-opment 1 . Rice grown on the most productive irrigated land has achieved nearly maximum production with current strains 1 . Environmental degradation, including pollution, increase in night time temperature due to global warming 2 , reductions in suitable arable land, water, labour and energy-dependent fertilizer provide additional constraints. These factors make steps to maximize rice productivity particularly important. Increasing yield potential and yield stability will come from a combination of biotechnology and improved conventional breeding. Both will be dependent on a high-quality rice genome sequence. Rice benefits from having the smallest genome of the major cereals, dense genetic maps and relative ease of genetic transformation 3 . The discovery of extensive genome colinearity among the Poaceae 4 has established rice as the model organism for the cereal grasses. These properties, along with the finished sequence and other tools under development, set the stage for a complete functional characterization of the rice genome.

The map-based sequence of the rice genome

Article

Full-text available

Jan 2005

The map-based sequence of rice genome

Article

Full-text available

Aug 2005

Paucity of chimeric gene-transposable element transcripts in the Drosophila melanogaster genome

Article

Full-text available

Nov 2005
BMC BIOL

Background Recent analysis of the human and mouse genomes has shown that a substantial proportion of protein coding genes and cis-regulatory elements contain transposable element (TE) sequences, implicating TE domestication as a mechanism for the origin of genetic novelty. To understand the general role of TE domestication in eukaryotic genome evolution, it is important to assess the acquisition of functional TE sequences by host genomes in a variety of different species, and to understand in greater depth the population dynamics of these mutational events. Results Using an in silico screen for host genes that contain TE sequences, we identified a set of 63 mature "chimeric" transcripts supported by expressed sequence tag (EST) evidence in the Drosophila melanogaster genome. We found a paucity of chimeric TEs relative to expectations derived from non-chimeric TEs, indicating that the majority (~80%) of TEs that generate chimeric transcripts are deleterious and are not observed in the genome sequence. Using a pooled-PCR strategy to assay the presence of gene-TE chimeras in wild strains, we found that over half of the observed chimeric TE insertions are restricted to the sequenced strain, and ~15% are found at high frequencies in North American D. melanogaster populations. Estimated population frequencies of chimeric TEs did not differ significantly from non-chimeric TEs, suggesting that the distribution of fitness effects for the observed subset of chimeric TEs is indistinguishable from the general set of TEs in the genome sequence. Conclusion In contrast to mammalian genomes, we found that fewer than 1% of Drosophila genes produce mRNAs that include bona fide TE sequences. This observation can be explained by the results of our population genomic analysis, which indicates that most potential chimeric TEs in D. melanogaster are deleterious but that a small proportion may contribute to the evolution of novel gene sequences such as nested or intercalated gene structures. Our results highlight the need to establish the fixity of putative cases of TE domestication identified using genome sequences in order to demonstrate their functional importance, and reveal that the contribution of TE domestication to genome evolution may vary drastically among animal taxa.

Functional Rice Centromeres Are Marked by a Satellite Repeat and a Centromere-Specific Retrotransposon

Article

Full-text available

Sep 2002

The centromere of eukaryotic chromosomes is essential for the faithful segregation and inheritance of genetic information. In the majority of eukaryotic species, centromeres are associated with highly repetitive DNA, and as a consequence, the boundary for a functional centromere is difficult to define. In this study, we demonstrate that the centers of rice centromeres are occupied by a 155-bp satellite repeat, CentO, and a centromere-specific retrotransposon, CRR. The CentO satellite is located within the chromosomal regions to which the spindle fibers attach. CentO is quantitatively variable among the 12 rice centromeres, ranging from 65 kb to 2 Mb, and is interrupted irregularly by CRR elements. The break points of 14 rice centromere misdivision events were mapped to the middle of the CentO arrays, suggesting that the CentO satellite is located within the functional domain of rice centromeres. Our results demonstrate that the CentO satellite may be a key DNA element for rice centromere function.

Collection, Mapping, and Annotation of Over 28,000 cDNA Clones from japonica Rice

Article

Full-text available

Aug 2003
SCIENCE

We collected and completely sequenced 28,469 full-length complementary DNA clones from Oryza sativa L. ssp. japonica cv. Nipponbare. Through homology searches of publicly available sequence data, we assigned tentative protein functions to 21,596 clones (75.86%). Mapping of the cDNA clones to genomic DNA revealed that there are 19,000 to 20,500 transcription units in the rice genome. Protein informatics analysis against the InterPro database revealed the existence of proteins presented in rice but not in Arabidopsis. Sixty-four percent of our cDNAs are homologous to Arabidopsis proteins.

RNAi-mediated Targeting of Heterochromatin by the RITS Complex

Article

Full-text available

Feb 2004
SCIENCE

RNA interference (RNAi) is a widespread silencing mechanism that acts at both the posttranscriptional and transcriptional levels. Here, we describe the purification of an RNAi effector complex termed RITS (RNA-induced initiation of transcriptional gene silencing) that is required for heterochromatin assembly in fission yeast. The RITS complex contains Ago1 (the fission yeast Argonaute homolog), Chp1 (a heterochromatin-associated chromodomain protein), and Tas3 (a novel protein). In addition, the complex contains small RNAs that require the Dicer ribonuclease for their production. These small RNAs are homologous to centromeric repeats and are required for the localization of RITS to heterochromatic domains. The results suggest a mechanism for the role of the RNAi machinery and small RNAs in targeting of heterochromatin complexes and epigenetic gene silencing at specific chromosomal loci.

Natural antisense transcripts with coding capacity in Arabidopsis may have a regulatory role that is not linked to double-stranded RNA degradation

Article

Jun 2005

Overlapping transcripts in antisense orientation have the potential to form double-stranded RNA (dsRNA), a substrate for a number of different RNA-modification pathways. One prominent route for dsRNA is its breakdown by Dicer enzyme complexes into small RNAs, a pathway that is widely exploited by RNA interference technology to inactivate defined genes in transgenic lines. The significance of this pathway for endogenous gene regulation remains unclear. RESULTS: We have examined transcription data for overlapping gene pairs in Arabidopsis thaliana. On the basis of an analysis of transcripts with coding regions, we find the majority of overlapping gene pairs to be convergently overlapping pairs (COPs), with the potential for dsRNA formation. In all tissues, COP transcripts are present at a higher frequency compared to the overall gene pool. The probability that both the sense and antisense copy of a COP are co-transcribed matches the theoretical value for coexpression under the assumption that the expression of one partner does not affect the expression of the other. Among COPs, we observe an over-representation of spliced (intron-containing) genes (90%) and of genes with alternatively spliced transcripts. For loci where antisense transcripts overlap with sense transcript introns, we also find a significant bias in favor of alternative splicing and variation of polyadenylation. CONCLUSION: The results argue against a predominant RNA degradation effect induced by dsRNA formation. Instead, our data support alternative roles for dsRNAs. They suggest that at least for a subgroup of COPs, antisense expression may induce alternative splicing or polyadenylation.

SP14 Genome-Wide High-Resolution Mapping and Functional Analysis of DNA Methylation

Article

Feb 2007

R. Lister

Methylation of cytosines in DNA sequences is a major part of epigenetic regulation, resulting in proximal transcriptional silencing and enabling the stable inheritance of a pattern of transcriptional activity. DNA methylation in higher eukaryotes is involved in transposon silencing and regulation of gene expression; however, the full extent to which this mechanism regulates the genome has remained unknown. Tiling arrays representing the entire genome of the flowering plant Arabidopsis thaliana, tiled at 35-bp resolution, provide a platform upon which to analyze the methylated component of the Arabidopsis genome. Hybridization of methylated genomic DNA isolated by 5-methyl-cytosine immunoprecipitation to the whole-genome tiling arrays produced the first comprehensive DNA methylation map of an entire genome, identifying heavy DNA methylation at pericentromeric heterochromatin, repetitive sequences, and regions producing small interfering RNAs. Over one-third of expressed genes contain methylation within transcribed regions, whereas only ~5% of genes show methylation within promoter regions. Genes methylated in transcribed regions are highly expressed and constitutively active, whereas promoter-methylated genes show a greater degree of tissue-specific expression. Whole-genome tiling-array transcriptional profiling of DNA methyltransferase null mutants identified hundreds of genes and intergenic noncoding RNAs with altered expression levels, many of which may be epigenetically controlled by DNA methylation. The approaches developed should assist in the study of DNA methylation in larger and more complex genomes, for which whole-genome tiling arrays are now available.

The microRNA registry

Article

Feb 2004
NUCLEIC ACIDS RES

Sam Griffiths-Jones

The miRNA Registry provides a service for the assignment of miRNA gene names prior to publication. A comprehensive and searchable database of published miRNA sequences is accessible via a web interface (http://www.sanger.ac.uk/Software/Rfam/mirna/), and all sequence and annotation data are freely available for download. Release 2.0 of the database contains 506 miRNA entries from six organisms.

An expression atlas of rice mRNAs and small RNAs

Abstract and Figures

Recommended publications

Mining small RNA sequencing data: A new approach to identify small nucleolar RNAs in Arabidopsis

Transposable Element Regulation in Rice and Arabidopsis : Diverse Patterns of Active Expression and...

Elucidation of the Small RNA Component of the Transcriptome

Genome-wide analysis for discovery of rice microRNAs reveals natural antisense microRNAs (nat-miRNAs...

Sequencing-based measurements of mRNA and small RNA