Content uploaded by Manoj Pillay
Author content
All content in this area was uploaded by Manoj Pillay
Content may be subject to copyright.
An expression atlas of rice mRNAs and small RNAs
Kan Nobuta
1,2
, R C Venu
4
, Cheng Lu
1,2
, Andre
´
Be
´
lo
2
, Kalyan Vemaraju
1
, Karthik Kulkarni
1
, Wenzhong Wang
1
,
Manoj Pillay
1
, Pamela J Green
1–3
, Guo-liang Wang
4
& Blake C Meyers
1,2
Identification of all expressed transcripts in a sequenced
genome is essential both for genome analysis and for
realization of the goals of systems biology. We used the
transcriptional profiling technology called ‘massively parallel
signature sequencing’ to develop a comprehensive expression
atlas of rice (Oryza sativa cv Nipponbare). We sequenced
46,971,553 mRNA transcripts from 22 libraries, and
2,953,855 small RNAs from 3 libraries. The data demonstrate
widespread transcription throughout the genome, including
sense expression of at least 25,500 annotated genes and
antisense expression of nearly 9,000 annotated genes. An
additional set of B15,000 mRNA signatures mapped to
unannotated genomic regions. The majority of the small RNA
data represented lower abundance short interfering RNAs that
match repetitive sequences, intergenic regions and genes.
Among these, numerous clusters of highly regulated small
RNAs were readily observed. We developed a genome
browser (http://mpss.udel.edu/rice) for public access to
the transcriptional profiling data for this important crop.
Because of its scientific, economic and cultural importance, the
sequencing of the rice (Oryza sativa ssp. japonica cv Nipponbare)
genome
1
represents a milestone in plant biology. The recent annota-
tion of the rice genome (The Institute for Genomic Research, TIGR
version 4.0) includes 55,890 features that represent 42,653 predicted
protein-coding genes and 13,237 transposable elements
2
.Experimen-
tal evidence from full-length cDNA and expressed sequence tags
(ESTs) is critical for genome analysis
3
, yet these data are incomplete,
in that they are subsaturating and miss nonpolyadenylated transcripts
such as small RNAs. Small RNAs of 21–24 nucleotides are well
characterized in Arabidopsis thaliana, but little is known about the
diversity of these molecules in other plant species. Several categories
are known, including short interfering RNAs (siRNAs) and micro-
RNAs (miRNAs), both of which silence genes by targeting comple-
mentary mRNAs for degradation
4
. siRNAs can also trigger
transcriptional silencing by guiding nuclear complexes that target
either histone modifications or DNA methylation or both
5
.Small
RNAs are best discovered and measured by deep sequencing
approaches that have high sensitivity and specificity.
One advantage of using a signature-based expression profiling
method such as massively parallel signature sequencing (MPSS) to
improve genome annotation is its potential to characterize previously
unknown transcripts. Transcriptional analyses of the A. thaliana
genome with MPSS have identified extensive alternative polyadenyla-
tion, as well as large numbers of natural antisense transcripts, novel
transcripts from unannotated genomic regions, noncoding RNAs and
small RNAs
6,7
. Sequence-based data have very high specificity and
accuracy for assessing gene activity, because they are not subject to
cross-hybridization. In addition to enhancing genome annotation,
MPSS data provide quantitative expression information
6
.
To clarify the complexity of polyadenylated transcripts in rice, we
sequenced 22 mRNA libraries using MPSS (Ta ble 1, Supplementary
Tabl e 1 and Supplementary Data online). These libraries, from
12 diverse untreated tissues (with some replicates) and six abiotic
stress treatments, included 46,971,553 transcripts. The nonredundant
set comprised 249,990 distinct sequences. The expression values of
these signatures ranged over four orders of magnitude, with the
majority found in the range of 1 to 100 transcripts per million
(TPM) (Supplementary Table 1 online). Filtering to capture the
most ‘reliable’ signatures removed most erroneous sequences
6
, leaving
a set of 46,251,966 signatures (Tabl e 1 and Supplementary Fig. 1
online); these comprised 121,581 distinct signatures. This represents
the deepest reported set of plant transcriptional data. This sampling
depth is high enough that new transcripts are infrequently
discovered (Fig. 1a).
Signatures matching only once in the genome (hits ¼ 1) provide a
conservative assessment of transcript diversity and active genes,
whereas including duplicated signatures (hits 40) provides an
upper boundary (Supplementary Table 2 online). The majority of
the signatures (B75%) in the libraries matched sense-strand tran-
scripts of nearly half of the annotated rice genes (20,821 of 42,653,
© 2007 Nature Publishing Group http://www.nature.com/naturebiotechnolog
y
Table 1 Summary statistics for rice MPSS libraries
Category mRNA total Small RNA total
Libraries 22 3
Signatures sequenced
a
46,971,553 2,953,855
Distinct signatures
b
249,990 284,301
Distinct genome-matched signatures
c
81,961 221,592
Abundance of genome-matched signatures
d
46,251,966 1,948,368
a
All sequencing reactions combined for each type of library.
b
The nonredundant set of either
the complete set of MPSS signatures or those that match to the genome.
c
For the genome-
matched mRNA signatures, only those that passed the reliability filter are included.
d
The sum
of the observed frequency for all distinct, genome-matched signatures.
Received 26 October 2006; accepted 25 January 2007; published online 11 March 2007; doi:10.1038/nbt1291
1
Delaware Biotechnology Institute,
2
Department of Plant and Soil Sciences,
3
College of Marine and Earth Studies, University of Delaware, Newark, Delaware 19711.
4
Department of Plant Pathology, The Ohio State University, Columbus, OH 43210. Correspondence should be addressed to B.C.M. (meyers@dbi.udel.edu).
NATURE BI OTECHNOLOGY ADVANCE ONLINE PUBLICATION 1
LETTERS
Fig. 1b and Supplementary Table 2a online). This number is
comparable to the number of gene annotations supported by EST or
full-length cDNA data (21,328) (Fig. 1b). Of genes with MPSS
support, 84% are supported by EST or full-length cDNA data, and
only 18% of those lacking MPSS support have ESTs or full-length
cDNAs (Fig. 1b). Because the ESTs or full-length cDNAs are derived
from 460 libraries (versus our 22 libraries), we concluded that
substantial sampling depth may substitute for tissue or treatment
diversity. Among the 20,821 genes with MPSS support, 87% had high
homology to A. thaliana genes, whereas only 39% of the genes without
MPSS support fall into this category (Fig. 1b). Because numerous
MPSS-supported low-homology genes (1,873) match ESTs or full-
length cDNAs, many validated rice genes apparently lack orthologs in
A. thaliana.InA. thaliana, 17 MPSS libraries confirmed expression of
21,193 genes. By comparison to our rice data, these data suggest that
the two plant genomes have similar transcriptional complexities,
although the respective genome size and gene number of rice are
3 times and 1.5 times those of A. thaliana.
We expected that similar tissues and treatments would share sets of
expressed genes. Hierarchical clustering of the libraries was performed
using the 20,821 genes with MPSS support, and the pollen library was
an outgroup compared to all other libraries (Supplementary Fig. 2
online). Only one-third of the pollen-specific genes had A. thaliana
orthologs. The unique transcriptional pattern of the pollen library
may result from its haploid nature; the number of library-specific
genes was the highest in pollen (392), including an unusually large
number of transposable elements (Supplementary Fig. 3 online).
To examine the association between chromosomal architecture and
transcriptional activity, we compared gene density and expression
levels for each chromosome (Supplementary Fig. 4 online). Expres-
sion activity was observed around the gene-dense regions, and the
MPSS data are consistent with studies of centromeres 3, 4 and 8
indicating that genes in these regions are active
8–10
. Some rice centro-
meres, like Cen3, may be evolving from genic regions to repeat-based
mature centromeres
9
.
Genic positions of MPSS signatures identify transcripts resulting
from alternative polyadenylation and 3¢-splicing, as well as those
which are antisense transcripts. Among the 20,821 expressed rice
genes, more than half (11,941) had multiple sense signatures that
represent alternative transcripts. These transcripts can show marked
differences in their expression levels (Fig. 1c). This type of analysis
provides a novel view of complex gene expression events that may be
important in better understanding gene function. We also identified
11,001 antisense signatures corresponding to 8,023 annotated genes
(Supplementary Table 2a online). Some natural antisense transcripts
are coexpressed and induce alternative splicing, some induce dsRNA
cleavage and show reciprocal expression patterns
11
, and some even
generate regulatory small RNAs
12
. MPSS analysis identified natural
antisense transcripts for many rice genes, often with highly specific
expression patterns. A comparison of sense and antisense expression
levels demonstrates the difficulty of generalizing the function
of natural antisense transcripts (Supplementary Fig. 5 online),
which may require gene-by-gene analyses to discern their complex
regulatory mechanism.
The 22 libraries included 13,461 intergenic signatures (Supplemen-
tary Table 2a online) that could be (i) misannotated 3¢-untranslated
regions of known genes, (ii) novel genes or (iii) noncoding RNAs
including miRNAs. Sixty-three intergenic mRNA signatures matched
30 known miRNA genes (precursors) and 68 noncoding RNA genes
from the registry
13
(Supplementary Table 3 online). Consistent
with previous reports
7
, many noncoding RNAs were expressed in
the stigma-ovary library (Supplementary Table 3b online). Like
© 2007 Nature Publishing Group http://www.nature.com/naturebiotechnolog
y
Figure 1 Genome-wide transcriptional analysis by
mRNA MPSS. (a) Discovery rates for new rice
transcripts based on the 46,971,553 mRNA
MPSS signatures, resampled 1,000 times with
replacement. Each signature was weighted by
abundance. The gold line indicates 249,990
distinct, unfiltered signatures, whereas the green
line indicates 121,581 reliable signatures. Each
slope was calculated at the endpoint, indicating
the final discovery rate for new transcripts after
sampling 50 million transcripts. Sampling of
additional tissues or treatments may increase this
rate. (b) mRNA MPSS signatures representing
sense-strand transcripts (classes 1, 2, 5 and 7),
compared to annotated rice genes and excluding
transposons and small genes (o50 amino acids).
Most genes with support by MPSS have
additional support from full-length cDNA data
and high similarity to A. thaliana genes. Total
gene numbers are indicated in the center (white
sections), and genes without cDNA support are
indicated by the blue sections labeled ‘‘others.’’
Numbers in parentheses are potential pack-
MULEs found in each category. HH and LH, high
and low homology to A. thaliana, respectively. Using signatures with hits 4 1 slightly increases MPSS supported genes (Supplementary Fig. 14 online).
(c) Each row in the heatmap represents a different gene preselected for evidence of a substantial number of alternative transcripts. Yellow and blue are
upregulated and downregulated transcripts in salt-treated (NSL) versus untreated young leaf (NYL), respectively. Green represents no difference in
expression. The ‘‘All’’ column is the NSL/NYL ratio (the expression ratio of all salt-treated to all young leaf transcripts), without considering alternatively
terminated transcripts. Column 1 is the expression level of the 3¢-most MPSS signature, the longest transcript, whereas column 5 is the shortest
transcript, and 5¢-most MPSS signature. Example genes are shown to the right, with the different transcripts and expression in the two libraries
indicated. Red and pink boxes represent coding regions and untranslated regions, respectively, and the black triangles indicate the five MPSS signatures
that correspond to the genes.
300,000
200,000
100,000
0 1020304050
Slope = 487 / 1,000,000
Slope = 34 / 1,000,000
Number of distinct signatures
All signatures
Reliable signatures
Total number of signatures sampled
(in millions)
HH
39%
LH
61%
Others
17,886
(1,358)
FL-cDNA
EST
3,946
(0)
21,832
genes
(1,358)
20,821
genes
(78)
FL-cDNA
EST
17,382
(19)
Others
3,439
(59)
Absent by MPSS Present by MPSS
HH
87%
LH
13%
All54321
Os04g32650
5
6
7
8
9
5
43
2
1
NYL
37
00
00
TPM
NSL
0
00
063
TPM
Os09g35990
NYL
NSL
0
0000
0 29 0 0 TPM
29 TPM
1 234
43215
Os10g37190
1 2
4321
5
NYL 0 0 500 TPM0
NSL 0
0 0 0 TPM13
Os03g18910
4321
5
1
NYL 0
38 0
TPM0
NSL 0
0
0
0
TPM
31
0
0
a
b
c
2 ADVANCE ONLINE PUBLICATION NATURE BIOTECHNOLOGY
LETTERS
‘housekeeping’ genes, many intergenic signatures were expressed in all
22 libraries and were supported by EST or full-length cDNA data.
Clearly there are a substantial number of unidentified genes in
rice. Some of these are expressed across a range of tissues and
developmental stages, whereas others may encode small peptides or
generate uncharacterized, regulatory small RNAs such as miRNAs
or siRNAs. However, intergenic transcripts were less broadly
expressed, with each intergenic transcript expressed in an average of
6.65 libraries versus 14.42 for gene-associated signatures, and
much more weakly expressed than genic transcripts (Supplementary
Fig. 4 online). This low expression contrasts with the prevailing
abundance of sense signatures and suggests that these intergenic
transcripts may have distinct characteristics unlike those of normal,
protein-coding genes.
To investigate the complexity of small RNAs in rice, we next
generated MPSS small RNA libraries of rice inflorescence, stem, and
seedling leaf. Of the 284,301 distinct small RNAs, 78% matched the
rice genome, representing 1,948,368 of 2,953,855 total sequences
(Ta bl e 1). Like A. thaliana
7
, the inflorescence library was proportion-
ally more complex, perhaps reflecting stronger germline silencing
(Supplementary Table 4 online). More than half of the distinct
signatures matched unique sites in the genome, but there were
many more highly duplicated signatures in the rice genome than
were observed in A. thaliana (Fig. 2a). Unlike small RNAs from
A. thaliana
7
, those from rice are not as concentrated across the
pericentromeric regions and are instead more widely distributed on
the chromosomes (Supplementary Fig. 6 online). The two completely
sequenced rice centromeres are characterized by small RNAs corre-
sponding to the centromeric repeats (Supplementary Fig. 7 online).
Because the distributions of A. thaliana small RNAs are strongly
correlated with those of repetitive sequences, we examined the
relationship between rice small RNAs, genes and repeats. Rice small
RNAs were strongly associated with intergenic regions (Supplemen-
tary Fig. 8 online), and some chromosomes demonstrated a pericen-
tromeric concentration when we examined the small RNA abundance
in addition to the distribution (Fig. 2 and Supplementary Fig. 9
online). Most chromosomal regions in rice show a range of patterns of
small RNAs, consistent with a dispersed arrangement of genes,
transposons and miniature inverted repeat transposable elements
(Supplementary Fig. 10 online). Notably, abrupt transitions between
concentrated siRNA clusters and active genes were often observed
(Supplementary Fig. 11 online), suggesting a localized shift from
silenced to active chromatin within a very short physical distance. This
type of abrupt transition is supported by methylation profiling studies
that can identify heterochromatin via biochemical (rather than
cytological) means
14,15
. These studies have indicated a strong coloca-
lization of siRNA clusters and DNA methylation, even within cytolo-
gically defined euchromatin. Many rice small RNAs were derived from
transposons or retrotransposons, but many also matched unannotated
intergenic regions (Supplementary Table 5 online). At least 12,766 of
the 13,237 annotated retrotransposon or transposon-related sequences
in the rice genome had matches to small RNAs.
© 2007 Nature Publishing Group http://www.nature.com/naturebiotechnolog
y
5′
160
140
120
100
80
60
40
20
0
1–25 TPQ
26–100 TPQ
>100 TPQ
Number of signatures matching to
annotated hairpin of the pre-miRNA
FLR SNU STM
Libraries
1
TIR
Os01g20930
TIR
Os02g49580
TIR
TIR
2
5
′
Os10g37080
TIR
TIR
5′
5′
5′
5′
1
1
2
3
2,000
0
70,000
0
2,000
0
10 Mb 20 Mb 30 Mb
Small RNA
(TPQ)
mRNA
(TPM)
Genes and
repeats (bp)
Chromosome 8
c
de
100,000
10,000
1,000
100
10
0
Number of distinct signatures
Rice
Arabidopsis
1
2–10
11–100
101–1,000
>1,000
Number of hits to genome
10,000
1,000
100
10
0
Count of gene, TEs or IGRs
10 100
Number of distinct signatures
a
b
Protein-coding gene
Transposable
element
Intergenic region
Figure 2 Deep sequencing of rice small RNAs by MPSS. (a) Match frequencies of small RNAs in rice versus those of A. thaliana. We determined the
number of genomic matches (‘hits’) to their respective genomes for 149,978 and 56,920 distinct small RNA MPSS signatures from rice and A. thaliana
inflorescence libraries. The A. thaliana data have been described elsewhere
7
.(b) The number of small RNAs matching individual protein-coding genes,
transposable elements and intergenic regions. (c) Distributions of small RNAs, mRNA expression and genes or repeats across rice chromosome 8, plotted
as moving averages of five adjacent bins of 100 kb. The light-green, pink, and light-blue lines (top) are small RNAs in inflorescence, stem and young leaf,
respectively. mRNA levels in the same tissues are shown in green, red and blue (middle). Black and gray lines (bottom) are densities of genes and repeats.
Blue vertical shading indicates the approximate position of the centromere. (d) Abundance values for small RNA signatures matching to the hairpin of the
171 genome-mapped, known rice miRNAs. The y-axis indicates the number of distinct signatures in each abundance class. (e) Small RNAs map to the
terminal inverted repeats (TIRs, black arrowheads) of pack-MULE elements described elsewhere
14,15
. Yellow shading indicates transposon-like sequences;
orange indicates inverted repeats; black triangles are small RNAs; red and blue boxes are annotated exons. Additional features are as described in
Supplementary Figure 7 online.
NATURE BI OTECHNOLOGY ADVANCE ON LINE PUBLICATION 3
LETTERS
As we have demonstrated elsewhere
6,7
, repetitive sources of siRNAs
produce numerous distributed small RNAs, whereas miRNAs pro-
duced small focused clusters of specific sequences. Rice genes matched
distinct small RNAs at a rate at least as high as repeats and intergenic
regions (Fig. 2b and Supplementary Fig. 8 online), often resulting
from miniature inverted repeat transposable elements or other small
repeats embedded within an intron. As in A. thaliana, tandem repeats
and inverted repeats are rich sources of small RNAs (Supplementary
Fig. 12 online). From all three libraries, there were many clusters of
small RNAs, particularly in intergenic, unannotated regions of the
rice genome (Supplementary Table 6 online). This indicates that the
rice genome is much richer in silenced sequences compared with
A. thaliana, consistent with the higher degree of repetitive DNA.
Because small RNAs may also interact with imperfectly matched
targets, their biological effect may be far more substantial than we
have indicated.
The miRNA registry includes 182 rice miRNAs
13
, of which 171 were
mapped in the genome and 130 were expressed, accounting for 8.8%,
36.6% and 12.6% of the inflorescence, seedling and stem small RNAs.
These percentages are much lower than those for A. thaliana,whichis
consistent with a substantial abundance of repeat-associated siRNAs in
the more complex rice genome
6,7
. The lack of small RNA biogenesis
mutants in rice hinders our ability to distinguish siRNAs and
miRNAs. However, miRNAs are abundant, consistently expressed
and conserved, and we identified numerous conserved and consis-
tently expressed small RNAs (Supplementary Fig. 13 online), suggest-
ing that these data include many novel miRNAs.
With the three libraries, we examined the developmental regulation
of rice small RNAs. The chromosomal distributions of small RNAs
suggested a high degree of similarity between young leaf and the stem
small RNAs (Fig. 2c and Supplementary Fig. 9 online), both of which
differed in comparison to the inflorescence data. However, the stem
library produced the greatest number of small RNA clusters or genes
that were substantially much more abundant in only one of the three
libraries (Fig. 2d and Supplementary Table 7 online), suggesting an
unusual degree of small RNA regulation in the stem. This is the
first report of small RNAs from plant stems, and it is possible that
some of these small RNAs are involved in signal transmission between
leaves and inflorescences
16
. In contrast to A. thaliana
6,7
, many small
RNA clusters were substantially reduced in the inflorescence as
compared with amounts in seedlings and the stem (Supplementary
Tabl e 7 online).
The rice genome contains many gene fragments (‘pack-MULEs’)
generated by transposons
17
. Our mRNA MPSS data identified 17,886
genes without expression data (Fig. 1b), 490% of which have no
known function, suggesting that many of these are inactive or are
pseudogenes. We compared the mRNA and small RNA MPSS data to
predicted rice pack-MULEs
17
, using 8,271 previously identified
MULEs
18
. This identified 1,358 potential pack-MULEs among the
17,886 unexpressed genes. These elements represent just one of several
classes of gene-shuffling transposable elements
19–21
, so other inactive
genes are likely to be transposed fragments. Small RNAs matched to
many of the terminal inverted repeats of these pack-MULEs but
infrequently to the internal gene fragments (Fig. 2e). The combined
mRNA and small RNA datasets may offer an experimentally based
system for pack-MULE identification in rice and other plant genomes.
Taking into account annotated, expressed genes as well as small
RNAs, a very high proportion of the rice genome is actively tran-
scribed. Although there are thousands of annotated genes lacking both
mRNA and small RNA expression data, detection of their expression
may require sampling of highly specialized tissues, cell types or
treatments. The rice small RNA data indicate an extensive and
complex repertoire of such molecules. This vastly exceeds that of
A. thaliana, consistent with increased genome sizes correlating with an
increased complexity of small RNAs. This suggests that larger plant
genomes, such as those of most crops, will require deeper sequencing.
Additional complexity may be found in analyses of nonpolyadenylated
transcripts
22
. A comprehensive understanding of the network of gene
expression events in rice or other crops will require concerted efforts
such as ours to characterize the activities and functions of a compre-
hensive catalog of genomic components.
METHODS
High- and low-homology rice genes. The high- and low-homology rice genes
were identified according to their similarity to A. thaliana genes using a
threshold of a BLASTP e-value o1.0e-7.
MULE analysis. The rice MULE data have been previously described
18
and
were downloaded from http://www.genome.org/. Because the International
Rice Genome Sequencing Project annotation version 2.0 was used for their
analyses, we remapped all the MULEs onto TIGR4.0. After mapping these
sequences, potential pack-MULEs (genes flanked on both sides by MULEs)
with or without MPSS signatures were identified. In addition, we also used
these sequences to identify pack-MULEs associated with the intergenic MPSS
signatures, because these could be pack-MULEs, which correspond to the
unannotated transcripts.
Analysis of alternative termination. As an example of the differential expres-
sion of alternative transcripts, we focused on two libraries (NYL and NSL) to
examine the effect of salt stress. The genes with alternative termination sites
were identified from the MPSS data (multiple sense-strand MPSS signatures
associated with a single gene), and from this set, those genes were selected that
had at least two sense signatures demonstrating tenfold higher levels of
expression in one library than the other. For each gene and for each library,
the expression levels and the sum of the expression levels were recorded for five
MPSS signatures located at 3¢ end of each gene. The expression level of the NSL
signatures was divided by that of the corresponding NYL signatures, with the
resulting values log-transformed and loaded into R to generate a heatmap.
Additional methods. Detailed methods are available in Supplementary
Methods online.
Accession codes. Gene Expression Omnibus (GEO): series identifier GSE7107,
platform identifiers GPL3777 and GPL3776 for mRNA and small RNA samples,
respectively; sample identifiers, GSM169562, GSM169564, GSM169566,
GSM169567, GSM169568, GSM169569, GSM169570, GSM170900,
GSM170901, GSM170902, GSM170903, GSM170904, GSM170905,
GSM170906, GSM170907, GSM170909, GSM170912, GSM170912,
GSM170914, GSM170917, GSM170919 and GSM170921. The raw and normal-
ized MPSS data are also available at http://mpss.udel.edu/rice and this website
allows users to query these data based on physical location, gene identifiers or
by sequence.
Note: Supplementary information is available on the Nature Biotechnology website.
ACKNOWLEDGMENTS
We are grateful to C. Haudenschild, TIGR’s rice annotation project, S. Singh Tej,
M. Nakano, R. German, A. Hetawal, R. Gupta and S. Kaushik. This work was
supported by US National Science Foundation awards 0321437 (B.C.M. and
G.-l.W.) and 0439186 (P.J.G. and B.C.M.), and US Department of Agriculture
2005-35064-15326 (B.C.M. and P.J.G.).
AUTHOR CONTRIBUTIONS
K.N. performed research, analyzed data and wrote the manuscript, R.C.V. and
C.L. performed laboratory research and provided useful discussions, A.B., K.V.,
K.K., W.W. and M.P. performed computational research; P.J.G. and G.-l.W.
designed research and wrote manuscript; B.C.M. designed research, analyzed
data, and coordinated and wrote the manuscript.
© 2007 Nature Publishing Group http://www.nature.com/naturebiotechnolog
y
4 ADVANCE ONLINE PU BLICATION NATURE BIOTECHNOLOGY
LETTERS
COMPETING INTERESTS STATEMENT
The authors declare no competing financial interests.
Published online at http://www.nature.com/naturebiotechnology
Reprints and permissions information is available online at http://npg.nature.com/
reprintsandpermissions
1. International Rice Genome Sequencing Project. The map-based sequence of the rice
genome Nature 436, 793–800 (2005).
2. Yuan, Q. et al. The Institute for Genomic Research Osa1 rice genome annotation
database. Plant Physiol. 138, 18–26 (2005).
3. Kikuchi, S. et al. Collection, mapping, and annotation of over 28,000 cDNA clones
from japonica rice. Science 301, 376–379 (2003).
4. Bartel, D.P. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116,
281–297 (2004).
5. Verdel, A. et al. RNAi-mediated targeting of heterochromatin by the RITS complex.
Science 303, 672–676 (2004).
6. Meyers, B.C. et al. Analysis of the transcriptional complexity of Arabidopsis thaliana by
massively parallel signature sequencing. Nat. Biotechnol. 22, 1006–1011 (2004).
7. Lu, C. et al. Elucidation of the small RNA component of the transcriptome. Science
309, 1567–1569 (2005).
8. Cheng, Z. et al. Functional rice centromeres are marked by a satellite repeat and a
centromere-specific retrotransposon. Plant Cell 14, 1691–1704 (2002).
9. Yan, H. et al. Genomic and genetic characterization of rice cen3 reveals extensive
transcription and evolutionary implications of complex centromere. Plant Cell 18,
3227–3238 (2006).
10. Yan, H. et al. Transcription and histone modifications in the recombination-free region
spanning a rice centromere. Plant Cell 17, 3227–3238 (2005).
11. Jen, C.H., Michalopoulos, I., Westhead, D. & Meyer, P. Natural antisense transcripts
with coding capacity in Arabidopsis may have a regulatory role that is not linked to
double-stranded RNA degradation. Genome Biol. 6, R51 (2005).
12. Borsani, O., Zhu, J., Verslues, P.E., Sunkar, R. & Zhu, J.K. Endogenous siRNAs derived
from a pair of natural cis-antisense transcripts regulate salt tolerance in Arabidopsis.
Cell 123, 1279–1291 (2005).
13. Griffiths-Jones, S. The microRNA Registry. Nucleic Acids Res. 32, D109–D111
(2004).
14. Lippman, Z. et al. Role of transposable elements in heterochromatin and epigenetic
control. Nature 430, 471–476 (2004).
15. Zhang, X. et al. Genome-wide high-resolution mapping and functional analysis of DNA
methylation in Arabidopsis. Cell 126, 1189–1201 (2006).
16. Yoo, B.-C. et al. A systemic small RNA signaling system in plants. Plant Cell 16,
1979–2000 (2004).
17. Jiang, N., Bao, Z., Zhang, X., Eddy, S.R. & Wessler, S.R. Pack-MULE transposable
elements mediate gene evolution in plants. Nature 431, 569–573 (2004).
18. Juretic, N., Hoen, D.R., Huynh, M.L., Harrison, P.M. & Bureau, T.E. The evolutionary
fate of MULE-mediated duplications of host gene fragments in rice. Genome Res. 15,
1292–1297 (2005).
19. Morgante, M. et al. Gene duplication and exon shuffling by helitron-like trans-
posons generate intraspecies diversity in maize. Nat. Genet. 37,997–1002
(2005).
20. Britten, R. Transposable elements have contributed to thousands of human proteins.
Proc. Natl. Acad. Sci. USA 103, 1798–1803 (2006).
21. Lipatov, M., Lenkov, K., Petrov, D. & Bergman, C. Paucity of chimeric gene-transpo-
sable element transcripts in the Drosophila melanogaster genome. BMC Biol. 3,24
(2005).
22. Cheng, J. et al. Transcriptional maps of 10 human chromosomes at 5-nucleotide
resolution. Science 308, 1149–1154 (2005).
© 2007 Nature Publishing Group http://www.nature.com/naturebiotechnolog
y
NATURE BI OTECHNOLOGY ADVANCE ON LINE PUBLICATION 5
LETTERS