ArticlePDF Available

An expression atlas of rice mRNAs and small RNAs

Authors:

Abstract and Figures

Identification of all expressed transcripts in a sequenced genome is essential both for genome analysis and for realization of the goals of systems biology. We used the transcriptional profiling technology called 'massively parallel signature sequencing' to develop a comprehensive expression atlas of rice (Oryza sativa cv Nipponbare). We sequenced 46,971,553 mRNA transcripts from 22 libraries, and 2,953,855 small RNAs from 3 libraries. The data demonstrate widespread transcription throughout the genome, including sense expression of at least 25,500 annotated genes and antisense expression of nearly 9,000 annotated genes. An additional set of approximately 15,000 mRNA signatures mapped to unannotated genomic regions. The majority of the small RNA data represented lower abundance short interfering RNAs that match repetitive sequences, intergenic regions and genes. Among these, numerous clusters of highly regulated small RNAs were readily observed. We developed a genome browser (http://mpss.udel.edu/rice) for public access to the transcriptional profiling data for this important crop.
Deep sequencing of rice small RNAs by MPSS.(a) Match frequencies of small RNAs in rice versus those of A. thaliana. We determined the number of genomic matches ('hits') to their respective genomes for 149,978 and 56,920 distinct small RNA MPSS signatures from rice and A. thaliana inflorescence libraries. The A. thaliana data have been described elsewhere7. (b) The number of small RNAs matching individual protein-coding genes, transposable elements and intergenic regions. (c) Distributions of small RNAs, mRNA expression and genes or repeats across rice chromosome 8, plotted as moving averages of five adjacent bins of 100 kb. The light-green, pink, and light-blue lines (top) are small RNAs in inflorescence, stem and young leaf, respectively. mRNA levels in the same tissues are shown in green, red and blue (middle). Black and gray lines (bottom) are densities of genes and repeats. Blue vertical shading indicates the approximate position of the centromere. (d) Abundance values for small RNA signatures matching to the hairpin of the 171 genome-mapped, known rice miRNAs. The y-axis indicates the number of distinct signatures in each abundance class. (e) Small RNAs map to the terminal inverted repeats (TIRs, black arrowheads) of pack-MULE elements described elsewhere14, 15. Yellow shading indicates transposon-like sequences; orange indicates inverted repeats; black triangles are small RNAs; red and blue boxes are annotated exons. Additional features are as described in Supplementary online.
… 
Content may be subject to copyright.
An expression atlas of rice mRNAs and small RNAs
Kan Nobuta
1,2
, R C Venu
4
, Cheng Lu
1,2
, Andre
´
Be
´
lo
2
, Kalyan Vemaraju
1
, Karthik Kulkarni
1
, Wenzhong Wang
1
,
Manoj Pillay
1
, Pamela J Green
1–3
, Guo-liang Wang
4
& Blake C Meyers
1,2
Identification of all expressed transcripts in a sequenced
genome is essential both for genome analysis and for
realization of the goals of systems biology. We used the
transcriptional profiling technology called ‘massively parallel
signature sequencing’ to develop a comprehensive expression
atlas of rice (Oryza sativa cv Nipponbare). We sequenced
46,971,553 mRNA transcripts from 22 libraries, and
2,953,855 small RNAs from 3 libraries. The data demonstrate
widespread transcription throughout the genome, including
sense expression of at least 25,500 annotated genes and
antisense expression of nearly 9,000 annotated genes. An
additional set of B15,000 mRNA signatures mapped to
unannotated genomic regions. The majority of the small RNA
data represented lower abundance short interfering RNAs that
match repetitive sequences, intergenic regions and genes.
Among these, numerous clusters of highly regulated small
RNAs were readily observed. We developed a genome
browser (http://mpss.udel.edu/rice) for public access to
the transcriptional profiling data for this important crop.
Because of its scientific, economic and cultural importance, the
sequencing of the rice (Oryza sativa ssp. japonica cv Nipponbare)
genome
1
represents a milestone in plant biology. The recent annota-
tion of the rice genome (The Institute for Genomic Research, TIGR
version 4.0) includes 55,890 features that represent 42,653 predicted
protein-coding genes and 13,237 transposable elements
2
.Experimen-
tal evidence from full-length cDNA and expressed sequence tags
(ESTs) is critical for genome analysis
3
, yet these data are incomplete,
in that they are subsaturating and miss nonpolyadenylated transcripts
such as small RNAs. Small RNAs of 21–24 nucleotides are well
characterized in Arabidopsis thaliana, but little is known about the
diversity of these molecules in other plant species. Several categories
are known, including short interfering RNAs (siRNAs) and micro-
RNAs (miRNAs), both of which silence genes by targeting comple-
mentary mRNAs for degradation
4
. siRNAs can also trigger
transcriptional silencing by guiding nuclear complexes that target
either histone modifications or DNA methylation or both
5
.Small
RNAs are best discovered and measured by deep sequencing
approaches that have high sensitivity and specificity.
One advantage of using a signature-based expression profiling
method such as massively parallel signature sequencing (MPSS) to
improve genome annotation is its potential to characterize previously
unknown transcripts. Transcriptional analyses of the A. thaliana
genome with MPSS have identified extensive alternative polyadenyla-
tion, as well as large numbers of natural antisense transcripts, novel
transcripts from unannotated genomic regions, noncoding RNAs and
small RNAs
6,7
. Sequence-based data have very high specificity and
accuracy for assessing gene activity, because they are not subject to
cross-hybridization. In addition to enhancing genome annotation,
MPSS data provide quantitative expression information
6
.
To clarify the complexity of polyadenylated transcripts in rice, we
sequenced 22 mRNA libraries using MPSS (Ta ble 1, Supplementary
Tabl e 1 and Supplementary Data online). These libraries, from
12 diverse untreated tissues (with some replicates) and six abiotic
stress treatments, included 46,971,553 transcripts. The nonredundant
set comprised 249,990 distinct sequences. The expression values of
these signatures ranged over four orders of magnitude, with the
majority found in the range of 1 to 100 transcripts per million
(TPM) (Supplementary Table 1 online). Filtering to capture the
most ‘reliable’ signatures removed most erroneous sequences
6
, leaving
a set of 46,251,966 signatures (Tabl e 1 and Supplementary Fig. 1
online); these comprised 121,581 distinct signatures. This represents
the deepest reported set of plant transcriptional data. This sampling
depth is high enough that new transcripts are infrequently
discovered (Fig. 1a).
Signatures matching only once in the genome (hits ¼ 1) provide a
conservative assessment of transcript diversity and active genes,
whereas including duplicated signatures (hits 40) provides an
upper boundary (Supplementary Table 2 online). The majority of
the signatures (B75%) in the libraries matched sense-strand tran-
scripts of nearly half of the annotated rice genes (20,821 of 42,653,
© 2007 Nature Publishing Group http://www.nature.com/naturebiotechnolog
y
Table 1 Summary statistics for rice MPSS libraries
Category mRNA total Small RNA total
Libraries 22 3
Signatures sequenced
a
46,971,553 2,953,855
Distinct signatures
b
249,990 284,301
Distinct genome-matched signatures
c
81,961 221,592
Abundance of genome-matched signatures
d
46,251,966 1,948,368
a
All sequencing reactions combined for each type of library.
b
The nonredundant set of either
the complete set of MPSS signatures or those that match to the genome.
c
For the genome-
matched mRNA signatures, only those that passed the reliability filter are included.
d
The sum
of the observed frequency for all distinct, genome-matched signatures.
Received 26 October 2006; accepted 25 January 2007; published online 11 March 2007; doi:10.1038/nbt1291
1
Delaware Biotechnology Institute,
2
Department of Plant and Soil Sciences,
3
College of Marine and Earth Studies, University of Delaware, Newark, Delaware 19711.
4
Department of Plant Pathology, The Ohio State University, Columbus, OH 43210. Correspondence should be addressed to B.C.M. (meyers@dbi.udel.edu).
NATURE BI OTECHNOLOGY ADVANCE ONLINE PUBLICATION 1
LETTERS
Fig. 1b and Supplementary Table 2a online). This number is
comparable to the number of gene annotations supported by EST or
full-length cDNA data (21,328) (Fig. 1b). Of genes with MPSS
support, 84% are supported by EST or full-length cDNA data, and
only 18% of those lacking MPSS support have ESTs or full-length
cDNAs (Fig. 1b). Because the ESTs or full-length cDNAs are derived
from 460 libraries (versus our 22 libraries), we concluded that
substantial sampling depth may substitute for tissue or treatment
diversity. Among the 20,821 genes with MPSS support, 87% had high
homology to A. thaliana genes, whereas only 39% of the genes without
MPSS support fall into this category (Fig. 1b). Because numerous
MPSS-supported low-homology genes (1,873) match ESTs or full-
length cDNAs, many validated rice genes apparently lack orthologs in
A. thaliana.InA. thaliana, 17 MPSS libraries confirmed expression of
21,193 genes. By comparison to our rice data, these data suggest that
the two plant genomes have similar transcriptional complexities,
although the respective genome size and gene number of rice are
3 times and 1.5 times those of A. thaliana.
We expected that similar tissues and treatments would share sets of
expressed genes. Hierarchical clustering of the libraries was performed
using the 20,821 genes with MPSS support, and the pollen library was
an outgroup compared to all other libraries (Supplementary Fig. 2
online). Only one-third of the pollen-specific genes had A. thaliana
orthologs. The unique transcriptional pattern of the pollen library
may result from its haploid nature; the number of library-specific
genes was the highest in pollen (392), including an unusually large
number of transposable elements (Supplementary Fig. 3 online).
To examine the association between chromosomal architecture and
transcriptional activity, we compared gene density and expression
levels for each chromosome (Supplementary Fig. 4 online). Expres-
sion activity was observed around the gene-dense regions, and the
MPSS data are consistent with studies of centromeres 3, 4 and 8
indicating that genes in these regions are active
8–10
. Some rice centro-
meres, like Cen3, may be evolving from genic regions to repeat-based
mature centromeres
9
.
Genic positions of MPSS signatures identify transcripts resulting
from alternative polyadenylation and 3¢-splicing, as well as those
which are antisense transcripts. Among the 20,821 expressed rice
genes, more than half (11,941) had multiple sense signatures that
represent alternative transcripts. These transcripts can show marked
differences in their expression levels (Fig. 1c). This type of analysis
provides a novel view of complex gene expression events that may be
important in better understanding gene function. We also identified
11,001 antisense signatures corresponding to 8,023 annotated genes
(Supplementary Table 2a online). Some natural antisense transcripts
are coexpressed and induce alternative splicing, some induce dsRNA
cleavage and show reciprocal expression patterns
11
, and some even
generate regulatory small RNAs
12
. MPSS analysis identified natural
antisense transcripts for many rice genes, often with highly specific
expression patterns. A comparison of sense and antisense expression
levels demonstrates the difficulty of generalizing the function
of natural antisense transcripts (Supplementary Fig. 5 online),
which may require gene-by-gene analyses to discern their complex
regulatory mechanism.
The 22 libraries included 13,461 intergenic signatures (Supplemen-
tary Table 2a online) that could be (i) misannotated 3¢-untranslated
regions of known genes, (ii) novel genes or (iii) noncoding RNAs
including miRNAs. Sixty-three intergenic mRNA signatures matched
30 known miRNA genes (precursors) and 68 noncoding RNA genes
from the registry
13
(Supplementary Table 3 online). Consistent
with previous reports
7
, many noncoding RNAs were expressed in
the stigma-ovary library (Supplementary Table 3b online). Like
© 2007 Nature Publishing Group http://www.nature.com/naturebiotechnolog
y
Figure 1 Genome-wide transcriptional analysis by
mRNA MPSS. (a) Discovery rates for new rice
transcripts based on the 46,971,553 mRNA
MPSS signatures, resampled 1,000 times with
replacement. Each signature was weighted by
abundance. The gold line indicates 249,990
distinct, unfiltered signatures, whereas the green
line indicates 121,581 reliable signatures. Each
slope was calculated at the endpoint, indicating
the final discovery rate for new transcripts after
sampling 50 million transcripts. Sampling of
additional tissues or treatments may increase this
rate. (b) mRNA MPSS signatures representing
sense-strand transcripts (classes 1, 2, 5 and 7),
compared to annotated rice genes and excluding
transposons and small genes (o50 amino acids).
Most genes with support by MPSS have
additional support from full-length cDNA data
and high similarity to A. thaliana genes. Total
gene numbers are indicated in the center (white
sections), and genes without cDNA support are
indicated by the blue sections labeled ‘‘others.’
Numbers in parentheses are potential pack-
MULEs found in each category. HH and LH, high
and low homology to A. thaliana, respectively. Using signatures with hits 4 1 slightly increases MPSS supported genes (Supplementary Fig. 14 online).
(c) Each row in the heatmap represents a different gene preselected for evidence of a substantial number of alternative transcripts. Yellow and blue are
upregulated and downregulated transcripts in salt-treated (NSL) versus untreated young leaf (NYL), respectively. Green represents no difference in
expression. The ‘‘All’’ column is the NSL/NYL ratio (the expression ratio of all salt-treated to all young leaf transcripts), without considering alternatively
terminated transcripts. Column 1 is the expression level of the 3¢-most MPSS signature, the longest transcript, whereas column 5 is the shortest
transcript, and 5¢-most MPSS signature. Example genes are shown to the right, with the different transcripts and expression in the two libraries
indicated. Red and pink boxes represent coding regions and untranslated regions, respectively, and the black triangles indicate the five MPSS signatures
that correspond to the genes.
300,000
200,000
100,000
0 1020304050
Slope = 487 / 1,000,000
Slope = 34 / 1,000,000
Number of distinct signatures
All signatures
Reliable signatures
Total number of signatures sampled
(in millions)
HH
39%
LH
61%
Others
17,886
(1,358)
FL-cDNA
EST
3,946
(0)
21,832
genes
(1,358)
20,821
genes
(78)
FL-cDNA
EST
17,382
(19)
Others
3,439
(59)
Absent by MPSS Present by MPSS
HH
87%
LH
13%
All54321
Os04g32650
5
6
7
8
9
5
43
2
1
NYL
37
00
00
TPM
NSL
0
00
063
TPM
Os09g35990
NYL
NSL
0
0000
0 29 0 0 TPM
29 TPM
1 234
43215
Os10g37190
1 2
4321
5
NYL 0 0 500 TPM0
NSL 0
0 0 0 TPM13
Os03g18910
4321
5
1
NYL 0
38 0
TPM0
NSL 0
0
0
0
TPM
31
0
0
a
b
c
2 ADVANCE ONLINE PUBLICATION NATURE BIOTECHNOLOGY
LETTERS
‘housekeeping’ genes, many intergenic signatures were expressed in all
22 libraries and were supported by EST or full-length cDNA data.
Clearly there are a substantial number of unidentified genes in
rice. Some of these are expressed across a range of tissues and
developmental stages, whereas others may encode small peptides or
generate uncharacterized, regulatory small RNAs such as miRNAs
or siRNAs. However, intergenic transcripts were less broadly
expressed, with each intergenic transcript expressed in an average of
6.65 libraries versus 14.42 for gene-associated signatures, and
much more weakly expressed than genic transcripts (Supplementary
Fig. 4 online). This low expression contrasts with the prevailing
abundance of sense signatures and suggests that these intergenic
transcripts may have distinct characteristics unlike those of normal,
protein-coding genes.
To investigate the complexity of small RNAs in rice, we next
generated MPSS small RNA libraries of rice inflorescence, stem, and
seedling leaf. Of the 284,301 distinct small RNAs, 78% matched the
rice genome, representing 1,948,368 of 2,953,855 total sequences
(Ta bl e 1). Like A. thaliana
7
, the inflorescence library was proportion-
ally more complex, perhaps reflecting stronger germline silencing
(Supplementary Table 4 online). More than half of the distinct
signatures matched unique sites in the genome, but there were
many more highly duplicated signatures in the rice genome than
were observed in A. thaliana (Fig. 2a). Unlike small RNAs from
A. thaliana
7
, those from rice are not as concentrated across the
pericentromeric regions and are instead more widely distributed on
the chromosomes (Supplementary Fig. 6 online). The two completely
sequenced rice centromeres are characterized by small RNAs corre-
sponding to the centromeric repeats (Supplementary Fig. 7 online).
Because the distributions of A. thaliana small RNAs are strongly
correlated with those of repetitive sequences, we examined the
relationship between rice small RNAs, genes and repeats. Rice small
RNAs were strongly associated with intergenic regions (Supplemen-
tary Fig. 8 online), and some chromosomes demonstrated a pericen-
tromeric concentration when we examined the small RNA abundance
in addition to the distribution (Fig. 2 and Supplementary Fig. 9
online). Most chromosomal regions in rice show a range of patterns of
small RNAs, consistent with a dispersed arrangement of genes,
transposons and miniature inverted repeat transposable elements
(Supplementary Fig. 10 online). Notably, abrupt transitions between
concentrated siRNA clusters and active genes were often observed
(Supplementary Fig. 11 online), suggesting a localized shift from
silenced to active chromatin within a very short physical distance. This
type of abrupt transition is supported by methylation profiling studies
that can identify heterochromatin via biochemical (rather than
cytological) means
14,15
. These studies have indicated a strong coloca-
lization of siRNA clusters and DNA methylation, even within cytolo-
gically defined euchromatin. Many rice small RNAs were derived from
transposons or retrotransposons, but many also matched unannotated
intergenic regions (Supplementary Table 5 online). At least 12,766 of
the 13,237 annotated retrotransposon or transposon-related sequences
in the rice genome had matches to small RNAs.
© 2007 Nature Publishing Group http://www.nature.com/naturebiotechnolog
y
5
160
140
120
100
80
60
40
20
0
1–25 TPQ
26–100 TPQ
>100 TPQ
Number of signatures matching to
annotated hairpin of the pre-miRNA
FLR SNU STM
Libraries
1
TIR
Os01g20930
TIR
Os02g49580
TIR
TIR
2
5
Os10g37080
TIR
TIR
5
5
5
5
1
1
2
3
2,000
0
70,000
0
2,000
0
10 Mb 20 Mb 30 Mb
Small RNA
(TPQ)
mRNA
(TPM)
Genes and
repeats (bp)
Chromosome 8
c
de
100,000
10,000
1,000
100
10
0
Number of distinct signatures
Rice
Arabidopsis
1
2–10
11–100
101–1,000
>1,000
Number of hits to genome
10,000
1,000
100
10
0
Count of gene, TEs or IGRs
10 100
Number of distinct signatures
a
b
Protein-coding gene
Transposable
element
Intergenic region
Figure 2 Deep sequencing of rice small RNAs by MPSS. (a) Match frequencies of small RNAs in rice versus those of A. thaliana. We determined the
number of genomic matches (‘hits’) to their respective genomes for 149,978 and 56,920 distinct small RNA MPSS signatures from rice and A. thaliana
inflorescence libraries. The A. thaliana data have been described elsewhere
7
.(b) The number of small RNAs matching individual protein-coding genes,
transposable elements and intergenic regions. (c) Distributions of small RNAs, mRNA expression and genes or repeats across rice chromosome 8, plotted
as moving averages of five adjacent bins of 100 kb. The light-green, pink, and light-blue lines (top) are small RNAs in inflorescence, stem and young leaf,
respectively. mRNA levels in the same tissues are shown in green, red and blue (middle). Black and gray lines (bottom) are densities of genes and repeats.
Blue vertical shading indicates the approximate position of the centromere. (d) Abundance values for small RNA signatures matching to the hairpin of the
171 genome-mapped, known rice miRNAs. The y-axis indicates the number of distinct signatures in each abundance class. (e) Small RNAs map to the
terminal inverted repeats (TIRs, black arrowheads) of pack-MULE elements described elsewhere
14,15
. Yellow shading indicates transposon-like sequences;
orange indicates inverted repeats; black triangles are small RNAs; red and blue boxes are annotated exons. Additional features are as described in
Supplementary Figure 7 online.
NATURE BI OTECHNOLOGY ADVANCE ON LINE PUBLICATION 3
LETTERS
As we have demonstrated elsewhere
6,7
, repetitive sources of siRNAs
produce numerous distributed small RNAs, whereas miRNAs pro-
duced small focused clusters of specific sequences. Rice genes matched
distinct small RNAs at a rate at least as high as repeats and intergenic
regions (Fig. 2b and Supplementary Fig. 8 online), often resulting
from miniature inverted repeat transposable elements or other small
repeats embedded within an intron. As in A. thaliana, tandem repeats
and inverted repeats are rich sources of small RNAs (Supplementary
Fig. 12 online). From all three libraries, there were many clusters of
small RNAs, particularly in intergenic, unannotated regions of the
rice genome (Supplementary Table 6 online). This indicates that the
rice genome is much richer in silenced sequences compared with
A. thaliana, consistent with the higher degree of repetitive DNA.
Because small RNAs may also interact with imperfectly matched
targets, their biological effect may be far more substantial than we
have indicated.
The miRNA registry includes 182 rice miRNAs
13
, of which 171 were
mapped in the genome and 130 were expressed, accounting for 8.8%,
36.6% and 12.6% of the inflorescence, seedling and stem small RNAs.
These percentages are much lower than those for A. thaliana,whichis
consistent with a substantial abundance of repeat-associated siRNAs in
the more complex rice genome
6,7
. The lack of small RNA biogenesis
mutants in rice hinders our ability to distinguish siRNAs and
miRNAs. However, miRNAs are abundant, consistently expressed
and conserved, and we identified numerous conserved and consis-
tently expressed small RNAs (Supplementary Fig. 13 online), suggest-
ing that these data include many novel miRNAs.
With the three libraries, we examined the developmental regulation
of rice small RNAs. The chromosomal distributions of small RNAs
suggested a high degree of similarity between young leaf and the stem
small RNAs (Fig. 2c and Supplementary Fig. 9 online), both of which
differed in comparison to the inflorescence data. However, the stem
library produced the greatest number of small RNA clusters or genes
that were substantially much more abundant in only one of the three
libraries (Fig. 2d and Supplementary Table 7 online), suggesting an
unusual degree of small RNA regulation in the stem. This is the
first report of small RNAs from plant stems, and it is possible that
some of these small RNAs are involved in signal transmission between
leaves and inflorescences
16
. In contrast to A. thaliana
6,7
, many small
RNA clusters were substantially reduced in the inflorescence as
compared with amounts in seedlings and the stem (Supplementary
Tabl e 7 online).
The rice genome contains many gene fragments (‘pack-MULEs’)
generated by transposons
17
. Our mRNA MPSS data identified 17,886
genes without expression data (Fig. 1b), 490% of which have no
known function, suggesting that many of these are inactive or are
pseudogenes. We compared the mRNA and small RNA MPSS data to
predicted rice pack-MULEs
17
, using 8,271 previously identified
MULEs
18
. This identified 1,358 potential pack-MULEs among the
17,886 unexpressed genes. These elements represent just one of several
classes of gene-shuffling transposable elements
19–21
, so other inactive
genes are likely to be transposed fragments. Small RNAs matched to
many of the terminal inverted repeats of these pack-MULEs but
infrequently to the internal gene fragments (Fig. 2e). The combined
mRNA and small RNA datasets may offer an experimentally based
system for pack-MULE identification in rice and other plant genomes.
Taking into account annotated, expressed genes as well as small
RNAs, a very high proportion of the rice genome is actively tran-
scribed. Although there are thousands of annotated genes lacking both
mRNA and small RNA expression data, detection of their expression
may require sampling of highly specialized tissues, cell types or
treatments. The rice small RNA data indicate an extensive and
complex repertoire of such molecules. This vastly exceeds that of
A. thaliana, consistent with increased genome sizes correlating with an
increased complexity of small RNAs. This suggests that larger plant
genomes, such as those of most crops, will require deeper sequencing.
Additional complexity may be found in analyses of nonpolyadenylated
transcripts
22
. A comprehensive understanding of the network of gene
expression events in rice or other crops will require concerted efforts
such as ours to characterize the activities and functions of a compre-
hensive catalog of genomic components.
METHODS
High- and low-homology rice genes. The high- and low-homology rice genes
were identified according to their similarity to A. thaliana genes using a
threshold of a BLASTP e-value o1.0e-7.
MULE analysis. The rice MULE data have been previously described
18
and
were downloaded from http://www.genome.org/. Because the International
Rice Genome Sequencing Project annotation version 2.0 was used for their
analyses, we remapped all the MULEs onto TIGR4.0. After mapping these
sequences, potential pack-MULEs (genes flanked on both sides by MULEs)
with or without MPSS signatures were identified. In addition, we also used
these sequences to identify pack-MULEs associated with the intergenic MPSS
signatures, because these could be pack-MULEs, which correspond to the
unannotated transcripts.
Analysis of alternative termination. As an example of the differential expres-
sion of alternative transcripts, we focused on two libraries (NYL and NSL) to
examine the effect of salt stress. The genes with alternative termination sites
were identified from the MPSS data (multiple sense-strand MPSS signatures
associated with a single gene), and from this set, those genes were selected that
had at least two sense signatures demonstrating tenfold higher levels of
expression in one library than the other. For each gene and for each library,
the expression levels and the sum of the expression levels were recorded for five
MPSS signatures located at 3¢ end of each gene. The expression level of the NSL
signatures was divided by that of the corresponding NYL signatures, with the
resulting values log-transformed and loaded into R to generate a heatmap.
Additional methods. Detailed methods are available in Supplementary
Methods online.
Accession codes. Gene Expression Omnibus (GEO): series identifier GSE7107,
platform identifiers GPL3777 and GPL3776 for mRNA and small RNA samples,
respectively; sample identifiers, GSM169562, GSM169564, GSM169566,
GSM169567, GSM169568, GSM169569, GSM169570, GSM170900,
GSM170901, GSM170902, GSM170903, GSM170904, GSM170905,
GSM170906, GSM170907, GSM170909, GSM170912, GSM170912,
GSM170914, GSM170917, GSM170919 and GSM170921. The raw and normal-
ized MPSS data are also available at http://mpss.udel.edu/rice and this website
allows users to query these data based on physical location, gene identifiers or
by sequence.
Note: Supplementary information is available on the Nature Biotechnology website.
ACKNOWLEDGMENTS
We are grateful to C. Haudenschild, TIGR’s rice annotation project, S. Singh Tej,
M. Nakano, R. German, A. Hetawal, R. Gupta and S. Kaushik. This work was
supported by US National Science Foundation awards 0321437 (B.C.M. and
G.-l.W.) and 0439186 (P.J.G. and B.C.M.), and US Department of Agriculture
2005-35064-15326 (B.C.M. and P.J.G.).
AUTHOR CONTRIBUTIONS
K.N. performed research, analyzed data and wrote the manuscript, R.C.V. and
C.L. performed laboratory research and provided useful discussions, A.B., K.V.,
K.K., W.W. and M.P. performed computational research; P.J.G. and G.-l.W.
designed research and wrote manuscript; B.C.M. designed research, analyzed
data, and coordinated and wrote the manuscript.
© 2007 Nature Publishing Group http://www.nature.com/naturebiotechnolog
y
4 ADVANCE ONLINE PU BLICATION NATURE BIOTECHNOLOGY
LETTERS
COMPETING INTERESTS STATEMENT
The authors declare no competing financial interests.
Published online at http://www.nature.com/naturebiotechnology
Reprints and permissions information is available online at http://npg.nature.com/
reprintsandpermissions
1. International Rice Genome Sequencing Project. The map-based sequence of the rice
genome Nature 436, 793–800 (2005).
2. Yuan, Q. et al. The Institute for Genomic Research Osa1 rice genome annotation
database. Plant Physiol. 138, 18–26 (2005).
3. Kikuchi, S. et al. Collection, mapping, and annotation of over 28,000 cDNA clones
from japonica rice. Science 301, 376–379 (2003).
4. Bartel, D.P. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116,
281–297 (2004).
5. Verdel, A. et al. RNAi-mediated targeting of heterochromatin by the RITS complex.
Science 303, 672–676 (2004).
6. Meyers, B.C. et al. Analysis of the transcriptional complexity of Arabidopsis thaliana by
massively parallel signature sequencing. Nat. Biotechnol. 22, 1006–1011 (2004).
7. Lu, C. et al. Elucidation of the small RNA component of the transcriptome. Science
309, 1567–1569 (2005).
8. Cheng, Z. et al. Functional rice centromeres are marked by a satellite repeat and a
centromere-specific retrotransposon. Plant Cell 14, 1691–1704 (2002).
9. Yan, H. et al. Genomic and genetic characterization of rice cen3 reveals extensive
transcription and evolutionary implications of complex centromere. Plant Cell 18,
3227–3238 (2006).
10. Yan, H. et al. Transcription and histone modifications in the recombination-free region
spanning a rice centromere. Plant Cell 17, 3227–3238 (2005).
11. Jen, C.H., Michalopoulos, I., Westhead, D. & Meyer, P. Natural antisense transcripts
with coding capacity in Arabidopsis may have a regulatory role that is not linked to
double-stranded RNA degradation. Genome Biol. 6, R51 (2005).
12. Borsani, O., Zhu, J., Verslues, P.E., Sunkar, R. & Zhu, J.K. Endogenous siRNAs derived
from a pair of natural cis-antisense transcripts regulate salt tolerance in Arabidopsis.
Cell 123, 1279–1291 (2005).
13. Griffiths-Jones, S. The microRNA Registry. Nucleic Acids Res. 32, D109–D111
(2004).
14. Lippman, Z. et al. Role of transposable elements in heterochromatin and epigenetic
control. Nature 430, 471–476 (2004).
15. Zhang, X. et al. Genome-wide high-resolution mapping and functional analysis of DNA
methylation in Arabidopsis. Cell 126, 1189–1201 (2006).
16. Yoo, B.-C. et al. A systemic small RNA signaling system in plants. Plant Cell 16,
1979–2000 (2004).
17. Jiang, N., Bao, Z., Zhang, X., Eddy, S.R. & Wessler, S.R. Pack-MULE transposable
elements mediate gene evolution in plants. Nature 431, 569–573 (2004).
18. Juretic, N., Hoen, D.R., Huynh, M.L., Harrison, P.M. & Bureau, T.E. The evolutionary
fate of MULE-mediated duplications of host gene fragments in rice. Genome Res. 15,
1292–1297 (2005).
19. Morgante, M. et al. Gene duplication and exon shuffling by helitron-like trans-
posons generate intraspecies diversity in maize. Nat. Genet. 37,9971002
(2005).
20. Britten, R. Transposable elements have contributed to thousands of human proteins.
Proc. Natl. Acad. Sci. USA 103, 1798–1803 (2006).
21. Lipatov, M., Lenkov, K., Petrov, D. & Bergman, C. Paucity of chimeric gene-transpo-
sable element transcripts in the Drosophila melanogaster genome. BMC Biol. 3,24
(2005).
22. Cheng, J. et al. Transcriptional maps of 10 human chromosomes at 5-nucleotide
resolution. Science 308, 1149–1154 (2005).
© 2007 Nature Publishing Group http://www.nature.com/naturebiotechnolog
y
NATURE BI OTECHNOLOGY ADVANCE ON LINE PUBLICATION 5
LETTERS
... The characterization of dynamic variations in the transcriptome during seed development has helped to identify the underlying gene regulation. The extensive genomic resources unveiled in the recent past, including: the release of gold-standard genomic sequence [8], broader scale reference genomes of several sub-species, and large scale sequencing resources [9], including resequencing of more than 3000 genomes [10], have helped to define the genome diversity in rice. Recent advances in high throughput tools, such as array-based gene profiling technologies [11][12][13][14], RNA-sequencing, and DNA bisulfite sequencing techniques [15,16], have helped to unravel multi-regulatory targets covering the entire seed development process. ...
Article
Rice is one of the most essential crops since it meets the calorific needs of 3 billion people around the world. Rice seed development initiates upon fertilization, leading to the establishment of two distinct filial tissues, the endosperm and embryo, which accumulate distinct seed storage products, such as starch, storage proteins, and lipids. A range of systems biology tools deployed in dissecting the spatiotemporal dynamics of transcriptome data, methylation, and small RNA based regulation operative during seed development, influencing the accumulation of storage products was reviewed. Studies of other model systems are also considered due to the limited information on the rice transcriptome. This review highlights key genes identified through a holistic view of systems biology targeted to modify biochemical composition and influence rice grain quality and nutritional value with the target of improving rice as a functional food.
... Hierarchical clustering of P. equestris samples showed the same pattern (Fig. 1B). Sample clusters were formed by floral organs, leaf parts, meristems and young leaves, inflorescence internode and root; young and mature anthers were the most distant from the other samples, similar to A. thaliana, rice, and maize (Nobuta et al., 2007;Wang et al., 2010;Stelpflug et al., 2016;Klepikova et al., 2016). The distances between samples on the clustering tree were closer than those in the other species we observed (Klepikova et al., 2016;Penin et al., 2019), which can be explained by the lack of older tissues in the P. equestris transcriptome atlas. ...
Article
Full-text available
The vast diversity of Orchidaceae together with sophisticated adaptations to pollinators and other unique features make this family an attractive model for evolutionary and functional studies. The sequenced genome of Phalaenopsis equestris facilitates Orchidaceae research. Here, we present an RNA-seq-based transcriptome map of P. equestris that covers 19 organs of the plant, including leaves, roots, floral organs and the shoot apical meristem. We demonstrated the high quality of the data and showed the similarity of the P. equestris transcriptome map with the gene expression atlases of other plants. The transcriptome map can be easily accessed through our database Transcriptome Variation Analysis (TraVA) for visualizing gene expression profiles. As an example of the application, we analyzed the expression of Phalaenopsis “orphan” genes–those that do not have recognizable similarity with the genes of other plants. We found that approximately half of these genes were not expressed; the ones that were expressed were predominantly expressed in reproductive structures.
... The initiation of TE silencing occurs via the following two pathways, such as homology-dependent/identity-based and homology-independent/expression-based initiation of silencing ( Figure 5; Fultz and Slotkin, 2017). In the upstream phase of a homology-dependent pathway, polymerase IV, RDR2, and DCL3 make a complex to produce 24-nt-long siRNAs from TEs associated with H3K9me (Nobuta et al., 2007;Huang et al., 2013;Law et al., 2013). In the downstream phase, a 24-nt-long siRNA molecule along with AGO4 or AGO6 protein interacts with polymerase V scaffold transcript resulting in transcriptional silencing of homologous TEs by the methylation of both DNA and H3K9me (Teixeira et al., 2009;Ito et al., 2011;Wierzbicki, 2012). ...
Article
Full-text available
Gene silencing is a negative feedback mechanism that regulates gene expression to define cell fate and also regulates metabolism and gene expression throughout the life of an organism. In plants, gene silencing occurs via transcriptional gene silencing (TGS) and post-transcriptional gene silencing (PTGS). TGS obscures transcription via the methylation of 5′ untranslated region (5′UTR), whereas PTGS causes the methylation of a coding region to result in transcript degradation. In this review, we summarized the history and molecular mechanisms of gene silencing and underlined its specific role in plant growth and crop production.
... The goal of systems biology is to discover new emergent properties to understand better the entirety of processes that happen in a biological system. To enhance systems biology and better inform breeding decisions, gene expression atlas (Nobuta 2007;Pazhamala et al. 2017;Kudapa et al. 2018;Shinozaki et al. 2018;Hoopes et al. 2019;Sinha et al. 2020b), and maps based on epigenome Junaid et al. 2018;Peng et al. 2019;Sinha et al. 2020c), proteome (Barua et al. 2019;Duncan et al. 2017;Jiang et al. 2019), and metabolome (Okazaki and Saito 2016;Chen et al. 2018) have been developed in many crops in addition to the existing saturated genome maps. The systems biology approach should be specifically targeted to understand the molecular mechanism of complex traits related to climate resilience such as drought tolerance (Miao et al. 2017) as improving these traits will require deep knowledge at the systems level (Pazhamala et al. 2021). ...
Article
Full-text available
Key message Integrating genomics technologies and breeding methods to tweak core parameters of the breeder’s equation could accelerate delivery of climate-resilient and nutrient rich crops for future food security. Abstract Accelerating genetic gain in crop improvement programs with respect to climate resilience and nutrition traits, and the realization of the improved gain in farmers’ fields require integration of several approaches. This article focuses on innovative approaches to address core components of the breeder’s equation. A prerequisite to enhancing genetic variance ( σ 2 g ) is the identification or creation of favorable alleles/haplotypes and their deployment for improving key traits. Novel alleles for new and existing target traits need to be accessed and added to the breeding population while maintaining genetic diversity. Selection intensity ( i ) in the breeding program can be improved by testing a larger population size, enabled by the statistical designs with minimal replications and high-throughput phenotyping. Selection priorities and criteria to select appropriate portion of the population too assume an important role. The most important component of breeder′s equation is heritability ( h 2 ). Heritability estimates depend on several factors including the size and the type of population and the statistical methods. The present article starts with a brief discussion on the potential ways to enhance σ 2 g in the population. We highlight statistical methods and experimental designs that could improve trait heritability estimation. We also offer a perspective on reducing the breeding cycle time ( t ), which could be achieved through the selection of appropriate parents, optimizing the breeding scheme, rapid fixation of target alleles, and combining speed breeding with breeding programs to optimize trials for release. Finally, we summarize knowledge from multiple disciplines for enhancing genetic gains for climate resilience and nutritional traits.
Chapter
An increasing number of crop genomic resources, with novel technical achievements in genome analytics have led to dramatic changes in the landscape of agricultural research. This has improved our capacity to meet global challenges around food production and must be understood to better serve the needs of the human population. In this chapter, we provide a comprehensive review of historical changes in technologies which allow for improved plant genotyping, molecular marker discovery, and decoding of the plant genome. Further, we explore resources and databases available for multi-omics analysis and finally conclude with a discussion of translational genomics considerations. Ultimately, this chapter will serve as a tool for bioinformaticians and researchers to explore the deeply significant field of crop genomics.
Chapter
Breeding has played a significant role in the evolution of human civilizations began with the domestication of plant and animal species estimated to date back 10,000–15,000 years ago. It provides sustainability to more than 6 billion world populations. Over the past 100 years, there is a drastic variation in the landscape for plant breeding due to uncontrolled population growth, demolition of agricultural land areas, and changing environmental conditions. Thus, it imposes a tremendous challenge on the researchers to improve the production and productivity of crops. The advent of novel genomics methods including NGS (Next-Generation Sequencing) and breeding tools has massively changed traditional breeding into next-generation breeding. Genome editing is a promising technique to alter specific genes to improve trait expression. Integrating computational tools with next-generation breeding technologies can speed up the breeding process and increase the genetic gains under different production systems. This chapter emphasizes the significance of next-generation sequencing-derived information (big data) and their analysis by omics tools to revolutionize crop improvement.
Article
Full-text available
Rice is pivotal pyramid of about half of the world population. Bearing small genome size and worldwide utmost food crop rice has been known as ideal cereal crop for genome research. Currently, decreasing water table and soil fatigue are big challenges and intense consequences in changing climate. Whole sequenced genome of rice sized 389 Mb of which 95% is covered with excellent mapping order. Sequenced rice genome helps in molecular biology and transcriptomics of cereals as it provides whole genome sequence of indica and japonica sub species. Through rice genome sequencing and functional genomics, QTLs or genes, genetic variability and halophyte blocks for agronomic characters were identified which have proved much more useful in molecular breeding and direct selection. There are different numbers of genes or QTLs identified for yield related traits i.e., 6 QTLs/genes for plant architecture, 6 for panicle characteristics, 4 for grain number, 1 gene/QTL for tiller, HGW, grain filling and shattering. QTLS/genes for grain quality, biotic stresses and for abiotic stresses are 7, 23 and 13 respectively. Low yield, inferior quality and susceptibility to biotic and abiotic stresses of a crop is due to narrow genetic background of new evolving rice verities. Wild rice provides genetic resources for improvement of these characters, molecular and genomics tool at different stages can overcome these stresses and improve yield and quality of rice crop.
Chapter
Drought is a major threat to many plants and especially with rice. Rice is a staple food for more than 3.5 billion people worldwide. There is a need to increase its productivity to make up the ever-increasing demand. However, drought during flowering stage reduces the crop yield significantly. Advances in omics technologies such as transcriptomics, genomics, proteomics, and metabolomics have provided an opportunity to study the drought-responsive genes and their functional product at genome-wide level. In recent years, these state-of-the-art techniques have improved our understanding of the complex drought tolerance mechanisms significantly. However, there are still challenges in generating a drought resistant rice variety having good yield. In this chapter, we have discussed about the application of different omics technologies to improve the drought resistant varieties with special reference to rice.
Chapter
Full-text available
Abiotic stresses, such as drought, salinity, low or high temperature, and heavy-metal toxicity are major limiting factors of crop productivity and sustainability. Over a few decades, several traditional and modern breeding methods have been used in the development of stress-tolerant plants. However, abiotic stress tolerance is a complex trait as plants respond to stresses by activating complex molecular and biochemical networks. To improve plant’s tolerance to abiotic stresses, a good knowledge of diverse mechanisms and/or pathways involved in the stress response is needed. With modern technology advancement, functional genomics approaches can provide enormously in understanding the gene-regulatory networks operating in diverse stresses. This chapter provides recent progresses on functional genomics, presenting various approaches such as next-generation sequencing, functional mapping of quantitative trait loci, genome-wide hybridization, and transgenesis and genome editing in addition to identification and validation of the candidate genes for crop improvements. Additionally, technologies related to gene expression, mutagenesis, map-based cloning, and different genomic-assisted strategies are assessed and discussed in the light of integration of the information acquired through functional genomics.
Preprint
Full-text available
The vast diversity of Orchidaceae together with sophisticated adaptations to pollinators and other unique features make this family an attractive model for evolutionary and functional studies. The sequenced genome of Phalaenopsis equestris facilitates Orchidaceae research. Here we present an RNA-seq based transcriptome map of P. equestris which covers 19 organs of the plant including leaves, roots, floral organs and shoot apical meristem. We demonstrated the high quality of the data and showed the similarity of P. equestris transcriptome map with gene expression atlases of other plants. The transcriptome map can be easily accessed through our database Transcriptome Variation Analysis (TraVA) visualizing gene expression profiles. As an example of the application we analyzed the expression of Phalaenopsis “orphan” genes – the ones that do not have recognizable similarity with genes of other plants. We found that about a half of them are not expressed; the ones that are expressed have a predominant expression pattern in reproductive structures.
Article
Full-text available
Rice, one of the world's most important food plants, has important syntenic relationships with the other cereal species and is a model plant for the grasses. Here we present a map-based, finished quality sequence that covers 95% of the 389 Mb genome, including virtually all of the euchromatin and two complete centromeres. A total of 37,544 non-transposable-element-related protein-coding genes were identified, of which 71% had a putative homologue in Arabidopsis. In a reciprocal analysis, 90% of the Arabidopsis proteins had a putative homologue in the predicted rice proteome. Twenty-nine per cent of the 37,544 predicted genes appear in clustered gene families. The number and classes of transposable elements found in the rice genome are consistent with the expansion of syntenic regions in the maize and sorghum genomes. We find evidence for widespread and recurrent gene transfer from the organelles to the nuclear chromosomes. The map-based sequence has proven useful for the identification of genes underlying agronomic traits. The additional single-nucleotide polymorphisms and simple sequence repeats identified in our study should accelerate improvements in rice production. Rice (Oryza sativa L.) is the most important food crop in the world and feeds over half of the global population. As the first step in a systematic and complete functional characterization of the rice genome, the International Rice Genome Sequencing Project (IRGSP) has generated and analysed a highly accurate finished sequence of the rice genome that is anchored to the genetic map. Our analysis has revealed several salient features of the rice genome: . We provide evidence for a genome size of 389 Mb. This size estimation is ,260 Mb larger than the fully sequenced dicot plant model Arabidopsis thaliana. We generated 370 Mb of finished sequence, representing 95% coverage of the genome and virtually all of the euchromatic regions. . A total of 37,544 non-transposable-element-related protein-cod-ing sequences were detected, compared with ,28,000–29,000 in Arabidopsis, with a lower gene density of one gene per 9.9 kb in rice. A total of 2,859 genes seem to be unique to rice and the other cereals, some of which might differentiate monocot and dicot lineages. . Gene knockouts are useful tools for determining gene function and relating genes to phenotypes. We identified 11,487 Tos17 retro-transposon insertion sites, of which 3,243 are in genes. . Between 0.38 and 0.43% of the nuclear genome contains orga-nellar DNA fragments, representing repeated and ongoing transfer of organellar DNA to the nuclear genome. . The transposon content of rice is at least 35% and is populated by representatives from all known transposon superfamilies. . We have identified 80,127 polymorphic sites that distinguish between two cultivated rice subspecies, japonica and indica, resulting in a high-resolution genetic map for rice. Single-nucleo-tide polymorphism (SNP) frequency varies from 0.53 to 0.78%, which is 20 times the frequency observed between the Columbia and Landsberg erecta ecotypes of Arabidopsis. . A comparison between the IRGSP genome sequence and the 6.3 £ indica and 6 £ japonica whole-genome shotgun sequence assemblies revealed that the draft sequences provided coverage of 69% by indica and 78% by japonica relative to the map-based sequence. Rice has played a central role in human nutrition and culture for the past 10,000 years. It has been estimated that world rice pro-duction must increase by 30% over the next 20 years to meet projected demands from population increase and economic devel-opment 1 . Rice grown on the most productive irrigated land has achieved nearly maximum production with current strains 1 . Environmental degradation, including pollution, increase in night time temperature due to global warming 2 , reductions in suitable arable land, water, labour and energy-dependent fertilizer provide additional constraints. These factors make steps to maximize rice productivity particularly important. Increasing yield potential and yield stability will come from a combination of biotechnology and improved conventional breeding. Both will be dependent on a high-quality rice genome sequence. Rice benefits from having the smallest genome of the major cereals, dense genetic maps and relative ease of genetic transformation 3 . The discovery of extensive genome colinearity among the Poaceae 4 has established rice as the model organism for the cereal grasses. These properties, along with the finished sequence and other tools under development, set the stage for a complete functional characterization of the rice genome.
Article
Full-text available
Rice, one of the world's most important food plants, has important syntenic relationships with the other cereal species and is a model plant for the grasses. Here we present a map-based, finished quality sequence that covers 95% of the 389 Mb genome, including virtually all of the euchromatin and two complete centromeres. A total of 37,544 non-transposable-element-related protein-coding genes were identified, of which 71% had a putative homologue in Arabidopsis. In a reciprocal analysis, 90% of the Arabidopsis proteins had a putative homologue in the predicted rice proteome. Twenty-nine per cent of the 37,544 predicted genes appear in clustered gene families. The number and classes of transposable elements found in the rice genome are consistent with the expansion of syntenic regions in the maize and sorghum genomes. We find evidence for widespread and recurrent gene transfer from the organelles to the nuclear chromosomes. The map-based sequence has proven useful for the identification of genes underlying agronomic traits. The additional single-nucleotide polymorphisms and simple sequence repeats identified in our study should accelerate improvements in rice production.
Article
Full-text available
Rice, one of the world's most important food plants, has important syntenic relationships with the other cereal species and is a model plant for the grasses. Here we present a map-based, finished quality sequence that covers 95% of the 389Mb genome, including virtually all of the euchromatin and two complete centromeres. A total of 37,544 nontransposable- element-related protein-coding genes were identified, of which 71% had a putative homologue in Arabidopsis. In a reciprocal analysis, 90% of the Arabidopsis proteins had a putative homologue in the predicted rice proteome. Twenty-nine per cent of the 37,544 predicted genes appear in clustered gene families. The number and classes of transposable elements found in the rice genome are consistent with the expansion of syntenic regions in the maize and sorghum genomes. We find evidence for widespread and recurrent gene transfer from the organelles to the nuclear chromosomes. The map-based sequence has proven useful for the identification of genes underlying agronomic traits. The additional single-nucleotide polymorphisms and simple sequence repeats identified in our study should accelerate improvements in rice production.
Article
Full-text available
Background Recent analysis of the human and mouse genomes has shown that a substantial proportion of protein coding genes and cis-regulatory elements contain transposable element (TE) sequences, implicating TE domestication as a mechanism for the origin of genetic novelty. To understand the general role of TE domestication in eukaryotic genome evolution, it is important to assess the acquisition of functional TE sequences by host genomes in a variety of different species, and to understand in greater depth the population dynamics of these mutational events. Results Using an in silico screen for host genes that contain TE sequences, we identified a set of 63 mature "chimeric" transcripts supported by expressed sequence tag (EST) evidence in the Drosophila melanogaster genome. We found a paucity of chimeric TEs relative to expectations derived from non-chimeric TEs, indicating that the majority (~80%) of TEs that generate chimeric transcripts are deleterious and are not observed in the genome sequence. Using a pooled-PCR strategy to assay the presence of gene-TE chimeras in wild strains, we found that over half of the observed chimeric TE insertions are restricted to the sequenced strain, and ~15% are found at high frequencies in North American D. melanogaster populations. Estimated population frequencies of chimeric TEs did not differ significantly from non-chimeric TEs, suggesting that the distribution of fitness effects for the observed subset of chimeric TEs is indistinguishable from the general set of TEs in the genome sequence. Conclusion In contrast to mammalian genomes, we found that fewer than 1% of Drosophila genes produce mRNAs that include bona fide TE sequences. This observation can be explained by the results of our population genomic analysis, which indicates that most potential chimeric TEs in D. melanogaster are deleterious but that a small proportion may contribute to the evolution of novel gene sequences such as nested or intercalated gene structures. Our results highlight the need to establish the fixity of putative cases of TE domestication identified using genome sequences in order to demonstrate their functional importance, and reveal that the contribution of TE domestication to genome evolution may vary drastically among animal taxa.
Article
Full-text available
The centromere of eukaryotic chromosomes is essential for the faithful segregation and inheritance of genetic information. In the majority of eukaryotic species, centromeres are associated with highly repetitive DNA, and as a consequence, the boundary for a functional centromere is difficult to define. In this study, we demonstrate that the centers of rice centromeres are occupied by a 155-bp satellite repeat, CentO, and a centromere-specific retrotransposon, CRR. The CentO satellite is located within the chromosomal regions to which the spindle fibers attach. CentO is quantitatively variable among the 12 rice centromeres, ranging from 65 kb to 2 Mb, and is interrupted irregularly by CRR elements. The break points of 14 rice centromere misdivision events were mapped to the middle of the CentO arrays, suggesting that the CentO satellite is located within the functional domain of rice centromeres. Our results demonstrate that the CentO satellite may be a key DNA element for rice centromere function.
Article
Full-text available
We collected and completely sequenced 28,469 full-length complementary DNA clones from Oryza sativa L. ssp. japonica cv. Nipponbare. Through homology searches of publicly available sequence data, we assigned tentative protein functions to 21,596 clones (75.86%). Mapping of the cDNA clones to genomic DNA revealed that there are 19,000 to 20,500 transcription units in the rice genome. Protein informatics analysis against the InterPro database revealed the existence of proteins presented in rice but not in Arabidopsis. Sixty-four percent of our cDNAs are homologous to Arabidopsis proteins.
Article
Full-text available
RNA interference (RNAi) is a widespread silencing mechanism that acts at both the posttranscriptional and transcriptional levels. Here, we describe the purification of an RNAi effector complex termed RITS (RNA-induced initiation of transcriptional gene silencing) that is required for heterochromatin assembly in fission yeast. The RITS complex contains Ago1 (the fission yeast Argonaute homolog), Chp1 (a heterochromatin-associated chromodomain protein), and Tas3 (a novel protein). In addition, the complex contains small RNAs that require the Dicer ribonuclease for their production. These small RNAs are homologous to centromeric repeats and are required for the localization of RITS to heterochromatic domains. The results suggest a mechanism for the role of the RNAi machinery and small RNAs in targeting of heterochromatin complexes and epigenetic gene silencing at specific chromosomal loci.
Article
Overlapping transcripts in antisense orientation have the potential to form double-stranded RNA (dsRNA), a substrate for a number of different RNA-modification pathways. One prominent route for dsRNA is its breakdown by Dicer enzyme complexes into small RNAs, a pathway that is widely exploited by RNA interference technology to inactivate defined genes in transgenic lines. The significance of this pathway for endogenous gene regulation remains unclear. RESULTS: We have examined transcription data for overlapping gene pairs in Arabidopsis thaliana. On the basis of an analysis of transcripts with coding regions, we find the majority of overlapping gene pairs to be convergently overlapping pairs (COPs), with the potential for dsRNA formation. In all tissues, COP transcripts are present at a higher frequency compared to the overall gene pool. The probability that both the sense and antisense copy of a COP are co-transcribed matches the theoretical value for coexpression under the assumption that the expression of one partner does not affect the expression of the other. Among COPs, we observe an over-representation of spliced (intron-containing) genes (90%) and of genes with alternatively spliced transcripts. For loci where antisense transcripts overlap with sense transcript introns, we also find a significant bias in favor of alternative splicing and variation of polyadenylation. CONCLUSION: The results argue against a predominant RNA degradation effect induced by dsRNA formation. Instead, our data support alternative roles for dsRNAs. They suggest that at least for a subgroup of COPs, antisense expression may induce alternative splicing or polyadenylation.
Article
Methylation of cytosines in DNA sequences is a major part of epigenetic regulation, resulting in proximal transcriptional silencing and enabling the stable inheritance of a pattern of transcriptional activity. DNA methylation in higher eukaryotes is involved in transposon silencing and regulation of gene expression; however, the full extent to which this mechanism regulates the genome has remained unknown. Tiling arrays representing the entire genome of the flowering plant Arabidopsis thaliana, tiled at 35-bp resolution, provide a platform upon which to analyze the methylated component of the Arabidopsis genome. Hybridization of methylated genomic DNA isolated by 5-methyl-cytosine immunoprecipitation to the whole-genome tiling arrays produced the first comprehensive DNA methylation map of an entire genome, identifying heavy DNA methylation at pericentromeric heterochromatin, repetitive sequences, and regions producing small interfering RNAs. Over one-third of expressed genes contain methylation within transcribed regions, whereas only ~5% of genes show methylation within promoter regions. Genes methylated in transcribed regions are highly expressed and constitutively active, whereas promoter-methylated genes show a greater degree of tissue-specific expression. Whole-genome tiling-array transcriptional profiling of DNA methyltransferase null mutants identified hundreds of genes and intergenic noncoding RNAs with altered expression levels, many of which may be epigenetically controlled by DNA methylation. The approaches developed should assist in the study of DNA methylation in larger and more complex genomes, for which whole-genome tiling arrays are now available.
Article
The miRNA Registry provides a service for the assignment of miRNA gene names prior to publication. A comprehensive and searchable database of published miRNA sequences is accessible via a web interface (http://www.sanger.ac.uk/Software/Rfam/mirna/), and all sequence and annotation data are freely available for download. Release 2.0 of the database contains 506 miRNA entries from six organisms.