ArticlePDF Available

Genome-wide identification of natural RNA aptamers in prokaryotes and eukaryotes

March 2018
Nature Communications 9(1)

March 2018
9(1)

DOI:10.1038/s41467-018-03675-1

License
CC BY

Authors:

Sidika Tapsin

Marmara University

Yang Shen

Genome Institute of Singapore

Huibin Zhang

Agency for Science, Technology and Research (A*STAR)

Show all 16 authorsHide

RNAs are well-suited to act as cellular sensors that detect and respond to metabolite changes in the environment, due to their ability to fold into complex structures. Here, we introduce a genome-wide strategy called PARCEL that experimentally identifies RNA aptamers in vitro, in a high-throughput manner. By applying PARCEL to a collection of prokaryotic and eukaryotic organisms, we have revealed 58 new RNA aptamers to three key metabolites, greatly expanding the list of natural RNA aptamers. The newly identified RNA aptamers exhibit significant sequence conservation, are highly structured and show an unexpected prevalence in coding regions. We identified a prokaryotic precursor tmRNA that binds vitamin B2 (FMN) to facilitate its maturation, as well as eukaryotic mRNAs that bind and respond to FMN, suggesting FMN as the second RNA-binding ligand to affect eukaryotic expression. PARCEL results show that RNA-based sensing and gene regulation is more widespread than previously appreciated in different organisms.

Precursor tmRNA can act as an RNA sensor for FMN. a Schematic of the B. subtilis tmRNA genomic locus (top) and predicted secondary structures of the long precursor tmRNA, short precursor tmRNA, and mature tmRNA, using the RNAfold program 20 (bottom). b RNA footprinting analysis of the long precursor tmRNA, using a SHAPE-like chemical (NAI), in the presence (lane 4) and absence (lane 3) of 100 µM FMN. Also shown are A ladder (lane 1) and unmodified RNA (lane 2). The red bar indicates bases that become more single-stranded in the presence of FMN. c Predicted secondary structure of the tmRNA long precursor using the RNAfold program 20. The red bases correspond to the positions marked by the red bar in b. d Average footprinting analysis (n = 3, SAFA) of mature (top), short precursor (middle), and long precursor tmRNA (bottom), in the presence (red) and absence (black) of 100 µM FMN, in the dark. The beige box indicate the region of increased flexibility in the precursor tmRNAs in the presence of FMN. The stars indicate bases that show statistically significant changes with FMN (p ≤ 0.05, Student t-test). e qPCR analysis of the mRNA expression level of precursor tmRNA and mature tmRNA, across six biological replicates, after addition of 100 µM of riboflavin to the growth media of B. subtilis. Fold changes are normalized to the negative control Veg gene. The known B. subtilis FMN riboswitch is used as the positive control. p-values were calculated by Student's t-test, the error bars indicate standard deviation of the replicates

…

Measuring RNA-ligand binding by structure probing and deep sequencing. a RNA undergoes structure changes upon ligand binding. This structural change is detected by the double-strand specific nuclease, RNase V1, which cuts at different double-stranded places along the RNA in the presence and absence of the ligand. The cleavage sites are then captured and cloned into a cDNA library for deep sequencing. After mapping the reads to the transcriptome, we can identify which bases have undergone changes in structuredness upon ligand binding (highlighted in beige boxes). b Deep sequencing reveals structure changes of a known TPP riboswitch, thiM, using RNase V1 (top), S1 nuclease (middle), and in-line probing (bottom). The red and black lines indicate the structure profiles of thiM treated with and without 100 µM TPP, respectively. The beige regions highlight regions of structural changes upon ligand binding. c PARCEL identified 85% of known TPP, FMN, and SAM riboswitches in B. subtilis and P. aeruginosa. The black and the white bars indicate the number of known riboswitches that were captured and missed in our study, respectively. d PARCEL sequencing data for the B. subtilis TPP riboswitch, thiT, in the presence and absence of 100 µM TPP (top), 100 µM thiamine (middle), and 100 µM oxythiamine (bottom). PARCEL detected strongest structural change in thiT in the presence of TPP, followed by thiamine and then oxythiamine, which corresponds to the binding affinities of TPP riboswitches for these metabolites⁹. e The plots show normalized V1 read counts of the thiC TPP riboswitch under increasing concentrations of TPP. PARCEL was performed on the B. subtilis transcriptome

…

PARCEL identifies new RNA aptamers in bacterial species. a PARCEL identifies a total of 52 RNA aptamers in B. subtilis and P. aeruginosa. Black and white bars indicate the numbers of known riboswitches and novel aptamers that are identified in our study, respectively. b Distribution of known riboswitches and new RNA aptamers along the 5′ UTR, CDS, and 3′ UTR regions for B. subtilis and P. aeruginosa, showing that a substantial proportion of RNA aptamers are located in the 3′ UTR and CDS regions. c Comparison of score distribution of Alifoldz¹² for RNA aptamers vs. shuffled counterparts. The upper, middle, and lower bounds of the boxplot represent the 75, 50, and 25th percentile of the values, respectively. A negative score indicates a stable, conserved consensus structure. p-value was obtained using the non-parametric Kolmogorov–Smirnov test. d, e Comparison of the nucleotide substitution rate (number of substitutions per base-pair) for new RNA aptamers in coding region (KrCDS), new RNA aptamers in UTR (KrUTR), 3′ UTR (K3UTR), 5′ UTR (K5UTR), synonymous sites (Ks), and non-synonymous sites (Ka). The upper, middle, and lower bounds of the boxplot represent the 75, 50, and 25th percentile of the values, respectively. To calculate nucleotide substitutions, B. subtilis 168 was compared to B. subtilis subsp. spizizenii W23 (d), and P. aeruginosa PAO1 was compared to P. aeruginosa PA7 (e). Note that Krknown denotes the substitution rate of known riboswitches in B. subtilis (15 in total) as annotated in the RegPrecise database¹¹. Krknown was not calculated in P. aeruginosa as there are too few known TPP and FMN riboswitches. p-values were calculated using the non-parametric Kolmogorov–Smirnov test

…

PARCEL identifies new RNA aptamers in Candida albicans. a Pie chart of the number of C. albicans RNA aptamers that are located in 5′ UTR, CDS, and 3′ UTR. The majority of C. albicans RNA aptamers are found in CDSs. b Comparison of the distribution of Alifoldz scores for RNA aptamers vs. shuffled counterpart. The upper, middle, and lower bounds of the boxplot represent the 75, 50, and 25th percentile of the values, respectively. A negative score indicates a stable, conserved consensus structure. P-value was obtained using the non-parametric Kolmogorov–Smirnov test. c Nucleotide substitution rates, calculated as the number of substitutions per base-pair, for RNA aptamers (Kr), 3′ UTR (K3UTR), 5′ UTR(K5UTR), synonymous sites (Ks), and non-synonymous sites (Ka). The upper, middle, and lower bounds of the boxplot represent the 75, 50, and 25th percentile of the values, respectively. C. albicans SC5314 was compared to Candida dubliniensis for the calculation. p-value was obtained using the non-parametric Kolmogorov–Smirnov test. d Gel analysis of RPS31 mRNA using in-line probing (left) and RNase V1 (right) in the presence (lane 3) and absence (lane 2) of 100 µM FMN. The A ladder (A, lane 1) is also shown. The black arrows indicate positions along the RNA that changed in the presence of FMN. e A representative Western blot showing RPS31::FLAG (top) and loading (bottom) protein levels in RPS31::FLAG knock-in strains with WT (left) and fmn1∆ (right) backgrounds cultured at different FMN concentrations (mM). Using t-test (n = 8), significant p-values of 0.009 and 0.01 (for 2.5 and 5.0 mM against 10.0 mM, respectively) were determined for fmn1∆, but not WT (p-values of 0.2 and 0.5). f Gel analysis of RPS31 mRNA using in-line probing in the presence of 20, 100, or 500 µM of FMN, FAD or riboflavin. In-line probing of RNA in the absence of metabolite (H2O, lane 2) and A ladder (A, lane 1) are also shown. g SAFA analysis of WT RPS31 (top) and codon-optimized RPS31 (bottom) in the presence (red line) and absence (black line) of 100 µM FMN. The beige box indicates the region of structural change in WT RPS31 when it interacts with FMN. This structural change is absent in the codon-optimized RPS31. h A representative Western blot showing codon-optimized RPS31::FLAG (top) and loading (bottom) protein levels in codon-optimized RPS31::FLAG knock-in strains with WT (left) and fmn1∆ (right) backgrounds cultured at different FMN concentrations (mM). Using t-test (n = 3), calculated p-values for 2.5 and 5.0 mM were insignificant for both fmn1∆ (both 0.7) and WT (0.9 and 0.09)

…

Figures - available from: Nature Communications

This content is subject to copyright. Terms and conditions apply.

Access to this full-text is provided by Springer Nature.

Learn more

Content available from Nature Communications

This content is subject to copyright. Terms and conditions apply.

ARTICLE

Genome-wide identiﬁcation of natural RNA

aptamers in prokaryotes and eukaryotes

Sidika Tapsin1, Miao Sun2, Yang Shen2, Huibin Zhang3, Xin Ni Lim1, Teodorus Theo Susanto1, Siwy Ling Yang1,

Gui Sheng Zeng4, Jasmine Lee4, Alexander Lezhava5, Ee Lui Ang3, Lian Hui Zhang4, Yue Wang 4,

Huimin Zhao 3,6, Niranjan Nagarajan2& Yue Wan1

RNAs are well-suited to act as cellular sensors that detect and respond to metabolite changes

in the environment, due to their ability to fold into complex structures. Here, we introduce a

genome-wide strategy called PARCEL that experimentally identiﬁes RNA aptamers in vitro, in

a high-throughput manner. By applying PARCEL to a collection of prokaryotic and eukaryotic

organisms, we have revealed 58 new RNA aptamers to three key metabolites, greatly

expanding the list of natural RNA aptamers. The newly identiﬁed RNA aptamers exhibit

signiﬁcant sequence conservation, are highly structured and show an unexpected prevalence

in coding regions. We identiﬁed a prokaryotic precursor tmRNA that binds vitamin B2 (FMN)

to facilitate its maturation, as well as eukaryotic mRNAs that bind and respond to FMN,

suggesting FMN as the second RNA-binding ligand to affect eukaryotic expression. PARCEL

results show that RNA-based sensing and gene regulation is more widespread than pre-

viously appreciated in different organisms.

DOI: 10.1038/s41467-018-03675-1 OPEN

1Stem Cell and Development Biology, Genome Institute of Singapore, Singapore 138672, Singapore. 2Computational and Systems Biology, Genome Institute

of Singapore, Singapore 138672, Singapore. 3Metabolic Engineering Research Laboratory (MERL), Science and Engineering Institutes, Agency for Science,

Technology, and Research (A*STAR), 31 Biopolis Way, Nanos #01-01, Singapore 138669, Singapore. 4Institute of Molecular and Cell Biology, Proteos, 61

Biopolis Drive, Singapore 138673, Singapore. 5Translational research group, Genome Institute of Singapore, Singapore 138672, Singapore. 6Department of

Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, United States. These authors contributed equally:

Sidika Tapsin, Miao Sun, Yang Shen, Huibin Zhang. Correspondence and requests for materials should be addressed to

N.N. (email: nagarajann@gis.a-star.edu.sg) or to Y.W. (email: wany@gis.a-star.edu.sg)

NATURE COMMUNICATIONS | (2018) 9:1289 |DOI: 10.1038/s41467-018-03675-1 |www.nature.com/naturecommunications 1

1234567890():,;

Content courtesy of Springer Nature, terms of use apply. Rights reserved

Microorganisms are constantly sensing their environment

for changes in temperature, pH, metabolites, and

nutrients so as to regulate their gene expression pro-

grams to best adapt to different signals for growth, survival, and

virulence1. As such, comprehensive mapping of regulatory net-

works in microbes under different environmental conditions is

crucial to understanding their biology. While many of these

regulators have been extensively studied at the protein and DNA

levels, RNA’s role as direct sensors and responders remains

relatively under-explored. One class of cellular RNA sensors,

known as riboswitches, can recognize and respond to speciﬁc

metabolites by altering gene expression2. Upon binding to their

ligands, riboswitches undergo conformational changes that result

in the regulation of gene expression through diverse means, such

as changes in transcription, translation and decay2, informing the

host of its environmental conditions. The modularity of ribos-

witches also allows them to be transplantable to different systems,

broadening their use as biological sensors.

While in vitro selection methods, such as SELEX, have been

applied with variable success to generate new synthetic RNA

aptamers3, the ability to comprehensively identify natural RNA

aptamers from transcriptomes would expand our toolbox for

synthetic biology and deepen our understanding of RNA -based

gene regulation in vivo. Currently, potential natural RNA apta-

mers are mostly identiﬁed through computational determination

of sequence and structural homology to known riboswitches4.

However, as RNA can adopt different folds for binding to the

same ligand and organisms can diverge greatly in sequence

content, computational means of searching for riboswitches

through sequence homology have limited scope5, and strategies

that allow direct experimental detection are highly desirable. One

recent strategy, Term-seq, utilizes high-throughput sequencing to

detect differential transcription termination events in bacteria

under different conditions6. However, complementary strategies

for detecting riboswitches that act through other mechanisms

(such as translation inhibition), as well as riboswitches that bind

to metabolites whose intracellular concentrations are not easily

altered, need to be developed.

Here, we report an in vitro method for experimentally identi-

fying RNA aptamers in transcriptomes by detecting ligand-

induced RNA structural changes using high-throughput sequen-

cing (Fig. 1a). This method allows us to rapidly screen through

transcriptomes to identify natural RNA aptamers toward almost

any ligand of choice. Speciﬁcally, we extract total RNA from

organisms grown under different conditions and probe their

structures in the presence or absence of different metabolites by

using a double-stranded nuclease, RNase V1, which recognizes

and cleaves at base-paired regions in RNAs. The different cleavage

sites, in the presence and absence of metabolites, are cloned into

cDNA libraries for deep sequencing. Subtle differences between

these two libraries point to the few true ligand-speciﬁc structure

changing RNA elements in the genome and we developed a

sensitive and robust computational analysis pipeline to identify

these (Methods). This experimental and computational approach,

termed Parallel Analysis of RNA Conformations Exposed to

Ligand binding (PARCEL), revealed the breadth of RNA-ligand

interactions in prokaryotic and eukaryotic transcriptomes, iden-

tifying many new natural RNA aptamers in the process.

Results

Development of PARCEL. To establish PARCEL, we system-

atically tested different structure-probing strategies to determine

the approach that best captures ligand-induced structural changes

genome-wide, allowing for a simpliﬁed, cost-effective workﬂow

without multiple probing assays7. Strategies using double-strand

or single-strand speciﬁc nucleases (RNase V1 and S1 nuclease,

respectively), as well as in-line probing which probes nucleotide

ﬂexibility8, were tested for their abilities to detect structural

changes using high-throughput sequencing (Fig. 1b, Supple-

mentary Fig. 1,2,3). The known thiamine pyrophosphate (TPP)

and S-adenosylmethionine (SAM) riboswitches were used as

positive controls and other RNA sequences not known to bind

TPP or SAM were used as negative controls in this experi-

ment9,10. As expected, the known riboswitches showed large

structural changes upon ligand binding (Fig. 1b, Supplementary

Fig. 1,2), while the negative controls did not (Supplementary

Fig. 3), indicating that the structure changes captured by nuclease

digestion followed by sequencing are highly speciﬁc. These

structural differences can be less pronounced in libraries prepared

using in-line probing (Fig. 1b, Supplementary Fig. 2). This is

likely due to the noise introduced by the additional 5′phos-

phorylation step that is used in in-line probing library prepara-

tion. As degraded cellular RNAs are also phosphorylated and

cloned into the library, it is difﬁcult to distinguish in-line probed

fragments from degradation fragments. Among the nucleases, we

observed a higher degree of correlation between two biological

replicates of RNase V1 (R=0.99) versus S1 nuclease libraries (R

=0.66, Supplementary Fig. 4a), a key feature for differential

analysis. The structural changes observed using the nucleases

could also be reproduced using low-throughput RNA footprint-

ing and mapped to the secondary structure of the TPP riboswitch

(Supplementary Fig. 1b-d). Correspondingly, we selected RNase

V1 as the probing strategy of choice in all PARCEL experiments

to identify natural RNA aptamers genome-wide.

As Bacillus subtilis and Pseudomonas aeruginosa are bacteria for

which many riboswitches are known, we performed structure

probing in the presence and absence of key metabolites known to

interact with RNAs in their transcriptomes. To maximize our

chances of ﬁnding RNA aptamers which may only be expressed

under speciﬁc conditions, we grew the bacteria in rich or minimal

media to exponential or stationary phases (Methods). We then

extracted total RNA from the pooled bacteria, performed ribosomal

depletion to enrich for mRNAs, and did structure probing, followed

by deep sequencing. We had two biological replicates for each

experiment and obtained more than 7 million reads per replicate

(Supplementary Table 1). RNA aptamers that bind speciﬁcally to

one ligand should not recognize other unrelated ligands. As such,

we developed a novel computational pipeline to identify contiguous

positions of structural change (to increase signal-to-noise ratio) that

show statistically different numbers of reads that indicate base

pairing in one metabolite condition but not in others (Supplemen-

tary Fig. 4b, Methods). This approach aggregates signals of variation

in each base pair across conditions to deﬁne ligand-responsive

regions using dynamic programming, and combines this with the

computation of a BLAST-like E-value to identify RNA aptamers

with statistical conﬁdence (Methods).

PARCEL ﬁnds known riboswitches in B.subtilis and P.aeru-

ginosa. To evaluate PARCEL, we ﬁrst determined if we could

identify known riboswitches that have been previously reported

in the literature. We identiﬁed 17 out of 20 known riboswitches

that interact with key metabolites (TPP, FMN, and SAM) in B.

subtilis and P.aeruginosa, including 4/5 known TPP riboswitches,

2/2 FMN riboswitches, 9/11 SAM riboswitches in B.subtilis,as

well as 1/1 TPP and 1/1 FMN riboswitches in P.aeruginosa11,

highlighting the high sensitivity of the method (85%; Fig. 1c,

Supplementary Fig. 5a-d). Furthermore, pair-wise analysis of

control libraries that were generated using the same metabolite

did not identify any candidate RNA aptamers, indicating that the

approach is highly speciﬁc to the presence of metabolites

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-03675-1

2NATURE COMMUNICATIONS | (2018) 9:1289 |DOI: 10.1038/s41467-018-03675-1 |www.nature.com/naturecommunications

Content courtesy of Springer Nature, terms of use apply. Rights reserved

(Supplementary Fig. 5e). We noted that undetected known

riboswitches have low numbers of RNase V1 and S1 nuclease

reads along their regulons, indicating that they are either poorly

expressed or present in nuclease-inaccessible regions (Supple-

mentary Fig. 5f, g).

Besides detecting structural changes upon ligand binding, we

also determined whether PARCEL can detect RNA-ligand

interactions quantitatively. As TPP riboswitches bind most

strongly to TPP, followed by thiamine and oxythiamine, we

tested the sensitivity of PARCEL in detecting RNA structural

changes due to differences in ligand binding afﬁnities9. Indeed,

we observed the strongest structural change in TPP riboswitches

upon binding to TPP, followed by thiamine and then oxythia-

mine (Fig. 1d, Supplementary Fig. 6a). We also treated B.subtilis

TPP FMN SAM

No. of known riboswitches

Identified

Missed

Water control

100 μM TPP

100

150

200

250

100

150

200

250

100

150

200

250

Normalized reads

Bases to start codon

Water control

100 μM thiamine

Water control

100 μM oxythiamine

19 24 29 34 39 44 49 54 59 64 69 74 79 84

Bases

TPP

No TPP

S1 nuclease

In-line probing

Normalized sequencing reads

–218 –208 –198 –188 –178 –168 –158 –148 –138 –128

thiM riboswitch

RNase V1

20 30

Library construction

Deep sequencing

Mapping to transcriptome

Library construction

Deep sequencing

Mapping to transcriptome

Ligand

TPP

FMN

Double

stranded signal

Double

stranded signal

5′70

20 30

40 10

5′70

20 30

10 7020 30 40 50 60

110 7020 30 40 50 601

100

200

300

62 67 72 77 82 87 92 97 102 107 112 117 122

Base

Normalized V1 reads

0.16 μM TPP H2O

0.8 μM TPP H2O

4 μM TPP H2O

20 μM TPP H2O

100 μM TPP H2O

500 μM TPP H2O

127

100

200

300

100

200

300

100

200

300

100

200

300

100

200

300

NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-03675-1 ARTICLE

NATURE COMMUNICATIONS | (2018) 9:1289 |DOI: 10.1038/s41467-018-03675-1 |www.nature.com/naturecommunications 3

Content courtesy of Springer Nature, terms of use apply. Rights reserved

ribosomal RNA-depleted pool with different concentrations of

TPP to determine the approximate binding afﬁnity of known TPP

riboswitches (Fig. 1e). PARCEL data on the known thiC

riboswitch shows a graded change in RNA structure under

different ligand concentrations, and an approximate K

of 110

nM in the most ligand sensitive regions, similar to the previously

reported K

of 100nM9(Supplementary Fig. 6b). These data

collectively demonstrate that PARCEL is quantitative and can be

used to approximate relative binding afﬁnities of RNA-ligand

interactions.

PARCEL identiﬁes new RNA aptamers in B.subtilis and P.

aeruginosa. Beyond known riboswitches, we also identiﬁed new

aptamers; 17 TPP and 12 FMN RNA aptamers in the B.subtilis

and P.aeruginosa transcriptomes, as well as 6 SAM RNA apta-

mers in the B.subtilis transcriptome (Fig. 2a, Supplementary

Table 2). RNA footprinting validated four out of six ligand-

induced structure changes identiﬁed by PARCEL (Supplementary

Fig. 7a-d). While known riboswitches are mostly located

upstream of their operons, we observed that 28% and 47% of the

newly identiﬁed B.subtilis and P.aeruginosa RNA aptamers are

found in coding regions, respectively (Fig. 2b, Supplementary

Fig. 8). Two of our validated RNA aptamers fall in coding regions

(Supplementary Fig. 7a, c), indicating that these coding regions

exhibit real structural changes in the presence of TPP.

To study the structural properties of these RNA aptamers, we

utilized the program Alifoldz12 to calculate folding energies

across their RNA orthologs in different bacterial species

(Methods). The newly identiﬁed elements exhibit lower folding

energies than their dinucleotide shufﬂed controls, indicating that

they are more structured (Fig. 2c). As functionally important

RNAs frequently show evolutionary constraints in their

sequences, we calculated the nucleotide substitution rate of our

new RNA aptamers, as compared to other synonymous regions

and UTRs. Similar to known riboswitches, the novel RNA

aptamers show a signiﬁcant reduction in nucleotide substitution

rate (Fig. 2d, e), suggesting that they are evolutionarily conserved

and likely to be functional.

We observed that one of our prokaryotic metabolite-sensitive

regions encodes a small non-coding RNA, speciﬁcally a tmRNA

(Fig. 3a). The B.subtilis tmRNA is transcribed from two

promoters to produce both long and short precursor RNAs,

which are then processed into the mature tmRNA. To understand

the functional role of metabolite sensing in tmRNA, we cloned

and in vitro transcribed all three isoforms of tmRNA, then

performed structure probing in the presence and absence of FMN

in the dark, to avoid FMN-induced photocleavage of RNAs.

Interestingly, we observed FMN-induced structure changes in

both precursor forms of tmRNA, but not in the mature form

(Fig. 3b, c, d, Supplementary Fig. 7d), highlighting that it is the

precursor tmRNA that responds to FMN. To determine whether

FMN binding inﬂuences the processing of precursor tmRNAs, we

grew B.subtilis in minimal media with and without the FMN

precursor, riboﬂavin. Addition of riboﬂavin resulted in a decrease

in precursor tmRNA levels and a two-fold increase in mature

tmRNA levels (Fig. 3e), supporting the hypothesis that FMN

regulates RNA maturation by binding and altering precursor

tmRNA structures.

PARCEL ﬁnds new eukaryotic FMN aptamers in Candida

albicans. To date, only riboswitches that bind to TPP have been

found in eukaryotes, and they regulate splicing and 3′UTR

usage13,14. Identifying new eukaryotic riboswitches is important

for broadening our understanding of eukaryotic gene regulation.

To maximize our chances of ﬁnding eukaryotic riboswitches, we

screened the fungal pathogen, C.albicans, using a pool of meta-

bolites that correspond to highly abundant classes of riboswitches

in bacteria, including FMN, SAM, glycine, lysine, and vitamin

B12 (Adocbl). PARCEL identiﬁed 23 new RNA aptamers that

exhibited structural changes in the presence of the metabolite

pool (Supplementary Table 3). 87% of the new C.albicans RNA

aptamers reside in coding regions (Fig. 4a, Supplementary Fig. 8),

in contrast to known riboswitches and new prokaryotic aptamers,

indicating that they may have different functions from classical

riboswitches. We validated seven out of nine PARCEL-identiﬁed

structural changes by performing in-line probing of these novel

RNA aptamers in the presence of the pooled metabolites (Sup-

plementary Fig. 9–11), all of which fall in the coding regions of

these genes, conﬁrming that the PARCEL-detected structural

changes are real. Similar to their prokaryotic counterparts, the

eukaryotic RNA aptamers were found to be signiﬁcantly more

structured compared to dinucleotide shufﬂed controls, suggesting

that structure is likely to be important for their function (Fig. 4b).

As many of the new eukaryotic RNA aptamers are located in

highly conserved coding regions, we observed an expected

increase in conservation of these elements as compared to UTRs,

but not a further reduced nucleotide substitution rate compared

to other coding sequences (Fig. 4c).

Eukaryotic RNA aptamers undergo gene expression changes

with FMN. To better understand the cellular roles of the new

eukaryotic RNA aptamers, we performed structure probing in the

presence of each individual compound in the metabolite pool on

two RNA aptamers identiﬁed in the coding regions of the genes

RPS31 and ATP1. Interestingly, structure probing of these apta-

mers revealed that they respond speciﬁcally to FMN, and not to

other metabolites in the solution (Supplementary Fig. 10a, 11a).

Detailed structure probing, in the dark, along the length of these

two RNAs identiﬁed several regions that changed structure in the

presence of FMN (Fig. 4d, Supplementary Fig. 10b), suggesting

that FMN binding results in structural remodeling of these RNAs.

To determine whether changes in the intracellular concentration

Fig. 1 Measuring RNA-ligand binding by structure probing and deep sequencing. aRNA undergoes structure changes upon ligand binding. This structural

change is detected by the double-strand speciﬁc nuclease, RNase V1, which cuts at different double-stranded places along the RNA in the presence and

absence of the ligand. The cleavage sites are then captured and cloned into a cDNA library for deep sequencing. After mapping the reads to the

transcriptome, we can identify which bases have undergone changes in structuredness upon ligand binding (highlighted in beige boxes). bDeep

sequencing reveals structure changes of a known TPP riboswitch, thiM, using RNase V1 (top), S1 nuclease (middle), and in-line probing (bottom). The red

and black lines indicate the structure proﬁles of thiM treated with and without 100 µM TPP, respectively. The beige regions highlight regions of structural

changes upon ligand binding. cPARCEL identiﬁed 85% of known TPP, FMN, and SAM riboswitches in B.subtilis and P.aeruginosa. The black and the white

bars indicate the number of known riboswitches that were captured and missed in our study, respectively. dPARCEL sequencing data for the B.subtilis TPP

riboswitch, thiT, in the presence and absence of 100 µM TPP (top), 100 µM thiamine (middle), and 100 µM oxythiamine (bottom). PARCEL detected

strongest structural change in thiT in the presence of TPP, followed by thiamine and then oxythiamine, which corresponds to the binding afﬁnities of TPP

riboswitches for these metabolites9.eThe plots show normalized V1 read counts of the thiC TPP riboswitch under increasing concentrations of TPP.

PARCEL was performed on the B.subtilis transcriptome

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-03675-1

4NATURE COMMUNICATIONS | (2018) 9:1289 |DOI: 10.1038/s41467-018-03675-1 |www.nature.com/naturecommunications

Content courtesy of Springer Nature, terms of use apply. Rights reserved

3′UTR5′UTR CDS

B. subtilis

Known riboswitches

B. subtilis

New RNA aptamers

P. aeruginosa

New RNA aptamers

B. subtilis

Z-score (obtained from Alifoldz)

Shuffled

−8

−6

−4

−2

p = 6.25×10−6

P. aeruginosa

Z-score (obtained from Alifoldz)

−10

−8

−6

−4

−2

p = 0.006

No. of RNA aptamers

B. subtilis P. aeruginosa

TPP FMN SAM TPP FMN

Known riboswitches

New RNA aptamers

KrCDS KrUTR Ks

0.1

0.2

0.3

0.4

0.5

1.1

1.2

K5′UTR

K3′UTR Ka

Nucleotide substitution rate

Krknown

0.1

0.3

0.5

1.1

1.2

0.4

0.2

Nucleotide substitution rate

KrCDS KrUTR KsK5′UTR

K3′UTR Ka

p = 0.001

B.spi vs. B.sub

PAO1 vs. PA7

p = 0.002

Observed ShuffledObserved

Fig. 2 PARCEL identiﬁes new RNA aptamers in bacterial species. aPARCEL identiﬁes a total of 52 RNA aptamers in B.subtilis and P.aeruginosa. Black and

white bars indicate the numbers of known riboswitches and novel aptamers that are identiﬁed in our study, respectively. bDistribution of known

riboswitches and new RNA aptamers along the 5′UTR, CDS, and 3′UTR regions for B.subtilis and P.aeruginosa, showing that a substantial proportion of

RNA aptamers are located in the 3′UTR and CDS regions. cComparison of score distribution of Alifoldz12 for RNA aptamers vs. shufﬂed counterparts. The

upper, middle, and lower bounds of the boxplot represent the 75, 50, and 25th percentile of the values, respectively. A negative score indicates a stable,

conserved consensus structure. p-value was obtained using the non-parametric Kolmogorov–Smirnov test. d, e Comparison of the nucleotide substitution

rate (number of substitutions per base-pair) for new RNA aptamers in coding region (Kr

CDS

), new RNA aptamers in UTR (Kr

UTR

), 3′UTR (K

3UTR

), 5′UTR

5UTR

), synonymous sites (Ks), and non-synonymous sites (Ka). The upper, middle, and lower bounds of the boxplot represent the 75, 50, and 25th

percentile of the values, respectively. To calculate nucleotide substitutions, B.subtilis 168 was compared to B.subtilis subsp.spizizenii W23 (d), and P.

aeruginosa PAO1 was compared to P.aeruginosa PA7 (e). Note that Kr

known

denotes the substitution rate of known riboswitches in B.subtilis (15 in total) as

annotated in the RegPrecise database11.Kr

known

was not calculated in P.aeruginosa as there are too few known TPP and FMN riboswitches. p-values were

calculated using the non-parametric Kolmogorov–Smirnov test

NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-03675-1 ARTICLE

NATURE COMMUNICATIONS | (2018) 9:1289 |DOI: 10.1038/s41467-018-03675-1 |www.nature.com/naturecommunications 5

Content courtesy of Springer Nature, terms of use apply. Rights reserved

of FMN could alter gene expression changes of RPS31 or ATP1

in vivo, we integrated FLAG-tagged C.albicans RPS31 or ATP1 in

aS.cerevisae FMN1 synthase deletion mutant (fmn1Δ)asS.

cerevisae is known to take up exogenous FMN, unlike C.albi-

cans15. Both transcript and protein levels of FLAG-tagged C.

albicans RPS31 and ATP1 were measured, following growth of

the integrated strains under varying FMN concentrations. We

found that while transcript levels of FLAG-tagged C.albicans

RPS31 and ATP1 in fmn1Δdid not change with increasing FMN

concentrations (Supplementary Fig. 10c, 11b), RPS31 and ATP1

protein levels were decreased and increased, respectively (Fig. 4e,

Supplementary Fig. 10d-f,11c), suggesting that FMN sensing

could have a regulatory effect on gene expression in vivo, and that

3,450,000 3,450,500 3,451,000 3,451,500

BSU33590 BSU33600

tmRNA

Short precursor

Long precursor

Short precursor tmRNALong precursor tmRNA Mature tmRNA

B. subtilis

A NAI

FMN (μM) 0 1000

Long precursor

0.5

1.5

2.5

Veg FMN

riboswitch

Precursor

tmRNA

Fold change +/– riboflavin

p = 0.04

p = 0.02

175A

168A

162A

154A

158A

150A

148A

143A

139A

ACGTTACGGATTCGACAGGGATGGATCGAGCTTGAGCTGCG

0.5

1.5

2.5

3.5

ACGTTACGGATTCGACAGGGATGGATCGAGCTTGAGCTGCG

ACGT TACGGAT TCGAC AGGGATGGATCGAGCT TGAGCT GCG

Water

FMN

Long precursor

tmRNA

Short precursor

Mature tmRNA

Base

Relative NAI probing intensity

0.5

1.5

2.5

3.5

0.5

1.5

2.5

3.5

*** *

GAC

GAG

ACTATAGC

TATTTTCTC

CAAGGGGA

CGTTACGGATTC

CGA

TCTC

TAAATATAACTGGC

TTTTA

ATA

GCCT

GCTTGCGTTC

CGTC

AAGAGATGAACAGACTAGCTCTCG

ATGAGTG

TAAAT

GACGTGGGTT

140

150

abc

Fig. 3 Precursor tmRNA can act as an RNA sensor for FMN. aSchematic of the B.subtilis tmRNA genomic locus (top) and predicted secondary structures

of the long precursor tmRNA, short precursor tmRNA, and mature tmRNA, using the RNAfold program20 (bottom). bRNA footprinting analysis of the long

precursor tmRNA, using a SHAPE-like chemical (NAI), in the presence (lane 4) and absence (lane 3) of 100 µM FMN. Also shown are A ladder (lane 1) and

unmodiﬁed RNA (lane 2). The red bar indicates bases that become more single-stranded in the presence of FMN. cPredicted secondary structure of the

tmRNA long precursor using the RNAfold program20. The red bases correspond to the positions marked by the red bar in b.dAverage footprinting analysis

(n=3, SAFA) of mature (top), short precursor (middle), and long precursor tmRNA (bottom), in the presence (red) and absence (black) of 100 µM FMN,

in the dark. The beige box indicate the region of increased ﬂexibility in the precursor tmRNAs in the presence of FMN. The stars indicate bases that show

statistically signiﬁcant changes with FMN (p≤0.05, Student t-test). eqPCR analysis of the mRNA expression level of precursor tmRNA and mature

tmRNA, across six biological replicates, after addition of 100 µM of riboﬂavin to the growth media of B.subtilis. Fold changes are normalized to the negative

control Veg gene. The known B.subtilis FMN riboswitch is used as the positive control. p-values were calculated by Student’st-test, the error bars indicate

standard deviation of the replicates

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-03675-1

6NATURE COMMUNICATIONS | (2018) 9:1289 |DOI: 10.1038/s41467-018-03675-1 |www.nature.com/naturecommunications

Content courtesy of Springer Nature, terms of use apply. Rights reserved

3′ UTR 5′ UTR CDS

C. albicans

New RNA aptamers

C. albicans

−3

−2

−1

Z-score (obtained from Alifoldz)

p = 0.016

Ka Kr Ks

0.1

0.2

0.3

0.4

0.5

1.1

1.2

Nucleotide substitution rate

Shuffled

FMN FAD RiboflavinA

20 100 500 20 100 500 20 100 500

(μM)

RPS31

2.5 10.05.0

FMN (mM)

WT fmn1Δ

2.5 10.05.0

Wildtype RPS31 protein levels

RPS31

2.5 10.05.0FMN (mM)

WT fmn1Δ

2.5 10.05.0

Codon-optimized RPS31 protein levels

H2O

359

337337

317

282

240

348

+–A

In-line probing

+–A

RNase V1

359

348

337337

317

282

240

FMN (100 μM) FMN

p = 3.4×10–12

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

WT RPS31

Codon-optimized RPS31

Base

Water

100 μM FMN

C.alb vs. C.dub

Observed K3′UTR K5′UTR

Normalized in-line

probing intensity

abc

Fig. 4 PARCEL identiﬁes new RNA aptamers in Candida albicans.aPie chart of the number of C.albicans RNA aptamers that are located in 5′UTR, CDS, and

3′UTR. The majority of C.albicans RNA aptamers are found in CDSs. bComparison of the distribution of Alifoldz scores for RNA aptamers vs. shufﬂed

counterpart. The upper, middle, and lower bounds of the boxplot represent the 75, 50, and 25th percentile of the values, respectively. A negative score

indicates a stable, conserved consensus structure. P-value was obtained using the non-parametric Kolmogorov–Smirnov test. cNucleotide substitution

rates, calculated as the number of substitutions per base-pair, for RNA aptamers (Kr), 3′UTR (K

3UTR

), 5′UTR(K

5UTR

), synonymous sites (Ks), and non-

synonymous sites (Ka). The upper, middle, and lower bounds of the boxplot represent the 75, 50, and 25th percentile of the values, respectively. C.albicans

SC5314 was compared to Candida dubliniensis for the calculation. p-value was obtained using the non-parametric Kolmogorov–Smirnov test. dGel analysis

of RPS31 mRNA using in-line probing (left) and RNase V1 (right) in the presence (lane 3) and absence (lane 2) of 100 µM FMN. The A ladder (A, lane 1) is

also shown. The black arrows indicate positions along the RNA that changed in the presence of FMN. eA representative Western blot showing RPS31::

FLAG (top) and loading (bottom) protein levels in RPS31::FLAG knock-in strains with WT (left) and fmn1Δ(right) backgrounds cultured at different FMN

concentrations (mM). Using t-test (n=8), signiﬁcant p-values of 0.009 and 0.01 (for 2.5 and 5.0 mM against 10.0 mM, respectively) were determined for

fmn1Δ, but not WT (p-values of 0.2 and 0.5). fGel analysis of RPS31 mRNA using in-line probing in the presence of 20, 100, or 500 µM of FMN, FAD or

riboﬂavin. In-line probing of RNA in the absence of metabolite (H

O, lane 2) and A ladder (A, lane 1) are also shown. gSAFA analysis of WT RPS31 (top)

and codon-optimized RPS31 (bottom) in the presence (red line) and absence (black line) of 100 µM FMN. The beige box indicates the region of structural

change in WT RPS31 when it interacts with FMN. This structural change is absent in the codon-optimized RPS31. hA representative Western blot showing

codon-optimized RPS31::FLAG (top) and loading (bottom) protein levels in codon-optimized RPS31::FLAG knock-in strains with WT (left) and fmn1Δ(right)

backgrounds cultured at different FMN concentrations (mM). Using t-test (n=3), calculated p-values for 2.5 and 5.0 mM were insigniﬁcant for both fmn1Δ

(both 0.7) and WT (0.9 and 0.09)

NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-03675-1 ARTICLE

NATURE COMMUNICATIONS | (2018) 9:1289 |DOI: 10.1038/s41467-018-03675-1 |www.nature.com/naturecommunications 7

Content courtesy of Springer Nature, terms of use apply. Rights reserved

these genes could represent the ﬁrst known eukaryotic FMN

riboswitches.

To further understand ligand binding speciﬁcities and afﬁnities

of this putative FMN riboswitch, we performed detailed structural

studies on the RPS31 transcript. We observed that RPS31 RNA

binds speciﬁcally to FMN and does not respond to structurally

similaranalogs,suchasriboﬂavin and FAD (Fig. 4f, Supplementary

Fig. 11d). Integrating double-stranded (RNase V1), single-stranded

(S1 nuclease) structure probing and in-line probing information

along the length of the RPS31 transcript into the RNAfold

prediction software showed that RPS31 RNA consists of seven

stems around a central loop, and that FMN binding results in

extensive structural rearrangements (Supplementary Fig. 12a). The

FMN-bound RPS31 aptamer consists of six stems around the

central FMN bound loop, and appears to resemble the prokaryotic

FMN riboswitch structure16. This attests to the structural plasticity

of RNA molecules, whereby different sequences can be utilized to

form similar structures for cellular function.

To further test whether the FMN-induced change in RPS31

protein levels is mediated post-transcriptionally, we designed a

codon-optimized version of C.albicans RPS31 (by changing

nucleotides of synonymous bases) to disrupt RPS31 RNA structure

without altering its protein sequence. As expected, codon-optimized

RPS31 mRNA is structurally different from wildtype RPS31 mRNA,

and does not show structure changes in the presence and absence of

FMN (Fig. 4g). Codon-optimized RPS31 maintained similar mRNA

and protein levels at varying concentrations of FMN (Fig. 4h,

Supplementary Fig.12b-d), supporting the hypothesis that the

binding of FMN to native C.albicans RPS31 RNA results in post-

transcriptional regulation of RPS31 protein levels.

Discussion

In summary, we have developed a new strategy named PARCEL

that experimentally identiﬁes RNA aptamers transcriptome-wide

by detecting ligand-induced structure changes. As PARCEL

allows us to rapidly screen through transcriptomes to identify

RNA aptamers, we identiﬁed a total of 58 novel candidate RNA

aptamers in two prokaryotic and one eukaryotic species, includ-

ing a second class of putative eukaryotic riboswitches. Unlike

known riboswitches described in the literature, the newly iden-

tiﬁed aptamers reside in both UTRs and coding sequences, and

are not necessarily linked to the biosynthetic pathways of their

respective ligands. Further characterization of three new RNA

aptamers showed that they could be riboswitches as FMN-sensing

induces RNA structural changes and regulates transcript levels of

tmRNA in B.subtilis, and protein levels of RPS31 and ATP1 in C.

albicans. As PARCEL can be readily applied to any transcriptome

and ligand, we believe that further application of PARCEL to

diverse organisms will result in the identiﬁcation of many novel

natural RNA aptamers in the near future, providing new building

blocks for biological sensing and deepening our understanding of

RNA-based gene regulation in vivo.

Methods

Bacterial and yeast cultures.P.aeruginosa PAO1 and B.subtilis 168, were grown

in LB or minimal media to log (OD

600

=0.6–0.8) or stationary phases (OD

600

> 2).

Total RNA from P.aeruginosa was extracted using Trizol reagent (Thermo Fisher

Scientiﬁc). Total RNA from B.subtilis was extracted by ﬁrst incubating B.subtilis in

4 mg per mL of lysozyme for 15 min before using Trizol LS reagent. Ribosomal

depleted RNA, Ribo(−) RNA, was obtained by using Ribo-Zero rRNA Removal Kit

(Epicenter) according to manufacturer’s instructions. S.cerevisiae S288C was

grown in YPD to exponential phase (OD

600

=0.6–0.8). C.albicans strain SC5314

was grown in YPD or GMM (yeast nitrogen base without amino acids and with 2%

glucose) to exponential (OD

600

=0.6–0.8) or stationary phases (OD

600

> 2). Total

RNA from S.cerevisiae or C.albicans was extracted using a slightly modiﬁed

protocol that uses hot acid phenol17. Poly(A) +RNA was obtained by using the

Poly(A) Purist MAG kit (Thermo Fisher Scientiﬁc) according to manufacturer’s

instructions. Poly(A)+or Ribo(−) RNA were then structure probed in the pre-

sence and absence of metabolites.

fmn1Δmutant was created by replacement of FMN1 in BY4741 strain with

KanMX using homologous recombination18. RPS31 from C.albicans, with a C-

terminal FLAG-tag (GATTACAAGGACGACGATGACAAG), was integrated

together with URA3 at the ura3Δsite to generate the RPS31::FLAG knock-in

strains. ATP1 from C.albicans, with a N-terminal FLAG-tag, was integrated

together with URA3 at the ura3Δsite to generate the ATP1::FLAG knock-in

strains.

RNA structure probing. Brieﬂy, 250 ng of Poly(A)+or Ribo(−) RNA was heated

to 90 °C for 2 min and cooled on ice for 2min before adding 10X RNA structure

buffer (500 mM Tris pH 7.4, 1.5 M NaCl, 100 mM MgCl

) and metabolites to the

RNA. The RNA pool was slowly brought to 37 °C for 30 min and structure probed

using RNase V1 (1:2000 dilution, AM2275 Life Technologies) or S1 nuclease (1:500

dilution, Fermentas) at 37 °C for 15 min. The nuclease reactions were inactivated

using phenol chloroform extraction and ethanol precipitated. In-line probing

reactions were performed in 50 mM Tris-HCl (pH 8.3), 20 mM MgCl

, and 100

mM KCl at 25 °C for 40 hours8. The in-line probed RNA was phosphorylated using

T4 polynucleotide kinase (PNK) in T4 PNK buffer and 1 mM ATP to capture the

cleavage sites.

Library preparation. Structure probed RNA was fragmented at 95 °C for 3.5 min

in alkaline hydrolysis buffer (Ambion). As fragmentation results in 5′OH, and is

hence ligation incompatible, it does not interfere with the downstream library

preparation process. Fragmented RNA was then puriﬁed using RiboMinus con-

centration module (Life Technologies), using the modiﬁed protocol for RNAs that

are <200 bases. The RNA was eluted in 12 µl of nuclease free water and con-

centrated to 2 µl using a vacuum centrifuge. The RNA was then ligated to 5′

adapter from NEBNext Multiplex Small RNA Library Prep Set for Illumina using

T4 RNA ligase1 (T4 RNA ligase buffer, 1 mM ATP, 10% PEG, 10% DMSO) at 16 °

C overnight. The 5′adapter ligated RNAs were then puriﬁed through a 6% TBE

urea PAGE gel and size selected for 50–200 bases. The RNA was then ligated to 3′

adapter, reverse transcribed, and PCR ampliﬁed using the NEBNext Multiplex

Small RNA Library Prep Set (New England Biolabs) for Illumina using manu-

facturer’s instructions. Eighteen cycles of PCR ampliﬁcation were typically per-

formed for each library.

RNA footprinting analysis. Cleavage and modiﬁcation sites along structure pro-

bed RNA were identiﬁed using primer extension. Brieﬂy, a primer located ~30–50

bases downstream of the structure probed region was labeled with ɣP32 ATP using

T4 PNK kinase. The labeled primer was then puriﬁed using a 15% TBE urea PAGE

gel. The labeled primer was incubated with the RNA at 65 °C for 5 min, followed by

35 °C for 5 min, and then cooled at 4 °C. To detect the structure probed sites, we

add 3 µl of enzyme mix (4:1:1 of ﬁrst-strand buffer: DTT: NTP) to the reaction,

incubated at 52 °C for 1 min, and Superscript III was added to the reaction at 52 °C

for 10 min. To generate a sequencing ladder for the RNA, we added 1 µl of ddNTP

(5 mM) to the reaction after the enzyme mix, and before adding Superscript III. 4

M sodium hydroxide was added to the reaction to denature the RNA before the

samples were loaded onto a 7 M TBE-Urea PAGE sequencing gel. Gel images were

quantiﬁed using the software Semi-automated footprinting analysis (SAFA)19.

RNA structure models. RNA secondary structure predictions were generated

using RNA footprinting data with RNase V1, S1 nuclease, and NAI as constraints,

using the program RNAfold20 with default parameters.

qPCR and Western blotting for wildtype and codon-optimized RPS31. The

RPS31::FLAG and ATP1::FLAG strains were inoculated from single colonies into 2

mL SC-ura media and grown overnight at 30 °C, with shaking. Strains with the

fmn1Δmutation were supplemented with 10 mM FMN and 200 µg per mL G418 in

the cultures. The overnight cultures (1:100 dilution of OD

600

2.0) were used to seed

50 mL YPD cultures supplemented with 2.5, 5.0 or 10.0 mM FMN. Cells were

harvested after 4–6 h of growth at 30 °C, with shakin g (when OD

600

reaches 0.4).

The cultures were split for RNA extraction and Western blotting, pelleted, and

washed once with PBS. The resultant cell pellets were frozen and stored at −80 °C.

RNA extraction and qPCR. RNA was extracted from frozen yeast pellets using the

hot acidic phenol method and treated with TURBOTM DNase (ThermoFisher

Scientiﬁc)17. We made cDNA using the Transcriptor First Strand cDNA Synthesis

Kit (Roche) and qPCR was performed using SYBR Green Master Mix (Roche) on a

Light Cycler 96 instrument (Roche). Primers used are listed as below. The RPS31

and ATP1 primers are speciﬁc for the knock-in C.albicans RPS31 and ATP1, and

do not amplify the endogenous S.cerevisiae RPS31 and ATP1. Normalized fold

changes were calculated by normalizing against actin (ACT1) and the respective

strain cultured at 10 mM FMN.

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-03675-1

8NATURE COMMUNICATIONS | (2018) 9:1289 |DOI: 10.1038/s41467-018-03675-1 |www.nature.com/naturecommunications

Content courtesy of Springer Nature, terms of use apply. Rights reserved

Forward Reverse

RPS31 set 1 (HZ_pri072) TCCACC

AGACCAAC

AAAGATTG

(HZ_pri073) ACCAAGTG

CAAGGTGGATTC

RPS31 set 2 (HZ_pri023) GAATCCA

CCTTGCACTTGGTC

(HZ_pri024) GCCAACT

TGTGTTTTCTGTGC

RPS31 set 3 (HZ_pri015) GCACAGAA

AACACAAGTTGGC

(HZ_pri016) CCATGAAA

ATACCGGCACCAC

ACT1 (HZ_pri051) ATGGATTC

TGAGGTTGCTGC

(HZ_pri052) TGGTCT

ACCGACGATAGATGG

RPS31_codon-

optimized set 1

(HZ_pri118) TGGAGGTCG

AGTCATCAGATAC

(HZ_pri119) CGTCTTA

TTTTCGCAGGGAAGC

RPS31_codon-

optimized set 2

(HZ_pri122) TTTCGCA

GGGAAGCAGTTAG

(HZ_pri123) TTTTTCCC

CCACCCCTTAACC

RPS31_codon-

optimized set 3

(HZ_pri120) TAGAGAG

GTTGAGGCGTGAATG

(HZ_pri121) GGCACTT

ACCGCAATATTGACG

ATP1 set 1 (HZ_pri064) TTACGTA

CTGCTGCTCGTACAG

(HZ_pri065) GGCAGAG

GCAAATCTTTGAGC

ATP1 set 2 (HZ_pri058) AAGTCG

GGGTTGTGTTGTTC

(HZ_pri059) TTCTGGA

CCAATTGGAACGG

ATP1 set 3 (HZ_pri062) TCGCTGG

TGTTAACGGTTTC

(HZ_pri063) CACCCTT

GGTCTTAATAGCATCC

Western blotting. The frozen cell pellet was resuspended in 80 µL of lysis buffer

(50 mM Tris pH 7.4, 4% SDS) with proteinase inhibitor (one tablet of completeTM,

Mini, EDTA-free [Roche] in 2.5 mL of lysis buffer). An equal volume of glass beads

(425–600 µm, Sigma-Aldrich) was added and cells were lysed in a Mini-

Beadbeater-96 (Biospec Products) for two cycles of 15 s with a two minute interval.

Cell lysates were centrifuged and supernatants were run on 4–20% Mini-Protean

TGX Stain-Free protein gels (Bio-Rad). To determine the relative levels of total

protein (loading), the gels were ﬁrst imaged on a ChemiDoc MP System (Bio-Rad)

using the stain-free technology21. Following wet transfer and blocking with 5%

milk in PBST, RPS31::FLAG and ATP1::FLAG were detected using mouse anti-

FLAG M2 primary antibody (1:4000 in PBS at 4 °C overnight, Sigma-Aldrich,

F1804) and sheep anti-mouse IgG, HRP-linked secondary antibodies (1:20000 in

1% milk at 25 °C for an hour, GE Healthcare, NA931). All images were taken using

the ChemiDoc MP System (Bio-Rad) and analyzed with ImageJ22. Corresponding

uncropped images of blots (in main ﬁgures) can be found in supplementary ﬁgures.

Read mapping. Short reads from PARCEL libraries (Illumina HiSeq, 50 bp, single-

end) in P.aeruginosa PAO1, B.subtilis 168, and E.coli K12 were aligned to their

corresponding reference genomes downloaded from NCBI using the short-read

aligner bowtie2 (parameters: −k1–local)23. In the case of S.cerevisiae S288C and C.

albicans SC5314, we extracted the UTR annotation from Bruno et al.24,25 and

integrated them into their corresponding transcriptomes before alignment by bow-

tie2. In both cases, only uniquely mapped reads were used for subsequent analysis.

Identiﬁcation of RNA aptamers. For each position along the genome or tran-

scriptome, we counted the number of reads whose ﬁrst mapped base was one base

downstream of the inspected position. Higher counts suggest greater accessibility to

V1 nuclease, and are more likely to be associated with a double-stranded con-

formation. In all expressed transcripts, positions with zero count could either be

associated with a single-stranded conformation, or come from a heavily folded

region that is inaccessible to V1 nuclease.

Since RNA structural changes should typically span across multiple bases, we

looked for regions that exhibit differential V1 counts to increase the sensitivity/

speciﬁcity of detecting RNA aptamers. We ﬁrst evaluated the signiﬁcance of

differential V1 counts at each nucleotide position using the edgeR package26, where

we compared samples treated by one speciﬁc metabolite (e.g., TPP) to samples

from all other conditions. We focused on positions that were generally accessible to

the V1 nuclease by applying a minimum abundance threshold (average counts per

sample per position, a> 1) and then computed a score s

for each passed position i

based on edgeR-generated p-values (pval

): s

=ln(0.1) −ln(pval

) (in effect, giving

negative scores for p-values > 0.1).

The higher the score, s

, the more likely that differential V1 cutting was observed

at that speciﬁc position. Accordingly, positions that failed to pass the abundance

threshold were assigned a penalty score of −10. We then looked for segments of

contiguous positions (e.g., a segment from position m to n) with the highest

aggregate score S¼P

i¼m

, by applying the Kadane algorithm (maximal subarray

problem). We then determined the signiﬁcance of these high-scoring segments

based on Karlin–Altschul statistics, similar to the approach used in BLAST27.

As described by Karlin and Altschul27, the expected value (E-value) of high-

scoring segments with an aggregate score of at least Sis given by the formula:

Ev¼KeλS

:ð1Þ

Therefore, we examined the extreme value distribution of the aggregate score S

to estimate the two parameters required i.e., Kand λ. Speciﬁcally, λcan be

calculated from the formula: Ppieλsi¼1, where p

is the corresponding

probability of the scores

.Ass

=ln(0.1) −ln(pval

) and pval

approximately

follows a uniform distribution (U(0,1)) due to the assumption that the majority of

nucleotide positions do not undergo any structural changes, the equation

Ppieλsi¼1can be translated to:

pvali¼0eλln 0:1ðÞln pvali

ðÞðÞ

¼1:

This can be solved to 0:1λ

1λ¼1, and λ=0.862871. The parameter Kis bounded

between K¼Cλδ

eλδ1



and Kþ¼Cλδ

1eλδ



. Since δis the smallest span of s

is bounded between limδ!0Kand lim

δ!0Kþand C*isdeﬁned by the formula:

C¼e2P1

k¼1

kðEe

λSk;Sk<0

ðÞ

þProb Sk0ðÞÞ

λEðS1eλS1Þ:

Here, S

is the random variable representing the sum of kindependently chosen

i.e., Sk¼Psi¼kIn 0:1ðÞðÞþ

Pln pvali



. Since pvali

U0;1ðÞ

Pln pvali



approximately follows the gamma distribution i.e.,

Γk¼k;δ¼1ðÞ:Let Xk¼Pk

i¼1ln pvali



, it can then be derived that:

λSk;Sk<0



¼0:1λkZkln0:1

Xk¼0

Xk1

keλ1ðÞXk

k1ðÞ!;

ProbðSk0Þ¼1Rkln0:1

t¼0tk1et

k1ðÞ!;

and

λES

1eλS1



¼λ0:1λ1λ1ðÞln0:1

λ1

ðÞ

Taken together, C* can be solved to take the value of 0.0809635, and the upper

and lower bounds of K,K−, and K+, both equal C*, i.e., K=0.0809635. We then

calculated E

for high-scoring segments by applying equation (1). Segments that

pass the E

threshold of 10 were considered as candidate RNA aptamers that

undergo metabolite-responsive conformational changes. Under all conditions, we

report candidate regions that have positions with absolute fold-change f>2,

relative abundance greater than median +standard deviation for the transcript

(abundance-ﬁlter; to avoid segments with lower accessibility) and low bonferroni-

corrected p-value (<10; to avoid segments with no strongly changing position).

Distribution of RNA aptamers across operons and transcripts. We evaluated

the distribution of RNA aptamers across operons in bacteria, and along transcripts

in fungi (including a 500 bp window on either side when UTRs were not speciﬁed).

We plotted the histogram of all RNA aptamer positions, with operons in bacteria

and coding regions in fungi being scaled to 1 kbp. There are cases where the same

position can be considered as belonging to multiple classes and in such cases, we

preferentially assigned positions to the 5′UTR, then to the operon or CDS, and

lastly to the 3′UTR.

Sequence conservation of RNA aptamers. We estimated the sequence con-

servation of identiﬁed RNA aptamers by measuring nucleotide substitution rate of

these regions to their blastn-identiﬁed orthologous sequences. If the identiﬁed

aptamer regions were shorter than 200 bases in length, we extended them on both

sides to a maximum of 200 bases. As highly divergent and highly similar sequences

would result in an unreliable estimate of nucleotide substitution rate28, we chose

fairly divergent, and yet not too divergent species (median Ks ranges from 0.1 to

0.4) for this analysis. The orthologous riboswitches, 3′UTR, 5′UTR, and protein

coding regions were identiﬁed using blastn, for non-coding, or genblastG29, for

coding sequence, in other species, respectively. To identify orthologous noncoding

sequences in other organisms with high sensitivity, we changed the default blastn

parameters as follows: “-e 1e-5 -word_size 7 –gapopen 2 –gapextend 1”30.We

aligned the noncoding sequences using MUSCLE31, and the coding sequences

using MACSE32, to construct the multiple sequence alignment. The nucleotide

substitution rate of riboswitches, 3′UTR and 5′UTR were calculated using

Kimura’s 2-parameter method33. The synonymous and non-synonymous sub-

stitution rates was calculated using Kakscalculator34 with the LPB method.

Calculating the degree of pairedness for RNA aptamers and controls.We

searched for orthologous sequences of RNA aptamers identiﬁed in B.subtilis,P.

aeruginosa, and C.albicans across the Bacillus,Pseudomonas, and Candida genus

using blastn (with parameters: “-e 1e-5 -word_size 7 –gapopen 2 –gapextend 1”).

The species that were used in each genus are: B.subtilis XF-1, B.subtilis BSn5, B.

malacitensis CR-95, B.natto BEST195, B.licheniformis DSM13, B.subtilis subsp.

NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-03675-1 ARTICLE

NATURE COMMUNICATIONS | (2018) 9:1289 |DOI: 10.1038/s41467-018-03675-1 |www.nature.com/naturecommunications 9

Content courtesy of Springer Nature, terms of use apply. Rights reserved

spizizenii W23, and B.cereus ATCC-14579 for Bacillus;P.aeruginosa PAO1, P.

aeruginosa PA7, P.mendocina 1267_PMEN, P.knackmussii B13, P.oryzihabitans

USDA-ARS-USMARC-56511, P.pseudoalcaligenes KF707, P.stutzeri A1501, P.

stutzeri A1501, P.mendocina ymp, P.entomophila L48, P.putida F1, P.ﬂuorescens

SBW25, and P.syringae pv.tomato str. DC3000 for Pseudomonas;C.albicans

SC5314,C.albicans WO-1, and C.dubliniensis for Candida. We then built multiple

species alignments for each RNA aptamer region using MUSCLE31. We used the

program Alifoldz12 to calculate the energy and structural stability of the consensus

structure. For each RNA aptamer alignment, a shufﬂed alignment was generated as

a control using the “shufﬂe.pl”script from the Alifoldz package.

Data availability. All relevant data are available from the authors upon request.

Data has been deposited under GEO accession number GSE106133.

Received: 21 December 2017 Accepted: 5 March 2018

References

1. Gasch, A. P. et al. Genomic expression programs in the response of yeast cells

to environmental changes. Mol. Biol. Cell 11, 4241–4257 (2000).

2. Breaker, R. R. Riboswitches and the RNA world. Cold Spring Harb. Perspect.

Biol.4, a003566 (2012).

3. Conrad, R. C., Baskerville, S. & Ellington, A. D. In vitro selection methodologies

to probe RNA function and structure. Mol. Divers. 1,69–78 (1995).

4. Barrick, J. E. & Breaker, R. R. The distributions, mechanisms, and structures of

metabolite-binding riboswitches. Genome Biol. 8, R239 (2007).

5. Wan, Y., Kertesz, M., Spitale, R. C., Segal, E. & Chang, H. Y. Understanding the

transcriptome through RNA structure. Nat. Rev. Genet. 12, 641–655 (2011).

6. Dar, D. et al. Term-seq reveals abundant ribo-regulation of antibiotics

resistance in bacteria. Science 352, aad9822 (2016).

7. Kertesz, M. et al. Genome-wide measurement of RNA secondary structure in

yeast. Nature 467, 103–107 (2010).

8. Regulski, E. E. & Breaker, R. R. In-line probing analysis of riboswitches.

Methods Mol. Biol. 419,53–67 (2008).

9. Winkler, W., Nahvi, A. & Breaker, R. R. Thiamine derivatives bind messenger

RNAs directly to regulate bacterial gene expression. Nature 419,952–956 (2002).

10. Winkler, W. C., Nahvi, A., Sudarsan, N., Barrick, J. E. & Breaker, R. R. An

mRNA structure that controls gene expression by binding S-

adenosylmethionine. Nat. Struct. Biol. 10, 701–707 (2003).

11. Novichkov, P. S. et al. RegPrecise 3.0–a resource for genome-scale exploration of

transcriptional regulation in bacteria. BMC Genomics 14, 745-2164-14-745 (2013).

12. Washietl, S. & Hofacker, I. L. Consensus folding of aligned sequences as a new

measure for the detection of functional RNAs by comparative genomics. J.

Mol. Biol. 342,19–30 (2004).

13. Li, S. & Breaker, R. R. Eukaryotic TPP riboswitch regulation of alternative splicing

involving long-distance base pairing. Nucleic Acids Res. 41, 3022–3031 (2013).

14. Wachter, A. et al. Riboswitch control of gene expression in plants by splicing

and alternative 3′end processing of mRNAs. Plant Cell 19, 3437–3450 (2007).

15. Echt, S. et al. Potential anti-infective targets in pathogenic yeasts: structure and

properties of 3,4-dihydroxy-2-butanone 4-phosphate synthase of Candida

albicans. J. Mol. Biol. 341, 1085–1096 (2004).

16. Winkler, W. C., Cohen-Chalamish, S. & Breaker, R. R. An mRNA structure

that controls gene expression by binding FMN. Proc. Natl Acad. Sci. USA 99,

15908–15913 (2002).

17. Collart, M. A. & Oliviero, S. Preparation of yeast RNA. Curr.Protoc.Mol.Biol.

Chapter 13, Unit13.12 (2001).

18. Guldener, U., Heck, S., Fielder, T., Beinhauer, J. & Hegemann, J. H. A new

efﬁcient gene disruption cassette for repeated use in budding yeast. Nucleic

Acids Res. 24, 2519–2524 (1996).

19. Das,R.,Laederach,A.,Pearlman,S.M.,Herschlag,D.&Altman,R.B.SAFA:

semi-automated footprinting analysis software for high-throughput quantiﬁcation

of nucleic acid footprinting experiments. RNA 11, 344–354 (2005).

20. Lorenz, R. et al. ViennaRNA Package 2.0. Algorithms Mol.Biol.6, 26-7188-6-

26 (2011).

21. Posch, A., Kohn, J., Oh, K., Hammond, M. & Liu, N. V3 stain-free workﬂow

for a practical, convenient, and reliable total protein loading control in

western blotting. J. Vis. Exp. 82, 50948 (2013).

22. Schindelin, J. et al. Fiji: an open-source platform for biological-image analysis.

Nat. Methods 9, 676–682 (2012).II

23. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat.

Methods 9, 357–359 (2012).

24. Nagalakshmi, U. et al. The transcriptional landscape of the yeast genome

deﬁned by RNA sequencing. Science 320, 1344–1349 (2008).

25. Bruno, V. M. et al. Comprehensive annotation of the transcriptome of the

human fungal pathogen Candida albicans using RNA-seq. Genome Res. 20,

1451–1458 (2010).

26. Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor

package for differential expression analysis of digital gene expression data.

Bioinformatics 26, 139–140 (2010).

27. Karlin, S. & Altschul, S. F. Methods for assessing the statistical signiﬁcance of

molecular sequence features by using general scoring schemes. Proc. Natl

Acad. Sci. USA 87, 2264–2268 (1990).

28. Tzeng, Y. H., Pan, R. & Li, W. H. Comparison of three methods for estimating

rates of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol.

Evol. 21, 2290–2298 (2004).

29. She, R. et al. genBlastG: using BLAST searches to build homologous gene

models. Bioinformatics 27, 2141–2143 (2011).

30. Lu, J. et al. The birth and death of microRNA genes in Drosophila. Nat. Genet.

40, 351–355 (2008).

31. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and

high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).

32. Ranwez, V., Harispe, S., Delsuc, F. & Douzery, E. J. MACSE: multiple

alignment of coding sequences accounting for frameshifts and stop codons.

PLoS One 6, e22594 (2011).

33. Kimura, M. A simple method for estimating evolutionary rates of base

substitutions through comparative studies of nucleotide sequences. J. Mol.

Evol. 16, 111–120 (1980).

34. Zhang, Z. et al. KaKs_Calculator: calculating Ka and Ks through model

selection and model averaging. Genom. Proteom. Bioinform. 4, 259–263 (2006).

Acknowledgements

We thank members of the Wan lab, Nagarajan lab, S. Chen, W.F. Burkholder, A. Sim, H.

H. Ng, and B. Lim for discussions. B.subtilis 168 was obtained from the Bacillus Genetic

Stock Center. N. Nagarajan is supported by funding from A*STAR. Y. Wan is supported

by funding from A*STAR, Society in Science - Branco Weiss Fellowship, and EMBO

Young Investigatorship.

Author contributions

Y.W. conceived the project, developed the protocol, and designed the experiments.

N.N. and M.S. designed the computational pipeline. Y.W., S.T., X.N.L., T.T.S., G.S.Z., J.L.,

Y.W., L.H.Z., E.L.A., H.Z. and H.Z. planned and performed all the experiments. N.N.,

S.L.Y., and M.S. planned and conducted the data analysis. A.L. helped with the

sequencing. Y.W. organized and wrote the paper with contributions from H.Z., S.L.Y.,

N.N. and all other authors.

Additional information

Supplementary Information accompanies this paper at https://doi.org/10.1038/s41467-

018-03675-1.

Competing interests: The authors declare no competing interests.

Reprints and permission information is available online at http://npg.nature.com/

reprintsandpermissions/

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in

published maps and institutional afﬁliations.

Open Access This article is licensed under a Creative Commons

Attribution 4.0 International License, which permits use, sharing,

adaptation, distribution and reproduction in any medium or format, as long as you give

appropriate credit to the original author(s) and the source, provide a link to the Creative

Commons license, and indicate if changes were made. The images or other third party

material in this article are included in the article’s Creative Commons license, unless

indicated otherwise in a credit line to the material. If material is not included in the

article’s Creative Commons license and your intended use is not permitted by statutory

regulation or exceeds the permitted use, you will need to obtain permission directly from

the copyright holder. To view a copy of this license, visit http://creativecommons.org/

licenses/by/4.0/.

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-03675-1

10 NATURE COMMUNICATIONS | (2018) 9:1289 |DOI: 10.1038/s41467-018-03675-1 |www.nature.com/naturecommunications

Content courtesy of Springer Nature, terms of use apply. Rights reserved

Terms and Conditions

Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).

Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-

scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By

accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these

purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.

These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal

subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription

(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will

apply.

We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within

ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not

otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as

detailed in the Privacy Policy.

While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may

not:

use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access

control;

use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is

otherwise unlawful;

falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in

writing;

use bots or other automated methods to access the content or redirect messages

override any security feature or exclusionary protocol; or

share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal

content.

In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,

royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal

content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any

other, institutional repository.

These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or

content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature

may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.

To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied

with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,

including merchantability or fitness for any particular purpose.

Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed

from third parties.

If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not

expressly permitted by these Terms, please contact Springer Nature at

onlineservice@springernature.com

Available via license: CC BY

Content may be subject to copyright.

Supplementary Material 2

Data

March 2018

Sidika Tapsin · Miao Sun · Yang Shen · Huibin Zhang · Yue Wan

Supplementary Material 1

Data

March 2018

Sidika Tapsin · Miao Sun · Yang Shen · Huibin Zhang · Yue Wan

Identification and characterization of RNA binding sites for (p)ppGpp using RNA-DRaCALA

Article

Full-text available

Jan 2023
NUCLEIC ACIDS RES

Ligand-binding RNAs (RNA aptamers) are widespread in the three domains of life, serving as sensors of metabolites and other small molecules. When aptamers are embedded within RNA transcripts as components of riboswitches, they can regulate gene expression upon binding their ligands. Previous methods for biochemical validation of computationally predicted aptamers are not well-suited for rapid screening of large numbers of RNA aptamers. Therefore, we utilized DRaCALA (Differential Radial Capillary Action of Ligand Assay), a technique designed originally to study protein-ligand interactions, to examine RNA-ligand binding, permitting rapid screening of dozens of RNA aptamer candidates concurrently. Using this method, which we call RNA-DRaCALA, we screened 30 ykkC family subtype 2a RNA aptamers that were computationally predicted to bind (p)ppGpp. Most of the aptamers bound both ppGpp and pppGpp, but some strongly favored only ppGpp or pppGpp, and some bound neither. Expansion of the number of biochemically verified sites allowed construction of more accurate secondary structure models and prediction of key features in the aptamers that distinguish a ppGpp from a pppGpp binding site. To demonstrate that the method works with other ligands, we also used RNA DRaCALA to analyze aptamer binding by thiamine pyrophosphate.

Differential Analysis of RNA Structure Probing Experiments at Nucleotide Resolution: Uncovering Regulatory Functions of RNA Structure

Preprint

Full-text available

Aug 2021

RNAs perform their function by forming specific structures, which can change across cellular conditions. Structure probing experiments combined with next generation sequencing technology have enabled transcriptome-wide analysis of RNA secondary structure in various cellular conditions. Differential analysis of structure probing data in different conditions can reveal the RNA structurally variable regions (SVRs), which is important for understanding RNA functions. Here, we propose DiffScan, a computational framework for normalization and differential analysis of structure probing data in high resolution. DiffScan preprocesses structure probing datasets to remove systematic bias, and then scans the transcripts to identify SVRs and adaptively determines their lengths and locations. The proposed approach is compatible with most structure probing platforms (e.g., icSHAPE, DMS-seq). When evaluated with simulated and benchmark datasets, DiffScan identifies structurally variable regions at nucleotide resolution, with substantial improvement in accuracy compared with existing SVR detection methods. Moreover, the improvement is robust when tested in multiple structure probing platforms. Application of DiffScan in a dataset of multi-subcellular RNA structurome identified multiple regions that form different structures in nucleus and cytoplasm, linking RNA structural variation to regulation of mRNAs encoding mitochondria-associated proteins. This work provides an effective tool for differential analysis of RNA secondary structure, reinforcing the power of structure probing experiments in deciphering the dynamic RNA structurome.

Differential analysis of RNA structure probing experiments at nucleotide resolution: uncovering regulatory functions of RNA structure

Article

Full-text available

Jul 2022

RNAs perform their function by forming specific structures, which can change across cellular conditions. Structure probing experiments combined with next generation sequencing technology have enabled transcriptome-wide analysis of RNA secondary structure in various cellular conditions. Differential analysis of structure probing data in different conditions can reveal the RNA structurally variable regions (SVRs), which is important for understanding RNA functions. Here, we propose DiffScan, a computational framework for normalization and differential analysis of structure probing data in high resolution. DiffScan preprocesses structure probing datasets to remove systematic bias, and then scans the transcripts to identify SVRs and adaptively determines their lengths and locations. The proposed approach is compatible with most structure probing platforms (e.g., icSHAPE, DMS-seq). When evaluated with simulated and benchmark datasets, DiffScan identifies structurally variable regions at nucleotide resolution, with substantial improvement in accuracy compared with existing SVR detection methods. Moreover, the improvement is robust when tested in multiple structure probing platforms. Application of DiffScan in a dataset of multi-subcellular RNA structurome and a subsequent motif enrichment analysis suggest potential links of RNA structural variation and mRNA abundance, possibly mediated by RNA binding proteins such as the serine/arginine rich splicing factors. This work provides an effective tool for differential analysis of RNA secondary structure, reinforcing the power of structure probing experiments in deciphering the dynamic RNA structurome. The authors present DiffScan, an advanced tool for normalization and differential analysis of RNA structure probing experiments, combining their power in deciphering the dynamic RNA structurome and facilitating the discovery of RNA regulatory functions.

Discovering riboswitches: the past and the future

Article

Full-text available

Sep 2022
TRENDS BIOCHEM SCI

Riboswitches are structured noncoding RNA domains used by many bacteria to monitor the concentrations of target ligands and regulate gene expression accordingly. In the past 20 years over 55 distinct classes of natural riboswitches have been discovered that selectively sense small molecules or elemental ions, and thousands more are predicted to exist. Evidence suggests that some riboswitches might be direct descendants of the RNA-based sensors and switches that were likely present in ancient organisms before the evolutionary emergence of proteins. We provide an overview of the current state of riboswitch research, focusing primarily on the discovery of riboswitches, and speculate on the major challenges facing researchers in the field.

Shifted Reverse PAGE: a novel approach based on structure switching for the discovery of riboswitches and aptamers

Preprint

Full-text available

Jul 2022

Riboswitches are regulatory sequences composed of an aptamer domain capable of binding a ligand and an expression platform that allows the control of the downstream gene expression based on a conformational change. Current bioinformatic methods for their discovery have various limitations. To circumvent this, we developed an experimental technique to discover new riboswitches called SR-PAGE (Shifted Reverse Polyacrylamide Gel Electrophoresis). A ligand-based regulatory molecule is recognized by exploiting the conformational change of the sequence following binding with the ligand within a native polyacrylamide gel. Known riboswitches were tested with their corresponding ligands to validate our method. SR-PAGE was imbricated within an SELEX to enrich switching RNAs from a TPP riboswitch-based degenerate library to change its binding preference from TPP to thiamine. The SR-PAGE technique allows performing a large screening for riboswitches, search in several organisms and test more than one ligand simultaneously.

The discovery of novel noncoding RNAs in 50 bacterial genomes

Article

Apr 2024
NUCLEIC ACIDS RES

Structured noncoding RNAs (ncRNAs) contribute to many important cellular processes involving chemical catalysis, molecular recognition and gene regulation. Few ncRNA classes are broadly distributed among organisms from all three domains of life, but the list of rarer classes that exhibit surprisingly diverse functions is growing. We previously developed a computational pipeline that enables the near-comprehensive identification of structured ncRNAs expressed from individual bacterial genomes. The regions between protein coding genes are first sorted based on length and the fraction of guanosine and cytidine nucleotides. Long, GC-rich intergenic regions are then examined for sequence and structural similarity to other bacterial genomes. Herein, we describe the implementation of this pipeline on 50 bacterial genomes from varied phyla. More than 4700 candidate intergenic regions with the desired characteristics were identified, which yielded 44 novel riboswitch candidates and numerous other putative ncRNA motifs. Although experimental validation studies have yet to be conducted, this rate of riboswitch candidate discovery is consistent with predictions that many hundreds of novel riboswitch classes remain to be discovered among the bacterial species whose genomes have already been sequenced. Thus, many thousands of additional novel ncRNA classes likely remain to be discovered in the bacterial domain of life.

Real-time label-free detection of dynamic aptamer-small molecule interactions using a nanopore nucleic acid conformational sensor

Article

Full-text available

Jun 2023
P NATL ACAD SCI USA

Nucleic acids can undergo conformational changes upon binding small molecules. These conformational changes can be exploited to develop new therapeutic strategies through control of gene expression or triggering of cellular responses and can also be used to develop sensors for small molecules such as neurotransmitters. Many analytical approaches can detect dynamic conformational change of nucleic acids, but they need labeling, are expensive, and have limited time resolution. The nanopore approach can provide a conformational snapshot for each nucleic acid molecule detected, but has not been reported to detect dynamic nucleic acid conformational change in response to small -molecule binding. Here we demonstrate a modular, label-free, nucleic acid-docked nanopore capable of revealing time-resolved, small molecule-induced, single nucleic acid molecule conformational transitions with millisecond resolution. By using the dopamine-, serotonin-, and theophylline-binding aptamers as testbeds, we found that these nucleic acids scaffolds can be noncovalently docked inside the MspA protein pore by a cluster of site-specific charged residues. This docking mechanism enables the ion current through the pore to characteristically vary as the aptamer undergoes conformational changes, resulting in a sequence of current fluctuations that report binding and release of single ligand molecules from the aptamer. This nanopore tool can quantify specific ligands such as neurotransmitters, elucidate nucleic acid-ligand interactions, and pinpoint the nucleic acid motifs for ligand binding, showing the potential for small molecule biosensing, drug discovery assayed via RNA and DNA conformational changes, and the design of artificial riboswitch effectors in synthetic biology.

Real-Time Assessment of Intracellular Metabolites in Single Cells through RNA-Based Sensors

Article

Full-text available

Apr 2023

Alvaro D. Ortega

Quantification of the concentration of particular cellular metabolites reports on the actual utilization of metabolic pathways in physiological and pathological conditions. Metabolite concentration also constitutes the readout for screening cell factories in metabolic engineering. However, there are no direct approaches that allow for real-time assessment of the levels of intracellular metabolites in single cells. In recent years, the modular architecture of natural bacterial RNA riboswitches has inspired the design of genetically encoded synthetic RNA devices that convert the intracellular concentration of a metabolite into a quantitative fluorescent signal. These so-called RNA-based sensors are composed of a metabolite-binding RNA aptamer as the sensor domain, connected through an actuator segment to a signal-generating reporter domain. However, at present, the variety of available RNA-based sensors for intracellular metabolites is still very limited. Here, we go through natural mechanisms for metabolite sensing and regulation in cells across all kingdoms, focusing on those mediated by riboswitches. We review the design principles underlying currently developed RNA-based sensors and discuss the challenges that hindered the development of novel sensors and recent strategies to address them. We finish by introducing the current and potential applicability of synthetic RNA-based sensors for intracellular metabolites.

Chameleon-like microbes promote microecological differentiation of Daqu

Article

Sep 2022
FOOD MICROBIOL

Chameleon-like microbes in the fermentation community are an internal factor that facilitate the transformation of the community to the corresponding homeostasis states under specific environmental conditions. High temperature daqu can form three typical microecologies during the preparation process, making it an ideal system for studying chameleon-like microbes. This study integrated multi-omic methods such as metaproteomics, and determined that Neurospora crassa, Aspergillus nidulans, Bacillus subtilis and Oceanobacillus iheyensis were chameleon-like microbes that regulated the metabolic differences of five-member heterocyclic amino acids in daqu, resulting in microecological differentiation. Synthetic microbial consortia consisting of the four chameleon-like microbes with (T6) and without (T4) the dominant functional bacteria Saccharopolyspora erythraea and Virgibacillus haloimitrificans were fermented under simulated in situ conditions. The community constructed by microorganisms with greater functional diversity (T6) was more robust, and its metabolome was more similar to the in situ system. When exposed to environmental disturbances, the functional diversity helped to maintain the community stability by increasing the dissimilarity of chameleon-like microbes in the community and forming different homeostasis.

Genetically encoded biosensors for microbial synthetic biology: From conceptual frameworks to practical applications

Article

Dec 2022
BIOTECHNOL ADV

Genetically encoded biosensors are the vital components of synthetic biology and metabolic engineering, as they are regarded as powerful devices for the dynamic control of genotype metabolism and evolution/screening of desirable phenotypes. This review summarized the recent advances in the construction and applications of different genetically encoded biosensors, including fluorescent protein-based biosensors, nucleic acid-based biosensors, allosteric transcription factor-based biosensors and two-component system-based biosensors. First, the construction frameworks of these biosensors were outlined. Then, the recent progress of biosensor applications in creating versatile microbial cell factories for the bioproduction of high-value chemicals was summarized. Finally, the challenges and prospects for constructing robust and sophisticated biosensors were discussed. This review provided theoretical guidance for constructing genetically encoded biosensors to create desirable microbial cell factories for sustainable bioproduction.

V3 Stain-free Workflow for a Practical, Convenient, and Reliable Total Protein Loading Control in Western Blotting

Article

Full-text available

Dec 2013
JoVE

The western blot is a very useful and widely adopted lab technique, but its execution is challenging. The workflow is often characterized as a "black box" because an experimentalist does not know if it has been performed successfully until the last of several steps. Moreover, the quality of western blot data is sometimes challenged due to a lack of effective quality control tools in place throughout the western blotting process. Here we describe the V3 western workflow, which applies stain-free technology to address the major concerns associated with the traditional western blot protocol. This workflow allows researchers: 1) to run a gel in about 20-30 min; 2) to visualize sample separation quality within 5 min after the gel run; 3) to transfer proteins in 3-10 min; 4) to verify transfer efficiency quantitatively; and most importantly 5) to validate changes in the level of the protein of interest using total protein loading control. This novel approach eliminates the need of stripping and reprobing the blot for housekeeping proteins such as β-actin, β-tubulin, GAPDH, etc. The V3 stain-free workflow makes the western blot process faster, transparent, more quantitative and reliable.

RegPrecise 3.0 – A resource for genome-scale exploration of transcriptional regulation in bacteria

Article

Full-text available

Nov 2013
BMC GENOMICS

Background: Genome-scale prediction of gene regulation and reconstruction of transcriptional regulatory networks in prokaryotes is one of the critical tasks of modern genomics. Bacteria from different taxonomic groups, whose lifestyles and natural environments are substantially different, possess highly diverged transcriptional regulatory networks. The comparative genomics approaches are useful for in silico reconstruction of bacterial regulons and networks operated by both transcription factors (TFs) and RNA regulatory elements (riboswitches). Description: RegPrecise (http://regprecise.lbl.gov) is a web resource for collection, visualization and analysis of transcriptional regulons reconstructed by comparative genomics. We significantly expanded a reference collection of manually curated regulons we introduced earlier. RegPrecise 3.0 provides access to inferred regulatory interactions organized by phylogenetic, structural and functional properties. Taxonomy-specific collections include 781 TF regulogs inferred in more than 160 genomes representing 14 taxonomic groups of Bacteria. TF-specific collections include regulogs for a selected subset of 40 TFs reconstructed across more than 30 taxonomic lineages. Novel collections of regulons operated by RNA regulatory elements (riboswitches) include near 400 regulogs inferred in 24 bacterial lineages. RegPrecise 3.0 provides four classifications of the reference regulons implemented as controlled vocabularies: 55 TF protein families; 43 RNA motif families; ~150 biological processes or metabolic pathways; and ~200 effectors or environmental signals. Genome-wide visualization of regulatory networks and metabolic pathways covered by the reference regulons are available for all studied genomes. A separate section of RegPrecise 3.0 contains draft regulatory networks in 640 genomes obtained by an conservative propagation of the reference regulons to closely related genomes. Conclusions: RegPrecise 3.0 gives access to the transcriptional regulons reconstructed in bacterial genomes. Analytical capabilities include exploration of: regulon content, structure and function; TF binding site motifs; conservation and variations in genome-wide regulatory networks across all taxonomic groups of Bacteria. RegPrecise 3.0 was selected as a core resource on transcriptional regulation of the Department of Energy Systems Biology Knowledgebase, an emerging software and data environment designed to enable researchers to collaboratively generate, test and share new hypotheses about gene and protein functions, perform large-scale analyses, and model interactions in microbes, plants, and their communities.

Eukaryotic TPP riboswitch regulation of alternative splicing involving long-distance base pairing

Article

Full-text available

Feb 2013
NUCLEIC ACIDS RES

Thiamin pyrophosphate (TPP) riboswitches are found in organisms from all three domains of life. Examples in bacteria commonly repress gene expression by terminating transcription or by blocking ribosome binding, whereas most eukaryotic TPP riboswitches are predicted to regulate gene expression by modulating RNA splicing. Given the widespread distribution of eukaryotic TPP riboswitches and the diversity of their locations in precursor messenger RNAs (pre-mRNAs), we sought to examine the mechanism of alternative splicing regulation by a fungal TPP riboswitch from Neurospora crassa, which is mostly located in a large intron separating protein-coding exons. Our data reveal that this riboswitch uses a long-distance (∼530-nt separation) base-pairing interaction to regulate alternative splicing. Specifically, a portion of the TPP-binding aptamer can form a base-paired structure with a conserved sequence element (α) located near a 5′ splice site, which greatly increases use of this 5′ splice site and promotes gene expression. Comparative sequence analyses indicate that many fungal species carry a TPP riboswitch with similar intron architecture, and therefore the homologous genes in these fungi are likely to use the same mechanism. Our findings expand the scope of genetic control mechanisms relying on long-range RNA interactions to include riboswitches.

Fiji: An Open-Source Platform for Biological-Image Analysis

Article

Full-text available

Jun 2012
Br J Pharmacol

Fiji is a distribution of the popular open-source software ImageJ focused on biological-image analysis. Fiji uses modern software engineering practices to combine powerful software libraries with a broad range of scripting languages to enable rapid prototyping of image-processing algorithms. Fiji facilitates the transformation of new algorithms into ImageJ plugins that can be shared with end users through an integrated update system. We propose Fiji as a platform for productive collaboration between computer science and biology research communities.

Langmead B, Salzberg SL.. Fast gapped-read alignment with Bowtie 2. Nat Methods 9: 357-359

Article

Full-text available

Mar 2012
Br J Pharmacol

As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

ViennaRNA package 2.0

Article

Full-text available

Nov 2011
ALGORITHM MOL BIOL

Secondary structure forms an important intermediate level of description of nucleic acids that encapsulates the dominating part of the folding energy, is often well conserved in evolution, and is routinely used as a basis to explain experimental findings. Based on carefully measured thermodynamic parameters, exact dynamic programming algorithms can be used to compute ground states, base pairing probabilities, as well as thermodynamic properties. The ViennaRNA Package has been a widely used compilation of RNA secondary structure related computer programs for nearly two decades. Major changes in the structure of the standard energy model, the Turner 2004 parameters, the pervasive use of multi-core CPUs, and an increasing number of algorithmic variants prompted a major technical overhaul of both the underlying RNAlib and the interactive user programs. New features include an expanded repertoire of tools to assess RNA-RNA interactions and restricted ensembles of structures, additional output information such as centroid structures and maximum expected accuracy structures derived from base pairing probabilities, or z-scores for locally stable secondary structures, and support for input in fasta format. Updates were implemented without compromising the computational efficiency of the core algorithms and ensuring compatibility with earlier versions. The ViennaRNA Package 2.0, supporting concurrent computations via OpenMP, can be downloaded from http://www.tbi.univie.ac.at/RNA.

MACSE: Multiple Alignment of Coding SEquences Accounting for Frameshifts and Stop Codons

Article

Full-text available

Sep 2011
PLOS ONE

Until now the most efficient solution to align nucleotide sequences containing open reading frames was to use indirect procedures that align amino acid translation before reporting the inferred gap positions at the codon level. There are two important pitfalls with this approach. Firstly, any premature stop codon impedes using such a strategy. Secondly, each sequence is translated with the same reading frame from beginning to end, so that the presence of a single additional nucleotide leads to both aberrant translation and alignment. We present an algorithm that has the same space and time complexity as the classical Needleman-Wunsch algorithm while accommodating sequencing errors and other biological deviations from the coding frame. The resulting pairwise coding sequence alignment method was extended to a multiple sequence alignment (MSA) algorithm implemented in a program called MACSE (Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons). MACSE is the first automatic solution to align protein-coding gene datasets containing non-functional sequences (pseudogenes) without disrupting the underlying codon structure. It has also proved useful in detecting undocumented frameshifts in public database sequences and in aligning next-generation sequencing reads/contigs against a reference coding sequence. MACSE is distributed as an open-source java file executable with freely available source code and can be used via a web interface at: http://mbb.univ-montp2.fr/macse.

Term-seq reveals abundant ribo-regulation of antibiotics resistance in bacteria

Article

Apr 2016

How bacteria switch between tracks Bacterial riboswitches prevent the formation of full-length messenger RNA, and hence proteins, via transcriptional termination in response to metabolites. However, identifying riboswitches within the genome has previously required comparative analysis, which may miss species- and environmentally specific responses. Dar et al. developed a method called term-seq to document all riboswitches in a bacterial genome, as well as their metabolite counterparts (see the Perspective by Sommer and Suess). The method revealed a role for pathogenic bacterial riboswitches in antibiotic resistance. Thus, transcription may be one way pathogens fend off antibiotic attack. Science , this issue p. 10.1126/science.aad9822 ; see also p. 144

Comparison of three methods for estimating rates of synonymous and nonsynonymous nucleotide substitutions

Article

Dec 2004

Three frequently used methods for estimating the synonymous and nonsynonymous substitution rates (Ks and Ka) were evaluated and compared for their accuracies; these methods are denoted by LWL85, LPB93, and GY94, respectively. For this purpose, we used a codon-evolution model to obtain the expected Ka and Ks values for the above three methods and compared the values with those obtained by the three methods. We also proposed some modifications of LWL85 and LPB93 to increase their accuracies. Our computer simulations under the codon-evolution model showed that for sequences less than or equal to300 codons, the performance of GY94 may not be reliable. For longer sequences, GY94 is more accurate for estimating the Ka/Ks ratio than the modified LPB93 and LWL85 in the majority of the cases studied. This is particularly so when k greater than or equal to 3, which is the transition/transversion (mutation) rate ratio. However, when k is approximately 2 and when the sequence divergence is relatively large, the modified LWL85 performed better than GY94 and the modified LP1393. The inferiority of LPB93 to LWL85 is surprising because LPB93 was intended to improve LWL85. Also, it has been thought that the codon-based method of GY94 is better than the heuristic method of LWL85, but our simulation results showed that in many cases, the opposite was true, even though our simulation was based on the codon-evolution model.

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data

Article

Jan 2010

Summary: It is expected that emerging digital gene expression (DGE) technologies will overtake microarray technologies in the near future for many functional genomics applications. One of the fundamental data analysis tasks, especially for gene expression studies, involves determining whether there is evidence that counts for a transcript or exon are significantly different across experimental conditions. edgeR is a Bioconductor software package for examining differential expression of replicated count data. An overdispersed Poisson model is used to account for both biological and technical variability. Empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference. The methodology can be used even with the most minimal levels of replication, provided at least one phenotype or experimental condition is replicated. The software may have other applications beyond sequencing data, such as proteome peptide count data.Availability: The package is freely available under the LGPL licence from the Bioconductor web site (http://bioconductor.org).Contact: mrobinson@wehi.edu.au

Genome-wide identification of natural RNA aptamers in prokaryotes and eukaryotes

Abstract and Figures

Supplementary resources (2)

Recommended publications

Analysis of the Bacillus subtilis genome sequence reveals nine new T-box leaders

Identification of the rctA Gene, Which Is Required for Repression of Conjugative Transfer of Rhizobi...

ribB and ribBA genes from Acidithiobacillus ferrooxidans: expression levels under different growth c...

Evolution of transcriptional regulatory networks in microbial genomes