ArticlePDF Available

Computational analysis of amino acids and their sidechain analogs in crowded solutions of RNA nucleobases with implications for the mRNA-protein complementarity hypothesis

Authors:

Abstract and Figures

Many critical processes in the cell involve direct binding between RNAs and proteins, making it imperative to fully understand the physicochemical principles behind such interactions at the atomistic level. Here, we use molecular dynamics simulations and 15 μs of sampling to study the behavior of amino acids and amino acid sidechain analogs in high-concentration aqueous solutions of standard RNA nucleobases. Structural and energetic analysis of simulated systems allows us to derive interaction propensity scales for different amino acid/nucleobase combinations. The derived scales closely match and greatly extend the available experimental data, providing a comprehensive foundation for studying RNA-protein interactions in different contexts. By using these scales, we demonstrate a statistically significant connection between nucleobase composition of human mRNA coding sequences and nucleobase interaction propensities of their cognate protein sequences. For example, pyrimidine density profiles of mRNAs match uracil-propensity profiles of their cognate proteins with a median Pearson correlation coefficient of R = -0.70. Our results provide support for the recently proposed hypotheses that mRNAs and their cognate proteins may be physicochemically complementary to each other and bind, especially if unstructured, with the complementarity level being negatively influenced by mRNA adenine content. Finally, we utilize the derived scales to refine the complementarity hypothesis and closely examine its physicochemical underpinnings.
Content may be subject to copyright.
12984–12994 Nucleic Acids Research, 2014, Vol. 42, No. 21 Published online 31 October 2014
doi: 10.1093/nar/gku1035
Computational analysis of amino acids and their
sidechain analogs in crowded solutions of RNA
nucleobases with implications for the mRNA–protein
complementarity hypothesis
Matea Hajnic, Juan Iregui Osorio and Bojan Zagrovic
*
Department of Structural and Computational Biology, Max F. Perutz Laboratories, University of Vienna, Vienna 1030,
Austria
Received July 17, 2014; Revised September 29, 2014; Accepted October 11, 2014
ABSTRACT
Many critical processes in the cell involve direct
binding between RNAs and proteins, making it im-
perative to fully understand the physicochemical
principles behind such interactions at the atom-
istic level. Here, we use molecular dynamics sim-
ulations and 15 s of sampling to study the be-
havior of amino acids and amino acid sidechain
analogs in high-concentration aqueous solutions
of standard RNA nucleobases. Structural and ener-
getic analysis of simulated systems allows us to de-
rive interaction propensity scales for different amino
acid/nucleobase combinations. The derived scales
closely match and greatly extend the available ex-
perimental data, providing a comprehensive foun-
dation for studying RNA–protein interactions in dif-
ferent contexts. By using these scales, we demon-
strate a statistically significant connection between
nucleobase composition of human mRNA coding
sequences and nucleobase interaction propensities
of their cognate protein sequences. For example,
pyrimidine density profiles of mRNAs match uracil-
propensity profiles of their cognate proteins with a
median Pearson correlation coefficient of
R
=−0.70.
Ourresultsprovidesupportforthe recentlyproposed
hypotheses that mRNAs and their cognate proteins
may be physicochemically complementary to each
other and bind, especially if unstructured, with the
complementaritylevelbeingnegativelyinfluencedby
mRNA adenine content. Finally, we utilize the derived
scales to refine the complementarity hypothesis and
closely examine its physicochemical underpinnings.
INTRODUCTION
From transcriptional and translational regulation to RNA
processing and decay to protein localization, many key pro-
cesses in the cell depend directly on RNA–protein interac-
tions (1–4). What is more, the list of systems that involve
RNA–protein interactions keeps dramatically expanding.
Recently, for example, high-throughput efforts aimed at
capturing the mRNA–protein interactome identied a large
number of novel RNA-binding proteins (5,6).Outofato-
tal of approximately 800 mRNA-binding proteins detected
in these studies using covalent UV-crosslinking methods,
about 25% were found not to contain any known RNA-
binding domains, while an even greater number lacked clear
functional characterization. Despite the challenges ahead,
one may expect that integrative efforts involving biochemi-
cal, structural and computational techniques will soon cat-
alog most if not all of biologically relevant RNA–protein
interactions. On the other hand, our understanding of the
basic physicochemical principles behind such interactions
still remains incomplete. Most importantly, only a few ex-
perimental studies have been performed in order to directly
explore interactions between individual nucleobases and
amino acids in different environments (7–10). While global
and local structural contexts do play important roles in
dening the properties of RNA–protein binding interfaces,
it is reasonable to expect that binding specicity in general
also critically depends on the preferences of individual nu-
cleobases and amino acids for each other.
In this reductionist framework, the properties of the bind-
ing sites are at least in part a consequence of binding prefer-
ences that are intrinsic to individual nucleobases and amino
acids. Motivated by this, Akinrimisi et al. (9) and Thomas
et al. (10) have measured afnities of several naturally oc-
curring amino acids for a set of nitrogenous bases and nu-
cleosides using spectroscopic methods, but those experi-
ments were never performed systematically for all possible
*
To whom correspondence should be addressed. Tel: +43 1 4277 52271; Fax: +43 1 4277 9522; Email: bojan.zagrovic@univie.ac.at
Present address: Juan Osorio Iregui, Institute for Theoretical Physics, ETH Z¨urich 8093, Switzerland.
C
The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Nucleic Acids Research, 2014, Vol. 42, No. 21 12985
combinations. Furthermore, Woese et al. have used chro-
matographic measurements to dene a scale of amino acids’
propensity to interact with pyrimidine mimetics pyridines,
which they termed ‘polar requirement’ (PR) (7,8). Finally,
several authors have studied interactions between differ-
ent nucleotides and polyamino acids, focusing typically on
polylysine or polyarginine peptides (11–13). Despite the
clear importance of such experimental studies, however,
we nd it remarkable that they have not been repeated or
extended since the 1960s and 1970s when they were rst
performed. On the other hand, sizable progress has been
made using computational and theoretical approaches (14–
28). Most signicantly, the available structures of nucleic
acid–protein complexes have been statistically analyzed to
explore the more general physicochemical principles be-
hind nucleobase/amino acid interactions (14–18) and, in
particular, derive binding preference scales, also known as
knowledge-based potentials (19–23). Moreover, a computa-
tional equivalent of the PR scale was derived using molec-
ular dynamics (MD) simulations, providing a microscopic
picture behind the interaction propensities exhibited by in-
dividual amino acids (24). Finally, quantum-mechanical
calculations have been used to characterize the interactions
between a select subset of bases and amino acids (25–28).
Overall, all of these studies suggest that the preferences of
individual nucleobases and amino acids for each other in
water may be highly differentiated, but a large-scale analy-
sis of this effect has never been systematically performed.
An important context in which nucleobase/amino acid
interactions may be relevant concerns an important foun-
dational question in molecular biology, that of the origin of
the universal genetic code (29–31). In particular, the stere-
ochemical hypothesis proposes that the code evolved as
a consequence of direct interactions between codons and
amino acids they code for (7,8,32–35). An early formula-
tion of the stereochemical hypothesis was put forth by Carl
Woese et al. based on the above mentioned PR scale, i.e.
the propensity of amino acids to interact with pyrimidine
mimetics (7,8). Recently, we have demonstrated that pyrim-
idine density proles of mRNA coding sequences closely
mirror the PR-weighted proles of their cognate protein se-
quences (36). In other words, pyrimidine-rich mRNA re-
gions tend to code for cognate protein regions that exhibit
high propensity to interact with pyrimidine mimetics and
vice versa. Moreover, we have used knowledge-based poten-
tials derived from experimental structures of RNA–protein
complexes to not only conrm these ndings in the case
of pyrimidines, but also extend them to purines (23,37). By
providing quantitative evidence for an early, more qualita-
tive proposal by Kyrpides and Ouzounis (38,39), these re-
sults allowed us to raise the stereochemical hypothesis to
the level of a general relationship between sequence com-
position of mRNAs and their cognate proteins as well as
to hypothesize that mRNAs could bind their cognate pro-
teins in a complementary fashion, especially if unstruc-
tured (23,36,37
). We argued that such binding interactions
w
ere an important driving force behind the establishment
of the universal genetic code, but also that they may have
a critical, yet still not fully characterized role in present-
day cell as well (23,36,37). Intriguingly, in our analysis of
knowledge-based preference scales, we found that guanine
and adenine exhibit opposite amino acid binding prefer-
ences, resulting in a curious asymmetry: while guanine-
binding propensity proles on the side of proteins closely
match purine density proles on the side of their cognate
mRNAs, adenine-binding protein sequence proles more
closely mirror pyrimidine density mRNA proles (23,37).
This hints at additional complexities behind the putative
cognate complementarity and suggests that there may have
been at least two major phases in the development of the
genetic code with opposite requirements when it comes to
complementary matching (37).
In order to better understand the underlying physico-
chemical principles behind RNA–protein interactions and
shed more light on the mRNA/protein complementarity
hypothesis, here we systematically explore the behavior of
individual amino acids and amino acid sidechain analogs in
high-concentration aqueous solutions of different RNA nu-
cleobases using classical MD simulations (Figure 1A). For
our simulations, we employ the GROMOS 53A6 force eld
(40), which was parameterized to accurately capture solva-
tion free energies of amino acid sidechain analogs in water
and cyclohexane. These hydrophobicity-related properties,
in turn, are considered to be an important factor in dening
amino acid/nucleobase interaction propensities. Empow-
ered by the spatial and temporal resolution provided by MD
simulations, we present an atomistic view of how individual
amino acids or sidechain analogs interact with RNA nucle-
obases in water and dene interaction preferences between
them using different structural and energetic criteria. Such
interaction propensity scales create a rigorous, reductionist
foundation for the analysis of RNA–protein interactions in
general, which is here further employed for a critical exam-
ination of the mRNA–protein complementarity hypothesis
and its renement.
MATERIALS AND METHODS
The 20 natural amino acids and 18 of their sidechain
analogs (all except for Gly and Pro) were simulated in the
presence of a single type of common RNA nucleobases in
aqueous solution: adenine (ADE), cytosine (CYT), guanine
(GUA) or uracil (URA). In all simulations, a single amino
acid or a sidechain analog (corresponding to an amino acid
residue with a hydrogen atom added to C) was centered in
a cubic box of initial size 4 × 4 × 4nm
3
with nucleobases
and water molecules placed in random orientations around
it so as to achieve the molar fraction of water of 0.86. Con-
sidering the low water solubility of naturally occurring nu-
cleobases, this molar fraction of water was chosen in order
to reach a compromise between maximizing the probabil-
ity of detecting interaction between amino acids (i.e. their
sidechain analogs) and nucleobases on the one hand and
minimizing nucleobase solubility issues on the other. In to-
tal, there were approximately 1250 molecules in each sys-
tem: one amino acid or sidechain analog, 170 nucleobase
molecules and the rest water molecules (see Supplementary
Table S1 for details). All amino acids were simulated in
their zwitterionic form. In the case of charged amino acids
or sidechain analogs, one randomly chosen water molecule
was replaced by a counter ion (Na
+
or Cl
) in order to ob-
tain an electrically neutral system.
12986 Nucleic Acids Research, 2014, Vol. 42, No. 21
Figure 1. (A) A typical snapshot from the simulation with a single amino acid sidechain analog (Leu) in CYT/water mixture. (B) CYT/CYT, CYT/water
and water/water radial distribution functions in Leu simulations (top) with the corresponding Kirkwood–Buff integrals (bottom). The nal value for the
integrals was taken as the average over the approximately constant window denoted by the arrow.
All simulations were carried out using the Gromacs 4.5.1.
simulation package (41), united-atom GROMOS 53A6
force eld (40) and SPC/E water model (42) with a 2 fs
integration step. Parameters for the nucleobases were ob-
tained from those corresponding to full nucleotides in the
GROMOS 53A6 force eld while ensuring charge neutral-
ity. Long-range electrostatic interactions were treated using
Particle Mesh Ewald (PME) summation with a grid spac-
ing of 0.12 nm and an interpolation order of 4. The cut-off
for short-range Coulombic and van der Waals interactions
was set to 0.9 nm. The temperature and pressure in all sim-
ulations were kept at 300 K and 1 bar using V-rescale ther-
mostat (
T
= 0.1 ps) (43) and Parrinello-Rahman barostat
(
p
= 2 ps and compressibility = 4.5 × 10
5
bar
1
)(44),
respectively. After minimization using the steepest descent
algorithm in water (10 000–25 000 steps), the systems were
rst equilibrated in the NVT ensemble for 800 ps and then
subjected to 400 ps of equilibration in the NPT ensemble
with the same position restraints placed on the amino acid
or sidechain analog. All production runs, each 100 ns long,
were performed in the NPT ensemble for a total of 15.2 s
of simulated time over all systems.
In order to test if systems with naturally occurring nu-
cleobases are microscopically stable, we have analyzed the
values of the rst derivative of the natural logarithm of ac-
tivity of the nucleobase, ln a
N
, with respect to natural loga-
rithm of nucleobase molar fraction, ln X
N
, in the simulated
systems (45–47). The quantity:
lna
N
lnX
N
=
1
1 + ρ
N
X
N
(G
NN
+ G
WW
2G
NW
)
(1)
where G
NN
, G
NW
and G
WW
denote Kirkwood–Buff integrals
derived from nucleobase/nucleobase, nucleobase/water
and water/water radial distribution functions (RDFs), re-
spectively, must be positive for a system to be microscopi-
cally stable. Here,
N
stands for the nucleobase density num-
ber. The Kirkwood–Buff integrals were calculated using the
following formula (45–47):
G
SS
(r) = 4π
r
0
r
2
[g
ss
(r
) 1]dr
(2)
where g
SS
denotes nucleobase/nucleobase (g
NN
),
nucleobase/water (g
NW
)orwater/water (g
WW
)RDFs,
respectively, from which corresponding Kirkwood–Buff
integrals (G
NN
, G
NW
, G
WW
) were derived. As anchor points
for RDFs, we used centers of mass of nucleobases and
water molecules. Finally, as representative examples for
each individual nucleobase type, we have performed the
above analysis on simulated systems with Leu residues and
the results are reported in Figure 1B and Supplementary
Table S2.
To quantify the interaction propensity of amino acids or
sidechain analogs for different nucleobases, we have ana-
lyzed their behavior in the simulated mixtures both struc-
turally and energetically. For simplicity, we describe the pro-
cedure for sidechain analogs only, but analogous calcula-
tions were also performed for all amino acid-containing
systems. For structural analysis, RDFs were calculated by
using as anchor points the centers of mass of amino acid
sidechain analogs, nucleobases and water molecules. For
energetic analysis, we have calculated differences between
the total force-eld potential energies corresponding to
sidechain analog-nucleobase (E
X-N
) and sidechain analog-
water interactions (E
X-W
):
E
NW
X
= E
X-N
E
X-W
[kJ/mol]. (3)
Moreover, the obtained differences in potential energy
between sidechain analog-nucleobase-water interactions
(E
NW
X
) were further subtracted between systems with dif-
ferent nucleobases (N
1
, N
2
) in order to obtain relative inter-
action propensities or preferences of each sidechain analog
for a specic nucleobase with respect to other nucleobases
Nucleic Acids Research, 2014, Vol. 42, No. 21 12987
(E
N
1
N
2X
):
E
N
1
N
2X
= E
N
1
W
X
E
N
2
W
X
[kJ/mol]. (4)
In a related study (M. Hajnic, J. I. Osorio and B. Za-
grovic, unpublished data), we have simulated amino acids in
the presence of only one type of nitrogenous base (unsubsti-
tuted pyrimidines or purines) in water solution as here, but
also in mixed systems with both nitrogenous bases (unsub-
stituted pyrimidines and purines) present at the same time.
The relative amino acids’ interaction propensities derived
from mixed systems where both bases were present at the
same time and those derived from differences between in-
dividual systems correlate with each other with a Pearson
correlation coefcient R = 0.98. This suggests that one can
obtain relative interaction propensities of amino acids for
different nucleobases from individual interaction propen-
sities derived from systems with only one nucleobase type
present.
To be able to compare systems with slightly different mo-
lar compositions, the calculated potential energies between
sidechain analog and water molecules were rescaled before
obtaining the interaction propensity scale in order to have
all systems correspond to exactly 0.86 molar fraction water.
When rescaling, we implicitly assumed that the few addi-
tional water molecules behave on average in the same way
as the rest of the water molecules in the system and con-
tribute to the overall sidechain analog-water potential en-
ergy proportionally to their number. Analogous structural
and energetic analysis was performed for systems contain-
ing amino acids with interaction energies evaluated over all
amino acid atoms. Amino acid and sidechain analog inter-
action propensity scales are given in Supplementary Table
S7 in units of kJ/mol. Note, however, that the exact ener-
getic values given in our scales depend strongly on the par-
ticular features of simulated systems (such as molar fraction
of water or nucleobases), and as such should primarily be
considered and analyzed in a relative sense.
The obtained scales were used as described in Hlevnjak
et al. (36) in order to assess the correlation between pro-
tein interaction propensities for different nucleobases and
the nucleobase content of their cognate mRNAs over the
complete Homo sapiens, Escherichia coli and Methanocaldo-
coccus jannaschii proteomes. In the case of sidechain analog
interaction propensity scales, glycines and prolines were ig-
nored on the protein side together with their codons on the
side of mRNA. The sequence datasets were extracted from
the UniProtKB database (April 2013 release) as described
previously (36,48). Window-averaged proles of individual
mRNAs and proteins were calculated in the same way as
reported previously (36), where each position in the prole
corresponds to the average value of the property in question
over a window (with the size of 21 residues for proteins and
63 bases for mRNAs) centered at that position. As shown
before (36), for window sizes anywhere between 10 and 40
residues, the results depend only marginally on window size
(variation < 2%).
To test the signicance of median values of prole-
matching Pearson R distributions calculated for complete
proteomes, we generated 10
6
random scales and compared
the medians of their prole-matching Pearson R distribu-
tions to the tested ones for each individual proteome. Ran-
dom scales were generated by drawing numbers from a uni-
form distribution between 0 and 1. Finally, the P-values
were calculated as the fraction of random scales whose me-
dians of the prole-matching Pearson R distributions were
greater than or equal to the tested ones in absolute value.
RESULTS
Validation and analysis of binding propensity scales
Natural nucleobases have low water solubility (49), rang-
ing from 1.04 g/lforADEto8g/l for CYT, corresponding
to base molar fractions of X
ADE
= 1 × 10
4
and X
CYT
=
1 × 10
3
, respectively. In order to: (i) realistically model
nucleobase density at typical RNA–protein interfaces and
(ii) reach a critical number of nucleobases that would al-
low us to observe interactions with amino acids or their
sidechain analogs on a reasonable timescale, we have sim-
ulated systems whose nucleobase concentrations were sig-
nicantly higher than their macroscopic solubility levels
(e.g. X
N
= 0.14). Practically speaking, we have simulated
the behavior of amino acids and their sidechain analogs
in hydrated, dynamic agglomerates of nucleobases as illus-
trated in Figure 1A for the Leu sidechain in CYT solution.
While such systems, in fact, better approximate the effec-
tive concentration of nucleobases at typical RNA–protein
interfaces, it was critical to rst assess their thermodynamic
stability at the microscopic level.
The stability of a binary, high-concentration mixture of
water and nucleobases can be studied by analyzing the rst
derivative of the natural logarithm of activity of the nu-
cleobase with respect to the natural logarithm of the nu-
cleobase mole fraction, ln a
N
/∂ln X
N
(45–47). This value,
which should be positive for systems to be microscopically
stable, was calculated from Equation (1)whereG
NN
, G
NW
and G
WW
denote Kirkwood–Buff integrals derived from
nucleobase/nucleobase, nucleobase/water and water/water
RDFs, respectively (45–47). A typical set of such RDFs
encountered in our simulations is given in Figure 1B (top
panel) for the Leu sidechain in CYT solution. Importantly,
due to the poor convergence of Kirkwood–Buff integrals,
as an estimate of G
SS
, in all cases we took the average of
G
SS
over distances starting from 1.5 nm (Figure 1B, lower
panel). Following the above procedure, we could indeed
show that the above requirement (i.e. ln a
N
/∂ln X
N
> 0)
is fullled for all four nucleobase types (Supplementary Ta-
ble S2). This suggests that although our systems would over
long timescales likely result in a creation of macroscopic
aggregates, they are thermodynamically stable on the size-
and time scales examined here and could be used as model
systems to study the behavior of amino acids and their
sidechain analogs in aqueous solutions of nucleobases. The
fact that despite high nucleobase concentrations we did not
observe formation of any static precipitates further corrob-
orates this claim.
We have used our simulations to calculate differences be-
tween the total force-eld potential energies correspond-
ing to amino acid–nucleobase and amino acid–water in-
teractions (and the same for sidechains). How do these
energy-based interaction propensity scales compare with
experimental results? The experimental PR scale (8), de-
rived by analyzing the chromatographic mobility of amino
12988 Nucleic Acids Research, 2014, Vol. 42, No. 21
acids in water mixtures of substituted pyridines such as
dimethylpyridine (DMP), is one of the few examples where
interactions between amino acids and nitrogenous bases
have been systematically explored in experiment. Speci-
cally, PR of a given amino acid was dened as the slope of a
linear t between the logarithm of its retention coefcient R
and the logarithm of mole fraction of water in the pyridine–
water solvent. In a related study (M. Hajnic, J. I. Osorio
and B. Zagrovic, unpublished data), we have performed MD
simulations of amino acids and their sidechain analogs in
water/DMP mixtures using the same setup as here. The
energy-based DMP/amino acid and DMP/sidechain ana-
log interaction propensity scales derived from MD agree
closely with the experimental PR scale (8) with Pearson R
coefcients of 0.93 and 0.95, respectively, attesting to the
general quality of our simulation methodology (M. Hajnic,
J. I. Osorio and B. Zagrovic, unpublished data).
Remarkably, the experimental PR scale (8) also exhibits
close correlation with the energy-based amino acid inter-
action propensity scales derived here for URA (Pearson
R = 0.89), ADE (R = 0.84) and CYT (R = 0.77), with
a signicantly weaker correlation observed for GUA (R =
0.30) (Figure 2A, inset table, third column). What is more,
all of these correlations against the experimental PR scale
improve even further if one uses sidechain analog scales
instead (Figure 2A, inset table, second column), with the
URA interaction propensity scale exhibiting the strongest
correlation (R = 0.94), followed by ADE (R = 0.93), CYT
(R = 0.86) and, nally, GUA (R = 0.58). In Figure 2A,
we plot the sidechain analog scale for URA, a nucleobase
which is physicochemically and sterically most similar to
DMP, against the experimental PR scale (8)andthetwo
exhibit remarkable similarity. Although the experimental
PR scale and the computational URA, ADE and CYT
scales were derived in very different ways, the close agree-
ment between can be taken as evidence of the quality of
the MD force eld and the general computational method-
ology used. Moreover, such agreement also suggests that
when it comes to capturing nucleobase/amino acid interac-
tion specicity, DMP is actually a good model not only for
naturally occurring pyrimidine bases URA and CYT, but
also purine ADE.
When we compare our sidechain analog interaction
propensities for GUA with the only analogous, exten-
sive scale available from experiment, that of amino acid–
guanosine binding constants for eight amino acids (Ser, Thr,
Val, Leu, Met, Lys, Phe and Trp) (10), we obtain a Spear-
man rank-order correlation coefcient of =−0.83 (Fig-
ure 2B) and a direct Pearson correlation coefcient of R =
0.79 when the association constants are converted to bind-
ing free energies (Figure 2B, inset). Interestingly, in our sim-
ulations we not only correctly capture the relative interac-
tion propensities of aromatic sidechain analogs for GUA,
but we also observe the same propensity trends as in the ex-
periment for the relatively similar residues such as Ser and
Thr or Val and Leu. What is more, if one excludes the out-
lier Lys, the rank correlation increases to =−0.96.
On the
other hand, the level of correlation drops signicantly if one
uses the computational scale for amino acids, here also in-
cluding the value for Gly ( =−0.62 and R = 0.46). Finally,
the experimentally derived binding free energies of four
amino acids for adenosine (Val, Lys, Phe, Trp) ( =−0.80
and R = 0.52) and two for cytidine (Phe, Trp) (10) show
the same trend as observed in our sidechain analog inter-
action propensity scales for the equivalent bases, with sim-
ilar results for amino acid scales ( =−0.80 and R = 0.50
for the adenosine case). Overall, a combination of the above
thermodynamic stability analysis and the favorable compar-
ison with experiment reassuringly suggests that the essen-
tial physical chemistry behind amino acid/nucleobase inter-
actions remains approximately the same even at relatively
high nucleobase concentrations as studied here. This fur-
thermore suggests that our simulation-based scales can be
used to greatly extend the limited experimental data avail-
able and characterize interactions with nucleobases for all
amino acids and sidechain analogs. Interestingly, in many
cases, our simulations with sidechain analogs match the
experimental data obtained with amino acids slightly bet-
ter than the simulations with amino acids themselves (Fig-
ure 2), a nding we do not currently have a full explanation
for. A part of the reason may be that the GROMOS53A6
force eld was parameterized to match solvation free ener-
gies of sidechain analogs in cyclohexane or water and not
those of complete amino acids. It is possible that a po-
tentially lower accuracy of parameters for complete amino
acids may be responsible for a greater discrepancy from ex-
periment in that case. However, as sidechain analogs cap-
ture the behavior of protein residues at RNA–protein inter-
faces arguably better than the zwitterionic amino acids do,
in the remainder of this text we primarily focus on sidechain
analogs, while always giving the results for amino acids as a
point of comparison.
The above energetic analysis is well illustrated by a
structural exploration using RDFs. In Figure 3, we show
water/sidechain-analog and nucleobase/sidechain-analog
RDFs for the most favorable and the least favorable in-
teracting partners of the four RNA nucleobases, as deter-
mined by the analysis of interaction energies. URA, CYT
and ADE, for example, all exhibit the strongest prefer-
ence for interacting with Trp relative to all other residues,
which is illustrated by the presence of a pronounced rst
peak in their nucleobase/Trp RDFs. On the other hand,
in the case of GUA the strongest favorable interactions are
seen for Lys. When it comes to the least favorable interac-
tions, in all cases they are invariably seen with the nega-
tively charged Glu and Asp. The presence of a well-dened
peak in CYT/Glu, ADE/Asp and GUA/Glu RDFs, how-
ever, suggests that, although unfavorable, some of these in-
teractions do exhibit a sizable level of structural organiza-
tion. Nonetheless, for all energetically unfavorable interac-
tions, it is clear that the residues in question prefer to in-
teract with and be surrounded by water molecules, as in-
dicated by strong, well-dened rst peaks in the respective
water/sidechain-analog RDFs.
As discussed above, the GUA-based interaction energy
scales differ most from all other scales. When correlat-
ing the individual scales against each other, we indeed
nd that the GUA scale deviates most from other scales,
which is primarily due to the behavior of charged sidechain
analogs (Figure 4A). Namely, in the GUA/water mixture,
Lys and Arg exhibit lower interaction energies with GUA
than with water molecules, which is not the case in any
Nucleic Acids Research, 2014, Vol. 42, No. 21 12989
Figure 2. (A) Correlation between the experimentally derived polar requirement (PR
experiment
) scale (8) and the energy-based scale of sidechain analog
interaction propensities for URA (in kJ/mol) obtained by simulation. Inset: Pearson correlation coefcients R between all sidechain analog (second
column) and amino acid (third column) propensity scales and the PR scale. (B) Rank-order correlation between experimentally measured amino acid–
guanosine association constants (10), and the computationally derived sidechain analog interaction energy scale for GUA (in kJ/mol). Inset: correlation
between binding free energies (in kJ/mol) at the standard reference concentration of 1 M, as derived from association constants, and the the computationally
obtained sidechain analog interaction energy scale for GUA (in kJ/mol).
Figure 3. Water/sidechain-analog and nucleobase/sidechain-analog ra-
dial distribution functions g(r) for the most favorable (left column) and the
least favorable (right column) sidechain analog interacting partners for the
four RNA nucleobases, as determined by energy-based interaction propen-
sity scales for: (A) URA, (B) CYT, (C) ADE and (D)GUA.
other nucleobase/water systems except for the CYT/Arg
system (Figure 4A). Furthermore, Asp and Glu also exhibit
signicantly more favorable interaction energies with GUA
as compared to other energy-based interaction propensity
scales (Figure 4A). Although in absolute terms these two
anionic sidechains do not interact favorably with GUA (i.e.
they exhibit positive energies), the extent of this unfavorable
bias is the least as compared to other bases (Figure 4A).
Similar results are also seen in the simulations with com-
plete amino acids (data not shown).
A particularly telling comparison in this regard concerns
the behavior of GUA- and ADE-based scales. If one, for ex-
ample, examines relative energy-based interaction propen-
sity scales, one observes a remarkable asymmetry in the
behavior of GUA and ADE (Supplementary Figure S1).
In particular, the relative ADE–CYT scale is strongly in-
versely correlated with those involving GUA (GUA–CYT,
R =−0.84 and GUA–URA, R =−0.95) with no signicant
correlations or anti-correlations for the ADE–URA relative
scale (Supplementary Figure S1). In Figure 4B, we illustrate
this difference in the case of GUA–CYT and ADE–CYT
relative scales and it is clear that the effect is completely
due to the nature of the interactions of the charged residues
with ADE and GUA relative to that with CYT. While
GUA strongly prefers to interact with Lys, Arg, Asp and
Glu as compared to CYT (with, for example, E
sca
GUA
CYT
of cca. 100 kJ/mol in the case of Lys), ADE almost
equally strongly prefers not to interact with these residues
(with E
sca
ADE
CYT
of cca. 100 kJ/mol in the case of Lys)
again as compared to CYT (Figure 4B). This effect clearly
demonstrates the paramount importance of specic ring
substituents especially in the case of purine bases, which
was already observed in our analysis of knowledge-based
nucleobase-residue interaction propensity scales (23). Inter-
estingly, while the sidechain analog scale derived presently
for ADE correlates reasonably well with the equivalent
knowledge-based scale (Spearman = 0.57 for the 2+ scale
from Polyansky et al. (23)), the correlations for all other
scales including the GUA scale are signicantly weaker (||
< 0.2) (Supplementary Table S3).
A similar trend is also seen with amino acid interac-
tion propensity scales (Supplementary Table S3). On the
other hand, the relative scales of GUA derived presently
for sidechain analogs agree somewhat better with those de-
rived in the knowledge-based analysis (23) with, for exam-
ple, GUA–URA and GUA–CYT correlating with Spear-
man of 0.40 or 0.38, respectively (Supplementary Table
S4). While these correlations between complete scales are
relatively weak, it is important to mention that they agree
much better when it comes to the relative placement of
12990 Nucleic Acids Research, 2014, Vol. 42, No. 21
Figure 4. (A) A direct comparison between energy-based sidechain analog interaction propensity scales for the four nucleobases with Pearson correlation
coefcients given in the graphs. In each graph, the four charged amino acids are labeled in red. (B) Correlation between relative energy-based sidechain
analog GUA–CYT and ADE–CYT interaction propensity scales (in kJ/mol) derived from simulations of different systems.
charged residues only, which is in the end chiey respon-
sible for the qualitative similarities between the scales, as
discussed below.
Analysis of the mRNA-cognate protein complementarity hy-
pothesis
We have used the obtained scales to study the relation-
ship between the nucleobase content of mRNA coding se-
quences and the nucleobase interaction propensities of their
cognate protein sequences for the entire H. sapiens, M. jan-
naschii and E. coli proteomes. We have performed this analy-
sis by comparing window-averaged sequence proles of the
two cognate biopolymers as elaborated before (23,36,37),
whereby one obtains a Pearson R for each cognate pair, i.e. a
distribution of Pearson Rs over the whole proteome. Note
that negative correlations here denote a positive relation-
ship between nucleobase content and interaction propen-
sity, which comes from the fact that propensity is dened
Nucleic Acids Research, 2014, Vol. 42, No. 21 12991
Figure 5. Distributions of Pearson correlation coefcients between
window-averaged PYR content proles of mRNAs and their cognate pro-
teins’ proles of interaction propensity for different RNA nucleobases
(URA, CYT, ADE and GUA) assessed using computationally derived
sidechain analog scales. Inset: median values of distributions of Pearson
correlation coefcients between window-averaged PYR content proles of
mRNAs and their cognate proteins’ proles of interaction propensity for
different RNA nucleobases, calculated over the entire human proteome
(sidechain analogs, ‘sca, and complete amino acids, ‘aa’).
Figure 6. (A) Distributions of Pearson correlation coefcients between
window-averaged PUR content proles of mRNAs and their cognate pro-
teins’ proles of relative interaction propensity for different combinations
of RNA nucleobases, calculated over the entire human proteome. The
propensities were obtained from the energetic analysis of different sys-
tems from MD simulations. (B) Typical proles of mRNA PUR content
and protein sequence interaction propensity calculated using the compu-
tationally derived sidechain analog GUA–CYT and ADE–CYT relative
interaction propensity scales. The two examples were chosen because their
Pearson R coefcients correspond to the medians over the respective dis-
tributions over the complete human proteome.
using an energy scale (the lower the energy, the higher the
propensity). Our results show that PYR density proles
of mRNAs quantitatively match the energy-based URA-,
CYT- and ADE-interaction propensity proles of their cog-
nate protein sequences across the entire human proteome,
with no signicant correlation being observed for GUA
scales, as demonstrated for H. sapiens in Figure 5. More
specically, the median correlation coefcients for URA,
CYT and ADE sidechain-based scales are 0.68, 0.52
and 0.70, respectively, while for the GUA scale this value
drops to 0.11 (Figure 5, inset). For M. jannaschii, the me-
dian correlation coefcients of mRNA–protein pairs are as
high as those observed for the human proteome or higher,
while for E. coli the values are slightly lower, but still statis-
tically signicant (Supplementary Figures S2A and S3A).
Similar values are also seen for amino acid-based scales as
well. In other words, PYR-rich regions in mRNAs tend to
code for regions in their cognate proteins that exhibit more
favorable interaction energies with URA, CYT and ADE
relative to water as compared to the PUR-rich regions. In-
terestingly, the correlation coefcients obtained for mRNA
density proles of individual bases are signicantly weaker
(Supplementary Table S5), as was already observed before
(23,36).
As mentioned above, the GUA scale does not yield any
signicant correlation with PYR, i.e. PUR content on the
side of mRNA. However, analysis of relative scales reveals
that mRNA PUR density proles closely and quantitatively
match their cognate protein proles capturing the relative
preference of residues to interact with GUA relative to all
other nucleobases. For example, protein sequence proles of
relative GUA–CYT binding propensities match their cog-
nate mRNA PUR density proles with the median Pear-
son R =−0.69 (P-value = 1 × 10
3
) over the entire hu-
man proteome (Table 1). What this means is that one half
of all mRNA-cognate protein pairs in the human proteome
display prole matching that is equal or better than the
median representative, Pleckstrin (P08567) shown in Fig-
ure 6B. Interestingly, though, much weaker correlation is
seen if instead of PUR content, which is up to a constant
equivalent to PUR–PYR content, one here analyzes GUA–
CYT content along mRNA (median R =−0.48). Similar
results are also obtained for the relative GUA–URA and
GUA–ADE scales (P-values 3 × 10
4
and 4 × 10
4
,re-
spectively) (Table 1). On the other hand, the ADE–CYT
scale results in a similar level of matching, but now when
it comes to mRNA PYR-density mRNA. In Figure 6B, we
illustrate this in the case of the median representative pro-
tein Pleckstrin (P08567) and its mRNA. Interestingly, the
ADE–URA scale exhibits no signicant correlation what-
soever, while the CYT–URA scale exhibits a signicant level
of matching with mRNA PUR density proles (P-value = 4
× 10
4
)(Table1). Again, we observe the same trend as with
non-relative interaction propensity scales (Supplementary
Table S5) that the correlation coefcients for mRNA density
proles of individual base are weaker than the mRNA PUR
density proles (Supplementary Table S6). The same trends
seen for human proteome extend to other organisms as well
(Supplementary Figures S2B and C and S3B and C). As the
main reason for the matching detected in our analysis is the
genetic code, which is the same for all the organisms studied,
12992 Nucleic Acids Research, 2014, Vol. 42, No. 21
Table 1. Median values of distributions of Pearson correlation coefcients
between window-averaged PUR content proles of mRNA molecules and
their cognate proteins’ proles of relative interaction propensities for nucle-
obases calculated over the entire human proteome. The interaction propen-
sities were obtained from the energetic analysis of both sidechain (sca) and
amino acid (aa) containing systems.
it is not surprising that one obtains similar levels of match-
ing no matter in which organism one looks. The differences,
on the other hand, can be attributed to the exact mRNA
and protein composition in individual proteomes. Finally,
qualitatively identical results are obtained for systems with
zwitterionic amino acids instead of sidechain analogs (Ta-
ble 1).
DISCUSSION
In the present study, we have for the rst time system-
atically analyzed the behavior of amino acids and their
sidechain analogs in high-concentration aqueous solutions
of naturally occurring RNA nucleobases. Our results show
that amino acids and their sidechain analogs display highly
differentiated interaction propensities for different nucle-
obases depending on the ring architecture and, even more
importantly, ring substituents. It is our hope that these
scales will provide: (i) a rigorous, quantitative, physic-
ochemical foundation for rationalizing the specicity in
RNA–protein interactions in different contexts, and (ii) a
powerful tool for sculpting and modifying such specicity
for biomedical and bioengineering purposes.
As discussed above, our simulations were carried out at
nucleobase concentration levels exceeding the experimen-
tally known solubility limits. However, a strong agreement
with extant experimental data, a general absence of stable
aggregates and favorable results of thermodynamic stabil-
ity analysis all suggest that the simulated model systems do
capture the essential features of amino acid/nucleobase in-
teractions even at high concentrations. Moreover, even if
the simulated systems would over time move in the direc-
tion of precipitation, the partitioning of amino acids and
their sidechain analogs between water- and base-rich frac-
tions occurs much more quickly, allowing one to accurately
capture interaction propensities with relatively short simu-
lations. Finally, the number of water molecules in our simu-
lations was such that for each base there was enough water
to account for one full hydration shell around it. As such,
our simulated systems in all likelihood better approximate
the situation at typical hydrated RNA–protein binding in-
terfaces than would more dilute solutions.
Our analysis of energy-based interaction preferences was
based on a critical assumption that the potential energies
between amino acids or sidechain analogs and nucleobases
or water accurately capture the free energies of these inter-
actions. In other words, we assumed that it is primarily the
enthalpic part of free energy that is responsible for the rela-
tive difference in amino acid–nucleobase interactions, with
the entropic component being proportional to it. A similar
assumption was made by Stumpe and Grubm¨uller in their
study of amino acid interactions with urea and their inu-
ence on protein folding (50). In a related study (A. de Ruiter
and B. Zagrovic, in preparation), we have used MD simu-
lations and umbrella sampling to evaluate absolute binding
free energies between nucleobases and amino acid sidechain
analogs in water. By comparing the sidechain analog inter-
action propensities derived in this work to the absolute free
energies, which fully account for both enthalpy and entropy,
we observe a high level of correlation with Spearman cor-
relation coefcients of = 0.88 (URA), = 0.85 (CYT),
= 0.62 (GUA) and = 0.88 (ADE) (A. de Ruiter and B.
Zagrovic, in preparation). Although the behavior of amino
acids or their sidechain analogs in crowded solutions of nu-
cleobases need not necessarily match that with only one nu-
cleobase present, such a high level of correlation does sup-
port the existence of a strong relationship between free en-
ergies and their enthalpic components in the former case.
Finally, the fact that the obtained scales agree well both
with experimental results (8,10)aswellaswiththestruc-
tural analysis of intermolecular contacts (23) lends further
support to this claim.
Here, we have used the derived interaction propensity
scales to investigate how the relationships observed at the
level of amino acids and nucleobases translate to the level
of complete coding sequences of mRNAs and their cog-
nate proteins. Our central aim was to further examine the
recently proposed complementarity hypothesis and its re-
lationship with the structure and the origin of the genetic
code (23,36,37). In accordance with our results obtained us-
ing knowledge-based potentials (23), we have observed that
the higher the pyrimidine content of mRNAs, the higher the
propensity of their cognate proteins’ propensities to interact
with URA, CYT and ADE, but importantly not with GUA
(Figure 5). Actually, the fact that GUA- and ADE-based
scales exhibit opposite behavior when it comes to their rela-
tionship with PYR-based scales (Supplementary Figure S1)
suggests that the key element in determining the specicity
of interaction between amino acids and nucleobases is not
the nature of the heterocyclic ring, but rather that of ring
substituents. In particular, our present results suggest that
this difference stems primarily from the behavior of charged
amino acids, which is reasonable considering the fact that
the two purine bases are largely isosteric and differ primar-
il
y when it comes to ring substituents and their charge distri-
bution. This is also supported by a related analysis in which
we showed that unsubstituted purine and pyrimidine rings
result in highly correlated scales when it comes to their in-
teractions with amino acids (M. Hajnic, J. I. Osorio and B.
Zagrovic, unpublished data).
In support of this reasoning, we have observed a strong
relationship between the average PUR content of mRNA
sequences and the relative preference of their cognate pro-
Nucleic Acids Research, 2014, Vol. 42, No. 21 12993
tein sequences to interact with GUA relative to other bases
(Figure 6A, Table 1). In accordance with the stereochemi-
cal hypothesis and our generalizations of it (23,36,37), GUA
exhibits strong preference for interaction with PUR-coded
amino acids relative to all other bases. Importantly, this ef-
fect appears to be primarily due to the behavior of charged
amino acids Glu, Asp, Arg and Lys. These results are fully
consistent with our previous knowledge-based analysis of
residue preferences for different nucleobases: there, GUA
interaction preferences on the side of amino acids or pro-
teins correlated extremely well with purine density at the
side of their cognate codons or mRNA, while ADE inter-
action preferences were much closer to those of pyrimi-
dine bases (23), as also seen here. While the full biological
meaning of this result still remains to be elucidated, we are
condent that it represents an important principle concern-
ing the mRNA–protein relationship in general. Overall, our
results give support to the generalized stereochemical hy-
pothesis of the origin of the genetic code, in which GUA
plays the role of an archetypal purine (i.e. purine richness
on the side of codons or mRNAs parallels high levels of
relative guanine interaction propensity on the side of cog-
nate amino acids or proteins), while the opposite is seen for
CYT, URA and ADE (i.e. pyrimidine richness of mRNA
mirrors high relative interaction propensity for these nucle-
obases when it comes to cognate amino acids or proteins)
(23,36,37). In this context, the presence of adenines neg-
atively affects complementarity levels, as discussed before
(37). Intriguingly, despite the fact that the propensity scales
were derived for specic bases, the highest levels of match-
ing are observed if one considers PUR (i.e. PYR) density on
the side of mRNA and not that of individual bases (Supple-
mentary Tables S5 and S6). This effect, which was already
observed before (23,36,37), still requires a full explanation.
However, we believe it suggests that the core of the genetic
code was originally dened at the level of a coarse-grained
nucleobase alphabet in which differences between specic
purines, i.e. pyrimidines, were not critical.
Our results with specically Glu and Asp and their in-
teractions with GUA show that the most basic version of
the stereochemical hypothesis, the one in which the genetic
code evolved on the basis of direct interactions between
amino acids and their codons, can at best hold for a subset
of amino acids only. In particular, Glu and Asp do not ap-
pear to favorably interact with any nucleobases in the aque-
ous environment, although in our knowledge-based analy-
sis (23) they do show a strong preference for interacting with
purine bases and especially GUA. The solution to this seem-
ing paradox is provided by our present results: although the
negatively charged Glu and Asp do not show direct prefer-
ence for binding to GUA, they appear to be the least un-
favorable interacting partners for GUA when compared to
all other nucleobases. It is very possible that the preferences
one sees in the knowledge-based analysis of known protein–
RNA complexes are in part a consequence of such a nega-
tive selection. One way in which this result could be made
consistent with the stereochemical hypothesis and especially
its generalized version, even for Glu and Asp, is if one as-
sumes that the genetic code evolved in a context in which the
role of the translation apparatus was not to link individual
amino acids according to the mRNA template, but rather
short peptides. In this scenario, other amino acids would
provide the source of favorable binding free energy to the
mRNA template, while Glu and Asp would contribute to
the specicity of binding only.
Overall, our study shows that by using MD simu-
lations and extensive sampling we can distinguish be-
tween amino acid or sidechain analog interaction propen-
sities for different nucleobases. Remarkably, the interac-
tion propensities derived from simulations of individual
monomers yield close correspondences at the level of com-
plete proteins and mRNA molecules, giving support to
the mRNA/protein complementarity hypothesis as recently
proposed. Although our present results are highly sugges-
tive, it should be nonetheless emphasized that the only rig-
orous test of the complementarity hypothesis can come
from direct experimental work. We hope that our present
results will serve not only as a source of motivation in this
direction, but also as a foundation for different computa-
tional and experimental studies of RNA/protein interac-
tions in general (51–53).
SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.
ACKNOWLEDGMENTS
We thank members of the Laboratory of Computational
Bioph
ysics at MFPL for useful advice and critical reading of
the manuscript. The authors gratefully acknowledge A. A.
Polyansky for help with randomization tests and C. Oost-
enbrink for helpful advice on Kirkwood-Buff integrals.
FUNDING
Austrian Science Fund FWF [START Y 514-B11 to B.Z.,
in part]; European Research Council [ERC Starting Inde-
pendent 279408 to B.Z.]. Funding for open access charge:
Austrian Science Fund FWF [START Y 514-B11 to B.Z.,
in part); European Research Council [ERC Starting Inde-
pendent 279408 to B.Z.].
Conict of interest statement. None declared.
REFERENCES
1. Moore,M.J. and Proudfoot,N.J. (2009) Pre-mRNA processing reaches
back to transcription and ahead to translation. Cell, 136, 688–700.
2. Licatalosi,D.D. and Darnell,R.B. (2010) RNA processing and its
regulation: global insights into biological networks. Nat. Rev. Genet.,
11, 75–87.
3. M ¨uller-Mcnicoll,M. and Neugebauer,K.M. (2013) How cells get the
message: dynamic assembly and function of mRNA-protein
complexes. Nat. Rev. Genet., 14, 275–287.
4. Mercer,T.R. and Mattick,J.S. (2013) Structure and function of long
noncoding RNAs in epigenetic regulation. Nat. Struct. Mol. Biol., 20,
300–307.
5. Baltz,A.G., Munschauer,M., Schwanhaeusser,B., Vasile,A.,
Murakawa,Y., Schueler,M., Youngs,N., Penfold-Brown,D., Drew,K.,
Milek,M. et al. (2012) The mRNA-bound proteome and its global
occupancy prole on protein-coding transcripts. Mol. Cell, 46,
674–690.
6. Castello,A., Fischer,B., Eichelbaum,K., Horos,R., Beckmann,B.M.,
Strein,C., Davey,N.E., Humphreys,D.T., Preiss,T., Steinmetz,L.M.
et al. (2012) Insights into RNA biology from an atlas of mammalian
mRNA-binding proteins. Cell, 149, 1393–1406.
12994 Nucleic Acids Research, 2014, Vol. 42, No. 21
7. Woese,C. (1965) On evolution of genetic code. Proc. Natl. Acad. Sci.
U.S.A., 54, 1546–1552.
8. Woese,C.R. (1973) Evolution of the genetic code.
Naturwissenschaften, 60, 447–459.
9. Akinrimisi,E. and Tso,P. (1964) Interactions of purine with proteins
+ amino acids. Biochemistry (Mosc.), 3, 619–626.
10. Thomas,P. and Podder,S. (1978) Specicity in protein-nucleic acid
interaction––solubility study on amino acid nucleoside interaction.
FEBS Lett., 96, 90–94.
11. Lacey,J.C. and Pruitt,K.M. (1969) Origin of the genetic code. Nature,
223, 799–804.
12. Rifkind,J.M. and Eichhorn,G.L. (1970) Specicity for the interaction
of nucleotides with basic polypeptides. Biochemistry (Mosc.), 9,
1753–1761.
13. Wagner,K.G. and Arfmann,H.-A. (1974) Properties of basic
amino-acid residues. Eur. J. Biochem., 46, 27–34.
14. Luscombe,N.M., Laskowski,R.A. and Thornton,J.M. (2001) Amino
acid-base interactions: a three-dimensional analysis of protein-DNA
interactions at an atomic level. Nucleic Acids Res., 29, 2860–2874.
15. Treger,M. and Westhof,E. (2001) Statistical analysis of atomic
contacts at RNA-protein interfaces. J. Mol. Recognit., 14, 199–214.
16. Jeong,E., Kim,H., Lee,S.W. and Han,K. (2003) Discovering the
interaction propensities of amino acids and nucleotides from
protein-RNA complexes. Mol. Cells, 16, 161–167.
17. Hoffman,M.M., Khrapov,M.A., Cox,J.C., Yao,J.C., Tong,L.N. and
Ellington,A.D. (2004) AANT: the amino acid-nucleotide interaction
database. Nucleic Acids Res., 32, D174–D181.
18. Kondo,J. and Westhof,E. (2011) Classication of pseudo pairs
between nucleotide bases and amino acids by analysis of
nucleotide-protein complexes. Nucleic Acids Res., 39, 8628–8637.
19. Donald,J.E., Chen,W.W. and Shakhnovich,E.I. (2007) Energetics of
protein–DNA interactions. Nucleic Acids Res., 35, 1039–1047.
20. Jonikas,M.A., Radmer,R.J., Laederach,A., Das,R., Pearlman,S.,
Herschlag,D. and Altman,R.B. (2009) Coarse-grained modeling of
large RNA molecules with knowledge-based potentials and structural
lters. RNA, 15, 189–199.
21. P
´
erez-Cano,L., Solernou,A., Pons,C. and Fern
´
andez-Recio,J. (2010)
Structural prediction of protein-RNA interaction by computational
docking with propensity-based statistical potentials. Pac. Symp.
Biocomput., 293–301.
22. Tuszynska,I. and Bujnicki,J.M. (2011) DARS-RNP and
QUASI-RNP: new statistical potentials for protein-RNA docking.
BMC Bioinformatics, 12, 348-363.
23. Polyansky,A.A. and Zagrovic,B. (2013) Evidence of direct
complementary interactions between messenger RNAs and their
cognate proteins.
Nucleic Acids Res., 41,
8434–8443.
24. Mathew,D.C. and Luthey-Schulten,Z. (2008) On the physical basis of
the amino acid polar requirement. J. Mol. Evol., 66, 519–528.
25. Biot,C., Buisine,E., Kwasigroch,J.M., Wintjens,R. and Rooman,M.
(2002) Probing the energetic and structural role of amino
acid/nucleobase cation-pi interactions in protein-ligand complexes. J.
Biol. Chem., 277, 40816–40822.
26. Rutledge,L.R., Campbell-Verduyn,L.S., Hunter,K.C. and
Wetmore,S.D. (2006) Characterization of nucleobase-amino acid
stacking interactions utilized by a DNA repair enzyme. J. Phys.
Chem. B, 110, 19652–19663.
27. Rutledge,L.R., Durst,H.F. and Wetmore,S.D. (2008) Computational
comparison of the stacking interactions between the aromatic amino
acids and the natural or (cationic) methylated nucleobases. Phys.
Chem. Chem. Phys., 10, 2801–2812.
28. Ebrahimi,A., Habibi-Khorassani,M., Gholipour,A.R. and
Masoodi,H.R. (2009) Interaction between uracil nucleobase and
phenylalanine amino acid: the role of sodium cation in stacking.
Theor. Chem. Acc., 124, 115–122.
29. Nirenberg,M.W., Jones,O.W., Leder,P., Clark,B.F.C., Sly,W.S. and
Pestka,S. (1963) On the coding of genetic information. Cold Spring
Harb. Symp. Quant. Biol., 28, 549–557.
30. Giulio,M.D. (2005) The origin of the genetic code: theories and their
relationships, a review. Biosystems, 80, 175–184.
31. Koonin,E.V. and Novozhilov,A.S. (2009) Origin and evolution of the
genetic code: the universal enigma. IUBMB Life, 61, 99–111.
32. Woese,C. (1968) Fundamental nature of genetic code––prebiotic
interactions between polynucleotides and polyamino acids or their
derivatives. Proc. Natl. Acad. Sci. U.S.A., 59, 110–117.
33. Woese,C. (1969) Models for evolution of codon assignments. J. Mol.
Biol., 43, 235–240.
34. Yarus,M. (1998) Amino acids as RNA ligands: a
direct-RNA-template theory for the code’s origin. J. Mol. Evol., 47,
109–117.
35. Yarus,M., Widmann,J.J. and Knight,R. (2009) RNA-amino acid
binding: a stereochemical era for the genetic code. J. Mol. Evol., 69,
406–429.
36. Hlevnjak,M., Polyansky,A.A. and Zagrovic,B. (2012) Sequence
signatures of direct complementarity between mRNAs and cognate
proteins on multiple levels. Nucleic Acids Res., 40, 8874–8882.
37. Polyansky,A.A., Hlevnjak,M. and Zagrovic,B. (2013) Proteome-wide
analysis reveals clues of complementary interactions between
mRNAs and their cognate proteins as the physicochemical
foundation of the genetic code. RNA Biol., 10, 1248–1254.
38. Kyrpides,N.C. and Ouzounis,C.A. (1993) Mechanisms of specicity
in mRNA degradation: autoregulation and cognate interactions. J.
Theor. Biol., 163,
373–392.
39. Ouzounis,C.A. and Kyrpides,N.C. (1994) Reverse interpretation: a
hypothetical selection mechanism for adaptive mutagenesis based on
autoregulated mRNA stability. J. Theor. Biol., 167, 373–379.
40. Oostenbrink,C., Villa,A., Mark,A.E. and Gunsteren,W.F. (2004) A
biomolecular force eld based on the free enthalpy of hydration and
solvation: the GROMOS force-eld parameter sets 53A5 and 53A6.
J. Comput. Chem., 25, 1656–1676.
41. Hess,B., Kutzner,C., van der Spoel,D. and Lindahl,E. (2008)
GROMACS 4: algorithms for highly efcient, load-balanced, and
scalable molecular simulation. J. Chem. Theory Comput., 4, 435–447.
42. Berendsen,H., Grigera,J. and Straatsma,T. (1987) The missing term in
effective pair potentials. J. Phys. Chem., 91, 6269–6271.
43. Bussi,G., Donadio,D. and Parrinello,M. (2007) Canonical sampling
through velocity rescaling. J. Chem. Phys., 126, 014101.
44. Parrinello,M. and Rahman,A. (1981) Polymorphic transitions in
single-crystals––a new molecular-dynamics method. J. Appl. Phys.,
52, 7182–7190.
45. Oostenbrink,C. and van Gunsteren,W.F. (2005) Methane clustering in
explicit water: effect of urea on hydrophobic interactions. Phys.
Chem. Chem. Phys., 7, 53–58.
46. Arieh,B.-N. (1992) Statistical thermodynamics for chemists and
biochemists. Springer Science+Business Media, NY.
47. Gazzillo,D. (1995) Stability of uids with more than two
components. Mol. Phys., 84, 303–323.
48. UniProt Consortium (2013) Update on activities at the Universal
Protein Resource (UniProt) in 2013. Nucleic Acids Res., 41, D43–D47.
49. Yalkowsky,S.H. and Dannenfelser,R.M. (1992) Aquasol database of
aqueous solubility. College of Pharmacy, University of Arizona,
Tucson, AZ.
50. Stumpe,M.C. and Grubm ¨uller,H. (2007) Interaction of urea with
amino acids: implications for urea-induced protein denaturation. J.
Am. Chem. Soc., 129, 16126–16131.
51.
¨
Ank
¨
o,M.-L. and Neugebauer,K.M. (2012) RNA–protein interactions
in vivo: global gets specic. Trends Biochem. Sci., 37, 255–262.
52. Puton,T., Kozlowski,L., Tuszynska,I., Rother,K. and Bujnicki,J.M.
(2012) Computational methods for prediction of protein–RNA
interactions. J. Struct. Biol., 179, 261–268.
53. Zanzoni,A., Marchese,D., Agostini,F., Bolognesi,B., Cirillo,D.,
Botta-Orla,M., Livi,C.M., Rodriguez-Mulero,S. and Tartaglia,G.G.
(2013) Principles of self-organization in biological pathways: a
hypothesis on the autogenous association of alpha-synuclein. Nucleic
Acids Res., 41,
9987–9998.

Supplementary resource (1)

Data
October 2014
Matea Hajnic · Juan Iregui Osorio · Bojan Zagrovic
... The latter analysis also provided a comprehensive dissection of the salt dependence of nucleobase-amino acid interactions and the contribution of DNA sugar and phosphate groups to binding. Moreover, the nucleobase-amino acid affinity scales were also derived based on a simulated partitioning of amino acids between nucleobase-rich phases and water [83,84]. Finally, Vondrasek and coworkers have used the known PDB structures of DNA-protein complexes together with molecular mechanics and DFT-D ab initio calculations to estimate the binding preferences between all 20 natural amino acids and the four DNA bases [79]. ...
... We have shown that this stems from biosynthetically more complex amino acids that are thought to have entered biology later [50,103]. Finally, we have fully corroborated the above findings by using affinities derived by orthogonal approaches including umbrella-sampling MD simulations [81,82] or modeling of partitioning experiments [83,84]. ...
... The above results provide support for the stereochemical hypothesis of the origin of the genetic code, but they emphasize the importance of an extended, polymeric context in which the relatively weak affinities of individual building blocks can be amplified. Moreover, these findings support a novel hypothesis that in the unstructured state, mRNAs and the proteins they encode may be complementary to each other and bind in a coaligned manner, whereby the complementarity level is negatively regulated by mRNA ADE content [50,51, [81][82][83][84][103][104][105]. Since compositional matching is seen for primary sequence profiles, we expect that the strongest interactions will occur if the partners are unstructured, yielding dynamic, multivalent, liquid-like complexes: in addition to IDPs, the hypothesis applies equally well to the unfolded states of otherwise folded proteins [51]. ...
Article
Full-text available
Despite their importance, our understanding of non‐covalent RNA/protein interactions is incomplete. This especially concerns the binding between RNA and unstructured protein regions, a widespread class of such interactions. Here, we review the recent experimental and computational work on RNA/protein interactions in an unstructured context with a particular focus on how such interactions may be shaped by the intrinsic interaction affinities between individual nucleobases and protein sidechains. Specifically, we articulate the claim that the universal genetic code, in part, reflects the binding specificity between nucleobases and amino acids and that, in turn, the code may be seen as the Rosetta stone for understanding RNA‐protein interactions in general. This article is protected by copyright. All rights reserved.
... (note that matched profiles correspond to a negative Pearson R due to the standard way of how affinities are defined) (43). Similar results were also obtained by several computationally derived nucleobase/amino-acid affinity scales, with the opposite behavior observed only in the case of ADE (41,42,(44)(45)(46)(47). On the basis of such analyses, it was suggested that proteins in general bind to their autologous mRNA CDSs in a complementary, co-aligned manner, especially if unstructured. ...
Article
Full-text available
During packaging in positive-sense single-stranded RNA (+ssRNA) viruses, coat proteins (CPs) interact directly with multiple regions in genomic RNA (gRNA), but the underlying physicochemical principles remain unclear. Here we analyze the high-resolution cryo-EM structure of bacteriophage MS2 and show that the gRNA/CP binding sites, including the known packaging signal, overlap significantly with regions where gRNA nucleobase-density profiles match the corresponding CP nucleobase-affinity profiles. Moreover, we show that the MS2 packaging signal corresponds to the global minimum in gRNA/CP interaction energy in the unstructured state as derived using a linearly additive model and knowledge-based nucleobase/amino-acid affinities. Motivated by this, we predict gRNA/CP interaction sites for a comprehensive set of 1082 +ssRNA viruses. We validate our predictions by comparing them with site-resolved information on gRNA/CP interactions derived in SELEX and CLIP experiments for 10 different viruses. Finally, we show that in experimentally studied systems CPs frequently interact with autologous coding regions in gRNA, in agreement with both predicted interaction energies and a recent proposal that proteins in general tend to interact with own mRNAs, if unstructured. Our results define a self-consistent framework for understanding packaging in +ssRNA viruses and implicate interactions between unstructured gRNA and CPs in the process.
... наоборот [51,[57][58][59][60]. Неясно, однако, могли ли предполагаемые стереохимические взаимодей ствия между нуклеиновыми кислотами и пепти дами, а не свободными аминокислотами, играть значимую роль в происхождении кода [12]. ...
Article
Происхождение генетического кода и системы трансляции, возможно, является центральной и самой трудной проблемой в изучении происхождения жизни и одной из самых трудных во всей эволюционной биологии. Существует большое количество гипотез возникновения и развития современных генетических систем, затрагивающих происхождение и раннюю эволюцию генетического кода, а также возникновение репликации и трансляции. Наиболее широко известные гипотезы рассмотрены в данном обзоре. Однако ни одна из этих гипотез не описывает без пробелов и допущений все этапы ранней эволюции генетических систем. Гипотеза РНК-мира является главенствующей на сегодняшний день научной идеей о ранней эволюции биологических и пребиологических объектов. Главное её преимущество заключается в том, что она предлагает в качестве первых живых систем РНК как самодостаточные, с точки зрения воспроизведения, молекулы, которые способны функционировать как каталитический компонент системы и в то же время – как матричный. Однако есть и существенные недостатки. В частности, до сих пор не открыта и не получена экспериментально рибозимная процессивная полимераза. Учитывая взаимную потребность белков и нуклеиновых кислот в современном мире, многие авторы предлагают сценарии ранней эволюции на основе коэволюции этих двух классов органических молекул. Подобные гипотезы постулируют, что для репликации нуклеиновых кислот было необходимо возникновение трансляции, в отличие от мира РНК, где появлению трансляции предшествовала эра самореплицирующихся РНК. И хотя такие сценарии менее экономичны, с эволюционной точки зрения, так как требуют одномоментного появления и эволюции сразу двух классов органических молекул, а также синхронизации по времени появления репликации и трансляции, большим их преимуществом является то, что они предлагают развитие сразу гораздо более точной и процессивной белковой репликации.
... In this respect, we are confident that atomistic MD simulations, possibly supplemented by enhanced sampling techniques, have the potential to substantially contribute to the definition of a comprehensive molecular grammar of protein/RNA phase behavior, extending pioneering investigations focusing on mRNA-protein complementarity. 32 4 | MATERIALS AND METHODS 4.1 | Unbiased MD simulations R 5 and K 5 peptides were generated in extended structure with the LEaP program in AmberTools16, 33 while U 5 oligonucleotides were built in A-form using Chimera 1.14. 34 Peptide molecules were capped with ACE and NME groups to avoid artificial effects induced by the charged termini. ...
Article
Full-text available
Biomolecular condensates assembled through liquid–liquid phase separation (LLPS) of proteins and RNAs are currently recognized to play an important role in cellular organization. Their assembly depends on the formation of a network of transient, multivalent interactions between flexible scaffold biomolecules. Understanding how protein and RNA sequences determine these interactions and ultimately regulate the phase separation is an open key challenge. Recent in vitro studies have revealed that arginine and lysine residues, which are enriched in most cellular condensates, have markedly distinct propensities to drive the LLPS of protein/RNA mixtures. Here, we employ explicit‐solvent atomistic molecular dynamics simulations to shed light on the microscopic origin of this difference by investigating mixtures of polyU oligonucleotides with either polyR/polyK peptides. In agreement with experiments, our simulations indicate that arginine has a higher affinity for polyU than lysine both in highly diluted conditions and in concentrated solutions with a biomolecular density comparable to cellular condensate. The analysis of intermolecular contacts suggests that this differential behavior is due to the propensity of arginine side chains to simultaneously form a higher number of specific interactions with oligonucleotides, including hydrogen bonds and stacking interactions. Our results provide a molecular description of how the multivalency of the guanidinium group enables the coordination of multiple RNA groups by a single arginine residue, thus ultimately stabilizing protein/RNA condensates.
... Sequences including noncanonical amino acids or nucleotides were not analyzed. The majority of the amino acid property scales studied were extracted from the AAindex database (42,43), and were complemented by additional consensus scales derived by Atchley et al. (29) and an additional category of recently derived nucleobase affinity scales (44)(45)(46)(47). The frameshifted variants of individual protein sequences were generated by removing the first four bases (+1 shift) or the first two bases (resulting in the −1 shift) in their wild-type mRNA coding sequences and translating them using the universal genetic code. ...
Article
Full-text available
Significance Genetic information stored in DNA is transcribed to messenger RNAs, which are then translated to produce proteins. A frameshift in the reading frame at any stage of this process typically results in a significantly different protein sequence being produced. Here, we show that, nevertheless, several essential properties of many protein sequences, such as their hydrophobicity profiles, remain largely unchanged upon frameshifting. This finding suggests that frameshifting could be an effective evolutionary strategy for generating novel protein sequences, which retain the functionally relevant physicochemical properties of the sequences from which they derive.
Article
The origin of genetic code and translation system is probably the central and most difficult problem in the inves tigations on the origin of life and one of the most complex problems in the evolutionary biology in general. There are mul tiple hypotheses on the emergence and development of existing genetic systems that propose the mechanisms for the origin and early evolution of genetic code, as well as for the emergence of replication and translation. Here, we discuss the most wellknown of these hypotheses, although none of them provides a description of the early evolution of genetic systems without gaps and assumptions. The RNA world hypothesis is a currently prevailing scientific idea on the early evolution of biological and prebiological structures, the main advantage of which is the assumption that RNAs as the first living systems were selfsufficient, i.e., capable of functioning as both catalysts and templates. However, this hypothesis has also significant limitations. In particular, no ribozymes with processive polymerase activity have been yet discovered or synthesized. Taking into account the mutual need of proteins and nucleic acids in each other in the current world, many authors propose the early evolution scenarios based on the coevolution of these two classes of organic molecules. They postulate that the emer gence of translation was necessary for the replication of nucleic acids, in contrast to the RNA world hypothesis, according to which the emergence of translation was preceded by the era of selfreplicating RNAs. Although such scenarios are less parsimonious from the evolutionary point of view, since they require simultaneous emergence and evolution of two classes of organic molecules, as well as the emergence of synchronized replication and translation, their major advantage is that they explain the development of processive and much more accurate proteindependent replication.
Preprint
Full-text available
Biomolecular condensates assembled through liquid-liquid phase separation (LLPS) of proteins and RNAs are currently recognized to play an important role in cellular organization. Their assembly depends on the formation of a network of transient, multivalent interactions between flexible scaffold biomolecules. Understanding how protein and RNA sequences determine these interactions and ultimately regulate the phase separation is an open key challenge. Recent in vitro studies have revealed that arginine and lysine residues, which are enriched in most cellular condensates, have markedly distinct propensities to drive the LLPS of protein/RNA mixtures. Here, we employ explicit-solvent atomistic Molecular Dynamics (MD) simulations to shed light on the microscopic origin of this difference by investigating mixtures of polyU oligonucleotides with either polyR/polyK peptides. In agreement with experiments, our simulations indicate that arginine has a higher affinity for polyU than lysine both in highly diluted conditions and in concentrated solutions with a biomolecular density comparable to cellular condensate. The analysis of intermolecular contacts suggests that this differential behavior is due to the propensity of arginine side chains to simultaneously form a higher number of specific interactions with oligonucleotides, including hydrogen bonds and stacking interactions. Our results provide a molecular description of how the multivalency of the guanidinium group enables the coordination of multiple RNA groups by a single arginine residue, thus ultimately stabilizing protein/RNA condensates.
Article
Full-text available
Prebiotic peptide synthesis and the origin of the genetic code are central issues concerning the origin of life. The question of how they are possibly correlated on the primordial Earth remains perplexing, although numerous experiments have been carried out to explain the prebiotic chemistry of peptide synthesis and the genetic code origin. The purpose of this article is to review the chemical reactions occurred during the synthesis of peptides and the origin of the genetic code in the early Earth aqueous environment. Meanwhile, we attempt to review their relationship as well. At last, from our perspective, the chiral properties of biomolecules should be taken into account in the prebiotic chemical scenarios, which may contribute to some breakthroughs in the further research of this field. This article is protected by copyright. All rights reserved.
Chapter
In this chapter, the genetic codeGenetic code seems to be a universal codeUniversal code. The universal codeUniversal code has a specific arrangement of the codonsCodon that is definitely not random. There are at least three major concepts of the origin and the evolutionEvolution of the universal genetic codeGenetic code: Firstly, there is the stereochemical theoryStereochemical theory stating that the assignments of codonsCodon are determined by the physicochemical affinity of the amino acidsAmino acids and the cognate codonsCodon (synonymously referred to as anticodons)Anticodons; secondly, the co-evolution theory stating that the structure of the code structure coevolved with the biosynthesis process of amino acids; and thirdly, the error minimization theoryError minimization theory stating that there is a selection pressure minimizing the negative effects of point mutations and errors in translationTranslation were the main factor of code development. These theories are not contradictory and are also in line with the frozen accident hypothesis, such as the idea that the standard code may not have any special properties, but is simply determined by the fact that all existing life forms have a common ancestorCommon ancestor, whereby later alterations to the code are generally excluded by the detrimental effect of codonCodon reallocation. The mathematical examination of the structure and potential evolutionary trajectories of the code reveals that the code is highly resistant to translationTranslation errors, although there are numerous more resistant codes, suggesting that the standard code could emerge from a random code through a short sequence of rearrangements of series of codonsCodon. A large proportion of the evolutionEvolution leading to the standard code seems to be a mixture of a frozen collision with a selection for error minimization, even though it cannot be excluded that the code co-evolves along with metabolic pathways due to weak affinities between amino acidsAmino acids and nucleotide triplets. These scenarios for code evolution, nonetheless, are founded on formal patterns with uncertain relevance to real primordial evolutionEvolution. A true comprehension of code origins and developments is probably only possible in connection with a plausible script for the development of the coding scheme and the translationTranslation tool itself.
Article
This paper addresses the intriguing speculation that amino acid-nucleic acid interactions may have played a role in the evolutionary development of protein-based life from an early “RNA Universe.” To explore the possible impact of single amino acids in promoting nucleic acid folding, single-molecule Förster resonance energy transfer (smFRET) experiments have been implemented with a DNA hairpin construct (7 nucleotide double strand with 40A loop) as a simple model for secondary structure formation. Exposure to positively charged amino acids (arginine and lysine) is found to clearly stabilize secondary structure. Kinetically, each amino acid promotes folding by generating a large increase in the folding rate with little change in the unfolding rate. From van’t Hoff and Arrhenius analysis of the equilibrium and rate constants as a function of temperature, arginine and lysine are found to significantly increase the overall exothermicity of folding while imposing only a small entropic penalty on the folding process. Detailed investigations into the kinetics and thermodynamics of this amino acid-induced folding stability reveal arginine and lysine to interact with nucleic acids in a manner similar to that of monovalent cations. Specifically, these results are interpreted in the context of an ion atmosphere surrounding the nucleic acid, in which amino acids stabilize folding qualitatively like small monovalent cations, but also with kinetic signatures reflecting the side chain composition.
Article
Full-text available
The authors Rabie Saidi and Tunca Dogan were omitted from the list of the UniProt consortium in the acknowledgements section of this paper. The corrected consortium list is provided below. The UniProt Consortium UniProt has been prepared by Rolf Apweiler, Alex Bateman, Maria Jesus Martin, Claire O'Donovan, Michele Magrane, Yasmin Alam–Faruque, Emanuele Alpi, Ricardo Antunes, Joanna Arganiska, Elisabet Barrera Casanova, Benoit Bely, Mark Bingley, Carlos Bonilla, Ramona Britto, Borisas Bursteinas, Wei Mun Chan, Gayatri Chavali, Elena Cibrian–Uhalte, Alan Da Silva, Maurizio De Giorgi, Tunca Dogan, Francesco Fazzini, Paul Gane, Leyla Garcia Castro, Penelope Garmiri, Emma Hatton–Ellis, Reija Hieta, Rachael Huntley, Duncan Legge, Wudong Liu, Jie Luo, Alistair MacDougall, Prudence Mutowo, Andrew Nightingale, Sandra Orchard, Klemens Pichler, Diego Poggioli, Sangya Pundir, Luis Pureza, Guoying Qi, Steven Rosanoff, Rabie Saidi, Tony Sawford, Aleksandra Shypitsyna, Edward Turner, Vladimir Volynkin, Tony Wardell, Xavier Watkins, Hermann Zellner, Matt Corbett, Mike Donnelly, Pieter van Rensburg, Mickael Goujon, Hamish McWilliam and Rodrigo Lopez at the European Bioinformatics Institute (EMBL–EBI); Ioannis Xenarios, Lydie Bougueleret, Alan Bridge, Sylvain Poux, Nicole Redaschi, Lucila Aimo, Andrea Auchincloss, Kristian Axelsen, Parit Bansal, Delphine Baratin, Pierre–Alain Binz, Marie–Claude Blatter, Brigitte Boeckmann, Jerven Bolleman, Emmanuel Boutet, Lionel Breuza, Cristina Casal–Casas, Edouard de Castro, Lorenzo Cerutti, Elisabeth Coudert, Beatrice Cuche, Mikael Doche, Dolnide Dornevil, Severine Duvaud, Anne Estreicher, Livia Famiglietti, Marc Feuermann, Elisabeth Gasteiger, Sebastien Gehant, Vivienne Gerritsen, Arnaud Gos, Nadine Gruaz–Gumowski, Ursula Hinz, Chantal Hulo, Janet James, Florence Jungo, Guillaume Keller, Vicente Lara, Philippe Lemercier, Jocelyne Lew, Damien Lieberherr, Thierry Lombardot, Xavier Martin, Patrick Masson, Anne Morgat, Teresa Neto, Salvo Paesano, Ivo Pedruzzi, Sandrine Pilbout, Monica Pozzato, Manuela Pruess, Catherine Rivoire, Bernd Roechert, Michel Schneider, Christian Sigrist, Karin Sonesson, Sylvie Staehli, Andre Stutz, Shyamala Sundaram, Michael Tognolli, Laure Verbregue and Anne–Lise Veuthey at the SIB Swiss Institute of Bioinformatics (SIB); Cathy H. Wu, Cecilia N. Arighi, Leslie Arminski, Chuming Chen, Yongxing Chen, John S. Garavelli, Hongzhan Huang, Kati Laiho, Peter McGarvey, Darren A. Natale, Baris E. Suzek, C. R. Vinayaka, Qinghua Wang, Yuqi Wang, Lai–Su Yeh, Meher Shruti Yerramalla and Jian Zhang at the Protein Information Resource (PIR).
Article
Full-text available
Previous evidence indicates that a number of proteins are able to interact with cognate mRNAs. These autogenous associations represent important regulatory mechanisms that control gene expression at the translational level. Using the catRAPID approach to predict the propensity of proteins to bind to RNA, we investigated the occurrence of autogenous associations in the human proteome. Our algorithm correctly identified binding sites in well-known cases such as thymidylate synthase, tumor suppressor P53, synaptotagmin-1, serine/ariginine-rich splicing factor 2, heat shock 70 kDa, ribonucleic particle-specific U1A and ribosomal protein S13. In addition, we found that several other proteins are able to bind to their own mRNAs. A large-scale analysis of biological pathways revealed that aggregation-prone and structurally disordered proteins have the highest propensity to interact with cognate RNAs. These findings are substantiated by experimental evidence on amyloidogenic proteins such as TAR DNA-binding protein 43 and fragile X mental retardation protein. Among the amyloidogenic proteins, we predicted that Parkinson's disease-related α-synuclein is highly prone to interact with cognate transcripts, which suggests the existence of RNA-dependent factors in its function and dysfunction. Indeed, as aggregation is intrinsically concentration dependent, it is possible that autogenous interactions play a crucial role in controlling protein homeostasis.
Article
Full-text available
Despite more than 50 years of effort, the origin of the genetic code remains enigmatic. Among different theories, the stereochemical hypothesis suggests that the code evolved as a consequence of direct interactions between amino acids and appropriate bases. If indeed true, such physicochemical foundation of the mRNA/protein relationship could also potentially lead to novel principles of protein-mRNA interactions in general. Inspired by this promise, we have recently explored the connection between the physicochemical properties of mRNAs and their cognate proteins at the proteome level. Using experimentally and computationally derived measures of solubility of amino acids in aqueous solutions of pyrimidine analogs together with knowledge-based interaction preferences of amino acids for different nucleobases, we have revealed a statistically significant matching between the composition of mRNA coding sequences and the base-binding preferences of their cognate protein sequences. Our findings provide strong support for the stereochemical hypothesis of genetic code's origin and suggest the possibility of direct complementary interactions between mRNAs and cognate proteins even in present-day cells.
Article
Full-text available
Recently, the ability to interact with messenger RNA (mRNA) has been reported for a number of known RNA-binding proteins, but surprisingly also for different proteins without recognizable RNA binding domains including several transcription factors and metabolic enzymes. Moreover, direct binding to cognate mRNAs has been detected for multiple proteins, thus creating a strong impetus to search for functional significance and basic physico-chemical principles behind such interactions. Here, we derive interaction preferences between amino acids and RNA bases by analyzing binding interfaces in the known 3D structures of protein–RNA complexes. By applying this tool to human proteome, we reveal statistically significant matching between the composition of mRNA sequences and base-binding preferences of protein sequences they code for. For example, purine density profiles of mRNA sequences mirror guanine affinity profiles of cognate protein sequences with quantitative accuracy (median Pearson correlation coefficient R = −0.80 across the entire human proteome). Notably, statistically significant anti-matching is seen only in the case of adenine. Our results provide strong evidence for the stereo-chemical foundation of the genetic code and suggest that mRNAs and cognate proteins may in general be directly complementary to each other and associate, especially if unstructured.
Article
An integral equation approach is used to investigate the stability limits of ternary fluid mixtures made up of hard spheres with non-additive diameters. With this simple model of non-ideal solutions we present a first application of the general theory, developed in the first paper of this series, for the phase stability of multicomponent systems. The spinodal boundary is localized by searching for singularities of a new function, which generalizes the Bhatia-Thornton concentration-concentration structure factor to mixtures with more than two components: the divergence of its long wavelength limit (i.e., S-CC((3))(k=0) in our case) signals phase instability For the ternary fluids considered, we test two 'closures' for the Ornstein-Zernike integral equations: the Ballone-Pastore-Galli-Gazzillo approximation, and the simplest multicomponent version of the Verlet closure, originally proposed for one-component hard spheres. The relevant integral equation results for thermodynamics and structure are compared successfully with recent molecular dynamics simulations. A comparison is also made with one-fluid first-order perturbation theories. Regarding phase stability, the present integral equation study confirms the molecular dynamics analysis qualitatively: at high densities the considered ternary mixtures of non-additive hard spheres become unstable and exhibit a demixing of purely entropic origin.