Available via license: CC BY 4.0
Content may be subject to copyright.
12984–12994 Nucleic Acids Research, 2014, Vol. 42, No. 21 Published online 31 October 2014
doi: 10.1093/nar/gku1035
Computational analysis of amino acids and their
sidechain analogs in crowded solutions of RNA
nucleobases with implications for the mRNA–protein
complementarity hypothesis
Matea Hajnic, Juan Iregui Osorio and Bojan Zagrovic
*
Department of Structural and Computational Biology, Max F. Perutz Laboratories, University of Vienna, Vienna 1030,
Austria
Received July 17, 2014; Revised September 29, 2014; Accepted October 11, 2014
ABSTRACT
Many critical processes in the cell involve direct
binding between RNAs and proteins, making it im-
perative to fully understand the physicochemical
principles behind such interactions at the atom-
istic level. Here, we use molecular dynamics sim-
ulations and 15 s of sampling to study the be-
havior of amino acids and amino acid sidechain
analogs in high-concentration aqueous solutions
of standard RNA nucleobases. Structural and ener-
getic analysis of simulated systems allows us to de-
rive interaction propensity scales for different amino
acid/nucleobase combinations. The derived scales
closely match and greatly extend the available ex-
perimental data, providing a comprehensive foun-
dation for studying RNA–protein interactions in dif-
ferent contexts. By using these scales, we demon-
strate a statistically significant connection between
nucleobase composition of human mRNA coding
sequences and nucleobase interaction propensities
of their cognate protein sequences. For example,
pyrimidine density profiles of mRNAs match uracil-
propensity profiles of their cognate proteins with a
median Pearson correlation coefficient of
R
=−0.70.
Ourresultsprovidesupportforthe recentlyproposed
hypotheses that mRNAs and their cognate proteins
may be physicochemically complementary to each
other and bind, especially if unstructured, with the
complementaritylevelbeingnegativelyinfluencedby
mRNA adenine content. Finally, we utilize the derived
scales to refine the complementarity hypothesis and
closely examine its physicochemical underpinnings.
INTRODUCTION
From transcriptional and translational regulation to RNA
processing and decay to protein localization, many key pro-
cesses in the cell depend directly on RNA–protein interac-
tions (1–4). What is more, the list of systems that involve
RNA–protein interactions keeps dramatically expanding.
Recently, for example, high-throughput efforts aimed at
capturing the mRNA–protein interactome identied a large
number of novel RNA-binding proteins (5,6).Outofato-
tal of approximately 800 mRNA-binding proteins detected
in these studies using covalent UV-crosslinking methods,
about 25% were found not to contain any known RNA-
binding domains, while an even greater number lacked clear
functional characterization. Despite the challenges ahead,
one may expect that integrative efforts involving biochemi-
cal, structural and computational techniques will soon cat-
alog most if not all of biologically relevant RNA–protein
interactions. On the other hand, our understanding of the
basic physicochemical principles behind such interactions
still remains incomplete. Most importantly, only a few ex-
perimental studies have been performed in order to directly
explore interactions between individual nucleobases and
amino acids in different environments (7–10). While global
and local structural contexts do play important roles in
dening the properties of RNA–protein binding interfaces,
it is reasonable to expect that binding specicity in general
also critically depends on the preferences of individual nu-
cleobases and amino acids for each other.
In this reductionist framework, the properties of the bind-
ing sites are at least in part a consequence of binding prefer-
ences that are intrinsic to individual nucleobases and amino
acids. Motivated by this, Akinrimisi et al. (9) and Thomas
et al. (10) have measured afnities of several naturally oc-
curring amino acids for a set of nitrogenous bases and nu-
cleosides using spectroscopic methods, but those experi-
ments were never performed systematically for all possible
*
To whom correspondence should be addressed. Tel: +43 1 4277 52271; Fax: +43 1 4277 9522; Email: bojan.zagrovic@univie.ac.at
Present address: Juan Osorio Iregui, Institute for Theoretical Physics, ETH Z¨urich 8093, Switzerland.
C
The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Nucleic Acids Research, 2014, Vol. 42, No. 21 12985
combinations. Furthermore, Woese et al. have used chro-
matographic measurements to dene a scale of amino acids’
propensity to interact with pyrimidine mimetics pyridines,
which they termed ‘polar requirement’ (PR) (7,8). Finally,
several authors have studied interactions between differ-
ent nucleotides and polyamino acids, focusing typically on
polylysine or polyarginine peptides (11–13). Despite the
clear importance of such experimental studies, however,
we nd it remarkable that they have not been repeated or
extended since the 1960s and 1970s when they were rst
performed. On the other hand, sizable progress has been
made using computational and theoretical approaches (14–
28). Most signicantly, the available structures of nucleic
acid–protein complexes have been statistically analyzed to
explore the more general physicochemical principles be-
hind nucleobase/amino acid interactions (14–18) and, in
particular, derive binding preference scales, also known as
knowledge-based potentials (19–23). Moreover, a computa-
tional equivalent of the PR scale was derived using molec-
ular dynamics (MD) simulations, providing a microscopic
picture behind the interaction propensities exhibited by in-
dividual amino acids (24). Finally, quantum-mechanical
calculations have been used to characterize the interactions
between a select subset of bases and amino acids (25–28).
Overall, all of these studies suggest that the preferences of
individual nucleobases and amino acids for each other in
water may be highly differentiated, but a large-scale analy-
sis of this effect has never been systematically performed.
An important context in which nucleobase/amino acid
interactions may be relevant concerns an important foun-
dational question in molecular biology, that of the origin of
the universal genetic code (29–31). In particular, the stere-
ochemical hypothesis proposes that the code evolved as
a consequence of direct interactions between codons and
amino acids they code for (7,8,32–35). An early formula-
tion of the stereochemical hypothesis was put forth by Carl
Woese et al. based on the above mentioned PR scale, i.e.
the propensity of amino acids to interact with pyrimidine
mimetics (7,8). Recently, we have demonstrated that pyrim-
idine density proles of mRNA coding sequences closely
mirror the PR-weighted proles of their cognate protein se-
quences (36). In other words, pyrimidine-rich mRNA re-
gions tend to code for cognate protein regions that exhibit
high propensity to interact with pyrimidine mimetics and
vice versa. Moreover, we have used knowledge-based poten-
tials derived from experimental structures of RNA–protein
complexes to not only conrm these ndings in the case
of pyrimidines, but also extend them to purines (23,37). By
providing quantitative evidence for an early, more qualita-
tive proposal by Kyrpides and Ouzounis (38,39), these re-
sults allowed us to raise the stereochemical hypothesis to
the level of a general relationship between sequence com-
position of mRNAs and their cognate proteins as well as
to hypothesize that mRNAs could bind their cognate pro-
teins in a complementary fashion, especially if unstruc-
tured (23,36,37
). We argued that such binding interactions
w
ere an important driving force behind the establishment
of the universal genetic code, but also that they may have
a critical, yet still not fully characterized role in present-
day cell as well (23,36,37). Intriguingly, in our analysis of
knowledge-based preference scales, we found that guanine
and adenine exhibit opposite amino acid binding prefer-
ences, resulting in a curious asymmetry: while guanine-
binding propensity proles on the side of proteins closely
match purine density proles on the side of their cognate
mRNAs, adenine-binding protein sequence proles more
closely mirror pyrimidine density mRNA proles (23,37).
This hints at additional complexities behind the putative
cognate complementarity and suggests that there may have
been at least two major phases in the development of the
genetic code with opposite requirements when it comes to
complementary matching (37).
In order to better understand the underlying physico-
chemical principles behind RNA–protein interactions and
shed more light on the mRNA/protein complementarity
hypothesis, here we systematically explore the behavior of
individual amino acids and amino acid sidechain analogs in
high-concentration aqueous solutions of different RNA nu-
cleobases using classical MD simulations (Figure 1A). For
our simulations, we employ the GROMOS 53A6 force eld
(40), which was parameterized to accurately capture solva-
tion free energies of amino acid sidechain analogs in water
and cyclohexane. These hydrophobicity-related properties,
in turn, are considered to be an important factor in dening
amino acid/nucleobase interaction propensities. Empow-
ered by the spatial and temporal resolution provided by MD
simulations, we present an atomistic view of how individual
amino acids or sidechain analogs interact with RNA nucle-
obases in water and dene interaction preferences between
them using different structural and energetic criteria. Such
interaction propensity scales create a rigorous, reductionist
foundation for the analysis of RNA–protein interactions in
general, which is here further employed for a critical exam-
ination of the mRNA–protein complementarity hypothesis
and its renement.
MATERIALS AND METHODS
The 20 natural amino acids and 18 of their sidechain
analogs (all except for Gly and Pro) were simulated in the
presence of a single type of common RNA nucleobases in
aqueous solution: adenine (ADE), cytosine (CYT), guanine
(GUA) or uracil (URA). In all simulations, a single amino
acid or a sidechain analog (corresponding to an amino acid
residue with a hydrogen atom added to C) was centered in
a cubic box of initial size 4 × 4 × 4nm
3
with nucleobases
and water molecules placed in random orientations around
it so as to achieve the molar fraction of water of 0.86. Con-
sidering the low water solubility of naturally occurring nu-
cleobases, this molar fraction of water was chosen in order
to reach a compromise between maximizing the probabil-
ity of detecting interaction between amino acids (i.e. their
sidechain analogs) and nucleobases on the one hand and
minimizing nucleobase solubility issues on the other. In to-
tal, there were approximately 1250 molecules in each sys-
tem: one amino acid or sidechain analog, 170 nucleobase
molecules and the rest water molecules (see Supplementary
Table S1 for details). All amino acids were simulated in
their zwitterionic form. In the case of charged amino acids
or sidechain analogs, one randomly chosen water molecule
was replaced by a counter ion (Na
+
or Cl
−
) in order to ob-
tain an electrically neutral system.
12986 Nucleic Acids Research, 2014, Vol. 42, No. 21
Figure 1. (A) A typical snapshot from the simulation with a single amino acid sidechain analog (Leu) in CYT/water mixture. (B) CYT/CYT, CYT/water
and water/water radial distribution functions in Leu simulations (top) with the corresponding Kirkwood–Buff integrals (bottom). The nal value for the
integrals was taken as the average over the approximately constant window denoted by the arrow.
All simulations were carried out using the Gromacs 4.5.1.
simulation package (41), united-atom GROMOS 53A6
force eld (40) and SPC/E water model (42) with a 2 fs
integration step. Parameters for the nucleobases were ob-
tained from those corresponding to full nucleotides in the
GROMOS 53A6 force eld while ensuring charge neutral-
ity. Long-range electrostatic interactions were treated using
Particle Mesh Ewald (PME) summation with a grid spac-
ing of 0.12 nm and an interpolation order of 4. The cut-off
for short-range Coulombic and van der Waals interactions
was set to 0.9 nm. The temperature and pressure in all sim-
ulations were kept at 300 K and 1 bar using V-rescale ther-
mostat (
T
= 0.1 ps) (43) and Parrinello-Rahman barostat
(
p
= 2 ps and compressibility = 4.5 × 10
−5
bar
−1
)(44),
respectively. After minimization using the steepest descent
algorithm in water (10 000–25 000 steps), the systems were
rst equilibrated in the NVT ensemble for 800 ps and then
subjected to 400 ps of equilibration in the NPT ensemble
with the same position restraints placed on the amino acid
or sidechain analog. All production runs, each 100 ns long,
were performed in the NPT ensemble for a total of 15.2 s
of simulated time over all systems.
In order to test if systems with naturally occurring nu-
cleobases are microscopically stable, we have analyzed the
values of the rst derivative of the natural logarithm of ac-
tivity of the nucleobase, ln a
N
, with respect to natural loga-
rithm of nucleobase molar fraction, ln X
N
, in the simulated
systems (45–47). The quantity:
∂lna
N
∂lnX
N
=
1
1 + ρ
N
X
N
(G
NN
+ G
WW
− 2G
NW
)
(1)
where G
NN
, G
NW
and G
WW
denote Kirkwood–Buff integrals
derived from nucleobase/nucleobase, nucleobase/water
and water/water radial distribution functions (RDFs), re-
spectively, must be positive for a system to be microscopi-
cally stable. Here,
N
stands for the nucleobase density num-
ber. The Kirkwood–Buff integrals were calculated using the
following formula (45–47):
G
SS
(r) = 4π
r
0
r
2
[g
ss
(r
) − 1]dr
(2)
where g
SS
denotes nucleobase/nucleobase (g
NN
),
nucleobase/water (g
NW
)orwater/water (g
WW
)RDFs,
respectively, from which corresponding Kirkwood–Buff
integrals (G
NN
, G
NW
, G
WW
) were derived. As anchor points
for RDFs, we used centers of mass of nucleobases and
water molecules. Finally, as representative examples for
each individual nucleobase type, we have performed the
above analysis on simulated systems with Leu residues and
the results are reported in Figure 1B and Supplementary
Table S2.
To quantify the interaction propensity of amino acids or
sidechain analogs for different nucleobases, we have ana-
lyzed their behavior in the simulated mixtures both struc-
turally and energetically. For simplicity, we describe the pro-
cedure for sidechain analogs only, but analogous calcula-
tions were also performed for all amino acid-containing
systems. For structural analysis, RDFs were calculated by
using as anchor points the centers of mass of amino acid
sidechain analogs, nucleobases and water molecules. For
energetic analysis, we have calculated differences between
the total force-eld potential energies corresponding to
sidechain analog-nucleobase (E
X-N
) and sidechain analog-
water interactions (E
X-W
):
E
NW
X
= E
X-N
− E
X-W
[kJ/mol]. (3)
Moreover, the obtained differences in potential energy
between sidechain analog-nucleobase-water interactions
(E
NW
X
) were further subtracted between systems with dif-
ferent nucleobases (N
1
, N
2
) in order to obtain relative inter-
action propensities or preferences of each sidechain analog
for a specic nucleobase with respect to other nucleobases
Nucleic Acids Research, 2014, Vol. 42, No. 21 12987
(E
N
1
N
2X
):
E
N
1
N
2X
= E
N
1
W
X
− E
N
2
W
X
[kJ/mol]. (4)
In a related study (M. Hajnic, J. I. Osorio and B. Za-
grovic, unpublished data), we have simulated amino acids in
the presence of only one type of nitrogenous base (unsubsti-
tuted pyrimidines or purines) in water solution as here, but
also in mixed systems with both nitrogenous bases (unsub-
stituted pyrimidines and purines) present at the same time.
The relative amino acids’ interaction propensities derived
from mixed systems where both bases were present at the
same time and those derived from differences between in-
dividual systems correlate with each other with a Pearson
correlation coefcient R = 0.98. This suggests that one can
obtain relative interaction propensities of amino acids for
different nucleobases from individual interaction propen-
sities derived from systems with only one nucleobase type
present.
To be able to compare systems with slightly different mo-
lar compositions, the calculated potential energies between
sidechain analog and water molecules were rescaled before
obtaining the interaction propensity scale in order to have
all systems correspond to exactly 0.86 molar fraction water.
When rescaling, we implicitly assumed that the few addi-
tional water molecules behave on average in the same way
as the rest of the water molecules in the system and con-
tribute to the overall sidechain analog-water potential en-
ergy proportionally to their number. Analogous structural
and energetic analysis was performed for systems contain-
ing amino acids with interaction energies evaluated over all
amino acid atoms. Amino acid and sidechain analog inter-
action propensity scales are given in Supplementary Table
S7 in units of kJ/mol. Note, however, that the exact ener-
getic values given in our scales depend strongly on the par-
ticular features of simulated systems (such as molar fraction
of water or nucleobases), and as such should primarily be
considered and analyzed in a relative sense.
The obtained scales were used as described in Hlevnjak
et al. (36) in order to assess the correlation between pro-
tein interaction propensities for different nucleobases and
the nucleobase content of their cognate mRNAs over the
complete Homo sapiens, Escherichia coli and Methanocaldo-
coccus jannaschii proteomes. In the case of sidechain analog
interaction propensity scales, glycines and prolines were ig-
nored on the protein side together with their codons on the
side of mRNA. The sequence datasets were extracted from
the UniProtKB database (April 2013 release) as described
previously (36,48). Window-averaged proles of individual
mRNAs and proteins were calculated in the same way as
reported previously (36), where each position in the prole
corresponds to the average value of the property in question
over a window (with the size of 21 residues for proteins and
63 bases for mRNAs) centered at that position. As shown
before (36), for window sizes anywhere between 10 and 40
residues, the results depend only marginally on window size
(variation < 2%).
To test the signicance of median values of prole-
matching Pearson R distributions calculated for complete
proteomes, we generated 10
6
random scales and compared
the medians of their prole-matching Pearson R distribu-
tions to the tested ones for each individual proteome. Ran-
dom scales were generated by drawing numbers from a uni-
form distribution between 0 and 1. Finally, the P-values
were calculated as the fraction of random scales whose me-
dians of the prole-matching Pearson R distributions were
greater than or equal to the tested ones in absolute value.
RESULTS
Validation and analysis of binding propensity scales
Natural nucleobases have low water solubility (49), rang-
ing from 1.04 g/lforADEto8g/l for CYT, corresponding
to base molar fractions of X
ADE
= 1 × 10
−4
and X
CYT
=
1 × 10
−3
, respectively. In order to: (i) realistically model
nucleobase density at typical RNA–protein interfaces and
(ii) reach a critical number of nucleobases that would al-
low us to observe interactions with amino acids or their
sidechain analogs on a reasonable timescale, we have sim-
ulated systems whose nucleobase concentrations were sig-
nicantly higher than their macroscopic solubility levels
(e.g. X
N
= 0.14). Practically speaking, we have simulated
the behavior of amino acids and their sidechain analogs
in hydrated, dynamic agglomerates of nucleobases as illus-
trated in Figure 1A for the Leu sidechain in CYT solution.
While such systems, in fact, better approximate the effec-
tive concentration of nucleobases at typical RNA–protein
interfaces, it was critical to rst assess their thermodynamic
stability at the microscopic level.
The stability of a binary, high-concentration mixture of
water and nucleobases can be studied by analyzing the rst
derivative of the natural logarithm of activity of the nu-
cleobase with respect to the natural logarithm of the nu-
cleobase mole fraction, ∂ln a
N
/∂ln X
N
(45–47). This value,
which should be positive for systems to be microscopically
stable, was calculated from Equation (1)whereG
NN
, G
NW
and G
WW
denote Kirkwood–Buff integrals derived from
nucleobase/nucleobase, nucleobase/water and water/water
RDFs, respectively (45–47). A typical set of such RDFs
encountered in our simulations is given in Figure 1B (top
panel) for the Leu sidechain in CYT solution. Importantly,
due to the poor convergence of Kirkwood–Buff integrals,
as an estimate of G
SS
, in all cases we took the average of
G
SS
over distances starting from 1.5 nm (Figure 1B, lower
panel). Following the above procedure, we could indeed
show that the above requirement (i.e. ∂ln a
N
/∂ln X
N
> 0)
is fullled for all four nucleobase types (Supplementary Ta-
ble S2). This suggests that although our systems would over
long timescales likely result in a creation of macroscopic
aggregates, they are thermodynamically stable on the size-
and time scales examined here and could be used as model
systems to study the behavior of amino acids and their
sidechain analogs in aqueous solutions of nucleobases. The
fact that despite high nucleobase concentrations we did not
observe formation of any static precipitates further corrob-
orates this claim.
We have used our simulations to calculate differences be-
tween the total force-eld potential energies correspond-
ing to amino acid–nucleobase and amino acid–water in-
teractions (and the same for sidechains). How do these
energy-based interaction propensity scales compare with
experimental results? The experimental PR scale (8), de-
rived by analyzing the chromatographic mobility of amino
12988 Nucleic Acids Research, 2014, Vol. 42, No. 21
acids in water mixtures of substituted pyridines such as
dimethylpyridine (DMP), is one of the few examples where
interactions between amino acids and nitrogenous bases
have been systematically explored in experiment. Speci-
cally, PR of a given amino acid was dened as the slope of a
linear t between the logarithm of its retention coefcient R
and the logarithm of mole fraction of water in the pyridine–
water solvent. In a related study (M. Hajnic, J. I. Osorio
and B. Zagrovic, unpublished data), we have performed MD
simulations of amino acids and their sidechain analogs in
water/DMP mixtures using the same setup as here. The
energy-based DMP/amino acid and DMP/sidechain ana-
log interaction propensity scales derived from MD agree
closely with the experimental PR scale (8) with Pearson R
coefcients of 0.93 and 0.95, respectively, attesting to the
general quality of our simulation methodology (M. Hajnic,
J. I. Osorio and B. Zagrovic, unpublished data).
Remarkably, the experimental PR scale (8) also exhibits
close correlation with the energy-based amino acid inter-
action propensity scales derived here for URA (Pearson
R = 0.89), ADE (R = 0.84) and CYT (R = 0.77), with
a signicantly weaker correlation observed for GUA (R =
0.30) (Figure 2A, inset table, third column). What is more,
all of these correlations against the experimental PR scale
improve even further if one uses sidechain analog scales
instead (Figure 2A, inset table, second column), with the
URA interaction propensity scale exhibiting the strongest
correlation (R = 0.94), followed by ADE (R = 0.93), CYT
(R = 0.86) and, nally, GUA (R = 0.58). In Figure 2A,
we plot the sidechain analog scale for URA, a nucleobase
which is physicochemically and sterically most similar to
DMP, against the experimental PR scale (8)andthetwo
exhibit remarkable similarity. Although the experimental
PR scale and the computational URA, ADE and CYT
scales were derived in very different ways, the close agree-
ment between can be taken as evidence of the quality of
the MD force eld and the general computational method-
ology used. Moreover, such agreement also suggests that
when it comes to capturing nucleobase/amino acid interac-
tion specicity, DMP is actually a good model not only for
naturally occurring pyrimidine bases URA and CYT, but
also purine ADE.
When we compare our sidechain analog interaction
propensities for GUA with the only analogous, exten-
sive scale available from experiment, that of amino acid–
guanosine binding constants for eight amino acids (Ser, Thr,
Val, Leu, Met, Lys, Phe and Trp) (10), we obtain a Spear-
man rank-order correlation coefcient of =−0.83 (Fig-
ure 2B) and a direct Pearson correlation coefcient of R =
0.79 when the association constants are converted to bind-
ing free energies (Figure 2B, inset). Interestingly, in our sim-
ulations we not only correctly capture the relative interac-
tion propensities of aromatic sidechain analogs for GUA,
but we also observe the same propensity trends as in the ex-
periment for the relatively similar residues such as Ser and
Thr or Val and Leu. What is more, if one excludes the out-
lier Lys, the rank correlation increases to =−0.96.
On the
other hand, the level of correlation drops signicantly if one
uses the computational scale for amino acids, here also in-
cluding the value for Gly ( =−0.62 and R = 0.46). Finally,
the experimentally derived binding free energies of four
amino acids for adenosine (Val, Lys, Phe, Trp) ( =−0.80
and R = 0.52) and two for cytidine (Phe, Trp) (10) show
the same trend as observed in our sidechain analog inter-
action propensity scales for the equivalent bases, with sim-
ilar results for amino acid scales ( =−0.80 and R = 0.50
for the adenosine case). Overall, a combination of the above
thermodynamic stability analysis and the favorable compar-
ison with experiment reassuringly suggests that the essen-
tial physical chemistry behind amino acid/nucleobase inter-
actions remains approximately the same even at relatively
high nucleobase concentrations as studied here. This fur-
thermore suggests that our simulation-based scales can be
used to greatly extend the limited experimental data avail-
able and characterize interactions with nucleobases for all
amino acids and sidechain analogs. Interestingly, in many
cases, our simulations with sidechain analogs match the
experimental data obtained with amino acids slightly bet-
ter than the simulations with amino acids themselves (Fig-
ure 2), a nding we do not currently have a full explanation
for. A part of the reason may be that the GROMOS53A6
force eld was parameterized to match solvation free ener-
gies of sidechain analogs in cyclohexane or water and not
those of complete amino acids. It is possible that a po-
tentially lower accuracy of parameters for complete amino
acids may be responsible for a greater discrepancy from ex-
periment in that case. However, as sidechain analogs cap-
ture the behavior of protein residues at RNA–protein inter-
faces arguably better than the zwitterionic amino acids do,
in the remainder of this text we primarily focus on sidechain
analogs, while always giving the results for amino acids as a
point of comparison.
The above energetic analysis is well illustrated by a
structural exploration using RDFs. In Figure 3, we show
water/sidechain-analog and nucleobase/sidechain-analog
RDFs for the most favorable and the least favorable in-
teracting partners of the four RNA nucleobases, as deter-
mined by the analysis of interaction energies. URA, CYT
and ADE, for example, all exhibit the strongest prefer-
ence for interacting with Trp relative to all other residues,
which is illustrated by the presence of a pronounced rst
peak in their nucleobase/Trp RDFs. On the other hand,
in the case of GUA the strongest favorable interactions are
seen for Lys. When it comes to the least favorable interac-
tions, in all cases they are invariably seen with the nega-
tively charged Glu and Asp. The presence of a well-dened
peak in CYT/Glu, ADE/Asp and GUA/Glu RDFs, how-
ever, suggests that, although unfavorable, some of these in-
teractions do exhibit a sizable level of structural organiza-
tion. Nonetheless, for all energetically unfavorable interac-
tions, it is clear that the residues in question prefer to in-
teract with and be surrounded by water molecules, as in-
dicated by strong, well-dened rst peaks in the respective
water/sidechain-analog RDFs.
As discussed above, the GUA-based interaction energy
scales differ most from all other scales. When correlat-
ing the individual scales against each other, we indeed
nd that the GUA scale deviates most from other scales,
which is primarily due to the behavior of charged sidechain
analogs (Figure 4A). Namely, in the GUA/water mixture,
Lys and Arg exhibit lower interaction energies with GUA
than with water molecules, which is not the case in any
Nucleic Acids Research, 2014, Vol. 42, No. 21 12989
Figure 2. (A) Correlation between the experimentally derived polar requirement (PR
experiment
) scale (8) and the energy-based scale of sidechain analog
interaction propensities for URA (in kJ/mol) obtained by simulation. Inset: Pearson correlation coefcients R between all sidechain analog (second
column) and amino acid (third column) propensity scales and the PR scale. (B) Rank-order correlation between experimentally measured amino acid–
guanosine association constants (10), and the computationally derived sidechain analog interaction energy scale for GUA (in kJ/mol). Inset: correlation
between binding free energies (in kJ/mol) at the standard reference concentration of 1 M, as derived from association constants, and the the computationally
obtained sidechain analog interaction energy scale for GUA (in kJ/mol).
Figure 3. Water/sidechain-analog and nucleobase/sidechain-analog ra-
dial distribution functions g(r) for the most favorable (left column) and the
least favorable (right column) sidechain analog interacting partners for the
four RNA nucleobases, as determined by energy-based interaction propen-
sity scales for: (A) URA, (B) CYT, (C) ADE and (D)GUA.
other nucleobase/water systems except for the CYT/Arg
system (Figure 4A). Furthermore, Asp and Glu also exhibit
signicantly more favorable interaction energies with GUA
as compared to other energy-based interaction propensity
scales (Figure 4A). Although in absolute terms these two
anionic sidechains do not interact favorably with GUA (i.e.
they exhibit positive energies), the extent of this unfavorable
bias is the least as compared to other bases (Figure 4A).
Similar results are also seen in the simulations with com-
plete amino acids (data not shown).
A particularly telling comparison in this regard concerns
the behavior of GUA- and ADE-based scales. If one, for ex-
ample, examines relative energy-based interaction propen-
sity scales, one observes a remarkable asymmetry in the
behavior of GUA and ADE (Supplementary Figure S1).
In particular, the relative ADE–CYT scale is strongly in-
versely correlated with those involving GUA (GUA–CYT,
R =−0.84 and GUA–URA, R =−0.95) with no signicant
correlations or anti-correlations for the ADE–URA relative
scale (Supplementary Figure S1). In Figure 4B, we illustrate
this difference in the case of GUA–CYT and ADE–CYT
relative scales and it is clear that the effect is completely
due to the nature of the interactions of the charged residues
with ADE and GUA relative to that with CYT. While
GUA strongly prefers to interact with Lys, Arg, Asp and
Glu as compared to CYT (with, for example, E
sca
GUA
–CYT
of cca. −100 kJ/mol in the case of Lys), ADE almost
equally strongly prefers not to interact with these residues
(with E
sca
ADE
–CYT
of cca. 100 kJ/mol in the case of Lys)
again as compared to CYT (Figure 4B). This effect clearly
demonstrates the paramount importance of specic ring
substituents especially in the case of purine bases, which
was already observed in our analysis of knowledge-based
nucleobase-residue interaction propensity scales (23). Inter-
estingly, while the sidechain analog scale derived presently
for ADE correlates reasonably well with the equivalent
knowledge-based scale (Spearman = 0.57 for the 2+ scale
from Polyansky et al. (23)), the correlations for all other
scales including the GUA scale are signicantly weaker (||
< 0.2) (Supplementary Table S3).
A similar trend is also seen with amino acid interac-
tion propensity scales (Supplementary Table S3). On the
other hand, the relative scales of GUA derived presently
for sidechain analogs agree somewhat better with those de-
rived in the knowledge-based analysis (23) with, for exam-
ple, GUA–URA and GUA–CYT correlating with Spear-
man of 0.40 or 0.38, respectively (Supplementary Table
S4). While these correlations between complete scales are
relatively weak, it is important to mention that they agree
much better when it comes to the relative placement of
12990 Nucleic Acids Research, 2014, Vol. 42, No. 21
Figure 4. (A) A direct comparison between energy-based sidechain analog interaction propensity scales for the four nucleobases with Pearson correlation
coefcients given in the graphs. In each graph, the four charged amino acids are labeled in red. (B) Correlation between relative energy-based sidechain
analog GUA–CYT and ADE–CYT interaction propensity scales (in kJ/mol) derived from simulations of different systems.
charged residues only, which is in the end chiey respon-
sible for the qualitative similarities between the scales, as
discussed below.
Analysis of the mRNA-cognate protein complementarity hy-
pothesis
We have used the obtained scales to study the relation-
ship between the nucleobase content of mRNA coding se-
quences and the nucleobase interaction propensities of their
cognate protein sequences for the entire H. sapiens, M. jan-
naschii and E. coli proteomes. We have performed this analy-
sis by comparing window-averaged sequence proles of the
two cognate biopolymers as elaborated before (23,36,37),
whereby one obtains a Pearson R for each cognate pair, i.e. a
distribution of Pearson Rs over the whole proteome. Note
that negative correlations here denote a positive relation-
ship between nucleobase content and interaction propen-
sity, which comes from the fact that propensity is dened
Nucleic Acids Research, 2014, Vol. 42, No. 21 12991
Figure 5. Distributions of Pearson correlation coefcients between
window-averaged PYR content proles of mRNAs and their cognate pro-
teins’ proles of interaction propensity for different RNA nucleobases
(URA, CYT, ADE and GUA) assessed using computationally derived
sidechain analog scales. Inset: median values of distributions of Pearson
correlation coefcients between window-averaged PYR content proles of
mRNAs and their cognate proteins’ proles of interaction propensity for
different RNA nucleobases, calculated over the entire human proteome
(sidechain analogs, ‘sca,’ and complete amino acids, ‘aa’).
Figure 6. (A) Distributions of Pearson correlation coefcients between
window-averaged PUR content proles of mRNAs and their cognate pro-
teins’ proles of relative interaction propensity for different combinations
of RNA nucleobases, calculated over the entire human proteome. The
propensities were obtained from the energetic analysis of different sys-
tems from MD simulations. (B) Typical proles of mRNA PUR content
and protein sequence interaction propensity calculated using the compu-
tationally derived sidechain analog GUA–CYT and ADE–CYT relative
interaction propensity scales. The two examples were chosen because their
Pearson R coefcients correspond to the medians over the respective dis-
tributions over the complete human proteome.
using an energy scale (the lower the energy, the higher the
propensity). Our results show that PYR density proles
of mRNAs quantitatively match the energy-based URA-,
CYT- and ADE-interaction propensity proles of their cog-
nate protein sequences across the entire human proteome,
with no signicant correlation being observed for GUA
scales, as demonstrated for H. sapiens in Figure 5. More
specically, the median correlation coefcients for URA,
CYT and ADE sidechain-based scales are −0.68, −0.52
and −0.70, respectively, while for the GUA scale this value
drops to −0.11 (Figure 5, inset). For M. jannaschii, the me-
dian correlation coefcients of mRNA–protein pairs are as
high as those observed for the human proteome or higher,
while for E. coli the values are slightly lower, but still statis-
tically signicant (Supplementary Figures S2A and S3A).
Similar values are also seen for amino acid-based scales as
well. In other words, PYR-rich regions in mRNAs tend to
code for regions in their cognate proteins that exhibit more
favorable interaction energies with URA, CYT and ADE
relative to water as compared to the PUR-rich regions. In-
terestingly, the correlation coefcients obtained for mRNA
density proles of individual bases are signicantly weaker
(Supplementary Table S5), as was already observed before
(23,36).
As mentioned above, the GUA scale does not yield any
signicant correlation with PYR, i.e. PUR content on the
side of mRNA. However, analysis of relative scales reveals
that mRNA PUR density proles closely and quantitatively
match their cognate protein proles capturing the relative
preference of residues to interact with GUA relative to all
other nucleobases. For example, protein sequence proles of
relative GUA–CYT binding propensities match their cog-
nate mRNA PUR density proles with the median Pear-
son R =−0.69 (P-value = 1 × 10
−3
) over the entire hu-
man proteome (Table 1). What this means is that one half
of all mRNA-cognate protein pairs in the human proteome
display prole matching that is equal or better than the
median representative, Pleckstrin (P08567) shown in Fig-
ure 6B. Interestingly, though, much weaker correlation is
seen if instead of PUR content, which is up to a constant
equivalent to PUR–PYR content, one here analyzes GUA–
CYT content along mRNA (median R =−0.48). Similar
results are also obtained for the relative GUA–URA and
GUA–ADE scales (P-values 3 × 10
−4
and 4 × 10
−4
,re-
spectively) (Table 1). On the other hand, the ADE–CYT
scale results in a similar level of matching, but now when
it comes to mRNA PYR-density mRNA. In Figure 6B, we
illustrate this in the case of the median representative pro-
tein Pleckstrin (P08567) and its mRNA. Interestingly, the
ADE–URA scale exhibits no signicant correlation what-
soever, while the CYT–URA scale exhibits a signicant level
of matching with mRNA PUR density proles (P-value = 4
× 10
−4
)(Table1). Again, we observe the same trend as with
non-relative interaction propensity scales (Supplementary
Table S5) that the correlation coefcients for mRNA density
proles of individual base are weaker than the mRNA PUR
density proles (Supplementary Table S6). The same trends
seen for human proteome extend to other organisms as well
(Supplementary Figures S2B and C and S3B and C). As the
main reason for the matching detected in our analysis is the
genetic code, which is the same for all the organisms studied,
12992 Nucleic Acids Research, 2014, Vol. 42, No. 21
Table 1. Median values of distributions of Pearson correlation coefcients
between window-averaged PUR content proles of mRNA molecules and
their cognate proteins’ proles of relative interaction propensities for nucle-
obases calculated over the entire human proteome. The interaction propen-
sities were obtained from the energetic analysis of both sidechain (sca) and
amino acid (aa) containing systems.
it is not surprising that one obtains similar levels of match-
ing no matter in which organism one looks. The differences,
on the other hand, can be attributed to the exact mRNA
and protein composition in individual proteomes. Finally,
qualitatively identical results are obtained for systems with
zwitterionic amino acids instead of sidechain analogs (Ta-
ble 1).
DISCUSSION
In the present study, we have for the rst time system-
atically analyzed the behavior of amino acids and their
sidechain analogs in high-concentration aqueous solutions
of naturally occurring RNA nucleobases. Our results show
that amino acids and their sidechain analogs display highly
differentiated interaction propensities for different nucle-
obases depending on the ring architecture and, even more
importantly, ring substituents. It is our hope that these
scales will provide: (i) a rigorous, quantitative, physic-
ochemical foundation for rationalizing the specicity in
RNA–protein interactions in different contexts, and (ii) a
powerful tool for sculpting and modifying such specicity
for biomedical and bioengineering purposes.
As discussed above, our simulations were carried out at
nucleobase concentration levels exceeding the experimen-
tally known solubility limits. However, a strong agreement
with extant experimental data, a general absence of stable
aggregates and favorable results of thermodynamic stabil-
ity analysis all suggest that the simulated model systems do
capture the essential features of amino acid/nucleobase in-
teractions even at high concentrations. Moreover, even if
the simulated systems would over time move in the direc-
tion of precipitation, the partitioning of amino acids and
their sidechain analogs between water- and base-rich frac-
tions occurs much more quickly, allowing one to accurately
capture interaction propensities with relatively short simu-
lations. Finally, the number of water molecules in our simu-
lations was such that for each base there was enough water
to account for one full hydration shell around it. As such,
our simulated systems in all likelihood better approximate
the situation at typical hydrated RNA–protein binding in-
terfaces than would more dilute solutions.
Our analysis of energy-based interaction preferences was
based on a critical assumption that the potential energies
between amino acids or sidechain analogs and nucleobases
or water accurately capture the free energies of these inter-
actions. In other words, we assumed that it is primarily the
enthalpic part of free energy that is responsible for the rela-
tive difference in amino acid–nucleobase interactions, with
the entropic component being proportional to it. A similar
assumption was made by Stumpe and Grubm¨uller in their
study of amino acid interactions with urea and their inu-
ence on protein folding (50). In a related study (A. de Ruiter
and B. Zagrovic, in preparation), we have used MD simu-
lations and umbrella sampling to evaluate absolute binding
free energies between nucleobases and amino acid sidechain
analogs in water. By comparing the sidechain analog inter-
action propensities derived in this work to the absolute free
energies, which fully account for both enthalpy and entropy,
we observe a high level of correlation with Spearman cor-
relation coefcients of = 0.88 (URA), = 0.85 (CYT),
= 0.62 (GUA) and = 0.88 (ADE) (A. de Ruiter and B.
Zagrovic, in preparation). Although the behavior of amino
acids or their sidechain analogs in crowded solutions of nu-
cleobases need not necessarily match that with only one nu-
cleobase present, such a high level of correlation does sup-
port the existence of a strong relationship between free en-
ergies and their enthalpic components in the former case.
Finally, the fact that the obtained scales agree well both
with experimental results (8,10)aswellaswiththestruc-
tural analysis of intermolecular contacts (23) lends further
support to this claim.
Here, we have used the derived interaction propensity
scales to investigate how the relationships observed at the
level of amino acids and nucleobases translate to the level
of complete coding sequences of mRNAs and their cog-
nate proteins. Our central aim was to further examine the
recently proposed complementarity hypothesis and its re-
lationship with the structure and the origin of the genetic
code (23,36,37). In accordance with our results obtained us-
ing knowledge-based potentials (23), we have observed that
the higher the pyrimidine content of mRNAs, the higher the
propensity of their cognate proteins’ propensities to interact
with URA, CYT and ADE, but importantly not with GUA
(Figure 5). Actually, the fact that GUA- and ADE-based
scales exhibit opposite behavior when it comes to their rela-
tionship with PYR-based scales (Supplementary Figure S1)
suggests that the key element in determining the specicity
of interaction between amino acids and nucleobases is not
the nature of the heterocyclic ring, but rather that of ring
substituents. In particular, our present results suggest that
this difference stems primarily from the behavior of charged
amino acids, which is reasonable considering the fact that
the two purine bases are largely isosteric and differ primar-
il
y when it comes to ring substituents and their charge distri-
bution. This is also supported by a related analysis in which
we showed that unsubstituted purine and pyrimidine rings
result in highly correlated scales when it comes to their in-
teractions with amino acids (M. Hajnic, J. I. Osorio and B.
Zagrovic, unpublished data).
In support of this reasoning, we have observed a strong
relationship between the average PUR content of mRNA
sequences and the relative preference of their cognate pro-
Nucleic Acids Research, 2014, Vol. 42, No. 21 12993
tein sequences to interact with GUA relative to other bases
(Figure 6A, Table 1). In accordance with the stereochemi-
cal hypothesis and our generalizations of it (23,36,37), GUA
exhibits strong preference for interaction with PUR-coded
amino acids relative to all other bases. Importantly, this ef-
fect appears to be primarily due to the behavior of charged
amino acids Glu, Asp, Arg and Lys. These results are fully
consistent with our previous knowledge-based analysis of
residue preferences for different nucleobases: there, GUA
interaction preferences on the side of amino acids or pro-
teins correlated extremely well with purine density at the
side of their cognate codons or mRNA, while ADE inter-
action preferences were much closer to those of pyrimi-
dine bases (23), as also seen here. While the full biological
meaning of this result still remains to be elucidated, we are
condent that it represents an important principle concern-
ing the mRNA–protein relationship in general. Overall, our
results give support to the generalized stereochemical hy-
pothesis of the origin of the genetic code, in which GUA
plays the role of an archetypal purine (i.e. purine richness
on the side of codons or mRNAs parallels high levels of
relative guanine interaction propensity on the side of cog-
nate amino acids or proteins), while the opposite is seen for
CYT, URA and ADE (i.e. pyrimidine richness of mRNA
mirrors high relative interaction propensity for these nucle-
obases when it comes to cognate amino acids or proteins)
(23,36,37). In this context, the presence of adenines neg-
atively affects complementarity levels, as discussed before
(37). Intriguingly, despite the fact that the propensity scales
were derived for specic bases, the highest levels of match-
ing are observed if one considers PUR (i.e. PYR) density on
the side of mRNA and not that of individual bases (Supple-
mentary Tables S5 and S6). This effect, which was already
observed before (23,36,37), still requires a full explanation.
However, we believe it suggests that the core of the genetic
code was originally dened at the level of a coarse-grained
nucleobase alphabet in which differences between specic
purines, i.e. pyrimidines, were not critical.
Our results with specically Glu and Asp and their in-
teractions with GUA show that the most basic version of
the stereochemical hypothesis, the one in which the genetic
code evolved on the basis of direct interactions between
amino acids and their codons, can at best hold for a subset
of amino acids only. In particular, Glu and Asp do not ap-
pear to favorably interact with any nucleobases in the aque-
ous environment, although in our knowledge-based analy-
sis (23) they do show a strong preference for interacting with
purine bases and especially GUA. The solution to this seem-
ing paradox is provided by our present results: although the
negatively charged Glu and Asp do not show direct prefer-
ence for binding to GUA, they appear to be the least un-
favorable interacting partners for GUA when compared to
all other nucleobases. It is very possible that the preferences
one sees in the knowledge-based analysis of known protein–
RNA complexes are in part a consequence of such a nega-
tive selection. One way in which this result could be made
consistent with the stereochemical hypothesis and especially
its generalized version, even for Glu and Asp, is if one as-
sumes that the genetic code evolved in a context in which the
role of the translation apparatus was not to link individual
amino acids according to the mRNA template, but rather
short peptides. In this scenario, other amino acids would
provide the source of favorable binding free energy to the
mRNA template, while Glu and Asp would contribute to
the specicity of binding only.
Overall, our study shows that by using MD simu-
lations and extensive sampling we can distinguish be-
tween amino acid or sidechain analog interaction propen-
sities for different nucleobases. Remarkably, the interac-
tion propensities derived from simulations of individual
monomers yield close correspondences at the level of com-
plete proteins and mRNA molecules, giving support to
the mRNA/protein complementarity hypothesis as recently
proposed. Although our present results are highly sugges-
tive, it should be nonetheless emphasized that the only rig-
orous test of the complementarity hypothesis can come
from direct experimental work. We hope that our present
results will serve not only as a source of motivation in this
direction, but also as a foundation for different computa-
tional and experimental studies of RNA/protein interac-
tions in general (51–53).
SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.
ACKNOWLEDGMENTS
We thank members of the Laboratory of Computational
Bioph
ysics at MFPL for useful advice and critical reading of
the manuscript. The authors gratefully acknowledge A. A.
Polyansky for help with randomization tests and C. Oost-
enbrink for helpful advice on Kirkwood-Buff integrals.
FUNDING
Austrian Science Fund FWF [START Y 514-B11 to B.Z.,
in part]; European Research Council [ERC Starting Inde-
pendent 279408 to B.Z.]. Funding for open access charge:
Austrian Science Fund FWF [START Y 514-B11 to B.Z.,
in part); European Research Council [ERC Starting Inde-
pendent 279408 to B.Z.].
Conict of interest statement. None declared.
REFERENCES
1. Moore,M.J. and Proudfoot,N.J. (2009) Pre-mRNA processing reaches
back to transcription and ahead to translation. Cell, 136, 688–700.
2. Licatalosi,D.D. and Darnell,R.B. (2010) RNA processing and its
regulation: global insights into biological networks. Nat. Rev. Genet.,
11, 75–87.
3. M ¨uller-Mcnicoll,M. and Neugebauer,K.M. (2013) How cells get the
message: dynamic assembly and function of mRNA-protein
complexes. Nat. Rev. Genet., 14, 275–287.
4. Mercer,T.R. and Mattick,J.S. (2013) Structure and function of long
noncoding RNAs in epigenetic regulation. Nat. Struct. Mol. Biol., 20,
300–307.
5. Baltz,A.G., Munschauer,M., Schwanhaeusser,B., Vasile,A.,
Murakawa,Y., Schueler,M., Youngs,N., Penfold-Brown,D., Drew,K.,
Milek,M. et al. (2012) The mRNA-bound proteome and its global
occupancy prole on protein-coding transcripts. Mol. Cell, 46,
674–690.
6. Castello,A., Fischer,B., Eichelbaum,K., Horos,R., Beckmann,B.M.,
Strein,C., Davey,N.E., Humphreys,D.T., Preiss,T., Steinmetz,L.M.
et al. (2012) Insights into RNA biology from an atlas of mammalian
mRNA-binding proteins. Cell, 149, 1393–1406.
12994 Nucleic Acids Research, 2014, Vol. 42, No. 21
7. Woese,C. (1965) On evolution of genetic code. Proc. Natl. Acad. Sci.
U.S.A., 54, 1546–1552.
8. Woese,C.R. (1973) Evolution of the genetic code.
Naturwissenschaften, 60, 447–459.
9. Akinrimisi,E. and Tso,P. (1964) Interactions of purine with proteins
+ amino acids. Biochemistry (Mosc.), 3, 619–626.
10. Thomas,P. and Podder,S. (1978) Specicity in protein-nucleic acid
interaction––solubility study on amino acid nucleoside interaction.
FEBS Lett., 96, 90–94.
11. Lacey,J.C. and Pruitt,K.M. (1969) Origin of the genetic code. Nature,
223, 799–804.
12. Rifkind,J.M. and Eichhorn,G.L. (1970) Specicity for the interaction
of nucleotides with basic polypeptides. Biochemistry (Mosc.), 9,
1753–1761.
13. Wagner,K.G. and Arfmann,H.-A. (1974) Properties of basic
amino-acid residues. Eur. J. Biochem., 46, 27–34.
14. Luscombe,N.M., Laskowski,R.A. and Thornton,J.M. (2001) Amino
acid-base interactions: a three-dimensional analysis of protein-DNA
interactions at an atomic level. Nucleic Acids Res., 29, 2860–2874.
15. Treger,M. and Westhof,E. (2001) Statistical analysis of atomic
contacts at RNA-protein interfaces. J. Mol. Recognit., 14, 199–214.
16. Jeong,E., Kim,H., Lee,S.W. and Han,K. (2003) Discovering the
interaction propensities of amino acids and nucleotides from
protein-RNA complexes. Mol. Cells, 16, 161–167.
17. Hoffman,M.M., Khrapov,M.A., Cox,J.C., Yao,J.C., Tong,L.N. and
Ellington,A.D. (2004) AANT: the amino acid-nucleotide interaction
database. Nucleic Acids Res., 32, D174–D181.
18. Kondo,J. and Westhof,E. (2011) Classication of pseudo pairs
between nucleotide bases and amino acids by analysis of
nucleotide-protein complexes. Nucleic Acids Res., 39, 8628–8637.
19. Donald,J.E., Chen,W.W. and Shakhnovich,E.I. (2007) Energetics of
protein–DNA interactions. Nucleic Acids Res., 35, 1039–1047.
20. Jonikas,M.A., Radmer,R.J., Laederach,A., Das,R., Pearlman,S.,
Herschlag,D. and Altman,R.B. (2009) Coarse-grained modeling of
large RNA molecules with knowledge-based potentials and structural
lters. RNA, 15, 189–199.
21. P
´
erez-Cano,L., Solernou,A., Pons,C. and Fern
´
andez-Recio,J. (2010)
Structural prediction of protein-RNA interaction by computational
docking with propensity-based statistical potentials. Pac. Symp.
Biocomput., 293–301.
22. Tuszynska,I. and Bujnicki,J.M. (2011) DARS-RNP and
QUASI-RNP: new statistical potentials for protein-RNA docking.
BMC Bioinformatics, 12, 348-363.
23. Polyansky,A.A. and Zagrovic,B. (2013) Evidence of direct
complementary interactions between messenger RNAs and their
cognate proteins.
Nucleic Acids Res., 41,
8434–8443.
24. Mathew,D.C. and Luthey-Schulten,Z. (2008) On the physical basis of
the amino acid polar requirement. J. Mol. Evol., 66, 519–528.
25. Biot,C., Buisine,E., Kwasigroch,J.M., Wintjens,R. and Rooman,M.
(2002) Probing the energetic and structural role of amino
acid/nucleobase cation-pi interactions in protein-ligand complexes. J.
Biol. Chem., 277, 40816–40822.
26. Rutledge,L.R., Campbell-Verduyn,L.S., Hunter,K.C. and
Wetmore,S.D. (2006) Characterization of nucleobase-amino acid
stacking interactions utilized by a DNA repair enzyme. J. Phys.
Chem. B, 110, 19652–19663.
27. Rutledge,L.R., Durst,H.F. and Wetmore,S.D. (2008) Computational
comparison of the stacking interactions between the aromatic amino
acids and the natural or (cationic) methylated nucleobases. Phys.
Chem. Chem. Phys., 10, 2801–2812.
28. Ebrahimi,A., Habibi-Khorassani,M., Gholipour,A.R. and
Masoodi,H.R. (2009) Interaction between uracil nucleobase and
phenylalanine amino acid: the role of sodium cation in stacking.
Theor. Chem. Acc., 124, 115–122.
29. Nirenberg,M.W., Jones,O.W., Leder,P., Clark,B.F.C., Sly,W.S. and
Pestka,S. (1963) On the coding of genetic information. Cold Spring
Harb. Symp. Quant. Biol., 28, 549–557.
30. Giulio,M.D. (2005) The origin of the genetic code: theories and their
relationships, a review. Biosystems, 80, 175–184.
31. Koonin,E.V. and Novozhilov,A.S. (2009) Origin and evolution of the
genetic code: the universal enigma. IUBMB Life, 61, 99–111.
32. Woese,C. (1968) Fundamental nature of genetic code––prebiotic
interactions between polynucleotides and polyamino acids or their
derivatives. Proc. Natl. Acad. Sci. U.S.A., 59, 110–117.
33. Woese,C. (1969) Models for evolution of codon assignments. J. Mol.
Biol., 43, 235–240.
34. Yarus,M. (1998) Amino acids as RNA ligands: a
direct-RNA-template theory for the code’s origin. J. Mol. Evol., 47,
109–117.
35. Yarus,M., Widmann,J.J. and Knight,R. (2009) RNA-amino acid
binding: a stereochemical era for the genetic code. J. Mol. Evol., 69,
406–429.
36. Hlevnjak,M., Polyansky,A.A. and Zagrovic,B. (2012) Sequence
signatures of direct complementarity between mRNAs and cognate
proteins on multiple levels. Nucleic Acids Res., 40, 8874–8882.
37. Polyansky,A.A., Hlevnjak,M. and Zagrovic,B. (2013) Proteome-wide
analysis reveals clues of complementary interactions between
mRNAs and their cognate proteins as the physicochemical
foundation of the genetic code. RNA Biol., 10, 1248–1254.
38. Kyrpides,N.C. and Ouzounis,C.A. (1993) Mechanisms of specicity
in mRNA degradation: autoregulation and cognate interactions. J.
Theor. Biol., 163,
373–392.
39. Ouzounis,C.A. and Kyrpides,N.C. (1994) Reverse interpretation: a
hypothetical selection mechanism for adaptive mutagenesis based on
autoregulated mRNA stability. J. Theor. Biol., 167, 373–379.
40. Oostenbrink,C., Villa,A., Mark,A.E. and Gunsteren,W.F. (2004) A
biomolecular force eld based on the free enthalpy of hydration and
solvation: the GROMOS force-eld parameter sets 53A5 and 53A6.
J. Comput. Chem., 25, 1656–1676.
41. Hess,B., Kutzner,C., van der Spoel,D. and Lindahl,E. (2008)
GROMACS 4: algorithms for highly efcient, load-balanced, and
scalable molecular simulation. J. Chem. Theory Comput., 4, 435–447.
42. Berendsen,H., Grigera,J. and Straatsma,T. (1987) The missing term in
effective pair potentials. J. Phys. Chem., 91, 6269–6271.
43. Bussi,G., Donadio,D. and Parrinello,M. (2007) Canonical sampling
through velocity rescaling. J. Chem. Phys., 126, 014101.
44. Parrinello,M. and Rahman,A. (1981) Polymorphic transitions in
single-crystals––a new molecular-dynamics method. J. Appl. Phys.,
52, 7182–7190.
45. Oostenbrink,C. and van Gunsteren,W.F. (2005) Methane clustering in
explicit water: effect of urea on hydrophobic interactions. Phys.
Chem. Chem. Phys., 7, 53–58.
46. Arieh,B.-N. (1992) Statistical thermodynamics for chemists and
biochemists. Springer Science+Business Media, NY.
47. Gazzillo,D. (1995) Stability of uids with more than two
components. Mol. Phys., 84, 303–323.
48. UniProt Consortium (2013) Update on activities at the Universal
Protein Resource (UniProt) in 2013. Nucleic Acids Res., 41, D43–D47.
49. Yalkowsky,S.H. and Dannenfelser,R.M. (1992) Aquasol database of
aqueous solubility. College of Pharmacy, University of Arizona,
Tucson, AZ.
50. Stumpe,M.C. and Grubm ¨uller,H. (2007) Interaction of urea with
amino acids: implications for urea-induced protein denaturation. J.
Am. Chem. Soc., 129, 16126–16131.
51.
¨
Ank
¨
o,M.-L. and Neugebauer,K.M. (2012) RNA–protein interactions
in vivo: global gets specic. Trends Biochem. Sci., 37, 255–262.
52. Puton,T., Kozlowski,L., Tuszynska,I., Rother,K. and Bujnicki,J.M.
(2012) Computational methods for prediction of protein–RNA
interactions. J. Struct. Biol., 179, 261–268.
53. Zanzoni,A., Marchese,D., Agostini,F., Bolognesi,B., Cirillo,D.,
Botta-Orla,M., Livi,C.M., Rodriguez-Mulero,S. and Tartaglia,G.G.
(2013) Principles of self-organization in biological pathways: a
hypothesis on the autogenous association of alpha-synuclein. Nucleic
Acids Res., 41,
9987–9998.