ArticlePDF Available

Computational analysis of amino acids and their sidechain analogs in crowded solutions of RNA nucleobases with implications for the mRNA-protein complementarity hypothesis

October 2014
Nucleic Acids Research 42(21)

October 2014
42(21)

Source
PubMed

License
CC BY 4.0

Authors:

Many critical processes in the cell involve direct binding between RNAs and proteins, making it imperative to fully understand the physicochemical principles behind such interactions at the atomistic level. Here, we use molecular dynamics simulations and 15 μs of sampling to study the behavior of amino acids and amino acid sidechain analogs in high-concentration aqueous solutions of standard RNA nucleobases. Structural and energetic analysis of simulated systems allows us to derive interaction propensity scales for different amino acid/nucleobase combinations. The derived scales closely match and greatly extend the available experimental data, providing a comprehensive foundation for studying RNA-protein interactions in different contexts. By using these scales, we demonstrate a statistically significant connection between nucleobase composition of human mRNA coding sequences and nucleobase interaction propensities of their cognate protein sequences. For example, pyrimidine density profiles of mRNAs match uracil-propensity profiles of their cognate proteins with a median Pearson correlation coefficient of R = -0.70. Our results provide support for the recently proposed hypotheses that mRNAs and their cognate proteins may be physicochemically complementary to each other and bind, especially if unstructured, with the complementarity level being negatively influenced by mRNA adenine content. Finally, we utilize the derived scales to refine the complementarity hypothesis and closely examine its physicochemical underpinnings.

(A) A typical snapshot from the simulation with a single amino acid sidechain analog (Leu) in CYT/water mixture. (B) CYT/CYT, CYT/water and water/water radial distribution functions in Leu simulations (top) with the corresponding Kirkwood–Buff integrals (bottom). The final value for the integrals was taken as the average over the approximately constant window denoted by the arrow.

…

(A) Correlation between the experimentally derived polar requirement (PRexperiment) scale (8) and the energy-based scale of sidechain analog interaction propensities for URA (in kJ/mol) obtained by simulation. Inset: Pearson correlation coefficients R between all sidechain analog (second column) and amino acid (third column) propensity scales and the PR scale. (B) Rank-order correlation between experimentally measured amino acid–guanosine association constants (10), and the computationally derived sidechain analog interaction energy scale for GUA (in kJ/mol). Inset: correlation between binding free energies (in kJ/mol) at the standard reference concentration of 1 M, as derived from association constants, and the the computationally obtained sidechain analog interaction energy scale for GUA (in kJ/mol).

…

Water/sidechain-analog and nucleobase/sidechain-analog radial distribution functions g(r) for the most favorable (left column) and the least favorable (right column) sidechain analog interacting partners for the four RNA nucleobases, as determined by energy-based interaction propensity scales for: (A) URA, (B) CYT, (C) ADE and (D) GUA.

…

(A) A direct comparison between energy-based sidechain analog interaction propensity scales for the four nucleobases with Pearson correlation coefficients given in the graphs. In each graph, the four charged amino acids are labeled in red. (B) Correlation between relative energy-based sidechain analog GUA–CYT and ADE–CYT interaction propensity scales (in kJ/mol) derived from simulations of different systems.

…

Distributions of Pearson correlation coefficients between window-averaged PYR content profiles of mRNAs and their cognate proteins’ profiles of interaction propensity for different RNA nucleobases (URA, CYT, ADE and GUA) assessed using computationally derived sidechain analog scales. Inset: median values of distributions of Pearson correlation coefficients between window-averaged PYR content profiles of mRNAs and their cognate proteins’ profiles of interaction propensity for different RNA nucleobases, calculated over the entire human proteome (sidechain analogs, ‘sca,’ and complete amino acids, ‘aa’).

…

Figures - available via license: Creative Commons Attribution 4.0 International

Content may be subject to copyright.

Available via license: CC BY 4.0

Content may be subject to copyright.

12984–12994 Nucleic Acids Research, 2014, Vol. 42, No. 21 Published online 31 October 2014

doi: 10.1093/nar/gku1035

Computational analysis of amino acids and their

sidechain analogs in crowded solutions of RNA

nucleobases with implications for the mRNA–protein

complementarity hypothesis

Matea Hajnic, Juan Iregui Osorio and Bojan Zagrovic

Department of Structural and Computational Biology, Max F. Perutz Laboratories, University of Vienna, Vienna 1030,

Austria

Received July 17, 2014; Revised September 29, 2014; Accepted October 11, 2014

ABSTRACT

Many critical processes in the cell involve direct

binding between RNAs and proteins, making it im-

perative to fully understand the physicochemical

principles behind such interactions at the atom-

istic level. Here, we use molecular dynamics sim-

ulations and 15 ␮s of sampling to study the be-

havior of amino acids and amino acid sidechain

analogs in high-concentration aqueous solutions

of standard RNA nucleobases. Structural and ener-

getic analysis of simulated systems allows us to de-

rive interaction propensity scales for different amino

acid/nucleobase combinations. The derived scales

closely match and greatly extend the available ex-

perimental data, providing a comprehensive foun-

dation for studying RNA–protein interactions in dif-

ferent contexts. By using these scales, we demon-

strate a statistically signiﬁcant connection between

nucleobase composition of human mRNA coding

sequences and nucleobase interaction propensities

of their cognate protein sequences. For example,

pyrimidine density proﬁles of mRNAs match uracil-

propensity proﬁles of their cognate proteins with a

median Pearson correlation coefﬁcient of

=−0.70.

Ourresultsprovidesupportforthe recentlyproposed

hypotheses that mRNAs and their cognate proteins

may be physicochemically complementary to each

other and bind, especially if unstructured, with the

complementaritylevelbeingnegativelyinﬂuencedby

mRNA adenine content. Finally, we utilize the derived

scales to reﬁne the complementarity hypothesis and

closely examine its physicochemical underpinnings.

INTRODUCTION

From transcriptional and translational regulation to RNA

processing and decay to protein localization, many key pro-

cesses in the cell depend directly on RNA–protein interac-

tions (1–4). What is more, the list of systems that involve

RNA–protein interactions keeps dramatically expanding.

Recently, for example, high-throughput efforts aimed at

capturing the mRNA–protein interactome identied a large

number of novel RNA-binding proteins (5,6).Outofato-

tal of approximately 800 mRNA-binding proteins detected

in these studies using covalent UV-crosslinking methods,

about 25% were found not to contain any known RNA-

binding domains, while an even greater number lacked clear

functional characterization. Despite the challenges ahead,

one may expect that integrative efforts involving biochemi-

cal, structural and computational techniques will soon cat-

alog most if not all of biologically relevant RNA–protein

interactions. On the other hand, our understanding of the

basic physicochemical principles behind such interactions

still remains incomplete. Most importantly, only a few ex-

perimental studies have been performed in order to directly

explore interactions between individual nucleobases and

amino acids in different environments (7–10). While global

and local structural contexts do play important roles in

dening the properties of RNA–protein binding interfaces,

it is reasonable to expect that binding specicity in general

also critically depends on the preferences of individual nu-

cleobases and amino acids for each other.

In this reductionist framework, the properties of the bind-

ing sites are at least in part a consequence of binding prefer-

ences that are intrinsic to individual nucleobases and amino

acids. Motivated by this, Akinrimisi et al. (9) and Thomas

et al. (10) have measured afnities of several naturally oc-

curring amino acids for a set of nitrogenous bases and nu-

cleosides using spectroscopic methods, but those experi-

ments were never performed systematically for all possible

To whom correspondence should be addressed. Tel: +43 1 4277 52271; Fax: +43 1 4277 9522; Email: bojan.zagrovic@univie.ac.at

Present address: Juan Osorio Iregui, Institute for Theoretical Physics, ETH Z¨urich 8093, Switzerland.



The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which

permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Nucleic Acids Research, 2014, Vol. 42, No. 21 12985

combinations. Furthermore, Woese et al. have used chro-

matographic measurements to dene a scale of amino acids’

propensity to interact with pyrimidine mimetics pyridines,

which they termed ‘polar requirement’ (PR) (7,8). Finally,

several authors have studied interactions between differ-

ent nucleotides and polyamino acids, focusing typically on

polylysine or polyarginine peptides (11–13). Despite the

clear importance of such experimental studies, however,

we nd it remarkable that they have not been repeated or

extended since the 1960s and 1970s when they were rst

performed. On the other hand, sizable progress has been

made using computational and theoretical approaches (14–

28). Most signicantly, the available structures of nucleic

acid–protein complexes have been statistically analyzed to

explore the more general physicochemical principles be-

hind nucleobase/amino acid interactions (14–18) and, in

particular, derive binding preference scales, also known as

knowledge-based potentials (19–23). Moreover, a computa-

tional equivalent of the PR scale was derived using molec-

ular dynamics (MD) simulations, providing a microscopic

picture behind the interaction propensities exhibited by in-

dividual amino acids (24). Finally, quantum-mechanical

calculations have been used to characterize the interactions

between a select subset of bases and amino acids (25–28).

Overall, all of these studies suggest that the preferences of

individual nucleobases and amino acids for each other in

water may be highly differentiated, but a large-scale analy-

sis of this effect has never been systematically performed.

An important context in which nucleobase/amino acid

interactions may be relevant concerns an important foun-

dational question in molecular biology, that of the origin of

the universal genetic code (29–31). In particular, the stere-

ochemical hypothesis proposes that the code evolved as

a consequence of direct interactions between codons and

amino acids they code for (7,8,32–35). An early formula-

tion of the stereochemical hypothesis was put forth by Carl

Woese et al. based on the above mentioned PR scale, i.e.

the propensity of amino acids to interact with pyrimidine

mimetics (7,8). Recently, we have demonstrated that pyrim-

idine density proles of mRNA coding sequences closely

mirror the PR-weighted proles of their cognate protein se-

quences (36). In other words, pyrimidine-rich mRNA re-

gions tend to code for cognate protein regions that exhibit

high propensity to interact with pyrimidine mimetics and

vice versa. Moreover, we have used knowledge-based poten-

tials derived from experimental structures of RNA–protein

complexes to not only conrm these ndings in the case

of pyrimidines, but also extend them to purines (23,37). By

providing quantitative evidence for an early, more qualita-

tive proposal by Kyrpides and Ouzounis (38,39), these re-

sults allowed us to raise the stereochemical hypothesis to

the level of a general relationship between sequence com-

position of mRNAs and their cognate proteins as well as

to hypothesize that mRNAs could bind their cognate pro-

teins in a complementary fashion, especially if unstruc-

tured (23,36,37

). We argued that such binding interactions

ere an important driving force behind the establishment

of the universal genetic code, but also that they may have

a critical, yet still not fully characterized role in present-

day cell as well (23,36,37). Intriguingly, in our analysis of

knowledge-based preference scales, we found that guanine

and adenine exhibit opposite amino acid binding prefer-

ences, resulting in a curious asymmetry: while guanine-

binding propensity proles on the side of proteins closely

match purine density proles on the side of their cognate

mRNAs, adenine-binding protein sequence proles more

closely mirror pyrimidine density mRNA proles (23,37).

This hints at additional complexities behind the putative

cognate complementarity and suggests that there may have

been at least two major phases in the development of the

genetic code with opposite requirements when it comes to

complementary matching (37).

In order to better understand the underlying physico-

chemical principles behind RNA–protein interactions and

shed more light on the mRNA/protein complementarity

hypothesis, here we systematically explore the behavior of

individual amino acids and amino acid sidechain analogs in

high-concentration aqueous solutions of different RNA nu-

cleobases using classical MD simulations (Figure 1A). For

our simulations, we employ the GROMOS 53A6 force eld

(40), which was parameterized to accurately capture solva-

tion free energies of amino acid sidechain analogs in water

and cyclohexane. These hydrophobicity-related properties,

in turn, are considered to be an important factor in dening

amino acid/nucleobase interaction propensities. Empow-

ered by the spatial and temporal resolution provided by MD

simulations, we present an atomistic view of how individual

amino acids or sidechain analogs interact with RNA nucle-

obases in water and dene interaction preferences between

them using different structural and energetic criteria. Such

interaction propensity scales create a rigorous, reductionist

foundation for the analysis of RNA–protein interactions in

general, which is here further employed for a critical exam-

ination of the mRNA–protein complementarity hypothesis

and its renement.

MATERIALS AND METHODS

The 20 natural amino acids and 18 of their sidechain

analogs (all except for Gly and Pro) were simulated in the

presence of a single type of common RNA nucleobases in

aqueous solution: adenine (ADE), cytosine (CYT), guanine

(GUA) or uracil (URA). In all simulations, a single amino

acid or a sidechain analog (corresponding to an amino acid

residue with a hydrogen atom added to C␤) was centered in

a cubic box of initial size 4 × 4 × 4nm

with nucleobases

and water molecules placed in random orientations around

it so as to achieve the molar fraction of water of 0.86. Con-

sidering the low water solubility of naturally occurring nu-

cleobases, this molar fraction of water was chosen in order

to reach a compromise between maximizing the probabil-

ity of detecting interaction between amino acids (i.e. their

sidechain analogs) and nucleobases on the one hand and

minimizing nucleobase solubility issues on the other. In to-

tal, there were approximately 1250 molecules in each sys-

tem: one amino acid or sidechain analog, 170 nucleobase

molecules and the rest water molecules (see Supplementary

Table S1 for details). All amino acids were simulated in

their zwitterionic form. In the case of charged amino acids

or sidechain analogs, one randomly chosen water molecule

was replaced by a counter ion (Na

or Cl

−

) in order to ob-

tain an electrically neutral system.

12986 Nucleic Acids Research, 2014, Vol. 42, No. 21

Figure 1. (A) A typical snapshot from the simulation with a single amino acid sidechain analog (Leu) in CYT/water mixture. (B) CYT/CYT, CYT/water

and water/water radial distribution functions in Leu simulations (top) with the corresponding Kirkwood–Buff integrals (bottom). The nal value for the

integrals was taken as the average over the approximately constant window denoted by the arrow.

All simulations were carried out using the Gromacs 4.5.1.

simulation package (41), united-atom GROMOS 53A6

force eld (40) and SPC/E water model (42) with a 2 fs

integration step. Parameters for the nucleobases were ob-

tained from those corresponding to full nucleotides in the

GROMOS 53A6 force eld while ensuring charge neutral-

ity. Long-range electrostatic interactions were treated using

Particle Mesh Ewald (PME) summation with a grid spac-

ing of 0.12 nm and an interpolation order of 4. The cut-off

for short-range Coulombic and van der Waals interactions

was set to 0.9 nm. The temperature and pressure in all sim-

ulations were kept at 300 K and 1 bar using V-rescale ther-

mostat (␶

= 0.1 ps) (43) and Parrinello-Rahman barostat

(␶

= 2 ps and compressibility = 4.5 × 10

−5

bar

−1

)(44),

respectively. After minimization using the steepest descent

algorithm in water (10 000–25 000 steps), the systems were

rst equilibrated in the NVT ensemble for 800 ps and then

subjected to 400 ps of equilibration in the NPT ensemble

with the same position restraints placed on the amino acid

or sidechain analog. All production runs, each 100 ns long,

were performed in the NPT ensemble for a total of 15.2 ␮s

of simulated time over all systems.

In order to test if systems with naturally occurring nu-

cleobases are microscopically stable, we have analyzed the

values of the rst derivative of the natural logarithm of ac-

tivity of the nucleobase, ln a

, with respect to natural loga-

rithm of nucleobase molar fraction, ln X

, in the simulated

systems (45–47). The quantity:



∂lna

∂lnX



1 + ρ

+ G

− 2G

)

(1)

where G

, G

and G

denote Kirkwood–Buff integrals

derived from nucleobase/nucleobase, nucleobase/water

and water/water radial distribution functions (RDFs), re-

spectively, must be positive for a system to be microscopi-

cally stable. Here, ␳

stands for the nucleobase density num-

ber. The Kirkwood–Buff integrals were calculated using the

following formula (45–47):

(r) = 4π



2



) − 1]dr



(2)

where g

denotes nucleobase/nucleobase (g

nucleobase/water (g

)orwater/water (g

)RDFs,

respectively, from which corresponding Kirkwood–Buff

integrals (G

, G

) were derived. As anchor points

for RDFs, we used centers of mass of nucleobases and

water molecules. Finally, as representative examples for

each individual nucleobase type, we have performed the

above analysis on simulated systems with Leu residues and

the results are reported in Figure 1B and Supplementary

Table S2.

To quantify the interaction propensity of amino acids or

sidechain analogs for different nucleobases, we have ana-

lyzed their behavior in the simulated mixtures both struc-

turally and energetically. For simplicity, we describe the pro-

cedure for sidechain analogs only, but analogous calcula-

tions were also performed for all amino acid-containing

systems. For structural analysis, RDFs were calculated by

using as anchor points the centers of mass of amino acid

sidechain analogs, nucleobases and water molecules. For

energetic analysis, we have calculated differences between

the total force-eld potential energies corresponding to

sidechain analog-nucleobase (E

X-N

) and sidechain analog-

water interactions (E

X-W

E

= E

X-N

− E

X-W

[kJ/mol]. (3)

Moreover, the obtained differences in potential energy

between sidechain analog-nucleobase-water interactions

(E

) were further subtracted between systems with dif-

ferent nucleobases (N

, N

) in order to obtain relative inter-

action propensities or preferences of each sidechain analog

for a specic nucleobase with respect to other nucleobases

Nucleic Acids Research, 2014, Vol. 42, No. 21 12987

(E

E

= E

− E

[kJ/mol]. (4)

In a related study (M. Hajnic, J. I. Osorio and B. Za-

grovic, unpublished data), we have simulated amino acids in

the presence of only one type of nitrogenous base (unsubsti-

tuted pyrimidines or purines) in water solution as here, but

also in mixed systems with both nitrogenous bases (unsub-

stituted pyrimidines and purines) present at the same time.

The relative amino acids’ interaction propensities derived

from mixed systems where both bases were present at the

same time and those derived from differences between in-

dividual systems correlate with each other with a Pearson

correlation coefcient R = 0.98. This suggests that one can

obtain relative interaction propensities of amino acids for

different nucleobases from individual interaction propen-

sities derived from systems with only one nucleobase type

present.

To be able to compare systems with slightly different mo-

lar compositions, the calculated potential energies between

sidechain analog and water molecules were rescaled before

obtaining the interaction propensity scale in order to have

all systems correspond to exactly 0.86 molar fraction water.

When rescaling, we implicitly assumed that the few addi-

tional water molecules behave on average in the same way

as the rest of the water molecules in the system and con-

tribute to the overall sidechain analog-water potential en-

ergy proportionally to their number. Analogous structural

and energetic analysis was performed for systems contain-

ing amino acids with interaction energies evaluated over all

amino acid atoms. Amino acid and sidechain analog inter-

action propensity scales are given in Supplementary Table

S7 in units of kJ/mol. Note, however, that the exact ener-

getic values given in our scales depend strongly on the par-

ticular features of simulated systems (such as molar fraction

of water or nucleobases), and as such should primarily be

considered and analyzed in a relative sense.

The obtained scales were used as described in Hlevnjak

et al. (36) in order to assess the correlation between pro-

tein interaction propensities for different nucleobases and

the nucleobase content of their cognate mRNAs over the

complete Homo sapiens, Escherichia coli and Methanocaldo-

coccus jannaschii proteomes. In the case of sidechain analog

interaction propensity scales, glycines and prolines were ig-

nored on the protein side together with their codons on the

side of mRNA. The sequence datasets were extracted from

the UniProtKB database (April 2013 release) as described

previously (36,48). Window-averaged proles of individual

mRNAs and proteins were calculated in the same way as

reported previously (36), where each position in the prole

corresponds to the average value of the property in question

over a window (with the size of 21 residues for proteins and

63 bases for mRNAs) centered at that position. As shown

before (36), for window sizes anywhere between 10 and 40

residues, the results depend only marginally on window size

(variation < 2%).

To test the signicance of median values of prole-

matching Pearson R distributions calculated for complete

proteomes, we generated 10

random scales and compared

the medians of their prole-matching Pearson R distribu-

tions to the tested ones for each individual proteome. Ran-

dom scales were generated by drawing numbers from a uni-

form distribution between 0 and 1. Finally, the P-values

were calculated as the fraction of random scales whose me-

dians of the prole-matching Pearson R distributions were

greater than or equal to the tested ones in absolute value.

RESULTS

Validation and analysis of binding propensity scales

Natural nucleobases have low water solubility (49), rang-

ing from 1.04 g/lforADEto8g/l for CYT, corresponding

to base molar fractions of X

ADE

= 1 × 10

−4

and X

CYT

1 × 10

−3

, respectively. In order to: (i) realistically model

nucleobase density at typical RNA–protein interfaces and

(ii) reach a critical number of nucleobases that would al-

low us to observe interactions with amino acids or their

sidechain analogs on a reasonable timescale, we have sim-

ulated systems whose nucleobase concentrations were sig-

nicantly higher than their macroscopic solubility levels

(e.g. X

= 0.14). Practically speaking, we have simulated

the behavior of amino acids and their sidechain analogs

in hydrated, dynamic agglomerates of nucleobases as illus-

trated in Figure 1A for the Leu sidechain in CYT solution.

While such systems, in fact, better approximate the effec-

tive concentration of nucleobases at typical RNA–protein

interfaces, it was critical to rst assess their thermodynamic

stability at the microscopic level.

The stability of a binary, high-concentration mixture of

water and nucleobases can be studied by analyzing the rst

derivative of the natural logarithm of activity of the nu-

cleobase with respect to the natural logarithm of the nu-

cleobase mole fraction, ∂ln a

/∂ln X

(45–47). This value,

which should be positive for systems to be microscopically

stable, was calculated from Equation (1)whereG

, G

and G

denote Kirkwood–Buff integrals derived from

nucleobase/nucleobase, nucleobase/water and water/water

RDFs, respectively (45–47). A typical set of such RDFs

encountered in our simulations is given in Figure 1B (top

panel) for the Leu sidechain in CYT solution. Importantly,

due to the poor convergence of Kirkwood–Buff integrals,

as an estimate of G

, in all cases we took the average of

over distances starting from 1.5 nm (Figure 1B, lower

panel). Following the above procedure, we could indeed

show that the above requirement (i.e. ∂ln a

/∂ln X

> 0)

is fullled for all four nucleobase types (Supplementary Ta-

ble S2). This suggests that although our systems would over

long timescales likely result in a creation of macroscopic

aggregates, they are thermodynamically stable on the size-

and time scales examined here and could be used as model

systems to study the behavior of amino acids and their

sidechain analogs in aqueous solutions of nucleobases. The

fact that despite high nucleobase concentrations we did not

observe formation of any static precipitates further corrob-

orates this claim.

We have used our simulations to calculate differences be-

tween the total force-eld potential energies correspond-

ing to amino acid–nucleobase and amino acid–water in-

teractions (and the same for sidechains). How do these

energy-based interaction propensity scales compare with

experimental results? The experimental PR scale (8), de-

rived by analyzing the chromatographic mobility of amino

12988 Nucleic Acids Research, 2014, Vol. 42, No. 21

acids in water mixtures of substituted pyridines such as

dimethylpyridine (DMP), is one of the few examples where

interactions between amino acids and nitrogenous bases

have been systematically explored in experiment. Speci-

cally, PR of a given amino acid was dened as the slope of a

linear t between the logarithm of its retention coefcient R

and the logarithm of mole fraction of water in the pyridine–

water solvent. In a related study (M. Hajnic, J. I. Osorio

and B. Zagrovic, unpublished data), we have performed MD

simulations of amino acids and their sidechain analogs in

water/DMP mixtures using the same setup as here. The

energy-based DMP/amino acid and DMP/sidechain ana-

log interaction propensity scales derived from MD agree

closely with the experimental PR scale (8) with Pearson R

coefcients of 0.93 and 0.95, respectively, attesting to the

general quality of our simulation methodology (M. Hajnic,

J. I. Osorio and B. Zagrovic, unpublished data).

Remarkably, the experimental PR scale (8) also exhibits

close correlation with the energy-based amino acid inter-

action propensity scales derived here for URA (Pearson

R = 0.89), ADE (R = 0.84) and CYT (R = 0.77), with

a signicantly weaker correlation observed for GUA (R =

0.30) (Figure 2A, inset table, third column). What is more,

all of these correlations against the experimental PR scale

improve even further if one uses sidechain analog scales

instead (Figure 2A, inset table, second column), with the

URA interaction propensity scale exhibiting the strongest

correlation (R = 0.94), followed by ADE (R = 0.93), CYT

(R = 0.86) and, nally, GUA (R = 0.58). In Figure 2A,

we plot the sidechain analog scale for URA, a nucleobase

which is physicochemically and sterically most similar to

DMP, against the experimental PR scale (8)andthetwo

exhibit remarkable similarity. Although the experimental

PR scale and the computational URA, ADE and CYT

scales were derived in very different ways, the close agree-

ment between can be taken as evidence of the quality of

the MD force eld and the general computational method-

ology used. Moreover, such agreement also suggests that

when it comes to capturing nucleobase/amino acid interac-

tion specicity, DMP is actually a good model not only for

naturally occurring pyrimidine bases URA and CYT, but

also purine ADE.

When we compare our sidechain analog interaction

propensities for GUA with the only analogous, exten-

sive scale available from experiment, that of amino acid–

guanosine binding constants for eight amino acids (Ser, Thr,

Val, Leu, Met, Lys, Phe and Trp) (10), we obtain a Spear-

man rank-order correlation coefcient of ␳ =−0.83 (Fig-

ure 2B) and a direct Pearson correlation coefcient of R =

0.79 when the association constants are converted to bind-

ing free energies (Figure 2B, inset). Interestingly, in our sim-

ulations we not only correctly capture the relative interac-

tion propensities of aromatic sidechain analogs for GUA,

but we also observe the same propensity trends as in the ex-

periment for the relatively similar residues such as Ser and

Thr or Val and Leu. What is more, if one excludes the out-

lier Lys, the rank correlation increases to ␳ =−0.96.

On the

other hand, the level of correlation drops signicantly if one

uses the computational scale for amino acids, here also in-

cluding the value for Gly (␳ =−0.62 and R = 0.46). Finally,

the experimentally derived binding free energies of four

amino acids for adenosine (Val, Lys, Phe, Trp) (␳ =−0.80

and R = 0.52) and two for cytidine (Phe, Trp) (10) show

the same trend as observed in our sidechain analog inter-

action propensity scales for the equivalent bases, with sim-

ilar results for amino acid scales (␳ =−0.80 and R = 0.50

for the adenosine case). Overall, a combination of the above

thermodynamic stability analysis and the favorable compar-

ison with experiment reassuringly suggests that the essen-

tial physical chemistry behind amino acid/nucleobase inter-

actions remains approximately the same even at relatively

high nucleobase concentrations as studied here. This fur-

thermore suggests that our simulation-based scales can be

used to greatly extend the limited experimental data avail-

able and characterize interactions with nucleobases for all

amino acids and sidechain analogs. Interestingly, in many

cases, our simulations with sidechain analogs match the

experimental data obtained with amino acids slightly bet-

ter than the simulations with amino acids themselves (Fig-

ure 2), a nding we do not currently have a full explanation

for. A part of the reason may be that the GROMOS53A6

force eld was parameterized to match solvation free ener-

gies of sidechain analogs in cyclohexane or water and not

those of complete amino acids. It is possible that a po-

tentially lower accuracy of parameters for complete amino

acids may be responsible for a greater discrepancy from ex-

periment in that case. However, as sidechain analogs cap-

ture the behavior of protein residues at RNA–protein inter-

faces arguably better than the zwitterionic amino acids do,

in the remainder of this text we primarily focus on sidechain

analogs, while always giving the results for amino acids as a

point of comparison.

The above energetic analysis is well illustrated by a

structural exploration using RDFs. In Figure 3, we show

water/sidechain-analog and nucleobase/sidechain-analog

RDFs for the most favorable and the least favorable in-

teracting partners of the four RNA nucleobases, as deter-

mined by the analysis of interaction energies. URA, CYT

and ADE, for example, all exhibit the strongest prefer-

ence for interacting with Trp relative to all other residues,

which is illustrated by the presence of a pronounced rst

peak in their nucleobase/Trp RDFs. On the other hand,

in the case of GUA the strongest favorable interactions are

seen for Lys. When it comes to the least favorable interac-

tions, in all cases they are invariably seen with the nega-

tively charged Glu and Asp. The presence of a well-dened

peak in CYT/Glu, ADE/Asp and GUA/Glu RDFs, how-

ever, suggests that, although unfavorable, some of these in-

teractions do exhibit a sizable level of structural organiza-

tion. Nonetheless, for all energetically unfavorable interac-

tions, it is clear that the residues in question prefer to in-

teract with and be surrounded by water molecules, as in-

dicated by strong, well-dened rst peaks in the respective

water/sidechain-analog RDFs.

As discussed above, the GUA-based interaction energy

scales differ most from all other scales. When correlat-

ing the individual scales against each other, we indeed

nd that the GUA scale deviates most from other scales,

which is primarily due to the behavior of charged sidechain

analogs (Figure 4A). Namely, in the GUA/water mixture,

Lys and Arg exhibit lower interaction energies with GUA

than with water molecules, which is not the case in any

Nucleic Acids Research, 2014, Vol. 42, No. 21 12989

Figure 2. (A) Correlation between the experimentally derived polar requirement (PR

experiment

) scale (8) and the energy-based scale of sidechain analog

interaction propensities for URA (in kJ/mol) obtained by simulation. Inset: Pearson correlation coefcients R between all sidechain analog (second

column) and amino acid (third column) propensity scales and the PR scale. (B) Rank-order correlation between experimentally measured amino acid–

guanosine association constants (10), and the computationally derived sidechain analog interaction energy scale for GUA (in kJ/mol). Inset: correlation

between binding free energies (in kJ/mol) at the standard reference concentration of 1 M, as derived from association constants, and the the computationally

obtained sidechain analog interaction energy scale for GUA (in kJ/mol).

Figure 3. Water/sidechain-analog and nucleobase/sidechain-analog ra-

dial distribution functions g(r) for the most favorable (left column) and the

least favorable (right column) sidechain analog interacting partners for the

four RNA nucleobases, as determined by energy-based interaction propen-

sity scales for: (A) URA, (B) CYT, (C) ADE and (D)GUA.

other nucleobase/water systems except for the CYT/Arg

system (Figure 4A). Furthermore, Asp and Glu also exhibit

signicantly more favorable interaction energies with GUA

as compared to other energy-based interaction propensity

scales (Figure 4A). Although in absolute terms these two

anionic sidechains do not interact favorably with GUA (i.e.

they exhibit positive energies), the extent of this unfavorable

bias is the least as compared to other bases (Figure 4A).

Similar results are also seen in the simulations with com-

plete amino acids (data not shown).

A particularly telling comparison in this regard concerns

the behavior of GUA- and ADE-based scales. If one, for ex-

ample, examines relative energy-based interaction propen-

sity scales, one observes a remarkable asymmetry in the

behavior of GUA and ADE (Supplementary Figure S1).

In particular, the relative ADE–CYT scale is strongly in-

versely correlated with those involving GUA (GUA–CYT,

R =−0.84 and GUA–URA, R =−0.95) with no signicant

correlations or anti-correlations for the ADE–URA relative

scale (Supplementary Figure S1). In Figure 4B, we illustrate

this difference in the case of GUA–CYT and ADE–CYT

relative scales and it is clear that the effect is completely

due to the nature of the interactions of the charged residues

with ADE and GUA relative to that with CYT. While

GUA strongly prefers to interact with Lys, Arg, Asp and

Glu as compared to CYT (with, for example, E

sca

GUA

–CYT

of cca. −100 kJ/mol in the case of Lys), ADE almost

equally strongly prefers not to interact with these residues

(with E

sca

ADE

–CYT

of cca. 100 kJ/mol in the case of Lys)

again as compared to CYT (Figure 4B). This effect clearly

demonstrates the paramount importance of specic ring

substituents especially in the case of purine bases, which

was already observed in our analysis of knowledge-based

nucleobase-residue interaction propensity scales (23). Inter-

estingly, while the sidechain analog scale derived presently

for ADE correlates reasonably well with the equivalent

knowledge-based scale (Spearman ␳ = 0.57 for the 2+ scale

from Polyansky et al. (23)), the correlations for all other

scales including the GUA scale are signicantly weaker (|␳|

< 0.2) (Supplementary Table S3).

A similar trend is also seen with amino acid interac-

tion propensity scales (Supplementary Table S3). On the

other hand, the relative scales of GUA derived presently

for sidechain analogs agree somewhat better with those de-

rived in the knowledge-based analysis (23) with, for exam-

ple, GUA–URA and GUA–CYT correlating with Spear-

man ␳ of 0.40 or 0.38, respectively (Supplementary Table

S4). While these correlations between complete scales are

relatively weak, it is important to mention that they agree

much better when it comes to the relative placement of

12990 Nucleic Acids Research, 2014, Vol. 42, No. 21

Figure 4. (A) A direct comparison between energy-based sidechain analog interaction propensity scales for the four nucleobases with Pearson correlation

coefcients given in the graphs. In each graph, the four charged amino acids are labeled in red. (B) Correlation between relative energy-based sidechain

analog GUA–CYT and ADE–CYT interaction propensity scales (in kJ/mol) derived from simulations of different systems.

charged residues only, which is in the end chiey respon-

sible for the qualitative similarities between the scales, as

discussed below.

Analysis of the mRNA-cognate protein complementarity hy-

pothesis

We have used the obtained scales to study the relation-

ship between the nucleobase content of mRNA coding se-

quences and the nucleobase interaction propensities of their

cognate protein sequences for the entire H. sapiens, M. jan-

naschii and E. coli proteomes. We have performed this analy-

sis by comparing window-averaged sequence proles of the

two cognate biopolymers as elaborated before (23,36,37),

whereby one obtains a Pearson R for each cognate pair, i.e. a

distribution of Pearson Rs over the whole proteome. Note

that negative correlations here denote a positive relation-

ship between nucleobase content and interaction propen-

sity, which comes from the fact that propensity is dened

Nucleic Acids Research, 2014, Vol. 42, No. 21 12991

Figure 5. Distributions of Pearson correlation coefcients between

window-averaged PYR content proles of mRNAs and their cognate pro-

teins’ proles of interaction propensity for different RNA nucleobases

(URA, CYT, ADE and GUA) assessed using computationally derived

sidechain analog scales. Inset: median values of distributions of Pearson

correlation coefcients between window-averaged PYR content proles of

mRNAs and their cognate proteins’ proles of interaction propensity for

different RNA nucleobases, calculated over the entire human proteome

(sidechain analogs, ‘sca,’ and complete amino acids, ‘aa’).

Figure 6. (A) Distributions of Pearson correlation coefcients between

window-averaged PUR content proles of mRNAs and their cognate pro-

teins’ proles of relative interaction propensity for different combinations

of RNA nucleobases, calculated over the entire human proteome. The

propensities were obtained from the energetic analysis of different sys-

tems from MD simulations. (B) Typical proles of mRNA PUR content

and protein sequence interaction propensity calculated using the compu-

tationally derived sidechain analog GUA–CYT and ADE–CYT relative

interaction propensity scales. The two examples were chosen because their

Pearson R coefcients correspond to the medians over the respective dis-

tributions over the complete human proteome.

using an energy scale (the lower the energy, the higher the

propensity). Our results show that PYR density proles

of mRNAs quantitatively match the energy-based URA-,

CYT- and ADE-interaction propensity proles of their cog-

nate protein sequences across the entire human proteome,

with no signicant correlation being observed for GUA

scales, as demonstrated for H. sapiens in Figure 5. More

specically, the median correlation coefcients for URA,

CYT and ADE sidechain-based scales are −0.68, −0.52

and −0.70, respectively, while for the GUA scale this value

drops to −0.11 (Figure 5, inset). For M. jannaschii, the me-

dian correlation coefcients of mRNA–protein pairs are as

high as those observed for the human proteome or higher,

while for E. coli the values are slightly lower, but still statis-

tically signicant (Supplementary Figures S2A and S3A).

Similar values are also seen for amino acid-based scales as

well. In other words, PYR-rich regions in mRNAs tend to

code for regions in their cognate proteins that exhibit more

favorable interaction energies with URA, CYT and ADE

relative to water as compared to the PUR-rich regions. In-

terestingly, the correlation coefcients obtained for mRNA

density proles of individual bases are signicantly weaker

(Supplementary Table S5), as was already observed before

(23,36).

As mentioned above, the GUA scale does not yield any

signicant correlation with PYR, i.e. PUR content on the

side of mRNA. However, analysis of relative scales reveals

that mRNA PUR density proles closely and quantitatively

match their cognate protein proles capturing the relative

preference of residues to interact with GUA relative to all

other nucleobases. For example, protein sequence proles of

relative GUA–CYT binding propensities match their cog-

nate mRNA PUR density proles with the median Pear-

son R =−0.69 (P-value = 1 × 10

−3

) over the entire hu-

man proteome (Table 1). What this means is that one half

of all mRNA-cognate protein pairs in the human proteome

display prole matching that is equal or better than the

median representative, Pleckstrin (P08567) shown in Fig-

ure 6B. Interestingly, though, much weaker correlation is

seen if instead of PUR content, which is up to a constant

equivalent to PUR–PYR content, one here analyzes GUA–

CYT content along mRNA (median R =−0.48). Similar

results are also obtained for the relative GUA–URA and

GUA–ADE scales (P-values 3 × 10

−4

and 4 × 10

−4

,re-

spectively) (Table 1). On the other hand, the ADE–CYT

scale results in a similar level of matching, but now when

it comes to mRNA PYR-density mRNA. In Figure 6B, we

illustrate this in the case of the median representative pro-

tein Pleckstrin (P08567) and its mRNA. Interestingly, the

ADE–URA scale exhibits no signicant correlation what-

soever, while the CYT–URA scale exhibits a signicant level

of matching with mRNA PUR density proles (P-value = 4

× 10

−4

)(Table1). Again, we observe the same trend as with

non-relative interaction propensity scales (Supplementary

Table S5) that the correlation coefcients for mRNA density

proles of individual base are weaker than the mRNA PUR

density proles (Supplementary Table S6). The same trends

seen for human proteome extend to other organisms as well

(Supplementary Figures S2B and C and S3B and C). As the

main reason for the matching detected in our analysis is the

genetic code, which is the same for all the organisms studied,

12992 Nucleic Acids Research, 2014, Vol. 42, No. 21

Table 1. Median values of distributions of Pearson correlation coefcients

between window-averaged PUR content proles of mRNA molecules and

their cognate proteins’ proles of relative interaction propensities for nucle-

obases calculated over the entire human proteome. The interaction propen-

sities were obtained from the energetic analysis of both sidechain (sca) and

amino acid (aa) containing systems.

it is not surprising that one obtains similar levels of match-

ing no matter in which organism one looks. The differences,

on the other hand, can be attributed to the exact mRNA

and protein composition in individual proteomes. Finally,

qualitatively identical results are obtained for systems with

zwitterionic amino acids instead of sidechain analogs (Ta-

ble 1).

DISCUSSION

In the present study, we have for the rst time system-

atically analyzed the behavior of amino acids and their

sidechain analogs in high-concentration aqueous solutions

of naturally occurring RNA nucleobases. Our results show

that amino acids and their sidechain analogs display highly

differentiated interaction propensities for different nucle-

obases depending on the ring architecture and, even more

importantly, ring substituents. It is our hope that these

scales will provide: (i) a rigorous, quantitative, physic-

ochemical foundation for rationalizing the specicity in

RNA–protein interactions in different contexts, and (ii) a

powerful tool for sculpting and modifying such specicity

for biomedical and bioengineering purposes.

As discussed above, our simulations were carried out at

nucleobase concentration levels exceeding the experimen-

tally known solubility limits. However, a strong agreement

with extant experimental data, a general absence of stable

aggregates and favorable results of thermodynamic stabil-

ity analysis all suggest that the simulated model systems do

capture the essential features of amino acid/nucleobase in-

teractions even at high concentrations. Moreover, even if

the simulated systems would over time move in the direc-

tion of precipitation, the partitioning of amino acids and

their sidechain analogs between water- and base-rich frac-

tions occurs much more quickly, allowing one to accurately

capture interaction propensities with relatively short simu-

lations. Finally, the number of water molecules in our simu-

lations was such that for each base there was enough water

to account for one full hydration shell around it. As such,

our simulated systems in all likelihood better approximate

the situation at typical hydrated RNA–protein binding in-

terfaces than would more dilute solutions.

Our analysis of energy-based interaction preferences was

based on a critical assumption that the potential energies

between amino acids or sidechain analogs and nucleobases

or water accurately capture the free energies of these inter-

actions. In other words, we assumed that it is primarily the

enthalpic part of free energy that is responsible for the rela-

tive difference in amino acid–nucleobase interactions, with

the entropic component being proportional to it. A similar

assumption was made by Stumpe and Grubm¨uller in their

study of amino acid interactions with urea and their inu-

ence on protein folding (50). In a related study (A. de Ruiter

and B. Zagrovic, in preparation), we have used MD simu-

lations and umbrella sampling to evaluate absolute binding

free energies between nucleobases and amino acid sidechain

analogs in water. By comparing the sidechain analog inter-

action propensities derived in this work to the absolute free

energies, which fully account for both enthalpy and entropy,

we observe a high level of correlation with Spearman cor-

relation coefcients of ␳ = 0.88 (URA), ␳ = 0.85 (CYT),

␳ = 0.62 (GUA) and ␳ = 0.88 (ADE) (A. de Ruiter and B.

Zagrovic, in preparation). Although the behavior of amino

acids or their sidechain analogs in crowded solutions of nu-

cleobases need not necessarily match that with only one nu-

cleobase present, such a high level of correlation does sup-

port the existence of a strong relationship between free en-

ergies and their enthalpic components in the former case.

Finally, the fact that the obtained scales agree well both

with experimental results (8,10)aswellaswiththestruc-

tural analysis of intermolecular contacts (23) lends further

support to this claim.

Here, we have used the derived interaction propensity

scales to investigate how the relationships observed at the

level of amino acids and nucleobases translate to the level

of complete coding sequences of mRNAs and their cog-

nate proteins. Our central aim was to further examine the

recently proposed complementarity hypothesis and its re-

lationship with the structure and the origin of the genetic

code (23,36,37). In accordance with our results obtained us-

ing knowledge-based potentials (23), we have observed that

the higher the pyrimidine content of mRNAs, the higher the

propensity of their cognate proteins’ propensities to interact

with URA, CYT and ADE, but importantly not with GUA

(Figure 5). Actually, the fact that GUA- and ADE-based

scales exhibit opposite behavior when it comes to their rela-

tionship with PYR-based scales (Supplementary Figure S1)

suggests that the key element in determining the specicity

of interaction between amino acids and nucleobases is not

the nature of the heterocyclic ring, but rather that of ring

substituents. In particular, our present results suggest that

this difference stems primarily from the behavior of charged

amino acids, which is reasonable considering the fact that

the two purine bases are largely isosteric and differ primar-

y when it comes to ring substituents and their charge distri-

bution. This is also supported by a related analysis in which

we showed that unsubstituted purine and pyrimidine rings

result in highly correlated scales when it comes to their in-

teractions with amino acids (M. Hajnic, J. I. Osorio and B.

Zagrovic, unpublished data).

In support of this reasoning, we have observed a strong

relationship between the average PUR content of mRNA

sequences and the relative preference of their cognate pro-

Nucleic Acids Research, 2014, Vol. 42, No. 21 12993

tein sequences to interact with GUA relative to other bases

(Figure 6A, Table 1). In accordance with the stereochemi-

cal hypothesis and our generalizations of it (23,36,37), GUA

exhibits strong preference for interaction with PUR-coded

amino acids relative to all other bases. Importantly, this ef-

fect appears to be primarily due to the behavior of charged

amino acids Glu, Asp, Arg and Lys. These results are fully

consistent with our previous knowledge-based analysis of

residue preferences for different nucleobases: there, GUA

interaction preferences on the side of amino acids or pro-

teins correlated extremely well with purine density at the

side of their cognate codons or mRNA, while ADE inter-

action preferences were much closer to those of pyrimi-

dine bases (23), as also seen here. While the full biological

meaning of this result still remains to be elucidated, we are

condent that it represents an important principle concern-

ing the mRNA–protein relationship in general. Overall, our

results give support to the generalized stereochemical hy-

pothesis of the origin of the genetic code, in which GUA

plays the role of an archetypal purine (i.e. purine richness

on the side of codons or mRNAs parallels high levels of

relative guanine interaction propensity on the side of cog-

nate amino acids or proteins), while the opposite is seen for

CYT, URA and ADE (i.e. pyrimidine richness of mRNA

mirrors high relative interaction propensity for these nucle-

obases when it comes to cognate amino acids or proteins)

(23,36,37). In this context, the presence of adenines neg-

atively affects complementarity levels, as discussed before

(37). Intriguingly, despite the fact that the propensity scales

were derived for specic bases, the highest levels of match-

ing are observed if one considers PUR (i.e. PYR) density on

the side of mRNA and not that of individual bases (Supple-

mentary Tables S5 and S6). This effect, which was already

observed before (23,36,37), still requires a full explanation.

However, we believe it suggests that the core of the genetic

code was originally dened at the level of a coarse-grained

nucleobase alphabet in which differences between specic

purines, i.e. pyrimidines, were not critical.

Our results with specically Glu and Asp and their in-

teractions with GUA show that the most basic version of

the stereochemical hypothesis, the one in which the genetic

code evolved on the basis of direct interactions between

amino acids and their codons, can at best hold for a subset

of amino acids only. In particular, Glu and Asp do not ap-

pear to favorably interact with any nucleobases in the aque-

ous environment, although in our knowledge-based analy-

sis (23) they do show a strong preference for interacting with

purine bases and especially GUA. The solution to this seem-

ing paradox is provided by our present results: although the

negatively charged Glu and Asp do not show direct prefer-

ence for binding to GUA, they appear to be the least un-

favorable interacting partners for GUA when compared to

all other nucleobases. It is very possible that the preferences

one sees in the knowledge-based analysis of known protein–

RNA complexes are in part a consequence of such a nega-

tive selection. One way in which this result could be made

consistent with the stereochemical hypothesis and especially

its generalized version, even for Glu and Asp, is if one as-

sumes that the genetic code evolved in a context in which the

role of the translation apparatus was not to link individual

amino acids according to the mRNA template, but rather

short peptides. In this scenario, other amino acids would

provide the source of favorable binding free energy to the

mRNA template, while Glu and Asp would contribute to

the specicity of binding only.

Overall, our study shows that by using MD simu-

lations and extensive sampling we can distinguish be-

tween amino acid or sidechain analog interaction propen-

sities for different nucleobases. Remarkably, the interac-

tion propensities derived from simulations of individual

monomers yield close correspondences at the level of com-

plete proteins and mRNA molecules, giving support to

the mRNA/protein complementarity hypothesis as recently

proposed. Although our present results are highly sugges-

tive, it should be nonetheless emphasized that the only rig-

orous test of the complementarity hypothesis can come

from direct experimental work. We hope that our present

results will serve not only as a source of motivation in this

direction, but also as a foundation for different computa-

tional and experimental studies of RNA/protein interac-

tions in general (51–53).

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

ACKNOWLEDGMENTS

We thank members of the Laboratory of Computational

Bioph

ysics at MFPL for useful advice and critical reading of

the manuscript. The authors gratefully acknowledge A. A.

Polyansky for help with randomization tests and C. Oost-

enbrink for helpful advice on Kirkwood-Buff integrals.

FUNDING

Austrian Science Fund FWF [START Y 514-B11 to B.Z.,

in part]; European Research Council [ERC Starting Inde-

pendent 279408 to B.Z.]. Funding for open access charge:

Austrian Science Fund FWF [START Y 514-B11 to B.Z.,

in part); European Research Council [ERC Starting Inde-

pendent 279408 to B.Z.].

Conict of interest statement. None declared.

REFERENCES

1. Moore,M.J. and Proudfoot,N.J. (2009) Pre-mRNA processing reaches

back to transcription and ahead to translation. Cell, 136, 688–700.

2. Licatalosi,D.D. and Darnell,R.B. (2010) RNA processing and its

regulation: global insights into biological networks. Nat. Rev. Genet.,

11, 75–87.

3. M ¨uller-Mcnicoll,M. and Neugebauer,K.M. (2013) How cells get the

message: dynamic assembly and function of mRNA-protein

complexes. Nat. Rev. Genet., 14, 275–287.

4. Mercer,T.R. and Mattick,J.S. (2013) Structure and function of long

noncoding RNAs in epigenetic regulation. Nat. Struct. Mol. Biol., 20,

300–307.

5. Baltz,A.G., Munschauer,M., Schwanhaeusser,B., Vasile,A.,

Murakawa,Y., Schueler,M., Youngs,N., Penfold-Brown,D., Drew,K.,

Milek,M. et al. (2012) The mRNA-bound proteome and its global

occupancy prole on protein-coding transcripts. Mol. Cell, 46,

674–690.

6. Castello,A., Fischer,B., Eichelbaum,K., Horos,R., Beckmann,B.M.,

Strein,C., Davey,N.E., Humphreys,D.T., Preiss,T., Steinmetz,L.M.

et al. (2012) Insights into RNA biology from an atlas of mammalian

mRNA-binding proteins. Cell, 149, 1393–1406.

12994 Nucleic Acids Research, 2014, Vol. 42, No. 21

7. Woese,C. (1965) On evolution of genetic code. Proc. Natl. Acad. Sci.

U.S.A., 54, 1546–1552.

8. Woese,C.R. (1973) Evolution of the genetic code.

Naturwissenschaften, 60, 447–459.

9. Akinrimisi,E. and Tso,P. (1964) Interactions of purine with proteins

+ amino acids. Biochemistry (Mosc.), 3, 619–626.

10. Thomas,P. and Podder,S. (1978) Specicity in protein-nucleic acid

interaction––solubility study on amino acid nucleoside interaction.

FEBS Lett., 96, 90–94.

11. Lacey,J.C. and Pruitt,K.M. (1969) Origin of the genetic code. Nature,

223, 799–804.

12. Rifkind,J.M. and Eichhorn,G.L. (1970) Specicity for the interaction

of nucleotides with basic polypeptides. Biochemistry (Mosc.), 9,

1753–1761.

13. Wagner,K.G. and Arfmann,H.-A. (1974) Properties of basic

amino-acid residues. Eur. J. Biochem., 46, 27–34.

14. Luscombe,N.M., Laskowski,R.A. and Thornton,J.M. (2001) Amino

acid-base interactions: a three-dimensional analysis of protein-DNA

interactions at an atomic level. Nucleic Acids Res., 29, 2860–2874.

15. Treger,M. and Westhof,E. (2001) Statistical analysis of atomic

contacts at RNA-protein interfaces. J. Mol. Recognit., 14, 199–214.

16. Jeong,E., Kim,H., Lee,S.W. and Han,K. (2003) Discovering the

interaction propensities of amino acids and nucleotides from

protein-RNA complexes. Mol. Cells, 16, 161–167.

17. Hoffman,M.M., Khrapov,M.A., Cox,J.C., Yao,J.C., Tong,L.N. and

Ellington,A.D. (2004) AANT: the amino acid-nucleotide interaction

database. Nucleic Acids Res., 32, D174–D181.

18. Kondo,J. and Westhof,E. (2011) Classication of pseudo pairs

between nucleotide bases and amino acids by analysis of

nucleotide-protein complexes. Nucleic Acids Res., 39, 8628–8637.

19. Donald,J.E., Chen,W.W. and Shakhnovich,E.I. (2007) Energetics of

protein–DNA interactions. Nucleic Acids Res., 35, 1039–1047.

20. Jonikas,M.A., Radmer,R.J., Laederach,A., Das,R., Pearlman,S.,

Herschlag,D. and Altman,R.B. (2009) Coarse-grained modeling of

large RNA molecules with knowledge-based potentials and structural

lters. RNA, 15, 189–199.

21. P

erez-Cano,L., Solernou,A., Pons,C. and Fern

andez-Recio,J. (2010)

Structural prediction of protein-RNA interaction by computational

docking with propensity-based statistical potentials. Pac. Symp.

Biocomput., 293–301.

22. Tuszynska,I. and Bujnicki,J.M. (2011) DARS-RNP and

QUASI-RNP: new statistical potentials for protein-RNA docking.

BMC Bioinformatics, 12, 348-363.

23. Polyansky,A.A. and Zagrovic,B. (2013) Evidence of direct

complementary interactions between messenger RNAs and their

cognate proteins.

Nucleic Acids Res., 41,

8434–8443.

24. Mathew,D.C. and Luthey-Schulten,Z. (2008) On the physical basis of

the amino acid polar requirement. J. Mol. Evol., 66, 519–528.

25. Biot,C., Buisine,E., Kwasigroch,J.M., Wintjens,R. and Rooman,M.

(2002) Probing the energetic and structural role of amino

acid/nucleobase cation-pi interactions in protein-ligand complexes. J.

Biol. Chem., 277, 40816–40822.

26. Rutledge,L.R., Campbell-Verduyn,L.S., Hunter,K.C. and

Wetmore,S.D. (2006) Characterization of nucleobase-amino acid

stacking interactions utilized by a DNA repair enzyme. J. Phys.

Chem. B, 110, 19652–19663.

27. Rutledge,L.R., Durst,H.F. and Wetmore,S.D. (2008) Computational

comparison of the stacking interactions between the aromatic amino

acids and the natural or (cationic) methylated nucleobases. Phys.

Chem. Chem. Phys., 10, 2801–2812.

28. Ebrahimi,A., Habibi-Khorassani,M., Gholipour,A.R. and

Masoodi,H.R. (2009) Interaction between uracil nucleobase and

phenylalanine amino acid: the role of sodium cation in stacking.

Theor. Chem. Acc., 124, 115–122.

29. Nirenberg,M.W., Jones,O.W., Leder,P., Clark,B.F.C., Sly,W.S. and

Pestka,S. (1963) On the coding of genetic information. Cold Spring

Harb. Symp. Quant. Biol., 28, 549–557.

30. Giulio,M.D. (2005) The origin of the genetic code: theories and their

relationships, a review. Biosystems, 80, 175–184.

31. Koonin,E.V. and Novozhilov,A.S. (2009) Origin and evolution of the

genetic code: the universal enigma. IUBMB Life, 61, 99–111.

32. Woese,C. (1968) Fundamental nature of genetic code––prebiotic

interactions between polynucleotides and polyamino acids or their

derivatives. Proc. Natl. Acad. Sci. U.S.A., 59, 110–117.

33. Woese,C. (1969) Models for evolution of codon assignments. J. Mol.

Biol., 43, 235–240.

34. Yarus,M. (1998) Amino acids as RNA ligands: a

direct-RNA-template theory for the code’s origin. J. Mol. Evol., 47,

109–117.

35. Yarus,M., Widmann,J.J. and Knight,R. (2009) RNA-amino acid

binding: a stereochemical era for the genetic code. J. Mol. Evol., 69,

406–429.

36. Hlevnjak,M., Polyansky,A.A. and Zagrovic,B. (2012) Sequence

signatures of direct complementarity between mRNAs and cognate

proteins on multiple levels. Nucleic Acids Res., 40, 8874–8882.

37. Polyansky,A.A., Hlevnjak,M. and Zagrovic,B. (2013) Proteome-wide

analysis reveals clues of complementary interactions between

mRNAs and their cognate proteins as the physicochemical

foundation of the genetic code. RNA Biol., 10, 1248–1254.

38. Kyrpides,N.C. and Ouzounis,C.A. (1993) Mechanisms of specicity

in mRNA degradation: autoregulation and cognate interactions. J.

Theor. Biol., 163,

373–392.

39. Ouzounis,C.A. and Kyrpides,N.C. (1994) Reverse interpretation: a

hypothetical selection mechanism for adaptive mutagenesis based on

autoregulated mRNA stability. J. Theor. Biol., 167, 373–379.

40. Oostenbrink,C., Villa,A., Mark,A.E. and Gunsteren,W.F. (2004) A

biomolecular force eld based on the free enthalpy of hydration and

solvation: the GROMOS force-eld parameter sets 53A5 and 53A6.

J. Comput. Chem., 25, 1656–1676.

41. Hess,B., Kutzner,C., van der Spoel,D. and Lindahl,E. (2008)

GROMACS 4: algorithms for highly efcient, load-balanced, and

scalable molecular simulation. J. Chem. Theory Comput., 4, 435–447.

42. Berendsen,H., Grigera,J. and Straatsma,T. (1987) The missing term in

effective pair potentials. J. Phys. Chem., 91, 6269–6271.

43. Bussi,G., Donadio,D. and Parrinello,M. (2007) Canonical sampling

through velocity rescaling. J. Chem. Phys., 126, 014101.

44. Parrinello,M. and Rahman,A. (1981) Polymorphic transitions in

single-crystals––a new molecular-dynamics method. J. Appl. Phys.,

52, 7182–7190.

45. Oostenbrink,C. and van Gunsteren,W.F. (2005) Methane clustering in

explicit water: effect of urea on hydrophobic interactions. Phys.

Chem. Chem. Phys., 7, 53–58.

46. Arieh,B.-N. (1992) Statistical thermodynamics for chemists and

biochemists. Springer Science+Business Media, NY.

47. Gazzillo,D. (1995) Stability of uids with more than two

components. Mol. Phys., 84, 303–323.

48. UniProt Consortium (2013) Update on activities at the Universal

Protein Resource (UniProt) in 2013. Nucleic Acids Res., 41, D43–D47.

49. Yalkowsky,S.H. and Dannenfelser,R.M. (1992) Aquasol database of

aqueous solubility. College of Pharmacy, University of Arizona,

Tucson, AZ.

50. Stumpe,M.C. and Grubm ¨uller,H. (2007) Interaction of urea with

amino acids: implications for urea-induced protein denaturation. J.

Am. Chem. Soc., 129, 16126–16131.

51.

Ank

o,M.-L. and Neugebauer,K.M. (2012) RNA–protein interactions

in vivo: global gets specic. Trends Biochem. Sci., 37, 255–262.

52. Puton,T., Kozlowski,L., Tuszynska,I., Rother,K. and Bujnicki,J.M.

(2012) Computational methods for prediction of protein–RNA

interactions. J. Struct. Biol., 179, 261–268.

53. Zanzoni,A., Marchese,D., Agostini,F., Bolognesi,B., Cirillo,D.,

Botta-Orla,M., Livi,C.M., Rodriguez-Mulero,S. and Tartaglia,G.G.

(2013) Principles of self-organization in biological pathways: a

hypothesis on the autogenous association of alpha-synuclein. Nucleic

Acids Res., 41,

9987–9998.

SUPPLEMENTARY DATA

Data

October 2014

Matea Hajnic · Juan Iregui Osorio · Bojan Zagrovic

RNA‐Protein Interactions in an Unstructured Context

Article

Full-text available

May 2018

Despite their importance, our understanding of non‐covalent RNA/protein interactions is incomplete. This especially concerns the binding between RNA and unstructured protein regions, a widespread class of such interactions. Here, we review the recent experimental and computational work on RNA/protein interactions in an unstructured context with a particular focus on how such interactions may be shaped by the intrinsic interaction affinities between individual nucleobases and protein sidechains. Specifically, we articulate the claim that the universal genetic code, in part, reflects the binding specificity between nucleobases and amino acids and that, in turn, the code may be seen as the Rosetta stone for understanding RNA‐protein interactions in general. This article is protected by copyright. All rights reserved.

Compositional complementarity between genomic RNA and coat proteins in positive-sense single-stranded RNA viruses

Article

Full-text available

Mar 2022

During packaging in positive-sense single-stranded RNA (+ssRNA) viruses, coat proteins (CPs) interact directly with multiple regions in genomic RNA (gRNA), but the underlying physicochemical principles remain unclear. Here we analyze the high-resolution cryo-EM structure of bacteriophage MS2 and show that the gRNA/CP binding sites, including the known packaging signal, overlap significantly with regions where gRNA nucleobase-density profiles match the corresponding CP nucleobase-affinity profiles. Moreover, we show that the MS2 packaging signal corresponds to the global minimum in gRNA/CP interaction energy in the unstructured state as derived using a linearly additive model and knowledge-based nucleobase/amino-acid affinities. Motivated by this, we predict gRNA/CP interaction sites for a comprehensive set of 1082 +ssRNA viruses. We validate our predictions by comparing them with site-resolved information on gRNA/CP interactions derived in SELEX and CLIP experiments for 10 different viruses. Finally, we show that in experimentally studied systems CPs frequently interact with autologous coding regions in gRNA, in agreement with both predicted interaction energies and a recent proposal that proteins in general tend to interact with own mRNAs, if unstructured. Our results define a self-consistent framework for understanding packaging in +ssRNA viruses and implicate interactions between unstructured gRNA and CPs in the process.

Происхождение генетического кода и трансляции в рамках современных концепций происхождения жизни

Article

Jan 2022
Biokhimiya

Происхождение генетического кода и системы трансляции, возможно, является центральной и самой трудной проблемой в изучении происхождения жизни и одной из самых трудных во всей эволюционной биологии. Существует большое количество гипотез возникновения и развития современных генетических систем, затрагивающих происхождение и раннюю эволюцию генетического кода, а также возникновение репликации и трансляции. Наиболее широко известные гипотезы рассмотрены в данном обзоре. Однако ни одна из этих гипотез не описывает без пробелов и допущений все этапы ранней эволюции генетических систем. Гипотеза РНК-мира является главенствующей на сегодняшний день научной идеей о ранней эволюции биологических и пребиологических объектов. Главное её преимущество заключается в том, что она предлагает в качестве первых живых систем РНК как самодостаточные, с точки зрения воспроизведения, молекулы, которые способны функционировать как каталитический компонент системы и в то же время – как матричный. Однако есть и существенные недостатки. В частности, до сих пор не открыта и не получена экспериментально рибозимная процессивная полимераза. Учитывая взаимную потребность белков и нуклеиновых кислот в современном мире, многие авторы предлагают сценарии ранней эволюции на основе коэволюции этих двух классов органических молекул. Подобные гипотезы постулируют, что для репликации нуклеиновых кислот было необходимо возникновение трансляции, в отличие от мира РНК, где появлению трансляции предшествовала эра самореплицирующихся РНК. И хотя такие сценарии менее экономичны, с эволюционной точки зрения, так как требуют одномоментного появления и эволюции сразу двух классов органических молекул, а также синхронизации по времени появления репликации и трансляции, большим их преимуществом является то, что они предлагают развитие сразу гораздо более точной и процессивной белковой репликации.

Arginine multivalency stabilizes protein/RNA condensates

Article

Full-text available

May 2021
PROTEIN SCI

Biomolecular condensates assembled through liquid–liquid phase separation (LLPS) of proteins and RNAs are currently recognized to play an important role in cellular organization. Their assembly depends on the formation of a network of transient, multivalent interactions between flexible scaffold biomolecules. Understanding how protein and RNA sequences determine these interactions and ultimately regulate the phase separation is an open key challenge. Recent in vitro studies have revealed that arginine and lysine residues, which are enriched in most cellular condensates, have markedly distinct propensities to drive the LLPS of protein/RNA mixtures. Here, we employ explicit‐solvent atomistic molecular dynamics simulations to shed light on the microscopic origin of this difference by investigating mixtures of polyU oligonucleotides with either polyR/polyK peptides. In agreement with experiments, our simulations indicate that arginine has a higher affinity for polyU than lysine both in highly diluted conditions and in concentrated solutions with a biomolecular density comparable to cellular condensate. The analysis of intermolecular contacts suggests that this differential behavior is due to the propensity of arginine side chains to simultaneously form a higher number of specific interactions with oligonucleotides, including hydrogen bonds and stacking interactions. Our results provide a molecular description of how the multivalency of the guanidinium group enables the coordination of multiple RNA groups by a single arginine residue, thus ultimately stabilizing protein/RNA condensates.

Frameshifting preserves key physicochemical properties of proteins

Article

Full-text available

Mar 2020

Significance Genetic information stored in DNA is transcribed to messenger RNAs, which are then translated to produce proteins. A frameshift in the reading frame at any stage of this process typically results in a significantly different protein sequence being produced. Here, we show that, nevertheless, several essential properties of many protein sequences, such as their hydrophobicity profiles, remain largely unchanged upon frameshifting. This finding suggests that frameshifting could be an effective evolutionary strategy for generating novel protein sequences, which retain the functionally relevant physicochemical properties of the sequences from which they derive.

The Origin of Genetic Code and Translation in the Framework of Current Concepts on the Origin of Life

Article

Feb 2022

The origin of genetic code and translation system is probably the central and most difficult problem in the inves tigations on the origin of life and one of the most complex problems in the evolutionary biology in general. There are mul tiple hypotheses on the emergence and development of existing genetic systems that propose the mechanisms for the origin and early evolution of genetic code, as well as for the emergence of replication and translation. Here, we discuss the most wellknown of these hypotheses, although none of them provides a description of the early evolution of genetic systems without gaps and assumptions. The RNA world hypothesis is a currently prevailing scientific idea on the early evolution of biological and prebiological structures, the main advantage of which is the assumption that RNAs as the first living systems were selfsufficient, i.e., capable of functioning as both catalysts and templates. However, this hypothesis has also significant limitations. In particular, no ribozymes with processive polymerase activity have been yet discovered or synthesized. Taking into account the mutual need of proteins and nucleic acids in each other in the current world, many authors propose the early evolution scenarios based on the coevolution of these two classes of organic molecules. They postulate that the emer gence of translation was necessary for the replication of nucleic acids, in contrast to the RNA world hypothesis, according to which the emergence of translation was preceded by the era of selfreplicating RNAs. Although such scenarios are less parsimonious from the evolutionary point of view, since they require simultaneous emergence and evolution of two classes of organic molecules, as well as the emergence of synchronized replication and translation, their major advantage is that they explain the development of processive and much more accurate proteindependent replication.

Arginine multivalency stabilizes protein/RNA condensates

Preprint

Full-text available

May 2021

Biomolecular condensates assembled through liquid-liquid phase separation (LLPS) of proteins and RNAs are currently recognized to play an important role in cellular organization. Their assembly depends on the formation of a network of transient, multivalent interactions between flexible scaffold biomolecules. Understanding how protein and RNA sequences determine these interactions and ultimately regulate the phase separation is an open key challenge. Recent in vitro studies have revealed that arginine and lysine residues, which are enriched in most cellular condensates, have markedly distinct propensities to drive the LLPS of protein/RNA mixtures. Here, we employ explicit-solvent atomistic Molecular Dynamics (MD) simulations to shed light on the microscopic origin of this difference by investigating mixtures of polyU oligonucleotides with either polyR/polyK peptides. In agreement with experiments, our simulations indicate that arginine has a higher affinity for polyU than lysine both in highly diluted conditions and in concentrated solutions with a biomolecular density comparable to cellular condensate. The analysis of intermolecular contacts suggests that this differential behavior is due to the propensity of arginine side chains to simultaneously form a higher number of specific interactions with oligonucleotides, including hydrogen bonds and stacking interactions. Our results provide a molecular description of how the multivalency of the guanidinium group enables the coordination of multiple RNA groups by a single arginine residue, thus ultimately stabilizing protein/RNA condensates.

Prebiotic Chemistry in Aqueous Environment: A Review of Peptide Synthesis and Its Relationship with Genetic Code

Article

Full-text available

Apr 2021

Prebiotic peptide synthesis and the origin of the genetic code are central issues concerning the origin of life. The question of how they are possibly correlated on the primordial Earth remains perplexing, although numerous experiments have been carried out to explain the prebiotic chemistry of peptide synthesis and the genetic code origin. The purpose of this article is to review the chemical reactions occurred during the synthesis of peptides and the origin of the genetic code in the early Earth aqueous environment. Meanwhile, we attempt to review their relationship as well. At last, from our perspective, the chiral properties of biomolecules should be taken into account in the prebiotic chemical scenarios, which may contribute to some breakthroughs in the further research of this field. This article is protected by copyright. All rights reserved.

Genetic Code

Chapter

Oct 2020

Claudia Tanja Mierke

In this chapter, the genetic codeGenetic code seems to be a universal codeUniversal code. The universal codeUniversal code has a specific arrangement of the codonsCodon that is definitely not random. There are at least three major concepts of the origin and the evolutionEvolution of the universal genetic codeGenetic code: Firstly, there is the stereochemical theoryStereochemical theory stating that the assignments of codonsCodon are determined by the physicochemical affinity of the amino acidsAmino acids and the cognate codonsCodon (synonymously referred to as anticodons)Anticodons; secondly, the co-evolution theory stating that the structure of the code structure coevolved with the biosynthesis process of amino acids; and thirdly, the error minimization theoryError minimization theory stating that there is a selection pressure minimizing the negative effects of point mutations and errors in translationTranslation were the main factor of code development. These theories are not contradictory and are also in line with the frozen accident hypothesis, such as the idea that the standard code may not have any special properties, but is simply determined by the fact that all existing life forms have a common ancestorCommon ancestor, whereby later alterations to the code are generally excluded by the detrimental effect of codonCodon reallocation. The mathematical examination of the structure and potential evolutionary trajectories of the code reveals that the code is highly resistant to translationTranslation errors, although there are numerous more resistant codes, suggesting that the standard code could emerge from a random code through a short sequence of rearrangements of series of codonsCodon. A large proportion of the evolutionEvolution leading to the standard code seems to be a mixture of a frozen collision with a selection for error minimization, even though it cannot be excluded that the code co-evolves along with metabolic pathways due to weak affinities between amino acidsAmino acids and nucleotide triplets. These scenarios for code evolution, nonetheless, are founded on formal patterns with uncertain relevance to real primordial evolutionEvolution. A true comprehension of code origins and developments is probably only possible in connection with a plausible script for the development of the coding scheme and the translationTranslation tool itself.

Amino Acid Stabilization of Nucleic Acid Secondary Structure: Insights from Single Molecule Studies

Article

Oct 2018

This paper addresses the intriguing speculation that amino acid-nucleic acid interactions may have played a role in the evolutionary development of protein-based life from an early “RNA Universe.” To explore the possible impact of single amino acids in promoting nucleic acid folding, single-molecule Förster resonance energy transfer (smFRET) experiments have been implemented with a DNA hairpin construct (7 nucleotide double strand with 40A loop) as a simple model for secondary structure formation. Exposure to positively charged amino acids (arginine and lysine) is found to clearly stabilize secondary structure. Kinetically, each amino acid promotes folding by generating a large increase in the folding rate with little change in the unfolding rate. From van’t Hoff and Arrhenius analysis of the equilibrium and rate constants as a function of temperature, arginine and lysine are found to significantly increase the overall exothermicity of folding while imposing only a small entropic penalty on the folding process. Detailed investigations into the kinetics and thermodynamics of this amino acid-induced folding stability reveal arginine and lysine to interact with nucleic acids in a manner similar to that of monovalent cations. Specifically, these results are interpreted in the context of an ion atmosphere surrounding the nucleic acid, in which amino acids stabilize folding qualitatively like small monovalent cations, but also with kinetic signatures reflecting the side chain composition.

Activities at the Universal Protein Resource (UniProt)

Article

Full-text available

Jun 2014

The authors Rabie Saidi and Tunca Dogan were omitted from the list of the UniProt consortium in the acknowledgements section of this paper. The corrected consortium list is provided below. The UniProt Consortium UniProt has been prepared by Rolf Apweiler, Alex Bateman, Maria Jesus Martin, Claire O'Donovan, Michele Magrane, Yasmin Alam–Faruque, Emanuele Alpi, Ricardo Antunes, Joanna Arganiska, Elisabet Barrera Casanova, Benoit Bely, Mark Bingley, Carlos Bonilla, Ramona Britto, Borisas Bursteinas, Wei Mun Chan, Gayatri Chavali, Elena Cibrian–Uhalte, Alan Da Silva, Maurizio De Giorgi, Tunca Dogan, Francesco Fazzini, Paul Gane, Leyla Garcia Castro, Penelope Garmiri, Emma Hatton–Ellis, Reija Hieta, Rachael Huntley, Duncan Legge, Wudong Liu, Jie Luo, Alistair MacDougall, Prudence Mutowo, Andrew Nightingale, Sandra Orchard, Klemens Pichler, Diego Poggioli, Sangya Pundir, Luis Pureza, Guoying Qi, Steven Rosanoff, Rabie Saidi, Tony Sawford, Aleksandra Shypitsyna, Edward Turner, Vladimir Volynkin, Tony Wardell, Xavier Watkins, Hermann Zellner, Matt Corbett, Mike Donnelly, Pieter van Rensburg, Mickael Goujon, Hamish McWilliam and Rodrigo Lopez at the European Bioinformatics Institute (EMBL–EBI); Ioannis Xenarios, Lydie Bougueleret, Alan Bridge, Sylvain Poux, Nicole Redaschi, Lucila Aimo, Andrea Auchincloss, Kristian Axelsen, Parit Bansal, Delphine Baratin, Pierre–Alain Binz, Marie–Claude Blatter, Brigitte Boeckmann, Jerven Bolleman, Emmanuel Boutet, Lionel Breuza, Cristina Casal–Casas, Edouard de Castro, Lorenzo Cerutti, Elisabeth Coudert, Beatrice Cuche, Mikael Doche, Dolnide Dornevil, Severine Duvaud, Anne Estreicher, Livia Famiglietti, Marc Feuermann, Elisabeth Gasteiger, Sebastien Gehant, Vivienne Gerritsen, Arnaud Gos, Nadine Gruaz–Gumowski, Ursula Hinz, Chantal Hulo, Janet James, Florence Jungo, Guillaume Keller, Vicente Lara, Philippe Lemercier, Jocelyne Lew, Damien Lieberherr, Thierry Lombardot, Xavier Martin, Patrick Masson, Anne Morgat, Teresa Neto, Salvo Paesano, Ivo Pedruzzi, Sandrine Pilbout, Monica Pozzato, Manuela Pruess, Catherine Rivoire, Bernd Roechert, Michel Schneider, Christian Sigrist, Karin Sonesson, Sylvie Staehli, Andre Stutz, Shyamala Sundaram, Michael Tognolli, Laure Verbregue and Anne–Lise Veuthey at the SIB Swiss Institute of Bioinformatics (SIB); Cathy H. Wu, Cecilia N. Arighi, Leslie Arminski, Chuming Chen, Yongxing Chen, John S. Garavelli, Hongzhan Huang, Kati Laiho, Peter McGarvey, Darren A. Natale, Baris E. Suzek, C. R. Vinayaka, Qinghua Wang, Yuqi Wang, Lai–Su Yeh, Meher Shruti Yerramalla and Jian Zhang at the Protein Information Resource (PIR).

Principles of self-organization in biological pathways: A hypothesis on the autogenous association of alpha-synuclein

Article

Full-text available

Dec 2013
NUCLEIC ACIDS RES

Previous evidence indicates that a number of proteins are able to interact with cognate mRNAs. These autogenous associations represent important regulatory mechanisms that control gene expression at the translational level. Using the catRAPID approach to predict the propensity of proteins to bind to RNA, we investigated the occurrence of autogenous associations in the human proteome. Our algorithm correctly identified binding sites in well-known cases such as thymidylate synthase, tumor suppressor P53, synaptotagmin-1, serine/ariginine-rich splicing factor 2, heat shock 70 kDa, ribonucleic particle-specific U1A and ribosomal protein S13. In addition, we found that several other proteins are able to bind to their own mRNAs. A large-scale analysis of biological pathways revealed that aggregation-prone and structurally disordered proteins have the highest propensity to interact with cognate RNAs. These findings are substantiated by experimental evidence on amyloidogenic proteins such as TAR DNA-binding protein 43 and fragile X mental retardation protein. Among the amyloidogenic proteins, we predicted that Parkinson's disease-related α-synuclein is highly prone to interact with cognate transcripts, which suggests the existence of RNA-dependent factors in its function and dysfunction. Indeed, as aggregation is intrinsically concentration dependent, it is possible that autogenous interactions play a crucial role in controlling protein homeostasis.

Proteome-wide analysis reveals clues of complementary interactions between mRNAs and their cognate proteins as the physicochemical foundation of the genetic code

Article

Full-text available

Aug 2013
RNA BIOL

Despite more than 50 years of effort, the origin of the genetic code remains enigmatic. Among different theories, the stereochemical hypothesis suggests that the code evolved as a consequence of direct interactions between amino acids and appropriate bases. If indeed true, such physicochemical foundation of the mRNA/protein relationship could also potentially lead to novel principles of protein-mRNA interactions in general. Inspired by this promise, we have recently explored the connection between the physicochemical properties of mRNAs and their cognate proteins at the proteome level. Using experimentally and computationally derived measures of solubility of amino acids in aqueous solutions of pyrimidine analogs together with knowledge-based interaction preferences of amino acids for different nucleobases, we have revealed a statistically significant matching between the composition of mRNA coding sequences and the base-binding preferences of their cognate protein sequences. Our findings provide strong support for the stereochemical hypothesis of genetic code's origin and suggest the possibility of direct complementary interactions between mRNAs and cognate proteins even in present-day cells.

Evidence of direct complementary interactions between messenger RNAs and their cognate proteins

Article

Full-text available

Jul 2013
NUCLEIC ACIDS RES

Recently, the ability to interact with messenger RNA (mRNA) has been reported for a number of known RNA-binding proteins, but surprisingly also for different proteins without recognizable RNA binding domains including several transcription factors and metabolic enzymes. Moreover, direct binding to cognate mRNAs has been detected for multiple proteins, thus creating a strong impetus to search for functional significance and basic physico-chemical principles behind such interactions. Here, we derive interaction preferences between amino acids and RNA bases by analyzing binding interfaces in the known 3D structures of protein–RNA complexes. By applying this tool to human proteome, we reveal statistically significant matching between the composition of mRNA sequences and base-binding preferences of protein sequences they code for. For example, purine density profiles of mRNA sequences mirror guanine affinity profiles of cognate protein sequences with quantitative accuracy (median Pearson correlation coefficient R = −0.80 across the entire human proteome). Notably, statistically significant anti-matching is seen only in the case of adenine. Our results provide strong evidence for the stereo-chemical foundation of the genetic code and suggest that mRNAs and cognate proteins may in general be directly complementary to each other and associate, especially if unstructured.

Update on activities at the Universal Protein Resource (UniProt) in 2013

Article

Jan 2013

The evolution of the genetic code

Article

Jan 1967

C. Woese

The fundamental nature of the genetic code: prebiotic interactions between polynucleotides and polyamino acids or their derivatives

Article

Jan 1967

C.R. Woese

STABILITY OF FLUIDS WITH MORE THAN 2 COMPONENTS .2. DEMIXING TERNARY MIXTURES OF NONADDITIVE HARD-SPHERES

Article

Feb 1995

Domenico Gazzillo

An integral equation approach is used to investigate the stability limits of ternary fluid mixtures made up of hard spheres with non-additive diameters. With this simple model of non-ideal solutions we present a first application of the general theory, developed in the first paper of this series, for the phase stability of multicomponent systems. The spinodal boundary is localized by searching for singularities of a new function, which generalizes the Bhatia-Thornton concentration-concentration structure factor to mixtures with more than two components: the divergence of its long wavelength limit (i.e., S-CC((3))(k=0) in our case) signals phase instability For the ternary fluids considered, we test two 'closures' for the Ornstein-Zernike integral equations: the Ballone-Pastore-Galli-Gazzillo approximation, and the simplest multicomponent version of the Verlet closure, originally proposed for one-component hard spheres. The relevant integral equation results for thermodynamics and structure are compared successfully with recent molecular dynamics simulations. A comparison is also made with one-fluid first-order perturbation theories. Regarding phase stability, the present integral equation study confirms the molecular dynamics analysis qualitatively: at high densities the considered ternary mixtures of non-additive hard spheres become unstable and exhibit a demixing of purely entropic origin.

On the evolution of the genetic code

Article