ArticlePDF Available

Amino acid side chain parameters for correlation studies in biology and pharmacology

Authors:

Abstract

Fifteen physicochemical descriptors of side chains of the 20 natural and of 26 non-coded amino acids are compiled and simple methods for their evaluation described. The relevance of these parameters to account for hydrophobic, steric, and electric properties of the side chains is assessed and their intercorrelation analyzed. It is shown that three principal components, one steric, one bulk, and one electric (electronic), account for 66% of the total variance in the available set. These parameters may prove to be useful for correlation studies in series of bioactive peptide analogues.
Int.
J.
Peptide Protein Res.
32,
1988,
269-278
Amino acid side chain parameters for correlation studies in biology and
pharmacology
JEAN-LUC FAUCHERE’, MARVIN CHARTON’, LEMONT
B.
KIER’, ARIE VERLOOP4 and
VLADIMIR PLISKA’
‘Department
of
Biotechnology, Swiss Federal Institute
of
Technology, ETH, Zurich, Switzerland;
2Departnient
of
Chemistry, Pralt Institute, Brooklyn, New York; ’Department
of
Medicinal Chemistry,
Virginia Commonwealth University, Richmond, Virginia,
USA;
414/2
A
R Naarden, The Netherland.7;
’Institute of Animal Science, Swiss Federal Institute of Technology, ETH, Zurich, Switzerland
Received 25 February, accepted for publication April 1988
Fifteen physicochemical descriptors of side chains of the 20 natural and of 26
non-coded amino acids are compiled and simple methods for their evaluation de-
scribed. The relevance of these parameters
to
account for hydrophobic, steric, and
electric properties
of
the
side chains is assessed and their intercorrelation analyzed. It
is shown that three principal components, one steric, one
bulk,
and one electric
(electronic), account for 66% of the total variance
in
the available set. These par-
ameters may prove
to
be useful for correlation studies in series
of
bioactive peptide
analogues.
Key
words:
amino acid side chain parameters; LFER parameters; QSAR in peptides; QSAR
parameters
One
of
the main limitations
of
correlation
studies for bioactive peptides is the lack
of
reliable physicochemical amino acid side
chain descriptors. This is mainly due to the
difficulty in selecting those features which
control the peptide-receptor interactions, and
in finding conditions under which each
property can be individually measured.
In previous work, Sneath (1) quantitatively
evaluated the similarity and dissimilarity
of
the
20
natural amino acids. He reasoned that
small structural changes should bring about
Amino acids are abbreviated according
to
the recom-
mendations
of
the
IUPAC-IUB Joint Commission on
Biochemical Nomenclature
(32).
All
amino acids have
the
L-configuration. QSAR, quantitative structure-
activity
relationships; LFER, linear free energy related;
n.m.r., nuclear magnetic resonance.
small changes in the biological activity, and
concluded on examples in the oxytocin and
angiotensin
I1
series that the use of correla-
tions between biological activity and these
factors would be “better than chance,”
although not very reliable for predictive pur-
poses. Neither the side chain properties
defined in this work nor the extracted princi-
pal components
-
aliphaticity, hydrogena-
tion, aromaticity, and hydroxythiolation
-
are convenient for use in
QSAR
studies, since
they are not continuous parameters. Darvas
et
al.
(2),
stated that conventional
LFER
par-
ameters
(3)
alone cannot satisfactorily de-
scribe peptide-peptide or peptide-protein in-
teractions, and introduced “peptide-tailored’’
physicochemical descriptors including a
summation parameter supposed to represent
nonspecific (“aspecific”
(2))
side chain in-
269
J.-L.
Fauchire
et
al.
teractions which originate in hydrogen
bonds, electrostatic and charge Lransfer
effects. Since the summation index contains
these effects as a whole, identification of the
individual contributions to biological activity
is no longer possible.
Recently, Kidera
et
al.
(4)
analyzed
188
properties of the naturally occurring amino
acids in a remarkable work focused mainly on
the prediction of the three-dimensional struc-
ture of proteins. In addition to bulk and
hydrophobicity, these authors identified
p-
structure preference and r-helix or bend
structure preference as the two main re-
presentative factors. These two factors, al-
though of paramount importance in protein
folding, are of stochastic nature and cannot
be established by direct measurements. For
the biological activity of peptide drugs, they
can hardly be
of
greater relevance than fea-
tures such as charge, aromaticity, or presence
of hydrogen bond donors
or
acceptors.
Finally, this study cannot be easily extended
to non-natural synthetic amino acids current-
ly
used in peptide drug design.
Several examples have demonstrated the
usefulness of
QSAR
studies of bioactive pep-
tides in order to identify the factors effective
in binding or proteolysis and to predict more
potent, more stable,
or
more selective
analogues
(5-8).
One convincing example is
the study by Hellberg
et
al.
(9)
of bradykinin
potentiating peptides in which correlations
derived from a small number of derivatives
modeled and predicted the activity
of
a large
series of analogues. In the preceding studies,
no consistent set of side chain parameters was
used, each laboratory relying upon its own
developed or measured descriptors for cor-
relation or principal components analyses.
The aims of this communication were
therefore:
(1) to establish
a
list of selected hydro-
phobic, steric, electronic, and other par-
ameters for amino acid side chains;
(2)
to extend the list by our own measure-
ments or computation to a number of un-
natural synthetic amino acids;
(3)
to indicate simple methods for the
measurement or predictive calculation of the
descriptors from the chemical structure of
new side chains;
(4)
to establish the degree of separation or
intercorrelation of the descriptors for the re-
ported values;
(5)
to identify principal components
among the side chain properties described by
the parameters.
METHODS
The initial set of structural parameters was
generally obtained experimentally by direct
measurements of the given property
of
the
amino acid or derivative. This was the case
for the hydrophobic constant, the polarizabil-
ity, the pK,
of
the corresponding carboxylic
acid, and for all the steric constants derived
from Taft's constants.
The n.m.r. chemical shift of the C,-carbon
of several amino acids was measured here for
the first time. The dH,.-values were obtained
with the free amino acid in neutral
D20
at
20"
on a Varian
XL300
spectrometer (Prof.
J.F.
Oth, ETH Zurich) with lock on
D20
under
proton decoupled conditions and elimination
of the Overhauser effect. The reference was
the sodium salt
of
2,2-dimethyl-2-silapen-
tane-5-sulfonate.
Other parameters, such
as
ureg
and
u,,
could
be measured on molecular CPK models
(10).
Finally, other ones were theoretically derived,
such as the graph shape index.
Constants for new amino acid side chains
can generally be calculated by empirical rules
or obtained from correlations with various
molecular features. Details are given here for
each individual case.
Correlation analysis and search for princi-
pal components was performed by programs
of the
BMDP
library
(1
1).
RESULTS
AND
DISCUSSION
Hydrophobicity
n
The Ir-values (Tables
1
and
2)
express the
hydrophobicity of the amino acid side chain
according to the equation:
Ir(side chain)
=
log P(amino acid)
-
log P(glycine)
270
TABLE
I
Side chain paramerers /16-paramrter
set)
for
narural amino acid side chains (except proline,
n
=
19)
h
Amino
Kd
I
UC
%%d
L' B,' BSg
a
uv'
bHCJ
6,'
nHI nnm i,"
iAO
~K,(RCOOH)~
-b
acid
Ala
0.31 1.28 0.52 0.53 2.87 1.52 2.04 0.046
1.0
7.3 -0.01
0 0 0
0
4.76
Arg
-1.01 2.34 0.68 0.69 7.82 1.52 6.24 0.291 6.13 11.1 0.04 4 3
1
0
4.30
Asn
-0.60 1.60 0.76 0.58 4.58 1.52 4.37 0.134 2.95 8.0 0.06 2 3
0
0
3.64
ASP
-0.77 1.60 0.76 0.59 4.74 1.52 3.78 0.105 2.78 9.2 0.15
1
401 5.69
CYS
1.54 1.77 0.62 0.66 4.47 1.52 3.41 0.128 2.43 14.4 0.12
0 0
0
0
3.67
Gln
-0.22 1.56 0.68 0.71 6.11 1.52 3.53
0.180
3.95 10.6 0.05 2 3
0 0
4.54
Glu
-0.64 1.56 0.68 0.72 5.97 1.52 3.31 0.151 3.78
11.4
0.07 1 401 5.48
GlY
0.00
0.00
0.00 0.00
2.06
1.00
1.00
0.00
0.00
0.00
0.00
0
0 0 0
3.77
His
0.13 2.99 0.70 0.64 5.23 1.52 5.66 0.230 4.66 10.2 0.08
1
1
10 2.84
Ile
1.80 4.19 1.02 0.96 4.92 1,90 3.49 0.186 4.00 16.1
-0.01
0 0
0
0
4.81
Leu
1.70 2.59 0.98 0.92 4.92 1.52 4.45 0.186
4.00
10.1 -0.01
0
0
0
0
4.79
LY
s
-0.99 1.89 0.68 0.78 6.89 1.52 4.87 0.219 4.77 10.9
0.00
2
I
10
4.27
Met
1.23 2.35 0.78 0.77 6.36 1.52 4.80 0.221 4.43 10.4 0.04
0
0
0 0
4.25
Phe
1.79 2.94 0.70 0.71 4.62 1.52 6.02 0.290 5.89 13.9 0.03
0 0 0
0
4.31
Pro
(0.72) 2.67
-
-
(4.11) (1.52) (4.31)
~
(2.72) 17.8
-
0000
-
Ser
-0.04 1.31 0.53 0.55 3.97 1.52 2.70 0.062 1.60 13.1 0.11
1
200 3.83
Thr
0.26 3.03 0.50 0.63 4.11 1.73 3.17
0.108
2.60 16.7 0.04
1
200 3.87
000
4.75
TrP
TY
r
0.96 2.94 0.70 0.71 4.73 1.52 6.72 0.298 6.47 13.9 0.03
1
200 4.30
Val
1.22 3.67 0.76 0.89 4.11 1.90 3.17 0.140 3.0 17.2 0.01
0
0
0
0
4.86
2.25 3.21 0.70
0.84
7.68 1.52 5.90 0.409 8.08 13.2
0.00
1
Hydrophobicity.
Upsilon steric parameter.
Smoothed upsilon steric parameter.
bGraph shape index.
'.':gSTERIMOL length, maximum, and minimum width, respectively. Torsion angles are
0"
except
for
Phe and Tyr where the torsion angle between phenyl and the
adjacent
group
(CH,,NH) is
90'.
'Normalized van der Waals volume.
Polarizability.
N.m.r. chemical shift
of
alpha-carbon.
Localized electrical effect.
Number
of full
nonbonding orbitals.
'
Number
of
hydrogen bond donors.
,I
Indicator
of
presence or absence
of
positive charge in side chain.
"
Indicaio:
of
presence
or
absence
of
negative charge in side chain.
"
-
Log
of
dissociation constant
of
carboxylated side chain.
"Torsion angle is
90".
Id
2
9
a
E
ui
J.-L.
Fauchkre
et
al.
3
3
3
rc.
3
s
0
3
3
-
d
2
d
3
d
0 0
-
0
N
0
3
-1
m
N
hi
3
3
c
-
o
o
3
c
o
CI
ri
o
o
o
o o o
N
o o o
o
-
o o
-
Amino acid side chain paranietcrs
values that are not attainable by the classical
ester hydrolysis model
(1
8).
Values of
=
are
given
in
Tables
1
and
2
and the corresponding
values for new amino acid side chains can
easily be calculated by methods described
in
(19, 20).
=
(and the corresponding calculated
Es-values) appears as a valuable steric par-
ameter since it
is
a measure of the directed
spatial influence of the group,
it
is indepen-
dent of all electrical and solution effects, and
it
can be calculated for all types of sub-
stituents.
in which
P
is
the partition coefficient of the
amino acid and of glycine in octanol/water
(1
2).
The fundamental n-values for natural
amino acids (Table
I)
have been obtained
at
pH
7.1
with amino acid derivatives protected
and thus uncharged in the backbone residue
-
the N,-acetyl-amino-acid amides
(1
3)
-
by
partitioning in octanol/water and by using a
similar equation for evaluation:
n(side chain)
=
logP(Ac-amino acid-NH,)
In spite
of
some controversial acceptance
of
this scale
(1
4)
the authors consider it highly
reliable because
P
was directly estimated in
octanol/water and because of the pertinence
of the derivatives and physiologically relevant
pH used
in
measurements. The scale was used
for the determination of atomic solvation
parameters in proteins
(15).
The other
n-
values (Table
2)
have been determined by
thin-layer chromatography and the Rf-values
converted to n-values in octanol/water, by a
method described earlier
(1
2).
All values re-
ported here have been determined exper-
imentally. New values can be obtained by
thin-layer chromatography
(12)
as far as the
L-form
of
the amino acid is available, or by
calculation from the structure using either the
fragment contributions without correcting
factors as estimated in
(12),
or those of
Hansch
&
Leo
(16)
with the appropriate cor-
recting factors.
Gruph shape
index
=
This parameter
(I
7)
is a measure of the steric
influence of a group which encodes the three
attributes: complexity, branching, and sym-
metry of the group. It can be directly calcu-
lated from the molecular graph structure of
the substituent, e.g.
of
the amino acid side
chain. The index
=
is free of inductive,
reasonance, or solvation effects. Although
=
is theoretically derived, it was found to cor-
relate with the Taft’s substituent index Es,
according to following equation:
-
ES
=
0.40
=
-
0.60
which allows one to predict additional Taft’s
Up.yilon
steric
paramt.ter
v
This steric parameter
1)
(21)
was derived from
the Taft’s constant
Es
and expressed
as
a
function of the minimal van der Waals radius.
Values are available for a large number
of
substituents (21) and in particular for amino
acid side chains
(22).
Unlike the original
Taft’s constant,
v
is
expressed
in
angstriims.
In some cases it was necessary for its deriva-
tion to use effective values ofu obtained either
from correlations of rate constants for acid
catalyzed ester hydrolysis or from estimation
equations. However, the upsilon parameter.
since it is based on the Bondi/van der Waals
radii, can be held as a most reliable measure
of
the steric effect.
The values of reported in the next
column of Tables
1
and
2,
are directly related.
although not identical, to those of upsilon.
A
tight correlation was observed between
I)
and
the minimal projection surface of the side
chain (or of any substituent) taken perpen-
dicular to the
C,-C..
bond
(23).
Using the
parabolic correlation obtained (with at
least
51 groups) and taking the ii,,,-values on the
regression curve, a new set of smoothed steric
parameters was obtained.
In
this set, a few
unexplainable discrepancies are eliminated
as, for example, the higher value
in
the orig-
inal met for the side chain of serine
(ti
=
0.53)
compared to threonine
(o
=
0.50).
New
values are attainable by estimating the projec-
tion surface of the relevant CPK molecular
models of the uncommon side chain and
using the same correlation equation
(23).
This
set of ti,,,-values, which essentially describes
(as do Es and
o)
the steric effect as seen from
the reaction center
in
the model compounds,
273
J.-L.
Fauchere
et
a/.
has also proven useful in a number of correla-
tion studies.
STERIMOL
multidimensional steric
parameters
L,
B,,
B,
The STERIMOL constants characterize the
steric bulk of a substituent by its dimensions
in three different directions
in
space (24). We
use here the revised version of these pa-
rameters, which contains the three quantities
L,
B,,
and
B,
(25).
L
represents the length of
the side chain measured in the direction in
which
it
is
attached
to
the glycine backbone.
and
B,
and
B5
are the minimum and
maximum width, respectively. of the side
chain, measured
in
directions perpendicular
to
L.
The parameters are calculated by a
computer software package directly from the
structure of the side chain. The STERIMOL
constants which have been shown to be useful
in a number of
QSAR
studies are likely to
help to investigate structure-activity relation-
ships in peptides, too, especially
in
the cases
where more than one side chain descriptor is
required for steric bulk. Since they are easily
derived from structure by calculation, their
value can be predicted for new amino acid
side chains even prior to synthesis.
Polarizability
a
The polarizability
SI
is related to the molar
refractivity
MR,
which in turn is experiment-
ally given by:
MR
=
(M/d)(n'
-
l)](n'
+
2)
(n, index of refraction; M, molecular weight;
d, density). Since
MR
is an additive-
constitutive property of a molecule,
it
can be
easily calculated for any substitutent. From
tabulated values of MR for common groups
(16)
and by the equation:
a
=
(3/(4.rrN))(M/d)(n2
-
l)((n'
+
2)
the polarizabilities a can be obtained (cf. also
(26))
for
known as well as for new amino acid
side chains (Tables
1
and 2). Clearly
2
is a
function of the molecular volume Mid. and
thus
a
bulk parameter which models disper-
sion forces. The a-values have been scaled to
make the coefficient
in
the regression equa-
tion roughly comparable to those obtained
for other parameters. New values are easily
obtained by simple arithmetic.
Normalized van deer Uhals volume
uy
This additional bulk parameter is the van der
Waals volume of the amino acid side chain
normalized according to the following equa-
tion (23):
u,
(side chain)
=
[V(side chain)
where
V
is the measured van der Waals
volume on CPK models for the side chain or
the hydrogen atom. respectively. From this,
uL
=
1
for the side chain of alanine and in-
creases by one unit
for
each additional CH,-
group. Side chains
of
amino acids such as
neopentylglycine and adamantylalanine,
which are characterized by very similar
u-
values, are distinct when described by
vv.
This parameter is easily measurable on CPK
models for known as well as for new, not yet
synthesi~ed amino acids.
As
a bulk pa-
rameter.
it
models dispersion forces and
is
highly correlated to the polarizability
cr.
N.m.r. chemical
.rhifi
of
a-carbon
6Hc
The n.m.r. chemical shift 6H,
(H,
magnetic
field strength;
6H,
chemical shift; 6Hc
I3C-
chemical shift) of the alpha-carbon in amino
acids has been proposed
(27)
as a descriptor
of the electronic properties of the side chain,
When expressed
in
ppm from the "C-
chemical shift
of
the y-carbons of glycine, it
can be considered a pure substituent par-
ameter.
As
a matter of fact, this parameter
primarily reflects the shielding of the C,-
nucleus by the nearby electronic systems of
the side chain and thus incorporates the
classical inductive and mesomeric effects
of
the substituent (side chain). However,
6Hc
is
not free of steric and hydrophobic contribu-
tions, as can be seen. for example, from a
certain level of intercorrelation with
n
and
E
(Fig.
1).
Values have been mcasured in a number of
cases for the free amino acid and for the
amino acid incorporated in a short peptide
(28).
Several newly measured values are re-
2
74
Amino acid side chain parameters
Hydrogen bonding parameters nH and
n,,
These integer parameters expressing the
number of OH and NH bonds, and the
number of full nonbonding orbitals
on
0
and
N
atoms, respectively, have been proposed
for amino acid side chains
(31).
They can be
evaluated by simple inspection of the struc-
tural formula of the substitutent. In QSAR
studies of peptides they often play the role of
indicator variables and can be of great help to
detect the implication of hydrogen bonds in
single side chains among large series of non
hydrogen bonding side chains.
P
>
99.9%
P
>
99%
c
P
>
95%
a
1'1
2e
r
.b
Q
5
?.a
3
.=
<
.-
4
,.
FIGURE
1
Significance of
the
linear correlation coefficients
(43
degrees of freedom)
ported here for the first time. The parameter
is lower by
1.5
f
1.1
ppm when the side
chain is contained in a peptide. For new
amino acids, dH, can be calculated using the
empirical rules of Horsley
et
al.
(29).
We have
tested these rules and observed that in their
present form, they do not even permit the
calculation of the chemical shift for all
natural amino acids. However they can be
applied to new side chains according to the
same scheme, using the additivity of the frag-
ment contributions to dH,. The constant dH,
has been successfully employed in several
QSAR
studies of bioactive peptides
(8,
27).
Localized electrical efSect parameter
g,
This constant has been clearly defined and
appropriately scaled
(30)
for any given sub-
stituent, and obtained for a number of amino
acid side chains
(26).
Values for several not
common side chains are reported here for the
first time (Tables
1
and
2).
The constant re-
presents mainly inductive field effects and is
well separated from delocalized resonance
contributions.
The pK,'s of the carboxylic acids R-
COOH,
in which
R
is an amino acid side
chain, are also compiled for natural side
chains in Table
1.
However, in contrast to
0,
,
the pK, reflects overlapping localized and
delocalized electrical effects.
Charge parameters
iB
and
i,
The presence or absence of charges in amino
acid side chains can be accounted for by the
parameters
i,
and i, for basic (negatively
charged) and acidic (positively charged)
groups, respectively. The parameter takes the
value
1
or
0
depending on whether such a
charge is present or not, but neglecting the
fact that ionization may be incomplete
at
physiologically relevant pH.
Principal component analysis
The data contained in Tables
1
and
2
describe
properties of
45
amino acid side chains by
means of 15 measurable parameters. The
selection of these parameters is arbitrary and
largely dictated by their availability.
Therefore, both redundancy and missing
properties cannot be fully excluded. We have
investigated the matrix of correlation
of
the
parameters (Fig. 1) and found a high level of
significance
of
the correlation coefficients
between certain pairs of parameters. Since
it
can be anticipated that no more than three to
four distinct properties such as hydrophobic-
ity, steric bulk, and electronic features are
expressed by this 15-parameter set, we have
investigated the system by principal com-
ponent analysis. For the particular choice of
45
side chains,
initial
factor extraction
showed that four factors were necessary to
explain
75%
of the total variance. Ortho-
gonal factor rotation and sorting out of the
factor loadings (those less than
0.25
being set
to zero) led to the pattern given in Fig.
2.
Three factors were retained and tentatively
interpreted as side chain properties. The first
275
J.-L.
Fauchere
rt
crl.
1
3
I
i
I
I
I
I
J
I
I
1
1
I
I
I
I
I
I
factor
1
factor
2
factor
3
FIGURE
2
Compositior?
of
the
tirsr
three
principal
compont.rits:
factor
loading
matrix.
The
factors explain
66%
of
the
total
tariance.
Loadings
less
thxn
0.25
IiLi\c
bcen
omitted
I
1
factor was clearly related to the volume of the
side chain, since its loadings were high for the
polarizability
r,
the van der Waals volume
ox.
and the two STERIMOL parameters
L
and
B,.
An apparent inconsistancy in this respect
was
the contribution of
I)~~~,
a parameter
which should not be primarily related
to
steric bulk. However. this contribution
(0.27)
to
the first factor was relatively small and
near
to
the threshold value for rejection. The
second factor, again, had steric character and
was clearly of the Taft type. These steric par-
ameters are vector quantities with both ab-
solute value and direction. They are likely to
be proportional to the projection surface of
the group perpendicular
to
the glycine (or
peptide) backbone, as are
its
constitutive
steric parameters
11.
urCg.
I
(Es).
and
B,.
Factor
3
appeared
to
be related to electronic
properties
as
given by the number of
non-
bonding Tc-orbitals. the number
of
possible
hydrogen bonds and the delocalized electrical
(inductive) effect
0,
and the presence
of
charges.
Most interesting \vas the fact that constants
-
276
~c
and
6Hc
were alniost evenly loaded in
factors
1.
2,
and
3
(n
;IS
a negative loading in
the third factor).
This
observation
does
not
imply that these parameters are not impor-
tant for the description of certain side chain
properties. but that they are,
due
to their
correlation with other parameters, already
sufficiently represented by the
first
three
factors,
In
conclusion. our results tend to suggest
that three factors are sufficient
to
describe
amino acid side chain properties in LFER
correlations. Since the number
of
parameters
and of side chains in\ olved in this study was
low. the factor composition cannot be con-
sidered generally valid and may vary from
one set of side chains
10
another. However, a
certain stability
\\'as
observed;
so,
for
example. omitting
0,
and/or joining
i,
and
i,
into
a
single charge paramctcr did
not
change
the structure
of
the fiic:ors considerably.
Fur-
thermore. the parameters described in this
study were closely related
to
one of the
desired properties, such as hydrophobicity,
bulkiness. or electronic configuration. There
Amino acid side chain parameters
matic side chains of Phe and Tyr
1'
were the
next to amalgamate.
A
second cluster con-
tained relatively small side chains containing
either a heteroatom or a polar bond: this
cluster 2 may be represented by either Thr or
propargylglycine. The third cluster was con-
stituted of bulky aromatic side chains con-
taining at least one heteroatom,
3,
as in
dihydroxyphenylalanine or
in
pyrazinylala-
nine. A fourth cluster
was
made
of
the three
primary amines Lys, ornithine and diamino-
butyric acid
4.
A cluster was also formed by
four fi-branched aliphatic side chains,
5,
as in
Ile or in cyclopentylglycine.
A
next cluster
contained bulky aromatic side chains, which,
in contrast to those in cluster
3,
were relative-
ly hydrophobic (cyclohexylalanine also
amalgamated to this cluster)
6;
one typical
representative would be 0-benzylserine. The
side chains of glycine and alanine, as ex-
pected, did not amalgamate in the earlier
clustering steps and behaved as singular
species. This was also the case for several
other side chains such
as
those of Bug, Asp
and Glu, or Arg. Although
a
certain stability
in clustering was observed (e.g. the amalga-
mation sequence was very similar for the sub-
series of the natural amino acids), further
work on larger series will be required to estab-
lish more significant clusters.
is, therefore, little point in employing these
factors
in
QSAR
studies of peptides at
present and the original parameters should
preferably be used in correlation analysis.
Cluster
analysis
The small number of side chains and of par-
ameters used in this study did not make it
possible to reach the ultimate goal of cluster
analysis: to order the substituents in groups in
such a way that each of member may be
considered
as
a representative of the whole
group. However, in the course of a prelimin-
ary analysis
of
the full series of available side
chains (n
=
4.9,
a few clusters clearly
appeared, while a number of miscellaneous
side chains did not fall into clear-cut groups.
The list
of
unambiguously formed clusters
and the corresponding tree diagram are given
in Fig.
3.
A first cluster
1
contained five
aliphatic side chains of the about the size of
norvaline to which the two uncharged
aro-
2
14s
6
4
FIGURE
3
Apparent clusters found by cluster analysis
of
cases
based on a
1
5-parameter set. The diagram show the order
of
amalgamation
of
individual clusters
(in
boxes) and the
connections between them. The ordinate distances cor-
respond to the Mahalanobis distances between the amal-
gamation points.
CONCLUSION
The collection of parameters compiled in
Tables
1
and
2
contains a selection of
sub-
stituent constants for amino acid side chains.
As substituent constants they reflect the
properties of the side chain more than those
of the amino acid and except for the
STERIMOL parameters and the pK,, the
reference value for glycine is zero. The list is
by no means intended to be complete, but
it
should quantitatively represent features rel-
evant for, say, investigations
of
peptide drug-
receptor interactions. The constants cover
hydrophobic, steric, electronic and hydrogen
bond donor/acceptor, and charge properties.
No
observed conformational preferences
have been included since they appear to be
consequences of the side chain properties (cf.
(26))
as is the biological activity of the corre-
277
J.-L.
Fauchere
et
ul.
sponding peptides. The parameters are either
measurable or computable by simple
methods, which we have indicated for each
particular
case.
This
Fact
can be
of
great
value
for
predictive studies.
ACKNOWLEDGMENT
This work was supported
by
the Swiss National Science
Foundation (grants
3.205-0.85
and 3.559-0.86).
REFERENCES
I.
2.
3.
Sneath. P.H. (1966)
J.
Theor(>/.
Biol.
12,
157-195
Darvas,
F. (1980)
Ah.
Phornicrc,o/.
Re<.
Prrwr.
3,
265-278
Hansch.
C.
&
Leo. A. (1979)
Suh.rririien/
Coti~runrs
/or
Corrdrrriotl
,4riul~,.\i.\
in
C/i~wiisrrI,
crud
B~ologr.
Wiley. New York
Kidera.
A,.
Konishi.
Y..
Oka.
M
.
Ooi.
T.
&
Schera-
ga.
H.A. (1985)
J.
Prorcw
Clrrni.
4,
23-55
Pliska,
V.
(1978)
Esparicwriu
34,
1190-1
192
Guillemctte, G.. Bernier.
M..
Parent. P.. Leduc.
R.
&
Escher.
E.
(1984)
J.
Med.
C/irm.
27,
315-320
Nisato.
D.,
Wagnon,
J..
Callet.
G..
Mettefeu.
D..
Assens. J.L.. Plouzane.
C..
Tonncrre. B.. Pliska.
V.
&
Fauchere. J.L. (1987)
J.
.%fed.
C/icw. 30,
2287-
229
1
Hellberg.
S..
Sj(istron1.
M.
&Wold.
S.
(19861
Am
Koltun. W.L. (1965)
Biopolrnier.c
3,
665-679
BMDP Statistical Softaare.
1983
-
Printing vrith
Additions (Dixon, W.J.. ed,). University
of
Califor-
nia Press. Berkeley
Pliska,
V..
Schmidt.
M.
&
Fauchere.
J.L.
(19x1)
J.
Ckromutogr.
216,
19-92
Fauchere. J.L.
&
Pliska.
V.
(1983)
OrropecuiJ.
Med.
Chem.
18,
369-375
Fauchere. J.L.
(1985)
TIBS
10,
268-269 and Dis-
cussion: (1986)
77BS
11,
69-70
Eisenhcrg.
D.
&
McLachlan.
A
D.
(1986)
Soritrr
4.
5.
7.
8.
6. Fauchere. J.L. (1982)
J.
.Vf~~tl.
Ch<Vl.
25,
1428-1431
9.
10.
11.
Ckt?i.
SL.(/JI~.
B40,
135-140
12.
13.
14.
15.
319,
199-203
16.
17.
I8
19.
20.
21.
-I?
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
Leo.
A.
(I
9x8) Lop P and related parameters
data-
base.
Medicinal Chemistry Project. Pomona
College. Clarcmont. ('alifornia
Kim. L.B. (1987)
Qir(iiir.
Srrrcc,.
A(.,.
Rela/.
6,
117-
I22
Taft. R.W. (19561
in
Srcric
Eflivt.c
in
Orgunic
Che-
nii,srrj,
(Neaman.
M.S..
ed.).
p.
556. Wiley,
New
York
Kicr, L.B. (1986)
Jugo.
J.
Pliurrn.
36,
171-188
Kier.
L.B. (1987)
Medic.inu/
Res.
Rev.
7,
417-470
Charton,
M.
(1977) In
Dcsign
of
Biophurmaceutical
Pt'(J/JW/;<i.S
rhrough
Prr~c/rugs
und
Anulogs
(Roche,
E.B.. ed.).
pp,
228-280. American Pharmaceutical
Association. Washington.
D.C.
Charton. M. (1981)
J.
T/iwer.
Bid.
91,
115-123
Fauchere. L.J. (19x4)
in
QSAR
in
Design
ofBioac-
rrw
C'ompoiintk
(Kuchar.
M.,
ed.),
pp.
135-144,
Prous. Barcelona
Verloop.
A,,
Hoogenstraaten. W.
&
Tipker,
J.
(1976)
in
Drug
Design
(Ariens,
E.J.,
ed.), vol. 7,
pp.
165-207.
Academic Prcss.
New
York
Verloop.
A.
(1983)
in
IC'PAC,
Pc*sricide Chemisrry
(Milamoto.
J.
&
Kearney. P.C., eds.),
vol.
I,
pp.
339-344. Pergainon. Oxford
Charton. M.
&
Charton. B.I. (1983)
J.
Throret.
Fauchere.
J.L.
&
L.nuterwein.
J.
(1985)
Qucmr.
Srrircr.
.Acr.
Re/.
4,
11-I!
Wiithrich.
K.
(1976)
.\.MR
in
Biological
Research:
Peprit/r~.s
mid
Prorcinv.
pp. 170-179, North Holland,
Amsterdam
FIorsley.
W.. Sternlicht.
H.
&
Cohen, J.S. (1970)
J.
,4m
Cheni.
Soc.
92,
680-685
Charton.
M.
(1981)
Progr.
Phjs.
Org. Chem.
13,
119-251
Charton.
M.
&
Charton.
B.I.
(1982)
J.
Theorer.
Bud.
99,
629-644
IUPAC-IUB Joint Commission on Biochemical
Nomenclature (1984)
European
J.
Biochem.
138,
9-37
BIO/.
102.
121-134
Address:
Dr.
J.L.
FuuchPre
Institut
fur
Biotechnologic
ETH Htinggerberg
CH
8093 Zurich
Sit
itzerland
278
... The biophysical properties of proteins as they relate to thermal stability have been well studied (5)(6)(7). One established trend is that more stable proteins have a greater proportion of smaller, less-branched aliphatic sidechains versus larger, more-branched ones (8,9), allowing for tighter packing within the hydrophobic core. A quantitative measure of this side-chain branching was developed by Fauchere et al. and termed "graph-shape index" (GSI) (8). ...
... One established trend is that more stable proteins have a greater proportion of smaller, less-branched aliphatic sidechains versus larger, more-branched ones (8,9), allowing for tighter packing within the hydrophobic core. A quantitative measure of this side-chain branching was developed by Fauchere et al. and termed "graph-shape index" (GSI) (8). More stable proteins also tend to have more salt bridges, highlighting the relevance of side-chain charge and of the covariance of the charges present at interacting positions in the sequence (9)(10)(11)(12). ...
Article
Full-text available
Enhancing protein thermal stability is important for biomedical and industrial applications as well as in the research laboratory. Here, we describe a simple machine-learning method which identifies amino acid substitutions that contribute to thermal stability based on comparison of the amino acid sequences of homologous proteins derived from bacteria that grow at different temperatures. A key feature of the method is that it compares the sequences based not simply on the amino acid identity, but rather on the structural and physicochemical properties of the side chain. The method accurately identified stabilizing substitutions in three well-studied systems and was validated prospectively by experimentally testing predicted stabilizing substitutions in a polyamine oxidase. In each case, the method outperformed the widely used bioinformatic consensus approach. The method can also provide insight into fundamental aspects of protein structure, for example, by identifying how many sequence positions in a given protein are relevant to temperature adaptation.
... The data indicated that the three peptides are C-terminally α-amidated. The calculated physicochemical properties of TtAP-1, TtAP-2 and TtAP-3 are shown in Table 3. Mean hydrophobicity was calculated using the hydrophobicity scale of Fauchere et al. [26]. Hydrophobic moment [27], a measure of the amphipathicity of an α-helix, was calculated using the HeliQuest web server [28]. ...
... The data indicate rapid and effective antimicrobial action with the killing of >99.9% of both microorganisms within 15 min ( Figure 3). Mean hydrophobicity was calculated using the hydrophobicity scale of Fauchere et al. [26]. Hydrophobic moment [27], a measure of the amphipathicity of an α-helix, was calculated using the Hel-iQuest web server [28]. ...
Article
Full-text available
Envenomation by the Trinidad thick-tailed scorpion Tityus trinitatis may result in fatal myocarditis and there is a high incidence of acute pancreatitis among survivors. Peptidomic analysis (reversed-phase HPLC followed by MALDI-TOF mass spectrometry and automated Edman degradation) of T. trinitatis venom led to the isolation and characterization of three peptides with antimicrobial activity. Their primary structures were established asTtAP-1 (FLGSLFSIGSKLLPGVFKLFSRKKQ.NH2), TtAP-2 (IFGMIPGLIGGLISAFK.NH2) and TtAP-3 (FFSLIPSLIGGLVSAIK.NH2). In addition, potassium channel and sodium channel toxins, present in the venom in high abundance, were identified by CID-MS/MS sequence analysis. TtAP-1 was the most potent against a range of clinically relevant Gram-positive and Gram-negative aerobes and against the anaerobe Clostridioides difficile (MIC = 3.1–12.5 µg/mL). At a concentration of 1× MIC, TtAP-1 produced rapid cell death (<15 min against Acinetobacter baumannii and Staphylococcus aureus). The therapeutic potential of TtAP-1 as an anti-infective agent is limited by its high hemolytic activity (LC50 = 18 µg/mL against mouse erythrocytes) but the peptide constitutes a template for the design of analogs that maintain the high bactericidal activity against ESKAPE pathogens but are less toxic to human cells. It is suggested that the antimicrobial peptides in the scorpion venom facilitate the action of the neurotoxins by increasing the membrane permeability of cells from either prey or predator.
... The node features are chosen according to the nature of LLPS based on experimental evidence and existing physical models (see [38,39]). We use five features including charge [40], hydrophobicity [41], polarity [42], flexibility index [43] and IDR scale [44] to represent amino acids in the molecular graph. We repeat the training of BG models with 105 random initial parameters. ...
Preprint
Full-text available
Protein Liquid-Liquid Phase Separation (LLPS) plays an essential role in cellular processes and is known to be associated with various diseases. However, our understanding of this enigmatic phenomena remains limited. In this work, we propose a graph-neural-network(GNN)-based interpretable machine learning approach to study the intricate nature of protein structure-function relationships associated with LLPS. For many protein properties of interest, information relevant to the property is expected to be confined to local domains. For LLPS proteins, the presence of intrinsically disordered regions (IDR)s in the molecule is arguably the most important information; an adaptive GNN model which preferentially shares information within such units and avoids mixing in information from other parts of the molecule may thus enhance the prediction of LLPS proteins. To allow for the accentuation of domain restricted information, we propose a novel graph-based model with the ability to partition each protein graph into task-dependent subgraphs. Such a model is designed not only to achieve better predictive performance but also to be highly interpretable, and thus have the ability to suggest novel biological insights. In addition to achieving state-of-the-art results on the prediction of LLPS proteins from protein structure for both regulator and scaffold proteins, we examine the properties of the graph partitions identified by our model, showing these to be consistent with the annotated IDRs believed to be largely responsible for LLPS. Moreover, our method is designed in a generic way such that it can be applied to other graph-based predictive tasks with minimal adaption.
... Interestingly, the "charged", "polar", "positively-charged" and "negatively-charged" features from XGB-PCP were also found as the key physicochemical properties in Fig 7C and 7D. This evidence was well supported by the FAUJ880111 or Positive charge [78] feature from XGB-AAI prediction (Fig 7A and 7B). The presence of charged residues within an epitope can have several effects on antigen recognition and immune response, via electrostatic interactions, to enhance the strength of the binding and play a crucial role in determining the binding affinity and specificity of the epitope [79]. ...
Article
Full-text available
Hepatitis C virus (HCV) infection is a concerning health issue that causes chronic liver diseases. Despite many successful therapeutic outcomes, no effective HCV vaccines are currently available. Focusing on T cell activity, the primary effector for HCV clearance, T cell epitopes of HCV (TCE-HCV) are considered promising elements to accelerate HCV vaccine efficacy. Thus, accurate and rapid identification of TCE-HCVs is recommended to obtain more efficient therapy for chronic HCV infection. In this study, a novel sequence-based stacked approach, termed TROLLOPE, is proposed to accurately identify TCE-HCVs from sequence information. Specifically, we employed 12 different sequence-based feature descriptors from heterogeneous perspectives, such as physicochemical properties, composition-transition-distribution information and composition information. These descriptors were used in cooperation with 12 popular machine learning (ML) algorithms to create 144 base-classifiers. To maximize the utility of these base-classifiers, we used a feature selection strategy to determine a collection of potential base-classifiers and integrated them to develop the meta-classifier. Comprehensive experiments based on both cross-validation and independent tests demonstrated the superior predictive performance of TROLLOPE compared with conventional ML classifiers, with cross-validation and independent test accuracies of 0.745 and 0.747, respectively. Finally, a user-friendly online web server of TROLLOPE (http://pmlabqsar.pythonanywhere.com/TROLLOPE) has been developed to serve research efforts in the large-scale identification of potential TCE-HCVs for follow-up experimental verification.
... The molecular masses of the peptides, determined using MALDI-TOF mass spectrometry, were consistent with the proposed structures and are also shown in Table 1. The calculated physicochemical properties of the five HDPs are presented in Table 2. Mean hydrophobicity using the hydrophobicity scale of Fauchere and Pliska [16] and hydrophobic moment [17], a measure of the amphipathicity of an α-helix, were calculated using the HeliQuest web-server [18]. Predicted helical domains were calculated using the PEP2D program [19]. ...
Article
Full-text available
Frogs from the extensive amphibian family Hylidae are a rich source of peptides with therapeutic potential. Peptidomic analysis of norepinephrine-stimulated skin secretions from the Giant Gladiator Treefrog Boana boans (Hylidae: Hylinae) collected in Trinidad led to the isolation and structural characterization of five host-defense peptides with limited structural similarity to figainin 2 and picturin peptides from other frog species belonging to the genus Boana. In addition, the skin secretions contained high concentrations of tryptophyllin-BN (WRPFPFL) in both C-terminally α-amidated and non-amidated forms. Figainin 2BN (FLGVALKLGKVLG KALLPLASSLLHSQ) and picturin 1BN (GIFKDTLKKVVAAVLTTVADNIHPK) adopt α-helical conformations in trifluroethanol–water mixtures and in the presence of cell membrane models (sodium dodecylsulfate and dodecylphosphocholine micelles). The CD data also indicate contributions from turn structures. Both peptides and picturin 2BN (GLMDMLKKVGKVALT VAKSALLP) inhibited the growth of clinically relevant Gram-negative and Gram-positive bacteria with MIC values in the range 7.8–62.5 µM. Figainin 2BN was potently cytotoxic to A549, MDA-MB-231 and HT-29 human tumor-derived cells (LC50 = 7–14 µM) but displayed comparable potency against non-neoplastic HUVEC cells (LC50 = 15 µM) indicative of lack of selectivity for cancer cells.
Article
Background The construction of gels from low molecular weight gelators (LMWG) has been extensively studied in the fields of bio-nanotechnology and other fields. However, the understanding gaps still prevent the prediction of LMWG from the full design of those gel systems. Gels with multicomponent become even more complicated because of the multiple interference effects coexist in the composite gel systems. Aim of review This review emphasizes systems view on the understanding of multicomponent low molecular weight gels (MLMWGs), and summarizes recent progress on the construction of desired networks of MLMWGs, including self-sorting and co-assembly, as well as the challenges and approaches to understanding MLMWGs, with the hope that the opportunities from natural products and peptides can speed up the understanding process and close the gaps between the design and prediction of structures. Key scientific concepts of review This review is focused on three key concepts. Firstly, understanding the complicated multicomponent gels systems requires a systems perspective on MLMWGs. Secondly, several protocols can be applied to control self-sorting and co-assembly behaviors in those multicomponent gels system, including the certain complementary structures, chirality inducing and dynamic control. Thirdly, the discussion is anchored in challenges and strategies of understanding MLMWGs, and some examples are provided for the understanding of multicomponent gels constructed from small natural products and subtle designed short peptides.
Article
Full-text available
Hirudin from Hirudo medicinalis is a bivalent α‐Thrombin (αT) inhibitor, targeting the enzyme active site and exosite‐I, and is currently used in anticoagulant therapy along with its simplified analogue hirulog. Haemadin, a small protein (57 amino acids) isolated from the land‐living leech Haemadipsa sylvestris, selectively inhibits αT with a potency identical to that of recombinant hirudin (KI = 0.2 pM), with which it shares a common disulfide topology and overall fold. At variance with hirudin, haemadin targets exosite‐II and therefore (besides the free protease) it also blocks thrombomodulin‐bound αT without inhibiting the active intermediate meizothrombin, thus offering potential advantages over hirudin. Here, we produced in reasonably high yields and pharmaceutical purity (>98%) wild‐type haemadin and the oxidation resistant Met5 → nor‐Leucine analogue, both inhibiting αT with a KI of 0.2 pM. Thereafter, we used site‐directed mutagenesis, spectroscopic, ligand‐displacement, and Hydrogen/Deuterium Exchange‐Mass Spectrometry techniques to map the αT regions relevant for the interaction with full‐length haemadin and with the synthetic N‐ and C‐terminal peptides Haem(1–10) and Haem(45–57). Haem(1–10) competitively binds to/inhibits αT active site (KI = 1.9 μM) and its potency was enhanced by 10‐fold after Phe3 → β‐Naphthylalanine exchange. Conversely to full‐length haemadin, haem(45–57) displays intrinsic affinity for exosite‐I (KD = 1.6 μM). Hence, we synthesized a peptide in which the sequences 1–9 and 45–57 were joined together through a 3‐Glycine spacer to yield haemanorm, a highly potent (KI = 0.8 nM) inhibitor targeting αT active site and exosite‐I. Haemanorm can be regarded as a novel class of hirulog‐like αT inhibitors with potential pharmacological applications.
Article
Full-text available
Gene therapy via retroviral vectors holds great promise for treating a variety of serious diseases. It requires the use of additives to boost infectivity. Amyloid-like peptide nanofibers (PNFs) were shown to efficiently enhance retroviral gene transfer. However, the underlying mode of action of these peptides remains largely unknown. Data-mining is an efficient method to systematically study structure–function relationship and unveil patterns in a database. This data-mining study elucidates the multi-scale structure–property–activity relationship of transduction enhancing peptides for retroviral gene transfer. In contrast to previous reports, we find that not the amyloid fibrils themselves, but rather µm-sized β-sheet rich aggregates enhance infectivity. Specifically, microscopic aggregation of β-sheet rich amyloid structures with a hydrophobic surface pattern and positive surface charge are identified as key material properties. We validate the reliability of the amphiphilic sequence pattern and the general applicability of the key properties by rationally creating new active sequences and identifying short amyloidal peptides from various pathogenic and functional origin. Data-mining—even for small datasets—enables the development of new efficient retroviral transduction enhancers and provides important insights into the diverse bioactivity of the functional material class of amyloids.
Article
Full-text available
Partition coefficients in n-octanol—water have been determined for the naturally occurring and some synthetic α-amino acids using thin-layer chromatography in different solvent systems. Both literature and newly measured values of partition coefficients in n-octanol—water have been used as reference values. The relationship between RF values and partition coefficients (P) has been established. Hansch hydrophobic parameters π have been evaluated for the side-chains. Fragmental group contributions to the overall π-value have also been evaluated. In this way P and π values of any non-listed amino acid become easily accessible either by thin-layer chromatography or by computation.
Article
Full-text available
In order to describe the conformational and other physical properties of the 20 naturally occurring amino acid residues with a minimum number of parameters, several multivariate statistical analyses were applied to 188 of their physical properties and ten orthogonal properties (factors) were obtained for the 20 amino acids without losing the information contained in the original physical properties. The analysis consisted of three main steps. First, 72 of the physical properties were eliminated from further consideration because they did not pass statistical tests that they follow a normal distribution. Second, the remaining 116 physical properties of the amino acids were classified by a cluster analysis to eliminate duplications of highly correlated physical properties. This led to nine clusters, each of which was characterized by an average characteristic property, namely bulk, two hydrophobicity indices for free amino acids, one hydrophobicity index for amino acid residues in a protein, two types of -structure preference, -helix preference, and two types of bend-structure preference. The physical properties within a given cluster were highly correlated with each other, but the correlation between clusters was low. Third, a factor analysis was applied to the nine average classified properties and 16 additional physical properties to obtain a small number of orthogonal properties (ten factors). Four of these factors arise from the nine characteristic properties, and the remaining six factors were obtained from the 16 physical properties not included in the nine characteristic properties. Finally, most of the 188 physical properties could be expressed as a sum of these ten orthogonal factors, with appropriate weighting factors. Since these factors contain information relating almost all properties of all 20 amino acids, it is possible to estimate the numerical values of a property for one or two amino acids for which experimental data for this property are not available. For example, the estimated values for the Zimm-Bragg parameters at 20C are 0.66 and 0.92 for proline and cysteine, respectively, computed from the first four factors.
Article
The 13C-NMR chemical shift δH of the α-carbon in amino-acids is proposed as a new parameter for QSAR studies of biologically active oligopeptides. δH which is known for common amino-acids, is mainly a measure of the electronic shielding of the α-carbon. In the aliphatic series, it is computable by empirical rules and very sensitive to β-branching. In this study, δH is shown to describe accurately as a single continuous parameter the variations of the biological activity of angiotensin II, when the residue in its position 5 is varied.
Article
Guidelines for submitting commentsPolicy: Comments that contribute to the discussion of the article will be posted within approximately three business days. We do not accept anonymous comments. Please include your email address; the address will not be displayed in the posted comment. Cell Press Editors will screen the comments to ensure that they are relevant and appropriate but comments will not be edited. The ultimate decision on publication of an online comment is at the Editors' discretion. Formatting: Please include a title for the comment and your affiliation. Note that symbols (e.g. Greek letters) may not transmit properly in this form due to potential software compatibility issues. Please spell out the words in place of the symbols (e.g. replace “α” with “alpha”). Comments should be no more than 8,000 characters (including spaces ) in length. References may be included when necessary but should be kept to a minimum. Be careful if copying and pasting from a Word document. Smart quotes can cause problems in the form. If you experience difficulties, please convert to a plain text file and then copy and paste into the form.
Article
Three graph-based indexes of molecular shape have been found to correlate well with the Taft steric indexes, Es, for electron influence-free alkyl groups and ten symmetric-top groups used by Charton to compute a Van der Waals radius-based index, v. The equation model includes the 1kα, and 3Kα, shape indexes and the 0k index equivalent to the Shannon information content per molecule. This later index encodes the group symmetry. The equation model is simplified to the expression ≈ = 2k - 3kα - 0k where Xi (≈) is proposed as a graph-based shape index encoding the steric influence of a group.
Chapter
Introduction Definition of Localized Effect Substituent Constants Evaluation of σI Constants The σD Constants Estimation of Substituent Constants Separation of Electrical Effects Highly Variable Substituent “Constants” Conclusions