Content uploaded by Vladimir Pliska
Author content
All content in this area was uploaded by Vladimir Pliska on Oct 23, 2017
Content may be subject to copyright.
Content uploaded by Vladimir Pliska
Author content
All content in this area was uploaded by Vladimir Pliska on Oct 23, 2017
Content may be subject to copyright.
Int.
J.
Peptide Protein Res.
32,
1988,
269-278
Amino acid side chain parameters for correlation studies in biology and
pharmacology
JEAN-LUC FAUCHERE’, MARVIN CHARTON’, LEMONT
B.
KIER’, ARIE VERLOOP4 and
VLADIMIR PLISKA’
‘Department
of
Biotechnology, Swiss Federal Institute
of
Technology, ETH, Zurich, Switzerland;
2Departnient
of
Chemistry, Pralt Institute, Brooklyn, New York; ’Department
of
Medicinal Chemistry,
Virginia Commonwealth University, Richmond, Virginia,
USA;
414/2
A
R Naarden, The Netherland.7;
’Institute of Animal Science, Swiss Federal Institute of Technology, ETH, Zurich, Switzerland
Received 25 February, accepted for publication April 1988
Fifteen physicochemical descriptors of side chains of the 20 natural and of 26
non-coded amino acids are compiled and simple methods for their evaluation de-
scribed. The relevance of these parameters
to
account for hydrophobic, steric, and
electric properties
of
the
side chains is assessed and their intercorrelation analyzed. It
is shown that three principal components, one steric, one
bulk,
and one electric
(electronic), account for 66% of the total variance
in
the available set. These par-
ameters may prove
to
be useful for correlation studies in series
of
bioactive peptide
analogues.
Key
words:
amino acid side chain parameters; LFER parameters; QSAR in peptides; QSAR
parameters
One
of
the main limitations
of
correlation
studies for bioactive peptides is the lack
of
reliable physicochemical amino acid side
chain descriptors. This is mainly due to the
difficulty in selecting those features which
control the peptide-receptor interactions, and
in finding conditions under which each
property can be individually measured.
In previous work, Sneath (1) quantitatively
evaluated the similarity and dissimilarity
of
the
20
natural amino acids. He reasoned that
small structural changes should bring about
Amino acids are abbreviated according
to
the recom-
mendations
of
the
IUPAC-IUB Joint Commission on
Biochemical Nomenclature
(32).
All
amino acids have
the
L-configuration. QSAR, quantitative structure-
activity
relationships; LFER, linear free energy related;
n.m.r., nuclear magnetic resonance.
small changes in the biological activity, and
concluded on examples in the oxytocin and
angiotensin
I1
series that the use of correla-
tions between biological activity and these
factors would be “better than chance,”
although not very reliable for predictive pur-
poses. Neither the side chain properties
defined in this work nor the extracted princi-
pal components
-
aliphaticity, hydrogena-
tion, aromaticity, and hydroxythiolation
-
are convenient for use in
QSAR
studies, since
they are not continuous parameters. Darvas
et
al.
(2),
stated that conventional
LFER
par-
ameters
(3)
alone cannot satisfactorily de-
scribe peptide-peptide or peptide-protein in-
teractions, and introduced “peptide-tailored’’
physicochemical descriptors including a
summation parameter supposed to represent
nonspecific (“aspecific”
(2))
side chain in-
269
J.-L.
Fauchire
et
al.
teractions which originate in hydrogen
bonds, electrostatic and charge Lransfer
effects. Since the summation index contains
these effects as a whole, identification of the
individual contributions to biological activity
is no longer possible.
Recently, Kidera
et
al.
(4)
analyzed
188
properties of the naturally occurring amino
acids in a remarkable work focused mainly on
the prediction of the three-dimensional struc-
ture of proteins. In addition to bulk and
hydrophobicity, these authors identified
p-
structure preference and r-helix or bend
structure preference as the two main re-
presentative factors. These two factors, al-
though of paramount importance in protein
folding, are of stochastic nature and cannot
be established by direct measurements. For
the biological activity of peptide drugs, they
can hardly be
of
greater relevance than fea-
tures such as charge, aromaticity, or presence
of hydrogen bond donors
or
acceptors.
Finally, this study cannot be easily extended
to non-natural synthetic amino acids current-
ly
used in peptide drug design.
Several examples have demonstrated the
usefulness of
QSAR
studies of bioactive pep-
tides in order to identify the factors effective
in binding or proteolysis and to predict more
potent, more stable,
or
more selective
analogues
(5-8).
One convincing example is
the study by Hellberg
et
al.
(9)
of bradykinin
potentiating peptides in which correlations
derived from a small number of derivatives
modeled and predicted the activity
of
a large
series of analogues. In the preceding studies,
no consistent set of side chain parameters was
used, each laboratory relying upon its own
developed or measured descriptors for cor-
relation or principal components analyses.
The aims of this communication were
therefore:
(1) to establish
a
list of selected hydro-
phobic, steric, electronic, and other par-
ameters for amino acid side chains;
(2)
to extend the list by our own measure-
ments or computation to a number of un-
natural synthetic amino acids;
(3)
to indicate simple methods for the
measurement or predictive calculation of the
descriptors from the chemical structure of
new side chains;
(4)
to establish the degree of separation or
intercorrelation of the descriptors for the re-
ported values;
(5)
to identify principal components
among the side chain properties described by
the parameters.
METHODS
The initial set of structural parameters was
generally obtained experimentally by direct
measurements of the given property
of
the
amino acid or derivative. This was the case
for the hydrophobic constant, the polarizabil-
ity, the pK,
of
the corresponding carboxylic
acid, and for all the steric constants derived
from Taft's constants.
The n.m.r. chemical shift of the C,-carbon
of several amino acids was measured here for
the first time. The dH,.-values were obtained
with the free amino acid in neutral
D20
at
20"
on a Varian
XL300
spectrometer (Prof.
J.F.
Oth, ETH Zurich) with lock on
D20
under
proton decoupled conditions and elimination
of the Overhauser effect. The reference was
the sodium salt
of
2,2-dimethyl-2-silapen-
tane-5-sulfonate.
Other parameters, such
as
ureg
and
u,,
could
be measured on molecular CPK models
(10).
Finally, other ones were theoretically derived,
such as the graph shape index.
Constants for new amino acid side chains
can generally be calculated by empirical rules
or obtained from correlations with various
molecular features. Details are given here for
each individual case.
Correlation analysis and search for princi-
pal components was performed by programs
of the
BMDP
library
(1
1).
RESULTS
AND
DISCUSSION
Hydrophobicity
n
The Ir-values (Tables
1
and
2)
express the
hydrophobicity of the amino acid side chain
according to the equation:
Ir(side chain)
=
log P(amino acid)
-
log P(glycine)
270
TABLE
I
Side chain paramerers /16-paramrter
set)
for
narural amino acid side chains (except proline,
n
=
19)
h
Amino
Kd
I
UC
%%d
L' B,' BSg
a
uv'
bHCJ
6,'
nHI nnm i,"
iAO
~K,(RCOOH)~
-b
acid
Ala
0.31 1.28 0.52 0.53 2.87 1.52 2.04 0.046
1.0
7.3 -0.01
0 0 0
0
4.76
Arg
-1.01 2.34 0.68 0.69 7.82 1.52 6.24 0.291 6.13 11.1 0.04 4 3
1
0
4.30
Asn
-0.60 1.60 0.76 0.58 4.58 1.52 4.37 0.134 2.95 8.0 0.06 2 3
0
0
3.64
ASP
-0.77 1.60 0.76 0.59 4.74 1.52 3.78 0.105 2.78 9.2 0.15
1
401 5.69
CYS
1.54 1.77 0.62 0.66 4.47 1.52 3.41 0.128 2.43 14.4 0.12
0 0
0
0
3.67
Gln
-0.22 1.56 0.68 0.71 6.11 1.52 3.53
0.180
3.95 10.6 0.05 2 3
0 0
4.54
Glu
-0.64 1.56 0.68 0.72 5.97 1.52 3.31 0.151 3.78
11.4
0.07 1 401 5.48
GlY
0.00
0.00
0.00 0.00
2.06
1.00
1.00
0.00
0.00
0.00
0.00
0
0 0 0
3.77
His
0.13 2.99 0.70 0.64 5.23 1.52 5.66 0.230 4.66 10.2 0.08
1
1
10 2.84
Ile
1.80 4.19 1.02 0.96 4.92 1,90 3.49 0.186 4.00 16.1
-0.01
0 0
0
0
4.81
Leu
1.70 2.59 0.98 0.92 4.92 1.52 4.45 0.186
4.00
10.1 -0.01
0
0
0
0
4.79
LY
s
-0.99 1.89 0.68 0.78 6.89 1.52 4.87 0.219 4.77 10.9
0.00
2
I
10
4.27
Met
1.23 2.35 0.78 0.77 6.36 1.52 4.80 0.221 4.43 10.4 0.04
0
0
0 0
4.25
Phe
1.79 2.94 0.70 0.71 4.62 1.52 6.02 0.290 5.89 13.9 0.03
0 0 0
0
4.31
Pro
(0.72) 2.67
-
-
(4.11) (1.52) (4.31)
~
(2.72) 17.8
-
0000
-
Ser
-0.04 1.31 0.53 0.55 3.97 1.52 2.70 0.062 1.60 13.1 0.11
1
200 3.83
Thr
0.26 3.03 0.50 0.63 4.11 1.73 3.17
0.108
2.60 16.7 0.04
1
200 3.87
000
4.75
TrP
TY
r
0.96 2.94 0.70 0.71 4.73 1.52 6.72 0.298 6.47 13.9 0.03
1
200 4.30
Val
1.22 3.67 0.76 0.89 4.11 1.90 3.17 0.140 3.0 17.2 0.01
0
0
0
0
4.86
2.25 3.21 0.70
0.84
7.68 1.52 5.90 0.409 8.08 13.2
0.00
1
Hydrophobicity.
Upsilon steric parameter.
Smoothed upsilon steric parameter.
bGraph shape index.
'.':gSTERIMOL length, maximum, and minimum width, respectively. Torsion angles are
0"
except
for
Phe and Tyr where the torsion angle between phenyl and the
adjacent
group
(CH,,NH) is
90'.
'Normalized van der Waals volume.
Polarizability.
N.m.r. chemical shift
of
alpha-carbon.
Localized electrical effect.
Number
of full
nonbonding orbitals.
'
Number
of
hydrogen bond donors.
,I
Indicator
of
presence or absence
of
positive charge in side chain.
"
Indicaio:
of
presence
or
absence
of
negative charge in side chain.
"
-
Log
of
dissociation constant
of
carboxylated side chain.
"Torsion angle is
90".
Id
2
9
a
E
ui
J.-L.
Fauchkre
et
al.
3
3
3
rc.
3
s
0
3
3
-
d
2
d
3
d
0 0
-
0
N
0
3
-1
m
N
hi
3
3
c
-
o
o
3
c
o
CI
ri
o
o
o
o o o
N
o o o
o
-
o o
-
Amino acid side chain paranietcrs
values that are not attainable by the classical
ester hydrolysis model
(1
8).
Values of
=
are
given
in
Tables
1
and
2
and the corresponding
values for new amino acid side chains can
easily be calculated by methods described
in
(19, 20).
=
(and the corresponding calculated
Es-values) appears as a valuable steric par-
ameter since it
is
a measure of the directed
spatial influence of the group,
it
is indepen-
dent of all electrical and solution effects, and
it
can be calculated for all types of sub-
stituents.
in which
P
is
the partition coefficient of the
amino acid and of glycine in octanol/water
(1
2).
The fundamental n-values for natural
amino acids (Table
I)
have been obtained
at
pH
7.1
with amino acid derivatives protected
and thus uncharged in the backbone residue
-
the N,-acetyl-amino-acid amides
(1
3)
-
by
partitioning in octanol/water and by using a
similar equation for evaluation:
n(side chain)
=
logP(Ac-amino acid-NH,)
In spite
of
some controversial acceptance
of
this scale
(1
4)
the authors consider it highly
reliable because
P
was directly estimated in
octanol/water and because of the pertinence
of the derivatives and physiologically relevant
pH used
in
measurements. The scale was used
for the determination of atomic solvation
parameters in proteins
(15).
The other
n-
values (Table
2)
have been determined by
thin-layer chromatography and the Rf-values
converted to n-values in octanol/water, by a
method described earlier
(1
2).
All values re-
ported here have been determined exper-
imentally. New values can be obtained by
thin-layer chromatography
(12)
as far as the
L-form
of
the amino acid is available, or by
calculation from the structure using either the
fragment contributions without correcting
factors as estimated in
(12),
or those of
Hansch
&
Leo
(16)
with the appropriate cor-
recting factors.
Gruph shape
index
=
This parameter
(I
7)
is a measure of the steric
influence of a group which encodes the three
attributes: complexity, branching, and sym-
metry of the group. It can be directly calcu-
lated from the molecular graph structure of
the substituent, e.g.
of
the amino acid side
chain. The index
=
is free of inductive,
reasonance, or solvation effects. Although
=
is theoretically derived, it was found to cor-
relate with the Taft’s substituent index Es,
according to following equation:
-
ES
=
0.40
=
-
0.60
which allows one to predict additional Taft’s
Up.yilon
steric
paramt.ter
v
This steric parameter
1)
(21)
was derived from
the Taft’s constant
Es
and expressed
as
a
function of the minimal van der Waals radius.
Values are available for a large number
of
substituents (21) and in particular for amino
acid side chains
(22).
Unlike the original
Taft’s constant,
v
is
expressed
in
angstriims.
In some cases it was necessary for its deriva-
tion to use effective values ofu obtained either
from correlations of rate constants for acid
catalyzed ester hydrolysis or from estimation
equations. However, the upsilon parameter.
since it is based on the Bondi/van der Waals
radii, can be held as a most reliable measure
of
the steric effect.
The values of reported in the next
column of Tables
1
and
2,
are directly related.
although not identical, to those of upsilon.
A
tight correlation was observed between
I)
and
the minimal projection surface of the side
chain (or of any substituent) taken perpen-
dicular to the
C,-C..
bond
(23).
Using the
parabolic correlation obtained (with at
least
51 groups) and taking the ii,,,-values on the
regression curve, a new set of smoothed steric
parameters was obtained.
In
this set, a few
unexplainable discrepancies are eliminated
as, for example, the higher value
in
the orig-
inal met for the side chain of serine
(ti
=
0.53)
compared to threonine
(o
=
0.50).
New
values are attainable by estimating the projec-
tion surface of the relevant CPK molecular
models of the uncommon side chain and
using the same correlation equation
(23).
This
set of ti,,,-values, which essentially describes
(as do Es and
o)
the steric effect as seen from
the reaction center
in
the model compounds,
273
J.-L.
Fauchere
et
a/.
has also proven useful in a number of correla-
tion studies.
STERIMOL
multidimensional steric
parameters
L,
B,,
B,
The STERIMOL constants characterize the
steric bulk of a substituent by its dimensions
in three different directions
in
space (24). We
use here the revised version of these pa-
rameters, which contains the three quantities
L,
B,,
and
B,
(25).
L
represents the length of
the side chain measured in the direction in
which
it
is
attached
to
the glycine backbone.
and
B,
and
B5
are the minimum and
maximum width, respectively. of the side
chain, measured
in
directions perpendicular
to
L.
The parameters are calculated by a
computer software package directly from the
structure of the side chain. The STERIMOL
constants which have been shown to be useful
in a number of
QSAR
studies are likely to
help to investigate structure-activity relation-
ships in peptides, too, especially
in
the cases
where more than one side chain descriptor is
required for steric bulk. Since they are easily
derived from structure by calculation, their
value can be predicted for new amino acid
side chains even prior to synthesis.
Polarizability
a
The polarizability
SI
is related to the molar
refractivity
MR,
which in turn is experiment-
ally given by:
MR
=
(M/d)(n'
-
l)](n'
+
2)
(n, index of refraction; M, molecular weight;
d, density). Since
MR
is an additive-
constitutive property of a molecule,
it
can be
easily calculated for any substitutent. From
tabulated values of MR for common groups
(16)
and by the equation:
a
=
(3/(4.rrN))(M/d)(n2
-
l)((n'
+
2)
the polarizabilities a can be obtained (cf. also
(26))
for
known as well as for new amino acid
side chains (Tables
1
and 2). Clearly
2
is a
function of the molecular volume Mid. and
thus
a
bulk parameter which models disper-
sion forces. The a-values have been scaled to
make the coefficient
in
the regression equa-
tion roughly comparable to those obtained
for other parameters. New values are easily
obtained by simple arithmetic.
Normalized van deer Uhals volume
uy
This additional bulk parameter is the van der
Waals volume of the amino acid side chain
normalized according to the following equa-
tion (23):
u,
(side chain)
=
[V(side chain)
where
V
is the measured van der Waals
volume on CPK models for the side chain or
the hydrogen atom. respectively. From this,
uL
=
1
for the side chain of alanine and in-
creases by one unit
for
each additional CH,-
group. Side chains
of
amino acids such as
neopentylglycine and adamantylalanine,
which are characterized by very similar
u-
values, are distinct when described by
vv.
This parameter is easily measurable on CPK
models for known as well as for new, not yet
synthesi~ed amino acids.
As
a bulk pa-
rameter.
it
models dispersion forces and
is
highly correlated to the polarizability
cr.
N.m.r. chemical
.rhifi
of
a-carbon
6Hc
The n.m.r. chemical shift 6H,
(H,
magnetic
field strength;
6H,
chemical shift; 6Hc
I3C-
chemical shift) of the alpha-carbon in amino
acids has been proposed
(27)
as a descriptor
of the electronic properties of the side chain,
When expressed
in
ppm from the "C-
chemical shift
of
the y-carbons of glycine, it
can be considered a pure substituent par-
ameter.
As
a matter of fact, this parameter
primarily reflects the shielding of the C,-
nucleus by the nearby electronic systems of
the side chain and thus incorporates the
classical inductive and mesomeric effects
of
the substituent (side chain). However,
6Hc
is
not free of steric and hydrophobic contribu-
tions, as can be seen. for example, from a
certain level of intercorrelation with
n
and
E
(Fig.
1).
Values have been mcasured in a number of
cases for the free amino acid and for the
amino acid incorporated in a short peptide
(28).
Several newly measured values are re-
2
74
Amino acid side chain parameters
Hydrogen bonding parameters nH and
n,,
These integer parameters expressing the
number of OH and NH bonds, and the
number of full nonbonding orbitals
on
0
and
N
atoms, respectively, have been proposed
for amino acid side chains
(31).
They can be
evaluated by simple inspection of the struc-
tural formula of the substitutent. In QSAR
studies of peptides they often play the role of
indicator variables and can be of great help to
detect the implication of hydrogen bonds in
single side chains among large series of non
hydrogen bonding side chains.
P
>
99.9%
P
>
99%
c
P
>
95%
a
1'1
2e
r
.b
Q
5
?.a
3
.=
<
.-
4
,.
FIGURE
1
Significance of
the
linear correlation coefficients
(43
degrees of freedom)
ported here for the first time. The parameter
is lower by
1.5
f
1.1
ppm when the side
chain is contained in a peptide. For new
amino acids, dH, can be calculated using the
empirical rules of Horsley
et
al.
(29).
We have
tested these rules and observed that in their
present form, they do not even permit the
calculation of the chemical shift for all
natural amino acids. However they can be
applied to new side chains according to the
same scheme, using the additivity of the frag-
ment contributions to dH,. The constant dH,
has been successfully employed in several
QSAR
studies of bioactive peptides
(8,
27).
Localized electrical efSect parameter
g,
This constant has been clearly defined and
appropriately scaled
(30)
for any given sub-
stituent, and obtained for a number of amino
acid side chains
(26).
Values for several not
common side chains are reported here for the
first time (Tables
1
and
2).
The constant re-
presents mainly inductive field effects and is
well separated from delocalized resonance
contributions.
The pK,'s of the carboxylic acids R-
COOH,
in which
R
is an amino acid side
chain, are also compiled for natural side
chains in Table
1.
However, in contrast to
0,
,
the pK, reflects overlapping localized and
delocalized electrical effects.
Charge parameters
iB
and
i,
The presence or absence of charges in amino
acid side chains can be accounted for by the
parameters
i,
and i, for basic (negatively
charged) and acidic (positively charged)
groups, respectively. The parameter takes the
value
1
or
0
depending on whether such a
charge is present or not, but neglecting the
fact that ionization may be incomplete
at
physiologically relevant pH.
Principal component analysis
The data contained in Tables
1
and
2
describe
properties of
45
amino acid side chains by
means of 15 measurable parameters. The
selection of these parameters is arbitrary and
largely dictated by their availability.
Therefore, both redundancy and missing
properties cannot be fully excluded. We have
investigated the matrix of correlation
of
the
parameters (Fig. 1) and found a high level of
significance
of
the correlation coefficients
between certain pairs of parameters. Since
it
can be anticipated that no more than three to
four distinct properties such as hydrophobic-
ity, steric bulk, and electronic features are
expressed by this 15-parameter set, we have
investigated the system by principal com-
ponent analysis. For the particular choice of
45
side chains,
initial
factor extraction
showed that four factors were necessary to
explain
75%
of the total variance. Ortho-
gonal factor rotation and sorting out of the
factor loadings (those less than
0.25
being set
to zero) led to the pattern given in Fig.
2.
Three factors were retained and tentatively
interpreted as side chain properties. The first
275
J.-L.
Fauchere
rt
crl.
1
3
I
i
I
I
I
I
J
I
I
1
1
I
I
I
I
I
I
factor
1
factor
2
factor
3
FIGURE
2
Compositior?
of
the
tirsr
three
principal
compont.rits:
factor
loading
matrix.
The
factors explain
66%
of
the
total
tariance.
Loadings
less
thxn
0.25
IiLi\c
bcen
omitted
I
1
factor was clearly related to the volume of the
side chain, since its loadings were high for the
polarizability
r,
the van der Waals volume
ox.
and the two STERIMOL parameters
L
and
B,.
An apparent inconsistancy in this respect
was
the contribution of
I)~~~,
a parameter
which should not be primarily related
to
steric bulk. However. this contribution
(0.27)
to
the first factor was relatively small and
near
to
the threshold value for rejection. The
second factor, again, had steric character and
was clearly of the Taft type. These steric par-
ameters are vector quantities with both ab-
solute value and direction. They are likely to
be proportional to the projection surface of
the group perpendicular
to
the glycine (or
peptide) backbone, as are
its
constitutive
steric parameters
11.
urCg.
I
(Es).
and
B,.
Factor
3
appeared
to
be related to electronic
properties
as
given by the number of
non-
bonding Tc-orbitals. the number
of
possible
hydrogen bonds and the delocalized electrical
(inductive) effect
0,
and the presence
of
charges.
Most interesting \vas the fact that constants
-
276
~c
and
6Hc
were alniost evenly loaded in
factors
1.
2,
and
3
(n
;IS
a negative loading in
the third factor).
This
observation
does
not
imply that these parameters are not impor-
tant for the description of certain side chain
properties. but that they are,
due
to their
correlation with other parameters, already
sufficiently represented by the
first
three
factors,
In
conclusion. our results tend to suggest
that three factors are sufficient
to
describe
amino acid side chain properties in LFER
correlations. Since the number
of
parameters
and of side chains in\ olved in this study was
low. the factor composition cannot be con-
sidered generally valid and may vary from
one set of side chains
10
another. However, a
certain stability
\\'as
observed;
so,
for
example. omitting
0,
and/or joining
i,
and
i,
into
a
single charge paramctcr did
not
change
the structure
of
the fiic:ors considerably.
Fur-
thermore. the parameters described in this
study were closely related
to
one of the
desired properties, such as hydrophobicity,
bulkiness. or electronic configuration. There
Amino acid side chain parameters
matic side chains of Phe and Tyr
1'
were the
next to amalgamate.
A
second cluster con-
tained relatively small side chains containing
either a heteroatom or a polar bond: this
cluster 2 may be represented by either Thr or
propargylglycine. The third cluster was con-
stituted of bulky aromatic side chains con-
taining at least one heteroatom,
3,
as in
dihydroxyphenylalanine or
in
pyrazinylala-
nine. A fourth cluster
was
made
of
the three
primary amines Lys, ornithine and diamino-
butyric acid
4.
A cluster was also formed by
four fi-branched aliphatic side chains,
5,
as in
Ile or in cyclopentylglycine.
A
next cluster
contained bulky aromatic side chains, which,
in contrast to those in cluster
3,
were relative-
ly hydrophobic (cyclohexylalanine also
amalgamated to this cluster)
6;
one typical
representative would be 0-benzylserine. The
side chains of glycine and alanine, as ex-
pected, did not amalgamate in the earlier
clustering steps and behaved as singular
species. This was also the case for several
other side chains such
as
those of Bug, Asp
and Glu, or Arg. Although
a
certain stability
in clustering was observed (e.g. the amalga-
mation sequence was very similar for the sub-
series of the natural amino acids), further
work on larger series will be required to estab-
lish more significant clusters.
is, therefore, little point in employing these
factors
in
QSAR
studies of peptides at
present and the original parameters should
preferably be used in correlation analysis.
Cluster
analysis
The small number of side chains and of par-
ameters used in this study did not make it
possible to reach the ultimate goal of cluster
analysis: to order the substituents in groups in
such a way that each of member may be
considered
as
a representative of the whole
group. However, in the course of a prelimin-
ary analysis
of
the full series of available side
chains (n
=
4.9,
a few clusters clearly
appeared, while a number of miscellaneous
side chains did not fall into clear-cut groups.
The list
of
unambiguously formed clusters
and the corresponding tree diagram are given
in Fig.
3.
A first cluster
1
contained five
aliphatic side chains of the about the size of
norvaline to which the two uncharged
aro-
2
14s
6
4
FIGURE
3
Apparent clusters found by cluster analysis
of
cases
based on a
1
5-parameter set. The diagram show the order
of
amalgamation
of
individual clusters
(in
boxes) and the
connections between them. The ordinate distances cor-
respond to the Mahalanobis distances between the amal-
gamation points.
CONCLUSION
The collection of parameters compiled in
Tables
1
and
2
contains a selection of
sub-
stituent constants for amino acid side chains.
As substituent constants they reflect the
properties of the side chain more than those
of the amino acid and except for the
STERIMOL parameters and the pK,, the
reference value for glycine is zero. The list is
by no means intended to be complete, but
it
should quantitatively represent features rel-
evant for, say, investigations
of
peptide drug-
receptor interactions. The constants cover
hydrophobic, steric, electronic and hydrogen
bond donor/acceptor, and charge properties.
No
observed conformational preferences
have been included since they appear to be
consequences of the side chain properties (cf.
(26))
as is the biological activity of the corre-
277
J.-L.
Fauchere
et
ul.
sponding peptides. The parameters are either
measurable or computable by simple
methods, which we have indicated for each
particular
case.
This
Fact
can be
of
great
value
for
predictive studies.
ACKNOWLEDGMENT
This work was supported
by
the Swiss National Science
Foundation (grants
3.205-0.85
and 3.559-0.86).
REFERENCES
I.
2.
3.
Sneath. P.H. (1966)
J.
Theor(>/.
Biol.
12,
157-195
Darvas,
F. (1980)
Ah.
Phornicrc,o/.
Re<.
Prrwr.
3,
265-278
Hansch.
C.
&
Leo. A. (1979)
Suh.rririien/
Coti~runrs
/or
Corrdrrriotl
,4riul~,.\i.\
in
C/i~wiisrrI,
crud
B~ologr.
Wiley. New York
Kidera.
A,.
Konishi.
Y..
Oka.
M
.
Ooi.
T.
&
Schera-
ga.
H.A. (1985)
J.
Prorcw
Clrrni.
4,
23-55
Pliska,
V.
(1978)
Esparicwriu
34,
1190-1
192
Guillemctte, G.. Bernier.
M..
Parent. P.. Leduc.
R.
&
Escher.
E.
(1984)
J.
Med.
C/irm.
27,
315-320
Nisato.
D.,
Wagnon,
J..
Callet.
G..
Mettefeu.
D..
Assens. J.L.. Plouzane.
C..
Tonncrre. B.. Pliska.
V.
&
Fauchere. J.L. (1987)
J.
.%fed.
C/icw. 30,
2287-
229
1
Hellberg.
S..
Sj(istron1.
M.
&Wold.
S.
(19861
Am
Koltun. W.L. (1965)
Biopolrnier.c
3,
665-679
BMDP Statistical Softaare.
1983
-
Printing vrith
Additions (Dixon, W.J.. ed,). University
of
Califor-
nia Press. Berkeley
Pliska,
V..
Schmidt.
M.
&
Fauchere.
J.L.
(19x1)
J.
Ckromutogr.
216,
19-92
Fauchere. J.L.
&
Pliska.
V.
(1983)
OrropecuiJ.
Med.
Chem.
18,
369-375
Fauchere. J.L.
(1985)
TIBS
10,
268-269 and Dis-
cussion: (1986)
77BS
11,
69-70
Eisenhcrg.
D.
&
McLachlan.
A
D.
(1986)
Soritrr
4.
5.
7.
8.
6. Fauchere. J.L. (1982)
J.
.Vf~~tl.
Ch<Vl.
25,
1428-1431
9.
10.
11.
Ckt?i.
SL.(/JI~.
B40,
135-140
12.
13.
14.
15.
319,
199-203
16.
17.
I8
19.
20.
21.
-I?
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
Leo.
A.
(I
9x8) Lop P and related parameters
data-
base.
Medicinal Chemistry Project. Pomona
College. Clarcmont. ('alifornia
Kim. L.B. (1987)
Qir(iiir.
Srrrcc,.
A(.,.
Rela/.
6,
117-
I22
Taft. R.W. (19561
in
Srcric
Eflivt.c
in
Orgunic
Che-
nii,srrj,
(Neaman.
M.S..
ed.).
p.
556. Wiley,
New
York
Kicr, L.B. (1986)
Jugo.
J.
Pliurrn.
36,
171-188
Kier.
L.B. (1987)
Medic.inu/
Res.
Rev.
7,
417-470
Charton,
M.
(1977) In
Dcsign
of
Biophurmaceutical
Pt'(J/JW/;<i.S
rhrough
Prr~c/rugs
und
Anulogs
(Roche,
E.B.. ed.).
pp,
228-280. American Pharmaceutical
Association. Washington.
D.C.
Charton. M. (1981)
J.
T/iwer.
Bid.
91,
115-123
Fauchere. L.J. (19x4)
in
QSAR
in
Design
ofBioac-
rrw
C'ompoiintk
(Kuchar.
M.,
ed.),
pp.
135-144,
Prous. Barcelona
Verloop.
A,,
Hoogenstraaten. W.
&
Tipker,
J.
(1976)
in
Drug
Design
(Ariens,
E.J.,
ed.), vol. 7,
pp.
165-207.
Academic Prcss.
New
York
Verloop.
A.
(1983)
in
IC'PAC,
Pc*sricide Chemisrry
(Milamoto.
J.
&
Kearney. P.C., eds.),
vol.
I,
pp.
339-344. Pergainon. Oxford
Charton. M.
&
Charton. B.I. (1983)
J.
Throret.
Fauchere.
J.L.
&
L.nuterwein.
J.
(1985)
Qucmr.
Srrircr.
.Acr.
Re/.
4,
11-I!
Wiithrich.
K.
(1976)
.\.MR
in
Biological
Research:
Peprit/r~.s
mid
Prorcinv.
pp. 170-179, North Holland,
Amsterdam
FIorsley.
W.. Sternlicht.
H.
&
Cohen, J.S. (1970)
J.
,4m
Cheni.
Soc.
92,
680-685
Charton.
M.
(1981)
Progr.
Phjs.
Org. Chem.
13,
119-251
Charton.
M.
&
Charton.
B.I.
(1982)
J.
Theorer.
Bud.
99,
629-644
IUPAC-IUB Joint Commission on Biochemical
Nomenclature (1984)
European
J.
Biochem.
138,
9-37
BIO/.
102.
121-134
Address:
Dr.
J.L.
FuuchPre
Institut
fur
Biotechnologic
ETH Htinggerberg
CH
8093 Zurich
Sit
itzerland
278