ArticlePDF Available

DoMCoSAR: A Novel Approach for Establishing the Docking Mode That Is Consistent with the Structure−Activity Relationship. Application to HIV-1 Protease Inhibitors and VEGF Receptor Tyrosine Kinase Inhibitors

Authors:

Abstract and Figures

DoMCoSAR is a novel approach for statistically determining the docking mode that is consistent with a structure-activity relationship. The approach establishes the binding mode for the compounds in a chemical series with the assumption that all molecules exhibit the same binding mode. It involves three stages. In the first stage all molecules that belong to a given chemical series are docked to the active site of the protein target. The only bias used in the docking at this stage involves the location of the protein binding site. Coordinates of the common substructure (CS) that results from the unbiased docking are then clustered to establish the major substructure docking modes. In the second stage all molecules are docked to the major docking modes (MDMs) with constraints based on the common substructure. The third stage generates, for the major docking modes, interaction-based descriptors that include electrostatic, VDW, strain, and solvation contributions. The problem of docking mode evaluation is now reduced to the question of which descriptor set is more predictive. To establish a quantitative comparison of the descriptor sets associated with the major docking modes, we use 50 instances of random 4-fold cross-validation. For each 4-fold cross-validation the predictive squared correlation coefficient (R(2)) is computed. t-Tests are applied to establish significance of the differences in mean R(2) for one docking mode versus another. We test the methodology on two test cases: HIV-1 protease inhibitors (Holloway et al. J. Med. Chem. 1995, 38, 305-317) and vascular endothelial growth factor (VEGF) receptor tyrosine kinase oxoindoles (Sun et al. J. Med. Chem. 1998, 41, 2588-2603). For both test cases there is statistically significant preference for the binding mode consistent with the X-ray structure. The appeal of this methodology is that researchers gain the objectivity of statistical justification for the selected docking mode. The methodology is relatively insensitive to subtle variations of the protein structure that include, but are not limited to, side chain and small backbone rearrangement during binding. In addition, predictive models that result from the approach can be used to further optimize chemical series.
Content may be subject to copyright.
DoMCoSAR: A Novel Approach for Establishing the Docking Mode That Is
Consistent with the Structure-Activity Relationship. Application to HIV-1
Protease Inhibitors and VEGF Receptor Tyrosine Kinase Inhibitors
Michal Vieth* and David J. Cummins
Eli Lilly and Company, Lilly Research Laboratories, Lilly Corporate Center, DC 1513, Indianapolis, Indiana 46285
Received December 14, 1999
DoMCoSAR is a novel approach for statistically determining the docking mode that is
consistent with a structure-activity relationship. The approach establishes the binding mode
for the compounds in a chemical series with the assumption that all molecules exhibit the
same binding mode. It involves three stages. In the first stage all molecules that belong to a
given chemical series are docked to the active site of the protein target. The only bias used in
the docking at this stage involves the location of the protein binding site. Coordinates of the
common substructure (CS) that results from the unbiased docking are then clustered to establish
the major substructure docking modes. In the second stage all molecules are docked to the
major docking modes (MDMs) with constraints based on the common substructure. The third
stage generates, for the major docking modes, interaction-based descriptors that include
electrostatic, VDW, strain, and solvation contributions. The problem of docking mode evaluation
is now reduced to the question of which descriptor set is more predictive. To establish a
quantitative comparison of the descriptor sets associated with the major docking modes, we
use 50 instances of random 4-fold cross-validation. For each 4-fold cross-validation the predictive
squared correlation coefficient (R2) is computed. t-Tests are applied to establish significance of
the differences in mean R2for one docking mode versus another. We test the methodology on
two test cases: HIV-1 protease inhibitors (Holloway et al. J. Med. Chem. 1995,38, 305-317)
and vascular endothelial growth factor (VEGF) receptor tyrosine kinase oxoindoles (Sun et al.
J. Med. Chem. 1998,41, 2588-2603). For both test cases there is statistically significant
preference for the binding mode consistent with the X-ray structure. The appeal of this
methodology is that researchers gain the objectivity of statistical justification for the selected
docking mode. The methodology is relatively insensitive to subtle variations of the protein
structure that include, but are not limited to, side chain and small backbone rearrangement
during binding. In addition, predictive models that result from the approach can be used to
further optimize chemical series.
Introduction
High-throughput screening (HTS) of combinatorial
and corporate libraries allows for screening thousands
of compounds in a short period of time.3,4 Even though
HTS techniques have been developed and fully imple-
mented at many pharmaceutical companies around the
world, there is a finite cost associated with the screening
of each compound. Thus for some selected targets
screening smaller, biased libraries of compounds may
prove beneficial.5A popular technique for reducing a
library of molecules to a manageable size is rapid
docking and scoring,6-8also known as structure-based
virtual screening. The popularity of this approach can
be inferred from the fact that the principles of structure-
based screening date back to the late 19th century with
Fisher’s concept of the lock-and-key mechanism of
enzyme action.9It is believed that limiting a library to
molecules that fit and complement the receptor active
site should improve the rate at which the hits are
identified. Although appealing, the methodology of rapid
docking and scoring has not been properly evaluated
and the limitations and pitfalls have not been ad-
dressed.
Another application of computer docking methodology
is optimization of lead molecules. In many cases an
X-ray structure of the target is available or a homology
model can be produced. Docking techniques allow for
placement of the lead molecule in the context of the
receptor active site,10,11 and in some cases new synthetic
directions emerge for improving the potency (or other
characteristics) of a given lead series.1In addition,
docking of compounds from a lead series can be helpful
in creation of predictive quantitative structure-activity
relationship (QSAR) models.1,12,13 These models can be
used to prioritize synthesis of compounds and quanti-
tatively evaluate potential modifications of the lead
series.
Most available docking algorithms meet a minimum
standard of being able to reproduce orientations of the
protein-ligand X-ray structures.14-17 In the majority of
real life problems, however, the exact structure of the
protein from the complex is not a priori available.
Binding of highly similar ligands can cause minor or,
in the case of protein kinases, quite substantial rear-
rangements of the protein.18,19 Prediction of protein
rearrangements upon binding is a problem similar to
predicting structural changes observed between homolo-
gous proteins. According to CASP3 competition results,
current methods are not able to deal effectively with this
* To whom correspondence should be addressed. Phone: (317) 277-
3959. Fax: (317) 276-6545. E-mail: m.vieth@lilly.com.
3020 J. Med. Chem. 2000, 43, 3020-3032
10.1021/jm990609e CCC: $19.00 © 2000 American Chemical Society
Published on Web 07/21/2000
problem.20,21 Despite a frequent inability to predict
structural changes upon binding, chemical changes in
a series of potent ligands may be able to aid in the
selection of the docking mode.
This paper addresses the goal of obtaining a docking
mode that is consistent with the structure-activity
relationship (SAR). We introduce a methodology that
allows a medicinal chemist to choose a docking solution
that is consistent with the observed activity of a series
of ligands. In addition some potential pitfalls associated
with utilizing homology models for prediction are high-
lighted. Finally, the results of fully automated models
are contrasted with models involving some manual
intervention.
Materials and Methods
The docking protocol utilizes a combination of simulated
annealing and CHARMm22-based energy function and is
described in detail elsewhere.14,23 For the sake of introduction
we present a summary of the protocol. An initial random
distribution of ligand replicas is generated around the active
site. The docking protocol involving dynamics of ligand replicas
consists of three stages. In the first stage, the energy surface
is explored by annealing ligands starting at high temperatures
with soft-core nonbonded interactions. In the second stage local
minima are identified in the basins provided by the first stage.
The soft core is gradually hardened, and the starting temper-
ature for the annealing is decreased. In the last stage of the
protocol the soft core potential disappears and the resulting
structures are locally minimized. In this work the protocol is
adjusted slightly, as described below.
Parameters and Docking Potential. A previous study
established that utilization of the soft core potential is essential
for an efficient docking protocol. This work implements a
different form of the soft core potential24 to allow for physical
characteristics of model systems such as SPC water dimer.
This new soft core potential is defined independently for the
electrostatic and van der Waals (VDW) components in the
following way:
otherwise the usual form for the nonbonded interactions is
utilized.24 The switching to the soft core potential occurs at
the distance Rcut defined such that:
The coefficients aand bare chosen so that the energy and
force terms agree at Rcut. Note that for the electrostatic
attraction both Emax and aare negative. The qualitative and
quantitative results for the docking on model systems remain
unchanged for the test systems described in previous pa-
pers.14,23 For the first heating stage of the docking protocol, a
value of Emax equal to 1.5, -10.0, and 20.0 is utilized for the
VDW, electrostatic attraction, and repulsion terms, respec-
tively. In the cooling stage the Emax values are set to 3.0, -20.0,
and 40.0. In the third stage the values are changed to 30, -200,
and 400, and in the last stage the regular form of the potential
is used. The polar hydrogen representation of the protein is
used together with PARAM19/TOPH19 parameter and topol-
ogy files.22 All hydrogen ligand parameter files are created by
Quanta98.25 Ligand charges are generated by Gausian94 with
the HF/6-31G* basis set.26 For the HIV-1 protease inhibitor
set we also use five other charge assignment methods available
in InsightII for comparison purposes.27 The ligand center of
masses are restricted harmonically if their distance exceeds 6
Å from the center of the active site. The center of the active
site is defined as the center of mass of the ligand from X-ray
structure of the complex.
Docking and Generation of Descriptors. The assump-
tion is made that all molecules in the chemical series adopt
the same binding mode. This assumption allows for establish-
ing the most probable binding mode for a series of compounds
after extracting probable common substructure (CS) binding
modes from unbiased docking experiments. It is especially
important for docking to an imperfect protein model which
could prevent some molecules, sometimes the most potent
ones, from accessing the true binding due to steric changes
caused frequently by side chain or small backbone rearrange-
ments. Because of the assumption of common binding mode,
DoMCoSAR is currently limited to congeneric chemical series
and is not applicable to virtual screening of diverse sets of
molecules. However, DoMCoSAR can be used with any docking
approach including but not limited to DOCK,10 AutoDock,17
and GOLD16 packages.
The entire procedure is schematically depicted in Figure 1.
For each ligand, the experiment proceeds by randomly gen-
erating 20 replicas of ligands and annealing them following
the protocol described in the previous papers.14,23 The mean
docking time for this stage of the studies is 12 min/replica on
SGI 200Mhz R10000 chip. The ligand replicas are clustered
to obtain the final solution for each molecule/ligand in the
series. The clustering method used is a root mean square
(rms)-based Ward’s hierarchical, agglomerative clustering28
applied only to the heavy atom coordinates. The clustering is
carried out until the largest radius of the clusters is less than
1.5 Å.29 The decision to choose a 1.5 Å cluster radius as the
stopping criterion was arbitrary, but chosen such that cluster
centers were consistent with visual interpretation of docking
results.
CS of all final solutions for all ligands are subsequently
clustered to obtain the major docking modes (MDM) for
substructures. The major docking modes are usually populated
by more than 40% of the ligands. Once the major docking
modes are established, all ligands are redocked for each
selected docking mode with restraints to the position of the
substructure MDMs utilizing the short version of annealing
Eij )Emax -arijbif Eij >Emax
2(1)
Eij(Rcut))Emax/2 (2)
Figure 1. Schematic representation of the docking mode
selection. First, the ligands are parametrized and a common
substructure (CS) is defined for the series. Docking to the
binding site for all molecules is performed followed by rms
clustering of the resulting CS coordinates. The centers of
significant clusters define the coordinates of the major docking
modes (MDMs). All molecules are docked again with restraints
so that the entire series is docked in a way corresponding to
the MDMs. For each MDM, descriptors are generated based
on the CHARMm force field. Each MDM will have a different
corresponding descriptor set due to a different position of the
series with respect to the receptor. Predictability of each
descriptor set is assessed based on random 4-fold cross-
validation performed 50 times. The most predictive descriptor
set is chosen to indicate the docking mode that is most
consistent with the SAR.
DoMCoSAR Journal of Medicinal Chemistry, 2000, Vol. 43, No. 16 3021
schedule.14 For this stage of the protocol the mean redocking
time is 3 min/replica on SGI 200Mhz R10000 chip. NOE
distance restraints are employed between the CS atoms and
the dummy atoms positioned at the MDM coordinates. A force
constant of 20 kcal/Å2is used with Rmax )0.5 Å. At this stage
4 ligand replicas are utilized rather than 20 as used in the
initial docking. At the end of this stage the entire chemical
series is positioned in the active site in a number of ways with
CS occupying the positions corresponding to MDMs. For each
MDM interaction energy descriptors are generated for the
entire chemical series. Four structure-based descriptors were
selected for building predictive models. Electrostatic (eps )
2r) and VDW contributions to the interaction energy with the
receptor were chosen, as well as the strain energy and the
atom type weighted difference in total exposed surface area30
between receptor and ligand in the complex and in the isolated
state. The strain energy is defined as the difference between
the internal energy in the docked conformation and the
internal energy of the final ligand conformation that was
extracted from the active site and minimized with 200 steps
of conjugate gradient minimization.
At this stage each docking mode is represented by a set of
descriptors. If one considers only two MDMs, the problem of
selecting the one that is more consistent with the SAR reduces
to the question: which descriptor set is more predictive? To
answer this question a 4-fold cross-validation study is per-
formed. A 4-fold cross-validation consists of dividing the entire
set of molecules into four subsets at random. In each subset,
the activity of each molecule is predicted from a (QSAR) model
derived from the molecules in the three remaining subsets.
Thus, four different models are built on four different training
sets, each one of which is 75% of the size of the whole data
set. In this way the activity of all molecules in the set is
predicted (see Figure 2). 4-Fold cross-validation (especially
repeated numerous times) gives a far more realistic estimate
of the predictive power of the method than a leave-one-out test,
since the 4-fold test predicts substantial fractions of “new” data
and is less prone to giving misleading optimistic results if the
data set contains pairs of identical or highly similar molecules.
The method of partial least squares31,32 is applied using the
SAS system33 to determine a set of regression coefficients for
the training set of molecules:
where ykis the experimentally determined pIC50 )-log(IC50)
for the kth compound, k)1...Kwhere Kis the total size of
the data set, cij is the contribution to the activity of the
descriptor Dijk for the jth binding mode in each of two latent
variables, and Nis the number of descriptors. In this study N
)4 and the two best docking modes j)1, 2 are compared.
Our measure of predictability is the squared predictive cor-
relation coefficient R2defined as:
where yjis the mean of the observed pIC50s, yˆiis the predicted
pIC50 for the ith compound, and γ
j
ˆ is the mean predicted
pIC50.
Since there is a chance that a single data split is anecdotal,
we perform 50 random 4-fold data splits, each time building a
fresh set of predictive models. For visualization the distribu-
tion of the 50 resulting values of R2is estimated using Normal
kernel density estimation, where the optimal smoothing
bandwidth is estimated by generalized cross-validation.34
Thus, for each descriptor set corresponding to each docking
mode we have a distribution of R2. There are several statistics
that are called R2,Q2, etc.; the above formulation of R2ranges
from 0 (no correlation) to 1 (perfect correlation). One may
examine the distributions and visually determine which one
is better. However, if the distributions overlap it may be
desirable to use an objective criterion to determine which
descriptor set is on average more predictive. The t-test can be
used to obtain p-values35 for the differences in means of two
distributions. The tstatistic is defined by:
where r and s denote two docking modes. A p-value is
determined by comparing the statistic t* with the values one
may observe from a Tdistributed random variable. Extremely
high or low values of the t* statistic are unlikely to be due to
chance alone and thus give strong support to the hypothesis
that the two means are different. A p-value less than 0.05 is
often taken to indicate that the means are significantly
different.
There is one difficulty with the use of the t-test in the above
setting. The t-test assumes that the values realized are
independent and identically distributed, with common vari-
ance and possibly different means. The use of the t-test in this
setting involves a violation of the assumption of independence,
since the values of R2are the result of models that are
computed from training sets that are overlapping across the
50 iterations. For example, in iteration 1 a given observation
will be predicted from a model trained on 75% of the data. In
iteration 2, the same observation will be predicted from a
different model trained on another 75% portion of the data.
To the degree that the training sets overlap, the models will
tend to give the same predictions and the resulting values of
R2will be correlated. The practical result of this is that the
value of Kin the equation for the tstatistic is inflated and
the true p-value may be higher than what is computed.
However, if the p-value from the above procedure is very much
lower than 0.05, then the true p-value will also be below 0.05.
Adjusting the statistic to properly reflect this dependence is a
topic for further research.
Results
We apply the described methodology to two literature
data sets. The first data set of hydroxyethylenes tar-
Figure 2. 4-Fold cross-validation scheme. The data set is
randomly divided into four equal subsets. The activities of the
molecules in each subset (test set) are predicted from a model
trained on the activities of molecules for the remaining subsets
(train set). The procedure yields predictions for all molecules.
The random division of the molecules is repeated 50 times so
that each time the test sets are different.
yk)
i)1
NcijDijk (3)
R2)
[
i)1
K(yi-yj)(yˆi-y
j
ˆ)]2
i)1
K(yi-yj)2
i)1
K(yˆi-y
j
ˆ)2
(4)
t*)y
j
ˆr-y
j
ˆs
x
(y
j
ˆr-yˆkr)2
(Kr-1)Kr
+(y
j
ˆs-yˆks)2
(Ks-1)Ks
(5)
3022 Journal of Medicinal Chemistry, 2000, Vol. 43, No. 16 Vieth and Cummins
geted as HIV protease inhibitors1is widely accepted in
the literature as a benchmark to test structure-based
QSAR methodologies and descriptors.12,36 The set of
coordinates for the protein and inhibitors was kindly
supplied to us by Kate Holloway. The X-ray structure
of HIV-1 protease in a complex with L-689,502 including
the bound water molecule was used for docking. Active
site aspartic acid of the first chain was protonated. The
set of supplied coordinates was used to evaluate the
automated docking results. It is possible that the
procedure is optimistically biased toward a success, if
the HIV-1 from the complex contains the ligand shape
imprinted in itself thus favoring the true binding mode.
To avoid this putative shape and electrostatic bias, we
also repeat the entire protocol with the native HIV-1
protease with open flaps and no bound water molecule
(PDB code 3phv).37 The chemical series with CS high-
lighted is shown in Table 1. Following Perez12 we used
compounds 1,3-29, and 31-34 for our analysis with
the expectation that this data set should be able to
generate predictive models.
The second data set includes 53 oxoindoles with
cellular IC50 data in VEGF receptor tyrosine kinase
assay.2The chemical series is represented in Table 2
with CS highlighted in blue. In this case the absence of
X-ray structure at the time of the study necessitated
the use of a homology model. We deliberately included
a data set that has several approximations to show a
range of situations that might occur in problems faced
in the pharmaceutical industry.
HIV-1 Protease Inhibitor Data Set/Protein from
HIV-1/L-689,502 Complex. For this data set, the
coordinates of the crystallographically determined hy-
droxy R-carbon atom were defined as the center of the
active site. Each of the 32 molecules in the chemical
series was subjected to the multiple copy simultaneous
sampling (MCSS)38-based annealing involving 20 rep-
licas.14,23 For each ligand the rms-based Ward’s cluster-
ing was used to remove redundant docking solutions.
The clustering of the resulting CS (shown in red in
Figure 2A) revealed the existence of 12 significant
clusters that are represented in Table 3. We chose to
analyze in detail two major docking modes correspond-
ing to the two most populated clusters. The two- and
three-dimensional representations of the chosen docking
modes are depicted in Figure 3A,B. In practice, it may
be necessary to perform an analysis of more than two
modes. For each MDM the entire series was redocked
with NOE restraints for all CS atoms with four ligand
replicas following the same annealing protocol. The top
portion of Figure 4 shows the distribution of cross-
validated R2from the 4-fold cross-validations for the top
two MDMs. The t-test shows that mode 2 is more
predictive than mode 1. Thus we are convinced that the
mode corresponding to the crystallographic complex is
more consistent with the SAR. The mean value of the
cross-validated R2is 0.44. Analysis of individual de-
scriptors revealed that the electrostatic contribution is
poorly correlated with pIC50. In addition, examination
of the automatically docked structures showed that the
Figure 3. Two-dimensional (left-hand side) and three-dimensional (right-hand side) representation of the docking modes. For
the two-dimensional pictures the CS is shown in red, the protein atoms are shown in magenta, and the hydrogen bonds are
depicted as black dotted lines. For the three-dimensional pictures the protein backbone is represented as a cyan tube, carbons of
the inhibitors R-groups are green, CS atoms are orange for mode 1 and yellow for mode 2, nitrogens are blue, oxygens are red,
protein carbons are gray, and hydrogen bonds are represented as dotted green lines: (A) mode 1; (B) mode 2 that is consistent
with X-ray structure.
DoMCoSAR Journal of Medicinal Chemistry, 2000, Vol. 43, No. 16 3023
Table 1. HIV-1 Protease Inhibitor Seta
apIC50 is the negative log of IC50.
3024 Journal of Medicinal Chemistry, 2000, Vol. 43, No. 16 Vieth and Cummins
Table 1. (Continued)
DoMCoSAR Journal of Medicinal Chemistry, 2000, Vol. 43, No. 16 3025
Table 2. VEGF Kinase Inhibitor Seta
apIC50 is the negative log of cellular IC50.
3026 Journal of Medicinal Chemistry, 2000, Vol. 43, No. 16 Vieth and Cummins
Table 2. (Continued)
DoMCoSAR Journal of Medicinal Chemistry, 2000, Vol. 43, No. 16 3027
six CS hydrogen bonds for mode 2 are not preserved
across the entire series. Further redocking using identi-
cal annealing schedule with additional restraints for all
six hydrogen bonds led to a refined mode 2. The mean
value of the cross-validated R2for the improved mode
2 is 0.62. This value is similar to the leave-one-out R2
reported by Merck researchers after performing manual
modeling of all structures. It is possible that more
manual intervention of the final docked solution could
further improve the predictability of the model. It is our
belief, however, that the “average” modeler can obtain
predictive models without much manual intervention.
The bottom of Figure 4 shows the comparison of the
distribution of the cross-validated R2for the improved
mode 2 with the distribution for the fully automated
mode 2.
How do the dock-derived QSAR models compare with
two-dimensional QSAR models? To address this ques-
tion, models were derived from identical 50 sets of train/
test splits of the data using 166 MACCS keys39,40 as
descriptors. The bottom portion of Figure 5 shows that
the mean R2for the MACCS keys of 0.33 is significantly
lower than that for mode 2 (p-value <0.005). However,
the MACCS keys show better predictive power than
mode 1 of docking. These results suggest that MACCS
keys can be used in general to indicate the lower bound
of predictability that is expected from more sophisti-
cated three-dimensional-based approaches. It is worth
noting that the improved mode 2 provides a very
predictive model that can be applied to prioritize
synthesis of new analogues and evaluate novel ideas.
Thus, DoMCoSAR demonstrates the interrelationship
between docking mode and SAR: SAR can be helpful
in determining the binding mode, and in turn a good
binding mode can be used to create predictive models.
HIV-1 Protease Data Set/Native HIV-1 Protein.
One possible drawback of the above results could be a
possible bias caused by the use of the “almost perfect”
protein model. We did, after all, use the protein cocrys-
tallized with one member of the series. The protein could
likely contain the information about the shape of the
entire series. To avoid this possible bias we have
repeated the entire protocol, but this time the native
HIV-1 protease structure was used. As in the example
above the active site Asp of the first chain was proto-
nated. The native protein structure is slightly different
in the active site especially in the flap region. The open
conformation is likely to influence the interaction of the
entire series with flaps. Despite this, the reported
results suggest that one can obtain predictive models
for the series even with the native protein, if the same
ligand coordinates as for HIV-1/L-689,502 are used.1
However, it remains unclear whether an automated
approach could provide the geometry of the inhibitors
required to build predictive models if only the native
protein were available.
The entire series of 32 ligands was automatically
docked to the native HIV-1 protease. Clustering of CS
revealed the two most populated MDMs. In this case
the most populated cluster corresponds to the true
solution. The second CS docking mode represents the
inverted true solution just as in the case of the L-689,-
502-inhibited enzyme. Table 4 shows the detailed
results of the CS clustering. The center of the major
cluster corresponding to the true solution in the native
HIV-1 is on average a little more distant from the
reference structures than the CS of true solution in the
HIV-1/L-689,502 complex. For the two native protein
MDMs the entire series of 32 ligands was redocked with
restraints to the CS positions and four conserved CS
hydrogen bonds to the protein. For each docking mode,
four docking-based descriptors were generated for each
ligand. In this case 50 4-fold cross-validations revealed
that the true docking mode has a mean R2of 0.40 which
is significantly better than the mean R2of 0.19 for the
inverted docking mode (p-value <0.005). The QSARs
resulting from the docking to the native HIV-1 protease
Table 3. CS Clusters from an Unbiased Docking to
HIV-1/L-689,502
cluster rank/
mode no. cluster
populationaZscorebrms of the CS
reference/X-rayc
1 20 10.45 7.26
2 15 7.64 1.18
3 12 5.95 7.25
4 11 5.39 8.22
5 9 4.26 7.97
6 9 4.26 7.07
7 8 3.70 7.46
8 6 2.58 6.25
9 5 2.01 8.29
10 5 2.01 8.28
11 5 2.01 7.14
12 4 1.45 4.54
13 3 0.89 6.13
14 3 0.89 5.44
15 3 0.89 7.57
aCluster population refers to the number of CS from the docking
simulations of entire molecules that belong to a given cluster. For
example, a population of 20 for mode 1 means that 20 out of 32
series members exhibit this docking mode. bZscore35 represents
the normalized population of a cluster. For this data set Zscores
greater than 2.0 could be considered significant. cAverage of all
heavy atom rms is shown for the CS between the cluster center
and the supplied positions for the reference compounds.1
Figure 4. Distribution of the 50 predictive R2values. The
descriptor sets use the same data splits for each of the 50 trials.
(Top) The mode 1 distribution is shown in blue. The mean
value of R2is 0.14. The mode 2 R2distribution is shown in
red with the mean at 0.44. The p-value indicates that mode 2
is more predictive. (Bottom) The MACCS keys predictive R2
distribution is shown in black with a mean of 0.33. The
improved mode 2 (by manual intervention to restrain hydrogen
bonds across the series) R2distribution is shown in green with
a mean of 0.62. The p-values indicate that the improved mode
2 is more predictive than mode 2, which is more predictive
than MACCS keys.
3028 Journal of Medicinal Chemistry, 2000, Vol. 43, No. 16 Vieth and Cummins
are thus less predictive than those obtained with the
L-689,502-inhibited enzyme. Even so, it is impressive
to obtain a result that is significantly better than
random even under somewat less than optimal condi-
tions. It is also of interest to note that using different
ligand charge assignment methods can in many cases
alter the predictability of both QSAR models. Starting
from the final structures generated by DoMCoSAR, we
examined five charge assignment methods available in
Insight 97.527 In each case DoMCoSAR structures were
minimized with 500 steps of conjugate gradient to allow
for slight adjustments. The results are reported in Table
5. Ironically, the more involved ab initio charges provide
less predictive models than those with charges gener-
ated with Insight in a matter of seconds. We also noted
that the use of strain energy does not add to the
predictability of the models for Insight-generated ligand
charges. Most predictive models (as judged by 50 4-fold
cross-validation results) can be obtained with cff91
Insight charges for the native HIV-1 complexes. How-
ever, for the L-689,502-inhibited protein the Insight-
generated Amber charges provide the most predictive
model with the average R2of 0.73. In all cases the
L-689,502-docked complexes give significantly more
predictive models than the native protein complexes.
Thus the quality of the QSAR models seems to strongly
correlate with the quality of the protein model. However,
the important finding is that for this SAR, DoMCoSAR
can use even the uncomplexed protein as basis to
construct predictive and interpretable models.
Oxoindole Data Set. To test the generality of the
approach and to identify possible limitations, we applied
DoMCoSAR to a set of 53 oxoindoles targeted at tyrosine
kinases. The entire series used in this study is depicted
in Table 2 with CS highlighted in red. IC50 data come
from a cell-based assay. We did not have access to the
crystalographically determined coordinates of VEGF
RTK-1 for which the oxoindoles show the best inhibitory
effect; however, two publicly available X-ray structures
of the oxoindole series with FGF tyrosine kinase (PDB
codes 1agw and 1fgi41) were available. FGFR kinase
shares 44% sequence identity with VEGF kinase. The
PSI-Blast42 sequence alignment and MODELER43 pack-
Figure 5. Two-dimensional (left-hand side) and three-dimensional (right-hand side) representation of the docking modes. For
the two-dimensional pictures the CS is shown in red, the protein atoms are shown in magenta, and the hydrogen bonds are
depicted as black dotted lines. For three-dimensional pictures the protein backbone is represented as a cyan tube, carbons of
inhibitors R-groups are green, CS atoms are orange for mode 1 and yellow for mode 2, nitrogens are blue, oxygens are red, protein
carbons are gray, and hydrogen bonds are represented as dotted green lines: (A) mode 1; (B) mode 2 that is consistent with X-ray
structure.
Table 4. CS Clusters for an Unbiased Docking to the Native
HIV-1 Protease37
cluster rank/
mode no. cluster
populationaZscorebrms of the CS
reference/X-rayc
1 16 15.17 1.51
2 14 13.13 8.36
3 7 5.99 8.45
4 6 4.97 8.00
5 4 2.93 8.73
6 4 2.93 7.71
7 4 2.93 9.18
8 3 1.91 5.14
9 3 1.91 1.78
10 3 1.91 6.85
11 2 0.89 6.87
12 2 0.89 6.92
13 2 0.89 7.38
14 2 0.89 8.33
15 2 0.89 8.66
a-cSee corresponding footnotes in Table 3.
DoMCoSAR Journal of Medicinal Chemistry, 2000, Vol. 43, No. 16 3029
age were used to generate a model of VEGF kinase
based on 1fgi.41 No water molecules were retained.
Unbiased docking and clustering revealed 11 major
binding modes (Table 6). The pictorial representation
of two modes is shown in Figure 5A,B. As was done with
the HIV-1 protease test case, we have chosen the two
most significantly populated modes to evaluate in 50
trials of 4-fold cross-validation. The distribution of
predictive R2values for two modes is shown in the top
portion of Figure 6. The entire distributions are well-
separated as well as the means (p-value <0.005);
however, the mean R2of the more predictive set is still
below 0.2. The bottom portion of Figure 6 reveals that
models from MACCS key descriptors also perform poorly
in cross-validation for this data set. The fact that both
three-dimensional docking and two-dimensional MACCS
keys are unable to build predictive models suggests
numerous complications. One may conjecture that such
behavior could result from attempting to build models
based on cellular assay data, since there could be factors
Table 5. Comparison of DoMCoSAR QSARs for Different Ligand Charge Assignments and Different Proteins
ligand chargeaproteinbR2cleave-one-
out R2d4-way cross-
validated R2eweights for terms, pIC50f
6-31G* native 0.55 0.41 0.40 (0.041 -3.1 -0.25Elec -0.15VdW +0.03Eis +1.17Strain
4 descriptors (0.38) (0.20) (0.19 (0.051)
6-31G* native 0.49 0.34 0.34 (0.047 -6.8 -0.37Elec -0.24VdW +0.06Eis
3 descriptors (0.39) (0.22) (0.22 (0.059)
Quanta template native 0.61 0.46 0.46 (0.049 -7.8 -0.42Elec -0.28VdW +0.06Eis
(0.33) (0.15) (0.15 (0.055)
cvff native 0.62 0.46 0.45 (0.043 -5.7 -0.46Elec -0.22VdW +0.05Eis
(0.37) (0.18) (0.19 (0.060)
cvff91 native 0.66 0.53 0.52 (0.040 -6.8 -0.44Elec -0.22VdW +0.04Eis
(0.38) (0.14) (0.15 (0.053)
Amber native 0.59 0.48 0.46 (0.039 -4.4 -0.29Elec -0.25VdW +0.05Eis
(0.37) (0.20) (0.20 (0.054)
6-31G* L689-502-inhibited 0.76 0.65 0.62 (0.056 -13.9 -0.13Elec -0.34VdW +0.023Eis -0.07Strain
4 descriptors
6-31G* L689-502-inhibited 0.76 0.71 0.69 (0.031 -15.1 -0.17Elec -0.34VdW +0.003Eis
Quanta template L689-502-inhibited 0.76 0.71 0.70 (0.027 -14.2 -0.17Elec -0.33VdW -0.004Eis
cvff L689-502-inhibited 0.78 0.72 0.70 (0.030 -13.4 -0.20Elec -0.31VdW +0.0001Eis
cvff91 L689-502-inhibited 0.76 0.69 0.68 (0.027 -14.5 -0.18Elec -0.32VdW +0.001Eis
Amber L689-502-inhibited 0.78 0.73 0.73 (0.024 -12.1 -0.11Elec -0.33VdW -0.005Eis
aLigand charges were generated using Insight 97.5,27 except 6-31G* which were generated with Gaussian.26 All ligands had formal
charge of 0. Protein polar hydrogen representation with CHARMm PARAM19/TOPH19 topology and parameter files22 were used in all
calculations. bProtein structure used for dockings and descriptor calculations. cCorrelation coefficients R2for all 32 ligands. For the
native HIV-1 protease the numbers in parentheses refer to the inverted/“nonnative” docking mode. dSquared correlation correlation
coefficient resulting from leave-one-out cross-validation. For the native HIV-1 protease the numbers in parentheses refer to the inverted/
“nonnative” docking mode eAverage squared correlation coefficient resulting from 50 4-fold cross-validation experiments. Standard
deviations are also reported, which can be used in conjunction with eq 5 to compute tstatistics. For the native HIV-1 protease the numbers
in parentheses refer to the inverted/“nonnative” docking mode. In all cases the true docking mode produces significantly better QSAR
models than the inverted mode. The 6-31G* with 3 descriptors is the only QSAR model that is not significantly better than two-dimensional
MACCS QSAR. Gaussian-generated charges do not produce the most predictive models; more predictive models can be constructed if
cvff91 or Amber charges on ligands are used for the native and L-689-502-inhibited HIV-1 protease, respectively. For all cases except the
native HIV-1 protease with 6-31G* ligand charges, 3-descriptor models are significantly more predictive than 4-descriptor models. fThe
resulting equation for the -log(IC50) for all 32 ligands for the X-ray-like docking mode. Elec denotes electrostatic ligand and ligand-
protein interaction energy; VdW denotes the van der Waals ligand and ligand protein interaction energy; Eis denotes the surface-based
Eisenberg solvation energy; Strain denotes ligand’s strain energy. Note that the weightings are not indicative of the importance of the
descriptors. van der Waals energies are most correlated with activities in all cases with leave-one-out R2ranging from 0.3 (6-31G* ligand
charges with native protein) to 0.68 (Amber ligand charges and L-689-502-inhibited protein).
Table 6. CS Clusters for an Unbiased Docking of Oxoindole
Series to VEGF Model
cluster
rank cluster
population Zscore rms of the CS
from X-ray41
1 37 7.50 3.12
2 32 6.42 0.55
3 21 4.03 6.36
4 17 3.17 9.00
5 13 2.30 8.48
6 12 2.08 5.48
7 11 1.87 8.00
8 8 1.22 6.81
9 8 1.22 3.89
10 8 1.22 5.25
11 7 1.00 4.01
12 6 0.78 6.87
13 5 0.56 8.05
14 4 0.35 6.08
15 4 0.35 1.77
Figure 6. Distribution of the 50 predictive R2values. All data
splits are identical for all descriptor sets. (Top) The mode 1
distribution is shown in blue. The mean value of R2is 0.01.
The mode 2 R2distribution is shown in red with a mean of
0.15. The p-value indicates that mode 2 is more predictive.
(Bottom) The MACCS keys predictive R2distribution is shown
in black with a mean of 0.05.12
3030 Journal of Medicinal Chemistry, 2000, Vol. 43, No. 16 Vieth and Cummins
differentially affecting permeability and selectivity of
the compounds. This conjecture appears to be partially
supported by the recent enzyme VEGF assay data for
22 members of the series which shows no correlation
with VEGF cellular data.44 Another possible explanation
is that multiple binding modes could occur, thus violat-
ing the primary assumption of this work and making
the series difficult for traditional QSAR methods to
handle. This is a topic of current research. Another
significant factor contributing to poor predictability of
the docking modes could be related to the fact that the
homology model may not well represent the true protein.
We note that our homology model matched the ATP
binding site of the recently disclosed VEGF X-ray
structure45 very well; thus the protein modeling is less
likely to be the cause of the poor QSAR models in this
case. However, despite the poor predictive value of the
model, it is the docking mode that is consistent with
the X-ray structures of the two kinases that has the
superior cross-validation profile. For this series, even
though we were unable to create a good predictive
model, DoMCoSAR was able to identify the binding
mode consistent with X-ray structures of related com-
plexes.
Conclusions
We have demonstrated that for a chemical series,
DoMCoSAR can be helpful to statistically validate and
discriminate between possible docking modes. The ap-
plication of t-tests to the docking mode-dependent
predictive R2is a novel way to address the question of
docking mode SAR consistency. For the two cases
examined here, DoMCoSAR indicates that the docking
mode consistent with the X-ray structure of the complex
produces a QSAR model with higher predictive power
than QSARs derived from other modes. For some
chemical series, building docking-based predictive mod-
els is possible, but it is difficult to generalize this
statement. We found that the quality of the protein
model influences the predictability of the QSARs. For
the series of HIV-1 protease inhibitors, we demonstrated
that true binding mode could be identified using the
uncomplexed enzyme and serves to build predictive
models. However the use of the complexed HIV-1
protein with the bound water molecule did provide
significantly better models. As observed for the oxoin-
dole series, highly predictive docking-based models
could not be created despite the undeniable value of
generating the hypothetical binding mode. In addition
we have postulated that models derived from MACCS
keys or other two-dimensional descriptors can be ef-
fectively used to estimate the lower bound of series’
predictive ability.
This work gives supporting evidence that docking can
be extremely helpful to medicinal chemistry lead opti-
mization efforts. However, we feel that the utility of
docking as a virtual screening tool needs to be carefully
examined. As we have demonstrated, only about 50%
of the molecules of the series exhibit the unbiased
docking mode that is consistent with the X-ray struc-
ture. Thus, docking of databases carries a danger that
the score for some molecules will be evaluated based
on the incorrect binding mode. This is especially true
for the receptors that undergo conformational rear-
rangement upon binding. We will address the question
of scoring of database-docked structures in an upcoming
report.
Acknowledgment. We thank Kate Holloway from
Merck Research Laboratories for sending us the mod-
eled structures of HIV-1 protease inhibitors and very
insightful comments. We also thank referees of the
manuscript whose comments vastly improved the qual-
ity of the presented work. We are grateful to Bernard
Brooks for providing valuable suggestions for the func-
tional form of the docking potential. We thank Dr. Jon
Erickson for useful comments and suggestions. Helpful
discussions with Angel Ortiz, Charlie Brooks, Richard
Higgs, Robert Babine, Jim Wikel, and Jean-Pierre Wery
are appreciated.
References
(1) Holloway, K. M.; Wai, J. M.; Halgren, T. A.; Fitzgerald, P. M.
D.; Vacca, J. P.; Dorsey, B. D.; Levin, R. B.; Wayne, J. T.; Chen,
L. J.; deSolms, J. S.; Gaffin, N.; Ghosh, A. K.; Giuliani, E. A.;
Graham, S. L.; Guare, J. P.; Hungate, R. W.; Lyle, T. A.; Sanders,
W. M.; Tucker, T. J.; Wiggins, M.; Wiscount, C. M.; Woltersdorf,
O. W.; Young, S. D.; Darke, P. L.; Zugay, J. A. A Priori Prediction
of Activity for HIV-1 Protease Inhibitors Employing Energy
Minimization in the Active Site. J. Med. Chem. 1995,38, 305-
317.
(2) Sun, L.; Tran, N.; Tang, F.; App, H.; Hirth, P.; McMahon, G.;
Tang, C. Synthesis and Biological Evaluations of 3-Substituted
Indolin-2-ones: A Novel Class of Tyrosine Kinase Inhibitors that
Exhibit Selectivity toward Particular Receptor Tyrosine Kinases.
J. Med. Chem. 1998,41, 2588-2603.
(3) Gallop, M. A.; Barret, R. W.; Dower, W. J.; Fodor, S. P. A.;
Gordon, E. M. Applications of combinatorial technologies to drug
discovery. 1. Background and peptide combinatorial libraries.
J. Med. Chem. 1994,37, 1233-1251.
(4) Gordon, E. M.; Barret, R. W.; Dower, W. J.; Fodor, S. P.; Gallop,
M. A. Applications of combinatorial technologies to drug discov-
ery. 2. Combinatorial organic synthesis, library screening strate-
gies, and future directions. J. Med. Chem. 1994,37, 1385-401.
(5) Zheng, Q.; Kyle, D. J. Computational screening of combinatorial
libraries. Bioorg. Med. Chem. 1996,4, 631-638.
(6) Muegge, I.; Martin, Y. C.; Hajduk, P. J.; Fesik, S. W. Evaluation
of PMF scoring in docking weak ligands to the FK506 binding
protein. J. Med. Chem. 1999,42, 2498-503.
(7) Stewart, K. D.; Bentley, J. A.; Cory, M. DOCKing ligands into
receptors: The test case of A-Chymotrypsin. Tetrahedron Com-
put. Methodol. 1990,3, 713-722.
(8) Charifson, P. S.; Corkery, J. J.; Murcko, M. A.; Walters, P. W.
Consensus Scoring: A Method for Obtaining Improved Hit Rates
from Docking Databases of Three-Dimensional Structures into
Proteins. J. Med. Chem. 1999,42, 5100-5109.
(9) Fisher, E.; Thierfelder, H. Chem. Ber. 1894,27, 2031.
(10) Kuntz, I. D.; Blaney, J. M.; Oatley, S. J.; Langridge, R.; Ferrin,
T. E. A geometric approach to macromolecule-ligand interactions.
J. Mol. Biol. 1982,161, 269-288.
(11) Gschwend, D. A.; Good, A. C.; Kuntz, I. D. Molecular Docking
Towards Drug Discovery. J. Mol. Recognit. 1996,9, 175-186.
(12) Perez, C.; M. P.; Ortiz, A.; Gago, F. Comparative Binding Energy
Analysis of HIV-1 Protease Inhibitors: Incorporation of Solvent
Effects and Validation as a Powerful Toll in Receptor-Based
Drug Design. J. Med. Chem. 1998,41, 836-852.
(13) Jalaie, M.; Erickson, J. A. Homology Model Directed Alignment
Selection for Comparative Molecular Field Analysis: Application
to Photosystem II Inhibitors. J. Comput.-Aided Mol. Des. 2000,
14, 181-197.
(14) Vieth, M.; Hirst, J. D.; Dominy, B. N.; Daigler, H.; Brooks III,
C. L. Assessing Search Strategies for Flexible Docking. J.
Comput. Chem. 1998,19, 1623-1631.
(15) Judson, R. S.; Tan, Y. T.; Mori, E.; Melius, C.; Jeager, E. P.;
Treasurywala, A. M.; Mathiowetz, A. Docking flexible mol-
ecules: A case study of three proteins. J. Comput. Chem. 1995,
16, 1405-1419.
(16) Jones, G.; Willet, P. Docking of small-molecule ligands into active
sites. Curr. Opin. Biotechnol. 1995,6, 652-656.
(17) Goodsel, D. S.; Olson, A. J. Automated docking of substrates to
proteins by simulated annealing. Proteins 1990,8, 195-202.
(18) Goldsmith, E. J.; Cobb, M. H. Protein kinases. Curr. Opin. Struct.
Biol. 1994,4, 833-840.
DoMCoSAR Journal of Medicinal Chemistry, 2000, Vol. 43, No. 16 3031
(19) Wilson, K. P.; McCaffrey, P. G.; Hsiao, K.; Pazhanisamy, S.;
Galullo, V.; Bemis, G. W.; Fitzgibbon, M. J.; Caron, P. R.;
Murcko, M. A.; Su, M. S. S. The structural basis for the
specificity of pyridinylimidazole inhibitors of p38 MAP kinase.
Chem. Biol. 1997,4, 423-431.
(20) Jones, T. A.; Kleywegt, G. J. CASP3 Comparative Modeling
Evaluation. Proteins: Struct. Funct. Genet. Suppl. 1999,3,30-
46.
(21) Bates, P. A.; Sternberg, M. J. E. Model Building by Comparison
at CASP3: Using Expert Knowledge and Computer Automation.
Proteins: Struct. Funct. Genet. Suppl. 1999,3,47-54.
(22) Brooks, B. R.; Bruccoleri, R. E.; Olafson, B. D.; States, D. J.;
Swaminathan, S.; Karplus, M. CHARMM: A program for
macromolecular energy, minimization and dynamics calcula-
tions. J. Comput. Chem. 1983,4, 187-217.
(23) Vieth, M.; Hirst, J. D.; Kolinski, A.; Brooks III, C. L. Assessing
Energy Functions for Flexible Docking. J. Comput. Chem. 1998,
19, 1612-1622.
(24) Brooks, B. Personal communication.
(25) QUANTA, 4.6 ed.; Molecular Simulations Inc.: San Diego, CA,
1997.
(26) Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Gill, P. M.; Johnson,
B. G.; Robb, M. A.; Cheeseman, J. R.; Keith, T.; Petersson, G.
A.; Montgomery, J. A.; Raghavavachari, K.; Al-Laham, M. A.;
Zakrzewski, A. G.; Ortiz, J. V.; Foresman, J. B.; Cioslowski, J.;
Stefanov, B. B.; Nanayakkara, A.; Challacombe, M.; Peng, C.
Y.; Ayala, P. Y.; Chen, W.; Wong, M. W.; Anders, J. L.; Replogle,
E. S.; Gomperts, R.; Martin, R. L.; Fox, D. J.; Binkley, J. S.;
Defrees, D. J.; Baker, J.; Stewart, J. P.; Head-Gordon, M.;
Gonzales, C.; Pople, J. A. Gaussian 94; Gaussian, Inc.: Pitts-
burgh, PA, 1995.
(27) Insight II, release 97.5; Molecular Simulations Inc.: San Diego,
CA, 1998.
(28) Jain, A. K.; Dubes, R. C. Algorithms for Clustering Data; Prentice
Hall: Englewood Cliffs, NJ, 1988.
(29) Vieth, M.; Hirst, J. D.; Brooks, C. L., III. Do active site
conformations of small ligands correspond to low free-energy
solution structures? J. Comput.-Aided Mol. Des. 1998,12, 563-
572.
(30) Wesson, L.; Eisenberg, D. Atomic solvation parameters applied
to molecular dynamics of proteins in solution. Protein Sci. 1992,
2, 227-235.
(31) Wold, H. Soft Modelling; Wold, H., Ed.; North-Holland: Am-
sterdam, 1981.
(32) Wold, H. Soft Modeling by Latent Variables; the Nonlinear
Iterative Partial Least Squares Approach; Wold, H., Ed.; Aca-
demic Press: London, 1975.
(33) SAS; SAS Institute Inc.: Cary, NC, 1989-96.
(34) Silverman, B. W. Density Estimation for Statistics and Data
Analysis; Chapman & Hall: New York, London, 1986.
(35) Bulmer, M. G. Principles of Statistics; Dover Publications: New
York, 1979.
(36) Muegge, I.; Martin, Y. C. A general and fast scoring function
for protein-ligand interactions: a simplified potential approach.
J. Med. Chem. 1999,42, 791-804.
(37) Lapatto, R.; Blundell, T.; Hemmings, A.; Overington, J.; Wilder-
spin, A.; Wood, S.; Merson, J. R.; Whittle, P. J.; Danley, D. E.;
Geoghegan, K. F.; Hawrylik, S. J.; Lee, S. E.; Scheld, K. G.;
Hobart, P. M. X-ray analysis of HIV-1 proteinase at 2.7 A
resolution confirms structural homology among retroviral en-
zymes. Nature 1989,342, 299-302.
(38) Miranker, A.; Karplus, M. Functionality maps of binding sites:
A multiple copy simultaneous search method. Proteins 1991,11,
29-34.
(39) MACCS-II; Molecular Design Ltd.: San Leandro, CA.
(40) Brown, R. D.; Martin, Y. C. The information content of 2D and
3D structural descriptors relevant to ligand-receptor binding.
J. Chem. Inf. Comput. Sci. 1997,37,1-9.
(41) Mohammadi, M.; McMahon, G.; Sun, L.; Tang, C.; Hirth, P.; Yeh,
B. K.; Hubbard, S. R.; Schlessinger, J. Structures of the tyrosine
kinase domain of fibroblast growth factor receptor in complex
with inhibitors. Science 1997,276, 955-960.
(42) Altschul, S. F.; Madden, T. L.; Schaffer, A. A.; Zhang, J.; Zhang,
Z.; Miller, W.; Lipman, D. J. Gapped BLAST and PSI-BLAST:
a new generation of protein database search programs. Nucleic
Acid Res. 1997,25, 3389-3402.
(43) Sali, A.; Blundell, T. L. Comparative protein modelling by
satisfaction of spatial restraints. J. Mol. Biol. 1993,234, 779-
815.
(44) Sun, L.; Tran, N.; Liang, C.; Tang, F.; Rice, A.; Schreck, R.;
Waltz, K.; Shawver, L. K.; McMahon, G.; Tang, C. Design,
Synthesis, and Evaluation of Substituted 3-[(3- or 4-Caboxyeth-
ylpyrrol-2-yl)methylidenyl]indoline-2-ones as Inhibitors of VEGF,
FGF and PDGF Receptor Tyrosine Kinases. J. Med. Chem. 1999,
42, 5120-5130.
(45) McTigue, M. A.; Wickersham, J.; Pinko, C.; Showalter, R.; Parast,
C.; Tempczyk-Russell, A.; Gehring, M.; B. M.; Kan, C.; Vil-
lafranca, J.; Appelt, K. Crystal structure of the kinase domain
of human vascular endothelial growth factor receptor 2: a key
enzyme in angiogenesis. Structure 1999,7, 319-330.
JM990609E
3032 Journal of Medicinal Chemistry, 2000, Vol. 43, No. 16 Vieth and Cummins
... ComBind also draws inspiration from previous methods that predict binding poses of multiple known binders simultaneously. Some of these methods consider a congeneric series of ligands and require that the shared scaffold is similarly placed (49,50). Others use either the number of similarly placed functional groups (51) or the number of shared interactions (52) between a set of docked ligands as a scoring function, assuming that the ligands adopt maximally similar poses. ...
Article
Full-text available
Significance Structure-based drug design depends on the ability to predict both the three-dimensional structures of candidate molecules bound to their targets and the associated binding affinities. We demonstrate that one can substantially improve the accuracy of these predictions using easily obtained data about completely different molecules that bind to the same target without requiring any target-bound structures of these molecules. The approach we developed to integrate physical and data-driven modeling may find a variety of applications in the rapidly growing field of artificial intelligence for drug discovery.
Article
Wound healing is an essential cellular process involving numerous mechanisms. For an effective wound healing process, a suitable material must be applied to prevent the wound from various infections. Recently, researchers have mainly concentrated on development of wound dressing material by combining polymer with medicinal plant extracts based electrospun nanofibers due to their structural morphology mimicking extra cellular matrix of native skin organ. Thus, this study emphasizes on synthesizing nanofibrous scaffolds of polycaprolactone (PCL) incorporated with the extract of P. amboinicus, a therapeutic and bioactive drug, for wound healing application. The leaf extract contains phytochemicals such as tannins, steroids, essential oils, flavonoids, alkaloids and terpenes, etc., which are preferable for skin wound healing application. The fabricated nanofibers are characterized by SEM, XRD, FTIR analysis. The results of SEM analysis reveals structural morphology of electrospun nanofibrous scaffold showing a continuous, smooth, bead-less and inter connective nanofibrous structure. The results of XRD pattern reveals two characteristic peaks at 2θ = 21.4° and 2θ = 23.4° for the developed nanofibers which confirms that the PCL structure was maintained. The in-vitro MTT assay shows that the biocompatibility property is the highest for P. amboinicus (3%) loaded PCL nanofibrous scaffold. The fabricated P. amboinicus (3%) loaded PCL nanofiber shows higher anti-bacterial efficiency against mixed bacterial culture, thus providing their capability for pathogenic resistance. The results of contact angle measurement exhibit 112°±0.51, 77.1°± 0.1 and 53.1°±0.4 for PCL, PCL + P. amboinicus (1 and 3%) nanofibrous scaffolds proving an increased hydrophilic character of nanofibers that helps in maintaining moist environment on the wound site. From the results of gas chromatography–mass spectrometry, three phytochemical compounds i.e., diethyl phthalate, n-hexadecanoic acid and stigmasterol, are selected based on the area percentage. Docking analysis is carried out to confirm the wound healing property of the selected phytochemical compounds present in the methanolic extract of P. amboinicus against matrix metalloproteinases (MMP8) target protein. The docking results confirm that the electrospun PCL + P. amboinicus nanofibrous scaffold is a suitable therapeutic material for skin wound healing applications.
Article
Full-text available
A novel synthetic route for the construction of (E)‐3‐alkylideneindolin‐2‐ones through iron‐catalyzed aerobic oxidative condensation of oxindoles with benzylamines has been developed. This oxidative reaction involves a sequence of C–H activation, amine self‐condensation, nucleophilic addition, and C–C double bond formation. The synthetic importance of this protocol has been demonstrated by preparing tyrosine kinase inhibitors, anticonvulsant and antitumor agents, and other valuable 3‐alkylideneindolin‐2‐one derivatives. Key intermediates are isolated and a plausible mechanistic pathway for the reaction has been discussed.
Article
An acid-promoted 2-component Mannich annulation reaction of readily available acetylbenzaldehydes and secondary amines has been reported. The approach provides a simple and efficient method under mild conditions to synthesize 3-aminoindan-1-one derivatives in moderate to good yields.
Article
Multiple Quantitative Structure-Activity Relationship (QSAR) analysis is widely used in drug discovery for lead identification. Human Immunodeficiency Virus (HIV) protease is one of the key targets for the treatment of Acquired Immunodeficiency Syndrome (AIDS). One of the major challenges for the design of HIV-1 protease inhibitors (HIV PRIs) is to increase the inhibitory activities against the enzyme to a level where the problem associated to drug resistance may be considerably delayed. Herein, chemometric analyses were performed with 346 structurally diverse HIV PRIs with experimental bioactivities against a sub-type B mutant to develop highly predictable QSAR models and also to identify the effective structural determinants for higher affinity against HIV PR. The QSAR models were developed using OCHEM-based machine learning tools (ASNN, FSMLR, KNN, RF, MANN and XGBoost), with descriptors calculated by eight different software packages. Simultaneously, a Monte Carlo optimization-based QSAR modelling was performed using SMILES and graph-based descriptors to understand fragment and topochemical contributions. To validate the actual predictability of all these models, an additional set of 104 compounds (also with known experimental activities) with slightly different chemical space were employed. This ligand-based study serves as a crucial benchmark for further development of the HIV protease inhibitors with improved activities.
Article
Incorporating experimental restraints is a powerful method of increasing accuracy in computational protein small molecule docking simulations models. Different algorithms integrate distinct forms of biochemical data during the docking and/or scoring stages. These so-called hybrid methods make use of receptor-based information such as nuclear magnetic resonance (NMR) restraints or small molecule-based information such as structure-activity relationships (SAR). A third class of methods directly interrogates contacts between the protein receptor and the small molecule. This work reviews the current state of using such restraints in docking simulations, evaluates their feasibility across broad systems, and identifies potential areas of algorithm development.
Article
Rhodium(III)- and iridium(III)-catalyzed C–H activation of oximes and coupling with propargyl alcohols is discussed. Depending on the catalyst, the reaction pathway switched between [3 + 2] and [4 + 2] annulations, thus giving divergent access to indenamines and isoquinolines in a one-pot and atom-economical manner. The hydroxyl group in the tertiary propargyl alcohol substrate was found to be crucial in controlling chemoselectivity. Five-membered rhodacycle and iridacycle intermediates have also been identified for mechanism hypotheses.
Article
Full-text available
Herein we reported a facile approach to multi-substituted indenes and cyclopenta[b]quinolines under mild conditions. The reaction proceeds via Michael addition between commercially available cyanoacetate/malonic esters and α,β-unsaturated ketones. The synthetic methodology involves enolate mediated regio- and stereoselective intramolecular 5-enolexo-dig cyclization promoted by a catalytic base. The products formed stereoselectively cis in indenes and trans-isomers for cyclopenta[b]quinolines albeit presence of steric hindrance at a quaternary carbon substituted by active methylene compounds. The reaction pathway was investigated by isolating the reaction intermediate. This synthetic transformation was achieved with various aromatic and heteroaromatic Michael acceptors and desired products were obtained in high to excellent yields. The reaction is scalable up to grams level with only 10 mol% of base.
Article
A three-component cascade cyclization was developed to synthesize 2,3-diarylisoindolin-1-one by using 2-formylbenzonitrile, arenas and diaryliodonium salts. The process underwent copper-catalyzed tandem C-N / C-C bonds fromation, producing isoindolin-1-one derivatives in good to excellent yields.
Article
ChemInform is a weekly Abstracting Service, delivering concise information at a glance that was extracted from about 100 leading journals. To access a ChemInform Abstract of an article which was published elsewhere, please select a “Full Text” option. The original article is trackable via the “References” option.
Article
We have previously studied the ability of various structural descriptors to distinguish between biologically active and inactive compounds (ref 1). This paper examines the degree to which these descriptors encode information relevant to the forces of ligand-receptor binding, namely hydrophobic, dispersion, electrostatic, steric, and hydrogen bonding interactions. This is assessed by the ability to accurately predict values for physical properties of a structure related to each of the interactions, from the known values for other structures which are shown to be structurally similar to the first by the descriptor in question. Our results suggest that the differences we observed in the ability of descriptors to separate active from inactive molecules may be explained by the degree to which they encode information relevant to ligand-receptor binding. In particular we found that the MACCS structural key descriptor implicitly contains a great deal of information relevant to each type of interaction.
Article
Structures of 103 ligands previously tested as inhibitors of chymotrypsin catalysis were docked into the active site of the enzyme by use of the dock computer program. The goodness of fit was evaluated according to an approximate Lennard-Jones potential scoring routine. A statistical analysis indicated that dock correctly ranked the database when viewed in terms of a contingency table of four categories: true positives, false positives, true negatives, and false negatives. Eight of the top ten scoring molecules in the computerized docking procedure had been previously reported to be effective competitive inhibitors of chymotrypsin. This agreement between the computer predictions and experimental observations was encouraging and suggests that the dock computer program may be useful in evaluating other receptors for potential binding ligands.
Article
The NIPALS approach is applied to the ‘soft’ type of model that has come to the fore in sociology and other social sciences in the last five or ten years, namely path models that involve latent variables which serve as proxies for blocks of directly observed variables. Such models are seen as hybrids of the ‘hard’ models of econometrics where all variables are directly observed (path models in the form of simultaneous equations systems) and the ‘soft’ models of psychology where the human mind is described in terms of latent variables and their directly observed indicators. For hybrid models that involve one or two latent variables the NIPALS approach has been developed in [38], [41] and [42]. The present paper extends the NIPALS approach to path models with three or more latent variables. Each new latent variable brings a rapid increase in the pluralism of possible model designs, and new problems arise in the parameter estimation of the models. Iterative procedures are given for the point estimation of the parameters. With a view to cases when the iterative estimation does not converge, a device of range estimation is developed, where high profile versus low profile estimates give ranges for the parameter estimates.