ArticlePDF AvailableLiterature Review

Molecular mechanics methods for predicting protein-ligand binding

Authors:

Abstract and Figures

Ligand binding affinity prediction is one of the most important applications of computational chemistry. However, accurately ranking compounds with respect to their estimated binding affinities to a biomolecular target remains highly challenging. We provide an overview of recent work using molecular mechanics energy functions to address this challenge. We briefly review methods that use molecular dynamics and Monte Carlo simulations to predict absolute and relative ligand binding free energies, as well as our own work in which we have developed a physics-based scoring method that can be applied to hundreds of thousands of compounds by invoking a number of simplifying approximations. In our previous studies, we have demonstrated that our scoring method is a promising approach for improving the discrimination between ligands that are known to bind and those that are presumed not to, in virtual screening of large compound databases. In new results presented here, we explore several improvements to our computational method including modifying the dielectric constant used for the protein and ligand interiors, and empirically scaling energy terms to compensate for deficiencies in the energy model. Future directions for further improving our physics-based scoring method are also discussed.
Content may be subject to copyright.
Molecular mechanics methods for predicting protein–ligand bindingw
Niu Huang, Chakrapani Kalyanaraman, Katarzyna Bernacki and
Matthew P. Jacobson*
Received 12th June 2006, Accepted 8th August 2006
First published as an Advance Article on the web 1st September 2006
DOI: 10.1039/b608269f
Ligand binding affinity prediction is one of the most important applications of computational
chemistry. However, accurately ranking compounds with respect to their estimated binding
affinities to a biomolecular target remains highly challenging. We provide an overview of recent
work using molecular mechanics energy functions to address this challenge. We briefly review
methods that use molecular dynamics and Monte Carlo simulations to predict absolute and
relative ligand binding free energies, as well as our own work in which we have developed a
physics-based scoring method that can be applied to hundreds of thousands of compounds by
invoking a number of simplifying approximations. In our previous studies, we have demonstrated
that our scoring method is a promising approach for improving the discrimination between
ligands that are known to bind and those that are presumed not to, in virtual screening of large
compound databases. In new results presented here, we explore several improvements to our
computational method including modifying the dielectric constant used for the protein and ligand
interiors, and empirically scaling energy terms to compensate for deficiencies in the energy model.
Future directions for further improving our physics-based scoring method are also discussed.
1. Introduction
The application of techniques of computational chemistry to
studying structural and functional properties of biological
macromolecules has increased dramatically due to rapid ad-
vances in computer power, improvements in force fields, and
development of numerical algorithms.
1
However, free energy
calculation remains challenging in both theoretical and prac-
tical aspects. Accurately calculating the binding affinities of
small-molecule ligands to biomolecular targets is one of the
ultimate goals of structure-based drug design (SBDD). A
variety of computational models and tools have been devel-
oped and tested in reproducing experimental binding data for
different target systems. The theoretical complexity and accu-
racy vary greatly, ranging from simple statistical multivariate
equations to computationally intensive free energy perturba-
tion methods. Several reviews on theories and applications are
available in the literature.
2–8
Two important applications of computational SBDD meth-
ods are lead discovery and lead optimization. Lead discovery
is the process of identifying new compounds that bind with
reasonable affinity (typically low micromolar or better) to a
particular macromolecular target and inhibit its function.
Virtual screening methods can be used to identify compounds,
among large and diverse databases, that are most likely to
bind to a receptor; these can then be prioritized for experi-
mental testing.
9–11
Lead optimization is the process of chemi-
cally modifying a lead compound for improved properties,
including, usually, improved binding to the receptor. To be
useful in this context, computational methods must be capable
of predicting with some precision the relative binding affinities
of similar compounds. The molecular mechanics methods we
and others have developed for scoring protein–ligand com-
plexes have utility for both lead discovery and lead optimiza-
tion. With respect to our own methods, we focus here on lead
discovery applications, reviewing our previously published
work as well as introducing new improvements to the compu-
tational methods.
Numerous structure-based virtual screening methods (also
referred to as small-molecule docking) have been developed to
assist lead discovery. These methods orient and score small
molecules for shape and chemical complementarity to a
macromolecular binding site. Critical issues include the meth-
ods for exploring the conformational space of the flexible
ligands and receptor (sampling) and the estimation of relative
binding affinities for the ligand–receptor complexes (scoring).
In high-throughput docking, a scoring function has to be
simple to compute in order to dock a large chemical library
(B10
5
–10
6
compounds). Currently, most commonly used
scoring functions in high throughput docking can be classified
as empirical (e.g. FlexX, Glide)
12,13
or knowledge-based (e.g.
PMF, SMoG).
14,15
Empirical scoring functions contain adjus-
table parameters that are determined by fitting to many crystal
complexes with available experimental binding affinities.
Knowledge-based functions are derived from statistical ana-
lysis of the interaction distances between different atom types
in crystal structures of protein–ligand complexes.
Department of Pharmaceutical Chemistry, University of California
San Francisco, UCSF MC 2240, Genentech Hall, Room N472C, 600
16th St., San Francisco, CA 94158-2517, USA. E-mail:
matt.jacobson@ucsf.edu; Fax: +1-415-514-4260; Tel: +1-415-514-
9811
w The HTML version of this article has been enhanced with additional
colour images.
5166 | Phys.Chem.Chem.Phys., 2006, 8, 5166–5177 This journal is
c
the Owner Societies 2006
INVITED ARTICLE www.rsc.org/pccp | Phy sical Chem istry Chemical Physics
An alternate approach is to attempt to score protein–ligand
complexes (i.e., estimate their absolute or relative binding free
energies) using physics-based energy functions. This approach
is challenging because the physics of ligand binding is compli-
cated. Factors such as electronic polarizability, entropic losses
in the ligand and receptor, and solvation effects can all impact
absolute and relative binding affinities. Two important
strengths of physics-based scoring methods are (1) they do
not require parameterization using ligand binding affinity data
and crystal structural information, and hence are not subject
to concerns about over-fitting, and (2) it is possible to system-
atically pursue improvements to the scoring function by using
more sophisticated energy models and sampling schemes.
The rest of this manuscript is organized as follows. In
section 2, we concisely review important uses of molecular
mechanics methods in structure-based drug discovery. Section
3 describes our computational method for applying molecular
mechanics scoring in high-throughput docking applications.
Section 4 reviews our published data testing this approach,
and section 5 reports further optimization of the method,
examining in some detail the contributions from intrinsic
and environmental effects during ligand binding. We conclude
by discussing future directions for molecular mechanics meth-
ods in structure-based drug design.
2. Brief review of molecular mechanics methods
In principle, the most rigorous methods to approximate the
ligand–receptor binding enthalpy are based on quantum me-
chanical (QM) methods. Recently, a quantum mechanical/
molecular mechanical (QM/MM) algorithm was developed to
improve ligand binding pose prediction by replacing the force
field charges of ligands to QM/MM calculated charges in the
protein environment.
16
Moreover, a semi-empirical QM based
scoring function was validated to capture binding affinity
trends in a diverse range of protein–ligand complexes, as well
as the ability to discriminate between native and decoy
poses.
17
However, at the present time such calculations are
prohibitively expensive to be applied to high-throughput
applications, and it is difficult to incorporate solvent effects.
In contrast to QM methods, molecular mechanics methods
are based on classical mechanics, allowing computational
simulations to be performed on large biomolecular systems
containing more than 100 000 atoms. Therefore, molecular
mechanics is currently the most feasible means to model the
interactions between ligands and receptor in a physically
realistic manner, especially in large-scale applications (i.e.
high-throughput docking). Free energy calculations derived
from molecular mechanics (MM) simulations of protein–
ligand complexes can account for flexibility for both the
protein and the ligand as well as solvation effects, and both
accuracy and efficiency can be achieved within certain approx-
imations. We focus here on reviewing MM based binding
affinity prediction methods.
(a) Molecular mechanics force fields
Most commonly used force fields for biomolecular applica-
tions have the following form (eqn (1)),
18,19
where bonded interactions include the bond, angle and dihe-
dral terms (b, y and f), respectively. Non-bonded interactions
include the van der Waals (vdW) term represented by the
Lennard-Jones (LJ) 6-12 potential, and electrostatic interac-
tions, which are treated by Coulombic interactions between
point charges centered on each atom.
Water can be treated in the same manner as the macro-
molecule, i.e. , using point charges on each atom, and many
explicit solvent models have been developed and used in
estimating absolute or relative binding affinities for protein–
ligand complexes.
20
However, the extensive conformational
sampling required to converge simulations using explicit water
can lead to high computational expense, and many researchers
have pursued the development of more approximate but
efficient models for solvent. Commonly used implicit solvent
models represent water as a continuum dielectric medium,
therefore greatly reducing the computational expense of cal-
culating the solvent–solvent and solute–solvent interactions.
Such solvent models are typically parameterized to estimate
solvation free energies and thus implicitly account for solvent
entropy. The most widely used implicit solvent models are the
Poisson–Boltzmann/surface area (PB/SA)
21
and the general-
ized Born/surface area model (GB/SA).
22–24
PB methods
numerically solve the Poisson–Boltzmann equation. GB mod-
els can be considered a semi-analytical approximation to
solution of the PB equation.
23
Molecular dynamics is the simulation of the time-dependent
motion of molecules according to Newton’s laws of motion.
25
Based on the principles of statistical mechanics,
26
the macro-
scopic properties of the molecular system can be calculated
from the microscopic configurations recorded in a sufficiently
long trajectory during the MD simulation. Therefore, force
field based MD simulations have been widely used in free-
energy calculations for sampling conformational space and
determining equilibrium averages, including in methods such
as free-energy perturbation (FEP), thermodynamic integration
(TI) and molecular mechanics Poisson–Boltzmann/surface area
(MM-PB/SA), which have been used to predict ligand binding
free energies in good agreement with experimental data.
27–33
(b) Key thermodynamics principles
Generally, predicting the relative rather than absolute binding
free-energies of a series of ligands to the same molecular target
is adequate for most practical applications. The binding free
energy difference (DDG
bind
) for ligand L
1
versus ligand L
2
to
receptor R can be evaluated by employing the free-energy
cycle as shown in Fig. 1.
34
Although directly calculating the experimentally measurable
binding free energies of the two different ligands DG
bind
(L
1
)
E ¼
X
bonds
K
b
ðb b
0
Þ
2
þ
X
angles
K
y
ðy y
0
Þ
2
þ
X
torsions
K
f
½1 þ cosðnf dÞ
þ
X
v dW
e
ij
R
min;ij
r
ij

12
2
R
min;ij
r
ij

6
"#
þ
X
electrostatics
q
i
q
j
4pDr
ij
ð1Þ
This journal is
c
the Owner Societies 2006 Phys. Chem. Chem. Phys., 2006, 8, 5166–5177 | 5167
and DG
bind
(L
2
) is very difficult, the free energy differences to
alchemically transform one ligand into another similar ligand
in the free (DDG
L
mut
) and bound states (DDG
R*L
mut
) are easier to
calculate via theoretical methods, and the relative binding free
energy can be calculated as
DDG
bind
¼ DG
bind
ðL
1
ÞDG
bind
ðL
2
Þ
¼ DDG
RL
mut
DD
L
mut
ð2Þ
The free energies for the alchemical transformations can be
computed using rigorous statistical mechanical methods such
as FEP and TI in conjunction with MD simulations or Monte
Carlo (MC) simulations.
2,35–39
These and similar methods have
been extensively reviewed elsewhere.
2–7
FEP and TI relate the
free energy of a system and the ensemble average of an energy
function that describes that system. In practical application,
they generally treat solvent molecules and ions explicitly.
However, the computational demand of adequate sampling
makes such methods most amenable for estimating relative
binding affinities between similar ligands, while relative binding
affinities between diverse ligands and absolute binding affinities
predictions pose more of a challenge. Both are not applicable
to large numbers of compounds. Such methods are currently
most useful in the context of lead optimization process, when
the relative binding affinities of dozens of derivatives from the
same chemical scaffold are considered. These methods typically
give good predictions of relative binding free energies (a mean
absolute error o 1 kcal mol
1
is frequently reported).
31–33
(c) MM-PB/SA
End-point free energy methods such as MM-PB/SA
40
benefit
from computational efficiency relative to rigorous free energy
methods such as FEP and TI as only the initial and final states
of the system are evaluated.
8
MM-PB/SA and similar methods
instead employ the free-energy cycle shown in Fig. 2.
This approximation allows the absolute binding free energy
of ligand L to the receptor to be estimated by decomposing the
binding energy into a gas-phase free energy and a solvation
free energy of transferring the free ligand, free receptor and
ligand–receptor complex from the gas phase to aqueous solu-
tion (eqn (3)) where the free energies for each species are
evaluated individually (eqn (4)):
DG
bind
¼ DG
RL
water
DG
R
water
DG
L
water
ð3Þ
G
water
¼ G
gas
þ G
solv
¼ðH
gas
TSÞþG
solv
ð4Þ
The key improvement in efficiency comes from treating G
solv
using an implicit solvent model,
41
generally either PB/SA or
GB/SA, which treat the solvation free energy as decomposable
into electrostatic and nonpolar components:
G
solv
¼ G
elec
þ G
nonpolar
ð5Þ
The enthalpic components are treated using a force field
approximation:
H
gas
E
gas
¼ E
bond
þ E
angle
þ E
tosion
þ E
elec
þ E
v dW
ð6Þ
The MM-PB/SA method, as originally formulated, averages
the gas phase enthalpy and solvation free energy over multiple
configurations sampled from molecular dynamics (MD) simu-
lations with explicit solvent. In some cases, estimates of the
entropy losses upon binding are also included. The MM-PB/
SA method has been widely applied to predict ligand–receptor
binding geometries and to calculate absolute or relative bind-
ing affinities in good agreement with experimental data (mean
absolute error of B1 to 2 kcal mol
1
in many cases).
30,42–47
(d) Linear interaction energy (LIE)
LIE shares some similarities with the MM-PB/SA method, in
that it also uses averages calculated from explicit solvent
simulations, and also considers only the bound and unbound
‘‘end points’’. Aqvist and coworkers
7,48
implemented this
semi-empirical method to estimate ligand binding affinities
based on the following linear approximation:
DG
bind
¼ aDhE
elec
bDhE
v dW
7Þ
where hE
elec
i and hE
vdW
i are the ensemble averages of the
electrostatic and van der Waals interaction energies between
the ligand and its environment over an MD trajectory, respec-
tively, and D refers to the difference between these ensemble
averages in water and in the receptor binding site. The scaling
factors a and b are determined empirically.
49
Other researchers
have added other terms, such as solvent accessible surface area
(SASA, g).
50
This method was initially developed to predict
the binding affinities of a set of endothiapepsin inhibitors, and
found to give accurate results both for absolute as well as
relative binding free energies for large number of protein–ligand
systems (mean absolute error of B1.0 kcal mol
1
).
7,48,51
Fig. 1 Free-energy cycle that can be used to calculate relative binding
free energies. R is the free receptor in solution, L is the free ligand in
solution, R*L is the protein–ligand complex in solution, DDG
mut
L
is the
free energy change to alchemically change ligand L
1
into L
2
in
solution, and DDG
R*L
mut
is the free energy difference of transforming
R*L
1
to R*L
2
in solution.
Fig. 2 Free-energy cycle used to calculate the binding free energy of
ligand L to the receptor R in solution. R
gas
is the free receptor in
vacuum, R
water
is the free receptor in solution, L
gas
is the free ligand in
vacuum, L
water
is the free ligand in solution, DG
R
solv
is the free energy to
solvate R, and D G
R*L
solv
is the free energy to solvate complex R*L.
5168 | Phys.Chem.Chem.Phys., 2006, 8, 5166–5177 This journal is
c
the Owner Societies 2006
Recently, the LIE approach has been applied to study DNA
structural stability
52
and protein–protein interactions.
53
Also,
a method similar to LIE was applied to reproduce the relative
binding energy of HIV-1 RT inhibitors by scaling different
energetic components (vdW, electrostatic, solvation and non-
polar solvation).
54
(e) Molecular mechanics scoring in high-throughput docking
All of the MM-based free energy calculation methods dis-
cussed thus far are computationally expensive and generally
limited to evaluating dozens or hundreds of compounds. These
methods can also be complicated to apply, requiring expert
training. A few studies have been published on applying
molecular mechanics based scoring functions to refine and
rescore ligands in a high-throughput virtual screening con-
text.
55,56
We have developed a physics-based rescoring method
that can be applied to hundreds of thousands of compounds,
i.e., as is typical in lead discovery applications, using molecular
mechanics energy functions similar to those employed in the
more computationally intensive methods discussed above.
57,58
Theoretically, our rescoring protocol (shown in Fig. 3) is
similar to MM-PB/SA but applies energy minimization rather
than molecular dynamics. This further approximation greatly
increases computational efficiency, but in principle could be a
significant limitation compared with the ensemble averaging
over MD simulations performed in MM-PB/SA and LIE.
However, recently, Kuhn and coworkers performed an exten-
sive study suggesting that applying the MM-PB/SA energy
function to a single, relaxed complex structure is an adequate
and sometimes more accurate approach than the standard
averaging over molecular dynamics ensembles.
44
In Table 1, we compare our rescoring method with some of
the other methods discussed above. We view our rescoring
method as intermediate between high-throughput docking
methods and more rigorous molecular mechanics-based meth-
ods, in terms of both the number of approximations made and
computational expense. It is orders of magnitude slower than
most simple docking scoring functions, but orders of magni-
tude faster than more rigorous free energy estimates. Ulti-
mately, we can envisage following up the physics-based
rescoring with even more computationally intensive (but pre-
sumably more accurate) methods for a subset of ligands
selected by our rescoring approach.
Fig. 3 Our physics-based refinement and rescoring protocol (Reproduced with permission from J. Chem. Inf. Model, 2006, 46, 243–253.
Copyright 2006 American Chemical Society).
58
The superscript R refers to the free receptor in solution, L to the ligand in solution, and R*L to the
protein–ligand complex in solution. E
bind
is the predicted ligand binding energy, the free receptor energy in solution (E
R
) is a constant value, E
L
is
the energy of the optimized free ligand in solution, and E
R*L
is the energy of the optimized ligand–protein complex in solution.
Table 1 Comparison of our physics-based rescoring method with the MM-PB/SA and LIE methods
PLOP/Rescoring MM-PB/SA LIE
Force field All-atom All-atom All-atom
Non-bonded
interaction energy
Single minimized structure Averaged ensemble Linear interaction energy model
Ligand strain Partial treatment, with flexible
ligand minimization
Treated via MD Treated via MD during sampling,
N/A in scoring
Receptor strain Rigid or can be partially
included by minimization and
sidechain search
Treated via MD Treated via MD during sampling,
N/A in scoring
Ligand desolvation Implicit solvent Explicit solvent during sampling,
implicit solvent in scoring
Explicit solvent during sampling,
N/A in scoring
Receptor desolvation Implicit solvent Explicit solvent during sampling,
implicit solvent in scoring
Explicit solvent during sampling,
N/A in scoring
Entropy N/A Optionally approximated via normal
mode analysis
N/A in scoring
Computational
timing
One minute per ligand Hours or days per ligand Hours or days per ligand
This journal is
c
the Owner Societies 2006 Phys. Chem. Chem. Phys., 2006, 8, 5166–5177 | 5169
3. Methods
Our molecular mechanics scoring method consists of two
steps: predicting the binding poses of ligands using a docking
program, and then rescoring those protein–ligand complexes
using a more computationally intensive molecular-mechanics
based energy function. The rescoring procedure uses the OPLS
all-atom force field and a generalized Born implicit solvent
model, and accounts for ligand/receptor desolvation, and to a
lesser extent, ligand strain energies, in a more physically
realistic manner than the docking algorithm.
(a) High throughput virtual screening
In principle, any docking method can be used to predict the
conformation ligands bound to a protein. In our prior work,
we have used two different docking programs for this purpose:
Glide
13,59
and DOCK 3.5.54.
60–62
The new work reported here
uses the latter program, and we briefly review the protocol we
employed.
An automated docking approach was used to facilitate the
docking calculations with minimal user intervention. Most of
the labor-intensive, manual steps are now performed in an
automated fashion, including binding site preparation, sphere
generation, scoring grids computation, docking calculation
and data analysis (N Huang, B Shoichet & J Irwin, in
preparation). Briefly, each protein was prepared for docking
in the same manner. Matching spheres, required for initial
placement of the ligand during database screening, were
obtained from the position of the crystallographic ligand using
the program SPHGEN.
63
Four different types of grids were
generated before the docking calculations, including an ex-
cluded volume grid obtained from DISTMAP,
60
a united
AMBER-based van der Waals potential grid computed by
CHEMGRID,
60
an electrostatic potential grid calculated
using DelPhi
21
and a ligand desolvation grid computed using
SOLVMAP (B Shoichet, unpublished results). The program
DOCK 3.5.54 was used to dock compounds into the protein
binding site. Ensembles of pre-calculated conformers from
conformationally expanded databases are used to significantly
speed up docking calculations.
61,64
On the average, sampling
millions of poses for a single ligand takes only one second. For
each ligand orientation, the conformational ensemble is fil-
tered for steric complementarity first. Ligand conformations
are scored based on the docking total energy (E
tot
= E
ele
+
E
vdW
DG
lig-solv
), which is the sum of electrostatic (E
ele
) and
van der Waals (E
vdW
) interaction energies corrected by the
ligand partial desolvation energy (DG
lig-solv
).
62
Final energies
were computed after rigid-body minimization. Then, a single
docking pose with the best total energy score was saved for
each docked molecule.
(b) Molecular mechanics rescoring
The rescoring procedure for a single protein–ligand complex is
shown in Fig. 3, and this procedure has been fully automated.
The first step is to generate OPLS force field parameters for
each ligand using IMPACT,
65
after which the coordinate and
parameter files are passed to Protein Local Optimization
Program (PLOP),
66–68
the Jacobson group’s in-house software
(free for academic use; commercial distribution under the
name Prime). The protein–ligand complex and the free ligand
were then submitted to energy minimization in GB solvent.
The binding energy (E
bind
= E
R*L
E
L
E
R
) was calculated
by subtracting the energies of the optimized free ligand in
solution (E
L
) and the free protein in solution (E
R
) from the
optimized ligand–protein complex’s energy in solution (E
R*L
)
as described previously (eqn (3)). In our previous work, the
protein was kept rigid during minimization of the ligand–
protein complex to reduce computational expense.
All energy minimizations were performed using PLOP
66–68
with the all-atom OPLS force field (OPLS-AA)
69,70
and the
Surface Generalized Born (SGB) implicit solvent model.
23,71
PLOP implements a multi-scale truncated-Newton (MSTN)
minimization algorithm. The algorithm is adapted from
TNPACK
72
and optimized by applying multiscale methods,
analogous to those used in molecular dynamics (e.g.,
r-RESPA).
73
The molecular mechanics forces are divided
into short-(bond, angle, torsion, and local non-bonded) and
long-range components, with the long-range forces updated
only intermittently (never during the inner TN cycles, and
infrequently during the outer cycles). The speedup of MSTN
relative to the unmodified TNPACK algorithm is a factor of
4.0–4.5 with the parameters used here. The algorithm is also
optimized for minimizations with Generalized Born implicit
solvent, using a self-consistent procedure that increases the
computational expense, relative to the vacuum, by only a
factor of B3. Cutoffs for the non-bonded interactions are
residue-based, and depend on the type of side chain (charged
or neutral).
(c) Enrichment calculations
We have evaluated our method primarily by the rate of
‘‘enrichment’’, the increase in the proportion of active com-
pounds found in selected subsets from calculations compared
with the proportion expected from random selection. Enrich-
ment is measured as the proportion of true binders found in
selected subsets from docking (or rescoring) calculations com-
pared with the proportion expected from random selection.
The enrichment factor (EF) is calculated as EF
subset
= {Bin-
ders
subset
/N
subset
}/{Binders
total
/N
total
}.
62
For instance, for a
given protein system with 100 known binders (Binders
total
)in
a database of 100 000 compounds (N
total
), only one of the
known binders (Binders
subset
) would be expected to be found
in any randomly chosen subset of 1000 molecules (N
subset
).
This corresponds to an enrichment factor of 1. If ten known
binders (10% of known binders) were actually found in the top
1000 molecules of the ranked database (1% of database) by
docking, then the enrichment factor at that point (1% of
database) would be equal to 10, which is the number of known
binders actually found (10 known binders) divided by the
number of known binders expected from random selection
(1 known binder).
4. Our studies performed to date
(a) Enzyme specificity studies
The first published application of our physics-based scoring of
ligand–protein complexes was in the context of a virtual
5170 | Phys.Chem.Chem.Phys., 2006, 8, 5166–5177 This journal is
c
the Owner Societies 2006
metabolite screening method designed to help assign enzy-
matic function for alpha–beta barrel proteins.
57
Estimating
relative binding affinities in this case is particularly challenging
because the active sites of these enzymes contain numerous
charged groups (lysines, carboxylates, histidines, and one or
more metal ions) as shown in Fig. 4 (left). Using the physics-
based rescoring procedure, the ranks of known substrates are
generally improved significantly compared with the docking
alone (Table 2). In addition to the substrate and product being
ranked highly, the other top-ranked ligands are strongly
enriched in compounds with high chemical similarity to the
substrate (e.g., different substitution patterns on a similar
scaffold), as shown in Fig. 4 (right). Importantly, the rescoring
procedure appears to be capable of capturing selectivity (Table
2). That is, all of the enzymes have strikingly similar folds and
belong to the same superfamily, but recognize dramatically
different substrates and even perform different reactions. The
molecular mechanics-based rescoring procedure clearly cap-
tures the selectivity of the enzymes for their cognate substrates,
and vice versa .
(b) Enrichment studies on nine therapeutically important
enzymes
We next investigated the ability of our physics-based rescoring
method to enrich known inhibitors of a diverse set of ther-
apeutically relevant targets. We evaluated the strengths and
limitations of our rescoring procedure by the extent to which
known inhibitors were enriched against a background of
100 000 drug-like decoys on 9 enzyme systems.
58
Encoura-
gingly, for all 9 cases, the maximum enrichment factor in-
creased upon rescoring, by up to a factor of 6 (Table 3). The
improvement in enrichment is most robust and sometimes
dramatic within the top 1% the ranked database, i.e., the first
thousand compounds. In 4 of the 9 test cases, the rescoring
method robustly improves enrichment, relative to docking
alone, well beyond the top 1% of the ranked database. In
the other test cases, however, the results of the docking and
rescoring methods are roughly comparable beyond the top
1%. The improved early enrichment is likely due to the more
realistic treatment of ligand and, especially, receptor desolva-
tion in the rescoring procedure; the fully flexible minimization
of the ligands in the receptor during the rescoring stage may
also contribute to the improved enrichment. To our knowl-
edge, this work represents the most extensive test to date of the
utility of an all-atom force field/implicit solvent model scoring
function in the context of high-throughput virtual screening.
Not surprisingly, incorrect protonation states on the li-
gands, receptor, or co-factors significantly affect the electro-
static potential, which in turn strongly affects the rescoring
calculations. It appears that the simpler scoring function
employed in the docking method is less sensitive to such
errors, while the more physically reasonable molecular
Fig. 4 Left: binding site of mandelate racemase (PDB ID 1MDR). The docked pose of the substrate, S-mandelate, and the co-crystallized
structure of an inhibitor, S-atrolactic acid, are shown in CPK models. Right: chemical similarity to the substrate as a function of % of ranked
database, as measured by a property-based Tanimoto coefficient, which decreases from 1 as the chemical similarity decreases. Reproduced with
permission from Biochemistry, 2005, 44, 2059–2071 (ref. 57). Copyright 2005 American Chemical Society. Chemical similarity is defined by
descriptors that include both the numbers of common functional groups and whole molecule descriptors such as dipole and volume. Enrichment of
compounds that are chemically similar to the known substrate after docking (dark/blue line) and rescoring (light/red line) are shown. The results
have been smoothed using the moving average to decrease noise and emphasize the overall trends.
Table 2 Selectivity among enolase superfamily members (from Kalyanaraman et al., 2005)
57
Ranks of substrates before rescoring (%) Ranks of substrates after rescoring (%)
Proteins MR GlucD MAL OSBS Enolase MR GlucD MAL OSBS Enolase
MR 6.5 22.6 24.7 10.7 6.5 0.4 4.5 9.3 18.8 21.9
GlucD 425 0.03 3.9 425 0.93 425 0.02 5.0 425 8.6
MAL 4.9 0.2 9.1 5.1 2.7 6.2 11.0 1.1 23.8 22.6
OSBS 12.9 11.5 25.7 6.1 11.7 2.1 6.5 7.0 0.2 12.3
Enolase 3.7 5.7 1.9 425 0.04 21.2 1.4 1.1 425 0.2
Each row represents the ranks of the 5 substrates when docked against one of the 5 enzymes. The diagonal elements represent the ranks of the
cognate substrates, which improve significantly upon rescoring. If the computation is capturing selectivity, these values should be lower than those
of the off-diagonal elements. The substrates are as follows: MR = S-mandelate, GlucD = D-glucarate, MAL = L-threo (2s,3s)-3-methyl
aspartate, OSBS = 2-succinyl-2-hydroxy-2,4-cyclohexadiene-1-carboxylate, enolase = 2-phospho-glycerate.
This journal is
c
the Owner Societies 2006 Phys. Chem. Chem. Phys., 2006, 8, 5166–5177 | 5171
mechanics energy employed in the rescoring requires accurate
treatment of protonation and charge states to correctly ac-
count for the electrostatic properties of ligand–receptor bind-
ing complexes.
(c) Combining physics-based receptor-preparation and
rescoring methods in screening against E. coli dihydrofolate
reductase (DHFR)
As a final example, we describe our participation in the
McMaster Data-Mining and Docking Competition.
74
The
goal of this contest was to predict what compounds, out of a
database of 50 000, would bind to E. coli DHFR. The
molecular mechanics rescoring method played a key role in
this work, which also contained a new twist. Instead of dock-
ing to the crystal structure, we deliberately modified the
receptor to incorporate knowledge of ‘‘induced-fit’’ effects
associated with varying DHFR inhibitors’ scaffolds (as de-
rived from existing DHFR crystal structures). Specifically, we
developed and applied a receptor preparation procedure in
which torsion angles of loops and side chains are deliberately
sampled to open a key portion of the binding site. This
procedure used the same MM-GB/SA energy function as the
rescoring procedure, highlighting a major advantage of phy-
sics-based scoring functions: ligands and the protein receptor
can be treated consistently using the same scoring function,
making it possible to predict conformational changes asso-
ciated with ligand binding. Fig. 5 summarizes the results.
5. New results and discussion
In principle, the improved enrichment shown by our rescoring
method, relative to high-throughput docking programs, re-
flects improved estimation of relative binding affinities, at least
for a subset of the known inhibitors. However, the free energy
depends on a balance of many different intrinsic and environ-
mental contributions as discussed previously. We have under-
taken an extensive analysis of the energetic components that
contribute to the discrimination between true actives and
decoys, and developed new improvements based on this
analysis. Here, we present the most recent results for two very
different protein binding sites, both of which are important
drug targets. Thrombin has a large, solvent exposed polar
binding surface (Fig. 6a) while estrogen receptor (ER) has a
deeply buried and mostly hydrophobic binding pocket (Fig.
6b). Thrombin was included as a test case in our previously
published work while estrogen receptor is a new test case. We
included estrogen receptor in this study because, unlike most
other test cases we have examined since our original published
studies, the results we obtained using the molecular mechanics
rescoring were initially worse than those obtained using the
DOCK 3.5.54 scoring function. Since the molecular mechanics
scoring in principle captures important physical effects that the
docking scoring function does not, such as receptor desolva-
tion and ligand strain, we set out to understand why it
performed more poorly. This investigation has led to signifi-
cant improvements in the rescoring method. Generally, the
observations on these two protein systems are applicable to
Table 3 Measures of enrichment of the known inhibitors for nine enzyme systems achieved by docking alone (D) and the rescoring procedure (R)
(from Huang et al., 2006)
58
Enzyme PDB code
Number of
known inhibitors
% of ranked database
to find 25% of known
inhibitors
Maximum enrichment
factor achieved
% of ranked database
where maximum enrichment
factor occurred
DRDRD R
DHFR 3dfr 100 0.3 0.1 110 239 0.1 0.1
GART 1c2t 50 0.9 0.8 46 159 0.3 0.1
AR 1ah3 722 3.5 4.0 8 12 2.0 0.1
PARP 1efy 45 4.6 2.3 6 11 5.2 3.8
PNP 1b8o 25 1.2 0.1 60 358 0.2 0.1
SAHH 1a7a 37 2.1 1.8 14 19 1.3 2.0
Thrombin 1ba8 243 4.2 0.8 25 49 0.1 0.1
AChE 1e66 554 5.0 5.1 21 25 0.4 0.1
TS 2bbq 171 1.5 0.5 25 52 0.3 0.1
In this work, both the known inhibitors and the drug-like decoys were taken from the MDL Drug Data Report (MDDR). Abbreviations: AR,
aldose reductase; DHFR, dihydrofolate reductase; GART, glycinamide ribonucleotide transformylase; PARP, poly(ADP-ribose) polymerase;
PNP, purine nucleoside phosphorylase; SAHH, S-adenosylhomocysteine hydrolase; AChE, acetylcholinesterase; TS, thymidylate synthase.
Fig. 5 The percent of known inhibitors identified (y axis) in increas-
ingly large subsets of the ranked database (x axis) (reproduced with
permission from J. Biomol. Screen, 2005, 10, 675–681, copyright 2005
by SAGE Publications Inc.),
74
for E. coli DHFR. The grey line
represents the results expected from random selection of ligands.
The dotted (blue) line and solid (blue) line represent the docking
enrichment using the original holo crystal structure and the remodeled
structure, respectively. The solid (orange) line with circles represents
the rescoring of inhibitor poses from docking against the remodeled
structure.
5172 | Phys.Chem.Chem.Phys., 2006, 8, 5166–5177 This journal is
c
the Owner Societies 2006
other proteins with similar binding properties that we studied
(data not shown).
(a) Dielectric constant and dielectric boundary
One limitation of implicit solvent models is that the calculated
binding interaction energies can change dramatically as a
result of changes in the dielectric constant used for the protein
and ligand interiors (D
in
) and the definition of the dielectric
boundary. It has been suggested that the dielectric constant
inside a macromolecule is not a universal constant but simply
a parameter that depends on the model used.
76
From a
modeling standpoint, a dielectric constant of 1 is correct only
if electronic polarizability and conformational dynamics are
explicitly included in the calculation. Electronic polarizability
alone can result in an effective dielectric constant of roughly
2to4.
77
Poisson–Boltzmann calculations ( e.g., for predicting
pK
a
’s of protein side chains) frequently treat the internal
dielectric as an empirically adjustable parameter, and routi-
nely use an internal dielectric between 4 and 20 to obtain the
most reliable results.
Nonetheless, we used D
in
= 1 in all of the prior work
reported in section 4, which used a rigid receptor and fixed
atomic partial charges (no electronic polarizability), because
the GB solvation model as well as the atomic force field
charges were optimized using this dielectric constant. We have
empirically tested different values of D
in
, without otherwise
adjusting the solvent model or force field, in the context of
enriching known inhibitors in high-throughput docking. The
results suggest that using D
in
= 2 can lead to more robust
enrichment. For thrombin, increasing the value of D
in
from 1
to 2 does not change the overall enrichment significantly (Fig.
7a). On the other hand, for ER, the larger dielectric constant
impacts the results more significantly and positively (Fig. 7b).
This difference between the two binding sites is most likely due
to the extent of solvent exposure: in the polar, solvent-exposed
thrombin binding site, the electrostatic interactions are
screened primarily by the high dielectric of water, whereas in
the buried ER binding site, water plays little role in electro-
static screening and the internal dielectric is critical.
(b) Ligand reorganization energy
Generally, non-bonded intermolecular interactions are con-
sidered dominating in the ligand–receptor binding process.
However, if the bound conformation of the ligand is different
from the conformation of the free ligand in solution, the
intramolecular energy of the ligand can contribute to the
binding free energy. In our rescoring protocol, ligand reorga-
nization energies have been approximated using a molecular
mechanics force field and the GB/SA model, where minimiza-
tion of the ligands alone and bound to the protein were used to
compare ligand intramolecular energies in these two states. We
have assumed that the bound conformation of the ligand is
unique and that structural fluctuations of the bound ligand
contribute negligibly to the free energy of binding. This
assumption may be reasonable, at least for tight binding
ligands. However, a more dramatic assumption implicit in
the published rescoring procedure is that the free ligand can be
approximated by a single low-energy conformation.
For most flexible ligands, this assumption is likely to be
poor. Nonetheless, obtaining a true ensemble for ligands in
solution would be computationally expensive, and we assumed
that even a crude treatment of ligand strain would be better
than none at all. In subsequent work we have reexamined this
assumption. As an alternative to the published scheme, we
simply extracted the ligand conformation from the minimized
complex structure and evaluated its energy using this fixed
geometry, and this value (E
L
) was used for binding energy
calculation (E
bind
= E
R*L
E
L
E
R
). Surprisingly, this
modification to the method resulted in non-trivial improve-
ments, especially with respect to the early enrichment for
Fig. 6 Binding surfaces of two representative protein systems, thrombin and estrogen receptor. The crystallographic ligand is represented by a
CPK model coloured by atom type. The key hydrogen bond interactions between the protein and ligands are illustrated with dashed (yellow) lines.
The molecular images were generated with UCSF Chimera.
75
This journal is
c
the Owner Societies 2006 Phys. Chem. Chem. Phys., 2006, 8, 5166–5177 | 5173
protein systems binding highly strained ligands. For both
thrombin and ER, the maximum enrichment factors are nearly
doubled using the new procedure relative to the published
procedure (Fig. 8). In both cases, the overall enrichment is
consistently improved throughout the top 20% of the data-
base.
This improved enrichment has been observed in many other
systems as well (data not shown), where higher energy ligand
structures are required to maximize the ligand–receptor inter-
actions. Interestingly, similar approximations have been made
in MM-PB/SA calculations
44–46
and LIE methods,
78,79
where
MD simulations were only performed on the ligand–receptor
complex instead of three independent simulations of free
ligand, free receptor and the ligand–receptor complex. Aqvist
proposed that the intramolecular ligand strain energy corre-
lates linearly with the intermolecular electrostatic interaction,
making it possible to ignore the strain energy in the LIE
model.
79
Our own view is that simply minimizing the ligand
in solvent may increase errors due to neglecting the entropy of
the ligand in solution and also due to inaccuracies in the ligand
torsional parameters. Further work will be necessary to de-
termine whether ensemble averaging of the ligand in solution,
as well as improvements in ligand torsional parameters,
69,70,80
will provide more accurate treatment of ligand strain.
Fig. 7 Enrichment plots obtained after docking alone (solid dark/blue line), after rescoring using interior dielectric constant of 1 (solid light/
orange line) and rescoring using interior dielectric constant of 2 (dotted light/green line). Left: the percent of known ligands identified in
increasingly large subsets of the ranked database. The diagonal (grey) line represents the results expected from random selection of ligands. Right:
enrichment factor as a function of the fraction of the ranked database.
Fig. 8 Enrichment plots for two representative protein systems obtained after docking alone (solid dark/blue line), after rescoring with free ligand
minimization (solid light/orange line) and rescoring without free ligand minimization (dotted light/orange line).
5174 | Phys.Chem.Chem.Phys., 2006, 8, 5166–5177 This journal is
c
the Owner Societies 2006
(c) Additive effects
Combining the improvements discussed in the previous two
sections, both the early and overall enrichments are signifi-
cantly improved for ER (Fig. 9b) with the maximum enrich-
ment factors nearly doubled and B20–30% of improvement
on the overall enrichments.
(d) Scaling the energy components
Accurate free energy calculations depend on a proper balance
of many different energetic components. As we have empha-
sized, our rescoring method strikes a balance between compu-
tational speed and accuracy, and in particular neglects
entropic losses and protein flexibility (in the results discussed
here). Empirically scaling certain energy components as a
post-rescoring process, in a manner similar to the LIE scheme,
may be useful to compensate for some of these limitations.
54
Indeed, a simple scaling scheme seems to consistently improve
the enrichment for all the targets we have studied.
In general, scaling up the electrostatic interaction energy,
relative to the other components of the scoring function, seems
to improve results for binding sites containing charged side
chains. For example, scaling up the electrostatic interaction
energy by a factor of 2 significantly improves the enrichment
in thrombin (dark/purple solid line, Fig. 10a). If we follow
Aqvist’s argument
79
that intermolecular electrostatic interac-
tions correlate with intramolecular ligand strain energies, we
can assume that scaling up the electrostatic energies may
compensate for the lack of explicit treatment of intramolecular
ligand strain energies. Indeed, it is shown that the effect of free
Fig. 9 Enrichment plots obtained after docking alone (solid dark/blue line) and after rescoring (solid light/orange line), rescoring using interior
dielectric constant of 2 without free ligand minimization (dotted light/green line).
Fig. 10 Enrichment plots obtained after rescoring (solid light/orange line), rescoring with scaling particular energetic components (solid dark/
purple line) and rescoring without free ligand minimization procedure (dotted line). See text for details.
This journal is
c
the Owner Societies 2006 Phys. Chem. Chem. Phys., 2006, 8, 5166–5177 | 5175
ligand minimization is diminished after we scale up the
electrostatic interaction energies in charged binding site like
thrombin (dark/purple dotted line, Fig. 10a).
By contrast, for hydrophobic sites like ER, scaling up the
vdW term improves results. The maximum enrichment factor
for ER is nearly doubled by scaling up the vdW interaction
energy by a factor of 2 (dark/purple solid line, Fig. 10b). We
speculate that the MM-GB/SA scoring function underesti-
mates the non-polar binding contributions to the free energy
of binding, and that increasing the vdW term compensates for
this deficiency. Note that we did not attempt to scale all of the
energetic components to maximize the enrichment perfor-
mance for specific targets. We suspect that it will not be
possible to find a universal set of scaling factors that perform
excellently across many targets.
6. Conclusions
Ligand binding affinity prediction is one of the most important
applications of computational chemistry in the field of struc-
ture-based drug design. However, accurately scoring/ranking
database compounds with respect to their estimated binding
affinities to a biomolecular target remains highly challenging.
Molecular mechanics based energy functions have been used
in combination with MD simulations to predict absolute and
relative ligand binding free energies, using methods such as
FEP, TI and MM-PB/SA. However, such MD-based free
energy methods are computationally expensive and can be
complicated to apply. We have developed a physics-based
rescoring method that can be applied to hundreds of thou-
sands of compounds by invoking a number of simplifying
approximations and by developing new computational meth-
ods, especially the multi-scale truncated Newton minimization
algorithm.
In our previous studies and the new results presented here,
we have demonstrated that our rescoring method is a promis-
ing approach for improving the discrimination between known
ligands and decoys in virtual screening of large compound
databases. As we have emphasized, our rescoring method
strikes a balance between computational speed and accuracy,
and ultimately, we can envisage following up the physics-based
rescoring with even more computationally intensive (but pre-
sumably more accurate) methods for a subset of ligands. For
example, free energy methods like FEP can capture protein
and ligand entropy losses due to binding, which are ignored in
our scoring method. In addition, the new generation of
polarizable force fields could in principle be used to treat
electronic polarizability effects that are neglected by the fixed
charge force fields we have used.
From a more pragmatic standpoint, we believe that the two
most significant limitations of the rescoring method in its
current form are related to incorrect poses generated by the
docking algorithm and the rigid receptor approximation ap-
plied in this work. A simple extension of the current method is
to subject a small number of dissimilar docking poses to
rescoring, minimizing the receptor along with the ligand
during the rescoring stage, and use the most favorable binding
energy for rank-ordering ligands.
Acknowledgements
We thank Brian Shoichet, John Irwin, Alan Graves, Johannes
Hermann, and the rest of the Shoichet group for many helpful
conversations that were critical in guiding this work and for
technical assistance. NIH grants GM071790, AI035707, and
GM56531 are acknowledged for financial support. QB3 at
UCSF is thanked for computational support, MDL Inc. for
providing the MDDR database and ISIS software (to Prof.
Brian Shoichet, UCSF), the Shoichet lab for making compu-
ters available for this work, and Schro
¨
dinger Inc. for use of
IMPACT. M.P.J. is a member of the Scientific Advisory
Board of Schro
¨
dinger Inc.
References
1 Computational Biochemistry and Biophysics, ed. O. M. Becker, A.
D. MacKerell, Jr, B. Roux and M. Watanabe, Marcel Dekker,
Inc., New York, 2001.
2 D. L. Beveridge and F. M. DiCapua, Annu. Rev. Biophys. Biophys.
Chem., 1989, 18, 431–492.
3 P. Kollman, Chem. Rev., 1993, 93, 2395–2417.
4 Ajay and M. A. Murcko, J. Med. Chem., 1995, 38, 4953–4967.
5 A. Ajay, M. A. Murcko and P. F. W. Stouten, in Practical
application of computer-aided drug design, ed. P. S. Charifson,
Marcel Dekker, Inc., New York, Editon edn, 1997, pp. 165–194.
6 T. Lazaridis, Curr. Org. Chem., 2002, 6, 1319–1332.
7 B. O. Brandsdal, F. Osterberg, M. Almlof, I. Feierberg,
V. B. Luzhkov and J. Aqvist, Adv. Protein Chem., 2003, 66,
123–158.
8 J. M. Swanson, R. H. Henchman and J. A. McCammon, Biophys.
J., 2004, 86, 67–74.
9 B. K. Shoichet, S. L. McGovern, B. Wei and J. J. Irwin, Curr.
Opin. Chem. Biol., 2002, 6, 439–446.
10 W. P. Walters, M. T. Stahl and M. A. Murcko, Drug Discovery
Today, 1998, 3, 160–178.
11 B. K. Shoichet, Nature, 2004, 432, 862–865.
12 M. Rarey, B. Kramer, T. Lengauer and G. Klebe, J. Mol. Biol.,
1996, 261, 470–489.
13 R. A. Friesner, J. L. Banks, R. B. Murphy, T. A. Halgren, J. J.
Klicic, D. T. Mainz, M. P. Repasky, E. H. Knoll, M. Shelley, J. K.
Perry, D. E. Shaw, P. Francis and P. S. Shenkin, J. Med. Chem.,
2004, 47, 1739–1749.
14 I. Muegge and Y. C. Martin, J. Med. Chem., 1999, 42, 791–804.
15 R. DeWitte and E. Shakhnovich, J. Am. Chem. Soc., 1996, 118,
11733–11744.
16 A. E. Cho, V. Guallar, B. J. Berne and R. Friesner, J. Comput.
Chem., 2005, 26, 915–931.
17 K. Raha and K. M. Merz, Jr, J. Med. Chem., 2005, 48,
4558–4575.
18 B. R. Brooks, R. E. Bruccoleri, B. D. Olafson, D. J. States, S.
Swaminathan and M. Karplus, J. Comput. Chem., 1983, 4
,
187–217.
19 A. D. MacKerell, Jr, in Computational Biochemistry and Biophy-
sics, ed. O. M. Becker, A. D. MacKerell, Jr, B. Roux and M.
Watanabe, Marcel Dekker, Inc., New York, Editon edn, 2001.
20 W. L. Jorgensen and J. Tirado-Rives, Proc. Natl. Acad. Sci. U. S.
A., 2005, 102, 6665–6670.
21 A. Nicholls and B. Honig, J. Comput. Chem., 1991, 12, 435–445.
22 W. C. Still, A. Tempczyk, R. C. Hawley and T. Hendrickson,
J. Am. Chem. Soc., 1990, 112, 6127–6129.
23 A. Ghosh, C. S. Rapp and R. A. Friesner, J. Phys. Chem. B, 1998,
102, 10983–10990.
24 V. Tsui and D. A. Case, Biopolymers, 2000, 56, 275–291.
25 M. Karplus and G. A. Petsko, Nature, 1990, 347, 631–639.
26 D. A. McQuarrie, Statistical Mechanics, Harper & Row, New
York, 1976.
27 I. D. Kuntz, E. C. Meng and B. K. Schoichet, Acc. Chem. Res.,
1994, 27, 117–123.
28 D. A. Pearlman and P. S. Charifson, J. Med. Chem., 2001, 44,
3417–3423.
5176 | Phys.Chem.Chem.Phys., 2006, 8, 5166–5177 This journal is
c
the Owner Societies 2006
29 D. A. Pearlman and P. S. Charifson, J. Med. Chem., 2001, 44,
502–511.
30 J. Wang, P. Morin, W. Wang and P. A. Kollman, J. Am. Chem.
Soc., 2001, 123, 5521–5230.
31 B. G. Rao, E. E. Kim and M. A. Murcko, J. Comput. Aided Mol.
Des., 1996, 10, 23–30.
32 M. R. Reddy and M. D. Erion, J. Am. Chem. Soc., 2001, 123,
6246–6252.
33 W. F. V. G. Chris Oostenbrink, Proteins: Structure, Function, and
Bioinformatics, 2004, 54, 237–246.
34 C. J. Cramer, Essentials of Computational Chemistry Theories and
Models, John Wiley & Sons Ltd, Chichester, UK, 2002.
35 U. C. Singh, F. K. Brown, P. K. Bash and P. A. Kollman, J. Am.
Chem. Soc., 1987, 109, 1607.
36 D. M. Ferguson, D. A. Pearlman, W. C. Swope and P. A. Koll-
man, J. Comput. Chem., 1992, 13, 362–370.
37 J. A. McCammon, Curr. Opin. Struct. Biol., 1991, 1, 196–200.
38 D. J. Price and W. L. Jorgensen, J. Comput. Aided Mol. Des., 2001,
15, 681–695.
39 R. C. Rizzo, J. Tirado-Rives and W. L. Jorgensen, J. Med. Chem.,
2001, 44, 145–154.
40 P. A. Kollman, I. Massova, C. Reyes, B. Kuhn, S. Huo, L. Chong,
M. Lee, T. Lee, Y. Duan, W. Wang, O. Donini, P. Cieplak, J.
Srinivasan, D. A. Case and T. E. Cheatham, III, Acc. Chem. Res.,
2000, 33, 889–897.
41 R. C. Rizzo, T. Aynechi, D. A. Case and I. D. Kuntz, J. Chem.
Theory Comput., 2006, 2, 128–139.
42 S. Huo, J. Wang, P. Cieplak, P. A. Kollman and I. D. Kuntz,
J. Med. Chem., 2002, 45, 1412–1419.
43 T. Steinbrecher, D. A. Case and A. Labahn, J. Med. Chem., 2006,
49, 1837–1844.
44 B. Kuhn, P. Gerber, T. Schulz-Gasch and M. Stahl, J. Med. Chem.,
2005, 48, 4040–4048.
45 P. Bonnet and R. A. Bryce, J. Mol. Graphics Modell., 2005, 24,
147–156.
46 B. Kuhn and P. A. Kollman, J. Med. Chem., 2000,
43, 3786–3791.
47 J. Wang, X. Kang, I. D. Kuntz and P. A. Kollman, J. Med. Chem.,
2005, 48, 2432–2444.
48 J. Aqvist, C. Medina and J. E. Samuelsson, Protein Eng., 1994, 7,
385–391.
49 W. Wang, J. Wang and P. A. Kollman, Proteins, 1999, 34,
395–402.
50 R. H. Smith, Jr, W. L. Jorgensen, J. Tirado-Rives, M. L. Lamb, P.
A. Janssen, C. J. Michejda and M. B. Kroeger Smith, J. Med.
Chem., 1998, 41, 5272–5286.
51 R. Zhou, R. A. Friesner, A. Ghosh, R. C. Rizzo, W.L. Jorgensen
and R. M. Levy, J. Phys. Chem. B, 2001, 105, 10388–10397.
52 U. Bren, V. Martinek and J. Florian, J. Phys. Chem. B, 2006, 110,
10557–10566.
53 M. Almlof, J. Aqvist, A. O. Smalas and B. O. Brandsdal, Biophys.
J., 2006, 90, 433–442.
54 Z. Zhou and J. D. Madura, Proteins: Structure, Function, and
Bioinformatics, 2004, 57, 493–503.
55 E. Perola, W. P. Walters and P. S. Charifson, Proteins, 2004, 56,
235–249.
56 W. B. Floriano, N. Vaidehi, G. Zamanakos and W. A. Goddard,
III, J. Med. Chem., 2004, 47, 56–71.
57 C. Kalyanaraman, K. Bernacki and M. P. Jacobson, Biochemistry,
2005, 44, 2059–2071.
58 N. Huang, C. Kalyanaraman, J. J. Irwin and M. P. Jacobson, J.
Chem. Inf. Model, 2006, 46, 243–253.
59 T. A. Halgren, R. B. Murphy, R. A. Friesner, H. S. Beard, L. L.
Frye, W. T. Pollard and J. L. Banks, J. Med. Chem., 2004, 47,
1750–1759.
60 E. C. Meng, B. K. Shoichet and I. D. Kuntz, J. Comput. Chem.,
1992, 13, 505–524.
61 D. M. Lorber and B. K. Shoichet, Protein Sci., 1998, 7, 938–950.
62 B. Q. Wei, W. A. Baase, L. H. Weaver, B. W. Matthews and B. K.
Shoichet, J. Mol. Biol., 2002, 322, 339–355.
63 I. D. Kuntz, J. M. Blaney, S. J. Oatley, R. Langridge and T. E.
Ferrin,
J. Mol. Biol., 1982, 161, 269–288.
64 MDDR, MDL Inc., San Leandro, CA.
65 IMPACT, 2003, Schrodinger Inc., New York.
66 M. P. Jacobson, G. A. Kaminski, R. A. Friesner and C. A. Rapp,
J. Phys. Chem. B, 2002, 106, 11673–11680.
67 M. P. Jacobson, D. L. Pincus, C. S. Rapp, T. J. Day, B. Honig, D.
E. Shaw and R. A. Friesner, Proteins, 2004, 55, 351–367.
68 X. Li, M. P. Jacobson and R. A. Friesner, Proteins, 2004, 55,
368–382.
69 W. L. Jorgensen, D. S. Maxwell and J. Tirado-Rives, J. Am. Chem.
Soc., 1996, 118, 11225–11236.
70 G. A. Kaminski, R. A. Friesner, J. Tirado-Rives and W. L.
Jorgensen, J. Phys. Chem. B, 2001, 105, 6474–6487.
71 E. Gallicchio, L. Y. Zhang and R. M. Levy, J. Comput. Chem.,
2002, 23, 517–529.
72 D. X. Xie and T. Schlick, SIAM J. Optimization, 1999, 10,
132–154.
73 M. Tuckerman, B. J. Berne and G. J. Martyna, J. Chem. Phys.,
1992, 97, 1990–2001.
74 K. Bernacki, C. Kalyanaraman and M. P. Jacobson, J. Biomol.
Screen, 2005, 10, 675–681.
75 E. F. Pettersen, T. D. Goddard, C. C. Huang, G. S. Couch, D. M.
Greenblatt, E. C. Meng and T. E. Ferrin, J. Comput. Chem., 2004,
25, 1605–1612.
76 C. N. Schutz and A. Warshel, Proteins, 2001, 44, 400–417.
77 J. J. Havranek and P. B. Harbury, Proc. Natl. Acad. Sci. U. S. A.,
1999, 96, 11145–11150.
78 T. Hansson, J. Marelius and J. Aqvist, J. Comput. Aided Mol. Des.,
1998, 12, 27–35.
79 J. Aqvist and J. Marelius, Comb. Chem. High Throughput Screen,
2001, 4, 613–626.
80 J. L. Banks, H. S. Beard, Y. Cao, A. E. Cho, W. Damm, R. Farid,
A. K. Felts, T. A. Halgren, D. T. Mainz, J. R. Maple, R. Murphy,
D. M. Philipp, M. P. Repasky, L. Y. Zhang, B. J. Berne, R. A.
Friesner, E. Gallicchio and R. M. Levy, J. Comput. Chem., 2005,
26, 1752–1780.
This journal is
c
the Owner Societies 2006 Phys. Chem. Chem. Phys., 2006, 8, 5166–5177 | 5177
... Docking software, such as AutoDock 4 (Morris et al., 2009), AutoDock Vina (Trott and Olson, 2010), GOLD (Verdonk et al., 2003), and Glide (Friesner et al., 2004) commonly use scoring functions to predict the structure of the bound ligand (the pose), its binding affinity and its rank compared to other proposed poses. These scoring functions use either molecular force fields (Huang et al., 2006), statistical potentials (Gohlke et al., 2000) or linear combinations of empirical terms (Krammer et al., 2005). Advancements in machine learning (ML) have enabled the development of ML-based scoring functions (MLBSFs) that appear to outperform other scoring functions in accuracy for predicting binding affinity. ...
Preprint
Full-text available
Motivation Machine learning-based scoring functions (MLBSFs) have been found to exhibit inconsistent performance on different benchmarks and be prone to learning dataset bias. For the field to develop MLBSFs that learn a generalisable understanding of physics, a more rigorous understanding of how they perform is required. Results In this work, we compared the performance of a diverse set of popular MLBSFs (RFScore, SIGN, OnionNet-2, Pafnucy, and PointVS) to our proposed baseline models that can only learn dataset biases on a range of benchmarks. We found that these baseline models were competitive in accuracy to these MLBSFs in almost all proposed benchmarks, indicating these models only learn dataset biases. Our tests and provided platform, ToolBoxSF, will enable researchers to robustly interrogate MLBSF performance and determine the effect of dataset biases on their predictions. Availability and Implementation https://github.com/guydurant/toolboxsf Contact deane@stats.ox.ac.uk Supplementary information Supplementary data are available at Bioinformatics online.
Article
Full-text available
Computational methods available for the calculation of relative and absolute binding affinities (free energy simulations, continuum electrostatics, linear interaction energy approximations, and empirical solvation models) are reviewed together with recent applications to biological systems. The decomposability of the binding free energy into physically meaningful components is examined and results obtained for these components are presented. Some of these components, such as the direct interactions, the translational / rotational entropy loss, and the desolvation free energy are well recognized. Recent calculations have shown that the translational / rotational entropy loss is not as large as some theoretical calculations have previously suggested because of substantial residual movements in the bound complex. Recent work also points to the importance of contributions that are often neglected in binding affinity calculations, such as the protein reorganization energy and, for flexible ligands, the ligand reorganization energy. Future work should concentrate on the improvement of the energy functions and simulation protocols for the achievement of more precise and accurate predictions.
Article
The Smoluchowski equation, describing rotational Brownian motion under the action of large step-like temporal changes in an electric field, is solved by means of a new method. This method essentially makes use of an expansion of the orientational distribution function in terms of biorthogonal functions. The evaluation of the ensemble averages (P1) and (P2) by this procedure leads to relatively simple expressions with different relaxation times, which are dependent on the interaction energy. Furthermore, these relaxation times are in any case shorter than Debye's dipole relaxation time.
Article
ChemInform is a weekly Abstracting Service, delivering concise information at a glance that was extracted from about 100 leading journals. To access a ChemInform Abstract of an article which was published elsewhere, please select a “Full Text” option. The original article is trackable via the “References” option.
Article
Free energies derived from computer simulations can aid in the interpretation or prediction of experimental data on biomolecular structure, thermodynamics and kinetics. Progress made during the past year has improved the accuracy and speed of free energy calculations, and has provided new insights into molecular associations, protein folding and electron transfer.
Article
The prediction of protein side chain conformations is used to evaluate the accuracy of force field parameters. Specifically, new torsional parameters have recently been reported for the OPLS-AA force field, which achieved substantially better accuracy with respect to high level gas-phase quantum chemical calculations [J. Phys. Chem. B 2001, 105, 6474]. Here we demonstrate that these new parameters also lead to qualitatively improved side chain prediction accuracy. The primary emphasis is on the prediction of single side chain conformations, with the rest of the protein held fixed at the native configuration. Errors due to incomplete sampling can thus be essentially eliminated, using a combination of rotamer search and energy minimization. In addition, the protein environment is modeled realistically using implicit solvation and an explicit representation of crystal packing effects. Aided by the development of new algorithms, these calculations have been performed with modest computational requirements (a cluster of PCs) on a database of 36 proteins (5000 total residues). The side chain prediction tests that we employ are quite general and can be used to evaluate nonbonded or solvation parameters as well. As such, they provide a useful complement to decoy studies for force field validation.
Article
In this paper, we present SMoG (Small Molecule Growth), a novel, straightforward method for de novo lead design and the evidence for its effectiveness. It is based on a simple model for ligand-protein interactions and a scoring that is directly related to the free energy through a knowledge-based potential. A large number of structures are examined by an efficient metropolis Monte Carlo molecular growth algorithm that generates molecules through the adjoining of functional groups directly in the binding region. Thus SMoG is a method that is able to rank a large number of potential compounds according to binding free energy in a short time. In this sense, SMoG represents a step toward an ideal computational tool for ligand design.