ArticlePDF AvailableLiterature Review

Molecular mechanics methods for predicting protein-ligand binding

December 2006
Physical Chemistry Chemical Physics 8(44):5166-77

December 2006
8(44):5166-77

DOI:10.1039/b608269f

Source
PubMed

Authors:

Niu Huang

National Institute of Biological Sciences, China

Chakrapani Kalyanaraman

UCSF University of California, San Francisco

Matthew P Jacobson

UCSF University of California, San Francisco

Ligand binding affinity prediction is one of the most important applications of computational chemistry. However, accurately ranking compounds with respect to their estimated binding affinities to a biomolecular target remains highly challenging. We provide an overview of recent work using molecular mechanics energy functions to address this challenge. We briefly review methods that use molecular dynamics and Monte Carlo simulations to predict absolute and relative ligand binding free energies, as well as our own work in which we have developed a physics-based scoring method that can be applied to hundreds of thousands of compounds by invoking a number of simplifying approximations. In our previous studies, we have demonstrated that our scoring method is a promising approach for improving the discrimination between ligands that are known to bind and those that are presumed not to, in virtual screening of large compound databases. In new results presented here, we explore several improvements to our computational method including modifying the dielectric constant used for the protein and ligand interiors, and empirically scaling energy terms to compensate for deficiencies in the energy model. Future directions for further improving our physics-based scoring method are also discussed.

Free-energy cycle that can be used to calculate relative binding free energies. R is the free receptor in solution, L is the free ligand in solution, R*L is the protein-ligand complex in solution, DDG mut L

…

Comparison of our physics-based rescoring method with the MM-PB/SA and LIE methods

…

Selectivity among enolase superfamily members (from Kalyanaraman et al., 2005) 57

…

Our physics-based refinement and rescoring protocol (Reproduced with permission from J. Chem. Inf. Model, 2006, 46, 243-253. Copyright 2006 American Chemical Society). 58 The superscript R refers to the free receptor in solution, L to the ligand in solution, and R*L to the protein-ligand complex in solution. E bind is the predicted ligand binding energy, the free receptor energy in solution (E R ) is a constant value, E L is the energy of the optimized free ligand in solution, and E R*L is the energy of the optimized ligand-protein complex in solution.

…

Left: binding site of mandelate racemase (PDB ID 1MDR). The docked pose of the substrate, S-mandelate, and the co-crystallized structure of an inhibitor, S-atrolactic acid, are shown in CPK models. Right: chemical similarity to the substrate as a function of % of ranked database, as measured by a property-based Tanimoto coefficient, which decreases from 1 as the chemical similarity decreases. Reproduced with permission from Biochemistry , 2005, 44 , 2059–2071 (ref. 57). Copyright 2005 American Chemical Society. Chemical similarity is defined by descriptors that include both the numbers of common functional groups and whole molecule descriptors such as dipole and volume. Enrichment of compounds that are chemically similar to the known substrate after docking (dark/blue line) and rescoring (light/red line) are shown. The results have been smoothed using the moving average to decrease noise and emphasize the overall trends.

…

Figures - uploaded by Matthew P Jacobson

Content may be subject to copyright.

Content uploaded by Matthew P Jacobson

Content may be subject to copyright.

Molecular mechanics methods for predicting protein–ligand bindingw

Niu Huang, Chakrapani Kalyanaraman, Katarzyna Bernacki and

Matthew P. Jacobson*

Received 12th June 2006, Accepted 8th August 2006

First published as an Advance Article on the web 1st September 2006

DOI: 10.1039/b608269f

Ligand binding aﬃnity prediction is one of the most important applications of computational

chemistry. However, accurately ranking compounds with respect to their estimated binding

aﬃnities to a biomolecular target remains highly challenging. We provide an overview of recent

work using molecular mechanics energy functions to address this challenge. We brieﬂy review

methods that use molecular dynamics and Monte Carlo simulations to predict absolute and

relative ligand binding free energies, as well as our own work in which we have developed a

physics-based scoring method that can be applied to hundreds of thousands of compounds by

invoking a number of simplifying approximations. In our previous studies, we have demonstrated

that our scoring method is a promising approach for improving the discrimination between

ligands that are known to bind and those that are presumed not to, in virtual screening of large

compound databases. In new results presented here, we explore several improvements to our

computational method including modifying the dielectric constant used for the protein and ligand

interiors, and empirically scaling energy terms to compensate for deﬁciencies in the energy model.

Future directions for further improving our physics-based scoring method are also discussed.

1. Introduction

The application of techniques of computational chemistry to

studying structural and functional properties of biological

macromolecules has increased dramatically due to rapid ad-

vances in computer power, improvements in force ﬁelds, and

development of numerical algorithms.

However, free energy

calculation remains challenging in both theoretical and prac-

tical aspects. Accurately calculating the binding aﬃnities of

small-molecule ligands to biomolecular targets is one of the

ultimate goals of structure-based drug design (SBDD). A

variety of computational models and tools have been devel-

oped and tested in reproducing experimental binding data for

diﬀerent target systems. The theoretical complexity and accu-

racy vary greatly, ranging from simple statistical multivariate

equations to computationally intensive free energy perturba-

tion methods. Several reviews on theories and applications are

available in the literature.

2–8

Two important applications of computational SBDD meth-

ods are lead discovery and lead optimization. Lead discovery

is the process of identifying new compounds that bind with

reasonable aﬃnity (typically low micromolar or better) to a

particular macromolecular target and inhibit its function.

Virtual screening methods can be used to identify compounds,

among large and diverse databases, that are most likely to

bind to a receptor; these can then be prioritized for experi-

mental testing.

9–11

Lead optimization is the process of chemi-

cally modifying a lead compound for improved properties,

including, usually, improved binding to the receptor. To be

useful in this context, computational methods must be capable

of predicting with some precision the relative binding aﬃnities

of similar compounds. The molecular mechanics methods we

and others have developed for scoring protein–ligand com-

plexes have utility for both lead discovery and lead optimiza-

tion. With respect to our own methods, we focus here on lead

discovery applications, reviewing our previously published

work as well as introducing new improvements to the compu-

tational methods.

Numerous structure-based virtual screening methods (also

referred to as small-molecule docking) have been developed to

assist lead discovery. These methods orient and score small

molecules for shape and chemical complementarity to a

macromolecular binding site. Critical issues include the meth-

ods for exploring the conformational space of the ﬂexible

ligands and receptor (sampling) and the estimation of relative

binding aﬃnities for the ligand–receptor complexes (scoring).

In high-throughput docking, a scoring function has to be

simple to compute in order to dock a large chemical library

(B10

–10

compounds). Currently, most commonly used

scoring functions in high throughput docking can be classiﬁed

as empirical (e.g. FlexX, Glide)

12,13

or knowledge-based (e.g.

PMF, SMoG).

14,15

Empirical scoring functions contain adjus-

table parameters that are determined by ﬁtting to many crystal

complexes with available experimental binding aﬃnities.

Knowledge-based functions are derived from statistical ana-

lysis of the interaction distances between diﬀerent atom types

in crystal structures of protein–ligand complexes.

Department of Pharmaceutical Chemistry, University of California

San Francisco, UCSF MC 2240, Genentech Hall, Room N472C, 600

16th St., San Francisco, CA 94158-2517, USA. E-mail:

matt.jacobson@ucsf.edu; Fax: +1-415-514-4260; Tel: +1-415-514-

9811

w The HTML version of this article has been enhanced with additional

colour images.

5166 | Phys.Chem.Chem.Phys., 2006, 8, 5166–5177 This journal is

c

the Owner Societies 2006

INVITED ARTICLE www.rsc.org/pccp | Phy sical Chem istry Chemical Physics

An alternate approach is to attempt to score protein–ligand

complexes (i.e., estimate their absolute or relative binding free

energies) using physics-based energy functions. This approach

is challenging because the physics of ligand binding is compli-

cated. Factors such as electronic polarizability, entropic losses

in the ligand and receptor, and solvation eﬀects can all impact

absolute and relative binding aﬃnities. Two important

strengths of physics-based scoring methods are (1) they do

not require parameterization using ligand binding aﬃnity data

and crystal structural information, and hence are not subject

to concerns about over-ﬁtting, and (2) it is possible to system-

atically pursue improvements to the scoring function by using

more sophisticated energy models and sampling schemes.

The rest of this manuscript is organized as follows. In

section 2, we concisely review important uses of molecular

mechanics methods in structure-based drug discovery. Section

3 describes our computational method for applying molecular

mechanics scoring in high-throughput docking applications.

Section 4 reviews our published data testing this approach,

and section 5 reports further optimization of the method,

examining in some detail the contributions from intrinsic

and environmental eﬀects during ligand binding. We conclude

by discussing future directions for molecular mechanics meth-

ods in structure-based drug design.

2. Brief review of molecular mechanics methods

In principle, the most rigorous methods to approximate the

ligand–receptor binding enthalpy are based on quantum me-

chanical (QM) methods. Recently, a quantum mechanical/

molecular mechanical (QM/MM) algorithm was developed to

improve ligand binding pose prediction by replacing the force

ﬁeld charges of ligands to QM/MM calculated charges in the

protein environment.

Moreover, a semi-empirical QM based

scoring function was validated to capture binding aﬃnity

trends in a diverse range of protein–ligand complexes, as well

as the ability to discriminate between native and decoy

poses.

However, at the present time such calculations are

prohibitively expensive to be applied to high-throughput

applications, and it is diﬃcult to incorporate solvent eﬀects.

In contrast to QM methods, molecular mechanics methods

are based on classical mechanics, allowing computational

simulations to be performed on large biomolecular systems

containing more than 100 000 atoms. Therefore, molecular

mechanics is currently the most feasible means to model the

interactions between ligands and receptor in a physically

realistic manner, especially in large-scale applications (i.e.

high-throughput docking). Free energy calculations derived

from molecular mechanics (MM) simulations of protein–

ligand complexes can account for ﬂexibility for both the

protein and the ligand as well as solvation eﬀects, and both

accuracy and eﬃciency can be achieved within certain approx-

imations. We focus here on reviewing MM based binding

aﬃnity prediction methods.

(a) Molecular mechanics force ﬁelds

Most commonly used force ﬁelds for biomolecular applica-

tions have the following form (eqn (1)),

18,19

where bonded interactions include the bond, angle and dihe-

dral terms (b, y and f), respectively. Non-bonded interactions

include the van der Waals (vdW) term represented by the

Lennard-Jones (LJ) 6-12 potential, and electrostatic interac-

tions, which are treated by Coulombic interactions between

point charges centered on each atom.

Water can be treated in the same manner as the macro-

molecule, i.e. , using point charges on each atom, and many

explicit solvent models have been developed and used in

estimating absolute or relative binding aﬃnities for protein–

ligand complexes.

However, the extensive conformational

sampling required to converge simulations using explicit water

can lead to high computational expense, and many researchers

have pursued the development of more approximate but

eﬃcient models for solvent. Commonly used implicit solvent

models represent water as a continuum dielectric medium,

therefore greatly reducing the computational expense of cal-

culating the solvent–solvent and solute–solvent interactions.

Such solvent models are typically parameterized to estimate

solvation free energies and thus implicitly account for solvent

entropy. The most widely used implicit solvent models are the

Poisson–Boltzmann/surface area (PB/SA)

and the general-

ized Born/surface area model (GB/SA).

22–24

PB methods

numerically solve the Poisson–Boltzmann equation. GB mod-

els can be considered a semi-analytical approximation to

solution of the PB equation.

Molecular dynamics is the simulation of the time-dependent

motion of molecules according to Newton’s laws of motion.

Based on the principles of statistical mechanics,

the macro-

scopic properties of the molecular system can be calculated

from the microscopic conﬁgurations recorded in a suﬃciently

long trajectory during the MD simulation. Therefore, force

ﬁeld based MD simulations have been widely used in free-

energy calculations for sampling conformational space and

determining equilibrium averages, including in methods such

as free-energy perturbation (FEP), thermodynamic integration

(TI) and molecular mechanics Poisson–Boltzmann/surface area

(MM-PB/SA), which have been used to predict ligand binding

free energies in good agreement with experimental data.

27–33

(b) Key thermodynamics principles

Generally, predicting the relative rather than absolute binding

free-energies of a series of ligands to the same molecular target

is adequate for most practical applications. The binding free

energy diﬀerence (DDG

bind

) for ligand L

versus ligand L

receptor R can be evaluated by employing the free-energy

cycle as shown in Fig. 1.

Although directly calculating the experimentally measurable

binding free energies of the two diﬀerent ligands DG

bind

)

E ¼

bonds

ðb  b

angles

ðy  y

torsions

½1 þ cosðnf  dÞ

v dW

min;ij



2

min;ij



electrostatics

4pDr

ð1Þ

This journal is

c

the Owner Societies 2006 Phys. Chem. Chem. Phys., 2006, 8, 5166–5177 | 5167

and DG

bind

) is very diﬃcult, the free energy diﬀerences to

alchemically transform one ligand into another similar ligand

in the free (DDG

mut

) and bound states (DDG

R*L

mut

) are easier to

calculate via theoretical methods, and the relative binding free

energy can be calculated as

DDG

bind

¼ DG

bind

ðL

ÞDG

bind

ðL

¼ DDG

RL

mut

 DD

mut

ð2Þ

The free energies for the alchemical transformations can be

computed using rigorous statistical mechanical methods such

as FEP and TI in conjunction with MD simulations or Monte

Carlo (MC) simulations.

2,35–39

These and similar methods have

been extensively reviewed elsewhere.

2–7

FEP and TI relate the

free energy of a system and the ensemble average of an energy

function that describes that system. In practical application,

they generally treat solvent molecules and ions explicitly.

However, the computational demand of adequate sampling

makes such methods most amenable for estimating relative

binding aﬃnities between similar ligands, while relative binding

aﬃnities between diverse ligands and absolute binding aﬃnities

predictions pose more of a challenge. Both are not applicable

to large numbers of compounds. Such methods are currently

most useful in the context of lead optimization process, when

the relative binding aﬃnities of dozens of derivatives from the

same chemical scaﬀold are considered. These methods typically

give good predictions of relative binding free energies (a mean

absolute error o 1 kcal mol

1

is frequently reported).

31–33

End-point free energy methods such as MM-PB/SA

beneﬁt

from computational eﬃciency relative to rigorous free energy

methods such as FEP and TI as only the initial and ﬁnal states

of the system are evaluated.

MM-PB/SA and similar methods

instead employ the free-energy cycle shown in Fig. 2.

This approximation allows the absolute binding free energy

of ligand L to the receptor to be estimated by decomposing the

binding energy into a gas-phase free energy and a solvation

free energy of transferring the free ligand, free receptor and

ligand–receptor complex from the gas phase to aqueous solu-

tion (eqn (3)) where the free energies for each species are

evaluated individually (eqn (4)):

bind

¼ DG

RL

water

 DG

water

 DG

water

ð3Þ

water

¼ G

gas

þ G

solv

¼ðH

gas

 TSÞþG

solv

ð4Þ

The key improvement in eﬃciency comes from treating G

solv

using an implicit solvent model,

generally either PB/SA or

GB/SA, which treat the solvation free energy as decomposable

into electrostatic and nonpolar components:

solv

¼ G

elec

þ G

nonpolar

ð5Þ

The enthalpic components are treated using a force ﬁeld

approximation:

gas

 E

gas

¼ E

bond

þ E

angle

þ E

tosion

þ E

elec

þ E

v dW

ð6Þ

The MM-PB/SA method, as originally formulated, averages

the gas phase enthalpy and solvation free energy over multiple

conﬁgurations sampled from molecular dynamics (MD) simu-

lations with explicit solvent. In some cases, estimates of the

entropy losses upon binding are also included. The MM-PB/

SA method has been widely applied to predict ligand–receptor

binding geometries and to calculate absolute or relative bind-

ing aﬃnities in good agreement with experimental data (mean

absolute error of B1 to 2 kcal mol

1

in many cases).

30,42–47

(d) Linear interaction energy (LIE)

LIE shares some similarities with the MM-PB/SA method, in

that it also uses averages calculated from explicit solvent

simulations, and also considers only the bound and unbound

‘‘end points’’. Aqvist and coworkers

7,48

implemented this

semi-empirical method to estimate ligand binding aﬃnities

based on the following linear approximation:

bind

¼ aDhE

elec

iþbDhE

v dW

ið7Þ

where hE

elec

i and hE

vdW

i are the ensemble averages of the

electrostatic and van der Waals interaction energies between

the ligand and its environment over an MD trajectory, respec-

tively, and D refers to the diﬀerence between these ensemble

averages in water and in the receptor binding site. The scaling

factors a and b are determined empirically.

Other researchers

have added other terms, such as solvent accessible surface area

(SASA, g).

This method was initially developed to predict

the binding aﬃnities of a set of endothiapepsin inhibitors, and

found to give accurate results both for absolute as well as

relative binding free energies for large number of protein–ligand

systems (mean absolute error of B1.0 kcal mol

1

7,48,51

Fig. 1 Free-energy cycle that can be used to calculate relative binding

free energies. R is the free receptor in solution, L is the free ligand in

solution, R*L is the protein–ligand complex in solution, DDG

mut

is the

free energy change to alchemically change ligand L

into L

solution, and DDG

R*L

mut

is the free energy diﬀerence of transforming

R*L

to R*L

in solution.

Fig. 2 Free-energy cycle used to calculate the binding free energy of

ligand L to the receptor R in solution. R

gas

is the free receptor in

vacuum, R

water

is the free receptor in solution, L

gas

is the free ligand in

vacuum, L

water

is the free ligand in solution, DG

solv

is the free energy to

solvate R, and D G

R*L

solv

is the free energy to solvate complex R*L.

5168 | Phys.Chem.Chem.Phys., 2006, 8, 5166–5177 This journal is

c

the Owner Societies 2006

Recently, the LIE approach has been applied to study DNA

structural stability

and protein–protein interactions.

Also,

a method similar to LIE was applied to reproduce the relative

binding energy of HIV-1 RT inhibitors by scaling diﬀerent

energetic components (vdW, electrostatic, solvation and non-

polar solvation).

(e) Molecular mechanics scoring in high-throughput docking

All of the MM-based free energy calculation methods dis-

cussed thus far are computationally expensive and generally

limited to evaluating dozens or hundreds of compounds. These

methods can also be complicated to apply, requiring expert

training. A few studies have been published on applying

molecular mechanics based scoring functions to reﬁne and

rescore ligands in a high-throughput virtual screening con-

text.

55,56

We have developed a physics-based rescoring method

that can be applied to hundreds of thousands of compounds,

i.e., as is typical in lead discovery applications, using molecular

mechanics energy functions similar to those employed in the

more computationally intensive methods discussed above.

57,58

Theoretically, our rescoring protocol (shown in Fig. 3) is

similar to MM-PB/SA but applies energy minimization rather

than molecular dynamics. This further approximation greatly

increases computational eﬃciency, but in principle could be a

signiﬁcant limitation compared with the ensemble averaging

over MD simulations performed in MM-PB/SA and LIE.

However, recently, Kuhn and coworkers performed an exten-

sive study suggesting that applying the MM-PB/SA energy

function to a single, relaxed complex structure is an adequate

and sometimes more accurate approach than the standard

averaging over molecular dynamics ensembles.

In Table 1, we compare our rescoring method with some of

the other methods discussed above. We view our rescoring

method as intermediate between high-throughput docking

methods and more rigorous molecular mechanics-based meth-

ods, in terms of both the number of approximations made and

computational expense. It is orders of magnitude slower than

most simple docking scoring functions, but orders of magni-

tude faster than more rigorous free energy estimates. Ulti-

mately, we can envisage following up the physics-based

rescoring with even more computationally intensive (but pre-

sumably more accurate) methods for a subset of ligands

selected by our rescoring approach.

Fig. 3 Our physics-based reﬁnement and rescoring protocol (Reproduced with permission from J. Chem. Inf. Model, 2006, 46, 243–253.

The superscript R refers to the free receptor in solution, L to the ligand in solution, and R*L to the

protein–ligand complex in solution. E

bind

is the predicted ligand binding energy, the free receptor energy in solution (E

) is a constant value, E

the energy of the optimized free ligand in solution, and E

R*L

is the energy of the optimized ligand–protein complex in solution.

Table 1 Comparison of our physics-based rescoring method with the MM-PB/SA and LIE methods

PLOP/Rescoring MM-PB/SA LIE

Force ﬁeld All-atom All-atom All-atom

Non-bonded

interaction energy

Single minimized structure Averaged ensemble Linear interaction energy model

Ligand strain Partial treatment, with ﬂexible

ligand minimization

Treated via MD Treated via MD during sampling,

N/A in scoring

Receptor strain Rigid or can be partially

included by minimization and

sidechain search

Treated via MD Treated via MD during sampling,

N/A in scoring

Ligand desolvation Implicit solvent Explicit solvent during sampling,

implicit solvent in scoring

Explicit solvent during sampling,

N/A in scoring

Receptor desolvation Implicit solvent Explicit solvent during sampling,

implicit solvent in scoring

Explicit solvent during sampling,

N/A in scoring

Entropy N/A Optionally approximated via normal

mode analysis

N/A in scoring

Computational

timing

One minute per ligand Hours or days per ligand Hours or days per ligand

This journal is

c

the Owner Societies 2006 Phys. Chem. Chem. Phys., 2006, 8, 5166–5177 | 5169

3. Methods

Our molecular mechanics scoring method consists of two

steps: predicting the binding poses of ligands using a docking

program, and then rescoring those protein–ligand complexes

using a more computationally intensive molecular-mechanics

based energy function. The rescoring procedure uses the OPLS

all-atom force ﬁeld and a generalized Born implicit solvent

model, and accounts for ligand/receptor desolvation, and to a

lesser extent, ligand strain energies, in a more physically

realistic manner than the docking algorithm.

(a) High throughput virtual screening

In principle, any docking method can be used to predict the

conformation ligands bound to a protein. In our prior work,

we have used two diﬀerent docking programs for this purpose:

Glide

13,59

and DOCK 3.5.54.

60–62

The new work reported here

uses the latter program, and we brieﬂy review the protocol we

employed.

An automated docking approach was used to facilitate the

docking calculations with minimal user intervention. Most of

the labor-intensive, manual steps are now performed in an

automated fashion, including binding site preparation, sphere

generation, scoring grids computation, docking calculation

and data analysis (N Huang, B Shoichet & J Irwin, in

preparation). Brieﬂy, each protein was prepared for docking

in the same manner. Matching spheres, required for initial

placement of the ligand during database screening, were

obtained from the position of the crystallographic ligand using

the program SPHGEN.

Four diﬀerent types of grids were

generated before the docking calculations, including an ex-

cluded volume grid obtained from DISTMAP,

a united

AMBER-based van der Waals potential grid computed by

CHEMGRID,

an electrostatic potential grid calculated

using DelPhi

and a ligand desolvation grid computed using

SOLVMAP (B Shoichet, unpublished results). The program

DOCK 3.5.54 was used to dock compounds into the protein

binding site. Ensembles of pre-calculated conformers from

conformationally expanded databases are used to signiﬁcantly

speed up docking calculations.

61,64

On the average, sampling

millions of poses for a single ligand takes only one second. For

each ligand orientation, the conformational ensemble is ﬁl-

tered for steric complementarity ﬁrst. Ligand conformations

are scored based on the docking total energy (E

tot

= E

ele

vdW

 DG

lig-solv

), which is the sum of electrostatic (E

ele

) and

van der Waals (E

vdW

) interaction energies corrected by the

ligand partial desolvation energy (DG

lig-solv

Final energies

were computed after rigid-body minimization. Then, a single

docking pose with the best total energy score was saved for

each docked molecule.

(b) Molecular mechanics rescoring

The rescoring procedure for a single protein–ligand complex is

shown in Fig. 3, and this procedure has been fully automated.

The ﬁrst step is to generate OPLS force ﬁeld parameters for

each ligand using IMPACT,

after which the coordinate and

parameter ﬁles are passed to Protein Local Optimization

Program (PLOP),

66–68

the Jacobson group’s in-house software

(free for academic use; commercial distribution under the

name Prime). The protein–ligand complex and the free ligand

were then submitted to energy minimization in GB solvent.

The binding energy (E

bind

= E

R*L

 E

) was calculated

by subtracting the energies of the optimized free ligand in

solution (E

) and the free protein in solution (E

) from the

optimized ligand–protein complex’s energy in solution (E

R*L

)

as described previously (eqn (3)). In our previous work, the

protein was kept rigid during minimization of the ligand–

protein complex to reduce computational expense.

All energy minimizations were performed using PLOP

66–68

with the all-atom OPLS force ﬁeld (OPLS-AA)

69,70

and the

Surface Generalized Born (SGB) implicit solvent model.

23,71

PLOP implements a multi-scale truncated-Newton (MSTN)

minimization algorithm. The algorithm is adapted from

TNPACK

and optimized by applying multiscale methods,

analogous to those used in molecular dynamics (e.g.,

r-RESPA).

The molecular mechanics forces are divided

into short-(bond, angle, torsion, and local non-bonded) and

long-range components, with the long-range forces updated

only intermittently (never during the inner TN cycles, and

infrequently during the outer cycles). The speedup of MSTN

relative to the unmodiﬁed TNPACK algorithm is a factor of

4.0–4.5 with the parameters used here. The algorithm is also

optimized for minimizations with Generalized Born implicit

solvent, using a self-consistent procedure that increases the

computational expense, relative to the vacuum, by only a

factor of B3. Cutoﬀs for the non-bonded interactions are

residue-based, and depend on the type of side chain (charged

or neutral).

We have evaluated our method primarily by the rate of

‘‘enrichment’’, the increase in the proportion of active com-

pounds found in selected subsets from calculations compared

with the proportion expected from random selection. Enrich-

ment is measured as the proportion of true binders found in

selected subsets from docking (or rescoring) calculations com-

pared with the proportion expected from random selection.

The enrichment factor (EF) is calculated as EF

subset

= {Bin-

ders

subset

}/{Binders

total

For instance, for a

given protein system with 100 known binders (Binders

total

)in

a database of 100 000 compounds (N

total

), only one of the

known binders (Binders

subset

) would be expected to be found

in any randomly chosen subset of 1000 molecules (N

subset

This corresponds to an enrichment factor of 1. If ten known

binders (10% of known binders) were actually found in the top

1000 molecules of the ranked database (1% of database) by

docking, then the enrichment factor at that point (1% of

database) would be equal to 10, which is the number of known

binders actually found (10 known binders) divided by the

number of known binders expected from random selection

(1 known binder).

4. Our studies performed to date

(a) Enzyme speciﬁcity studies

The ﬁrst published application of our physics-based scoring of

ligand–protein complexes was in the context of a virtual

5170 | Phys.Chem.Chem.Phys., 2006, 8, 5166–5177 This journal is

c

the Owner Societies 2006

metabolite screening method designed to help assign enzy-

matic function for alpha–beta barrel proteins.

Estimating

relative binding aﬃnities in this case is particularly challenging

because the active sites of these enzymes contain numerous

charged groups (lysines, carboxylates, histidines, and one or

more metal ions) as shown in Fig. 4 (left). Using the physics-

based rescoring procedure, the ranks of known substrates are

generally improved signiﬁcantly compared with the docking

alone (Table 2). In addition to the substrate and product being

ranked highly, the other top-ranked ligands are strongly

enriched in compounds with high chemical similarity to the

substrate (e.g., diﬀerent substitution patterns on a similar

scaﬀold), as shown in Fig. 4 (right). Importantly, the rescoring

procedure appears to be capable of capturing selectivity (Table

2). That is, all of the enzymes have strikingly similar folds and

belong to the same superfamily, but recognize dramatically

diﬀerent substrates and even perform diﬀerent reactions. The

molecular mechanics-based rescoring procedure clearly cap-

tures the selectivity of the enzymes for their cognate substrates,

and vice versa .

(b) Enrichment studies on nine therapeutically important

enzymes

We next investigated the ability of our physics-based rescoring

method to enrich known inhibitors of a diverse set of ther-

apeutically relevant targets. We evaluated the strengths and

limitations of our rescoring procedure by the extent to which

known inhibitors were enriched against a background of

100 000 drug-like decoys on 9 enzyme systems.

Encoura-

gingly, for all 9 cases, the maximum enrichment factor in-

creased upon rescoring, by up to a factor of 6 (Table 3). The

improvement in enrichment is most robust and sometimes

dramatic within the top 1% the ranked database, i.e., the ﬁrst

thousand compounds. In 4 of the 9 test cases, the rescoring

method robustly improves enrichment, relative to docking

alone, well beyond the top 1% of the ranked database. In

the other test cases, however, the results of the docking and

rescoring methods are roughly comparable beyond the top

1%. The improved early enrichment is likely due to the more

realistic treatment of ligand and, especially, receptor desolva-

tion in the rescoring procedure; the fully ﬂexible minimization

of the ligands in the receptor during the rescoring stage may

also contribute to the improved enrichment. To our knowl-

edge, this work represents the most extensive test to date of the

utility of an all-atom force ﬁeld/implicit solvent model scoring

function in the context of high-throughput virtual screening.

Not surprisingly, incorrect protonation states on the li-

gands, receptor, or co-factors signiﬁcantly aﬀect the electro-

static potential, which in turn strongly aﬀects the rescoring

calculations. It appears that the simpler scoring function

employed in the docking method is less sensitive to such

errors, while the more physically reasonable molecular

Fig. 4 Left: binding site of mandelate racemase (PDB ID 1MDR). The docked pose of the substrate, S-mandelate, and the co-crystallized

structure of an inhibitor, S-atrolactic acid, are shown in CPK models. Right: chemical similarity to the substrate as a function of % of ranked

database, as measured by a property-based Tanimoto coeﬃcient, which decreases from 1 as the chemical similarity decreases. Reproduced with

descriptors that include both the numbers of common functional groups and whole molecule descriptors such as dipole and volume. Enrichment of

compounds that are chemically similar to the known substrate after docking (dark/blue line) and rescoring (light/red line) are shown. The results

have been smoothed using the moving average to decrease noise and emphasize the overall trends.

Table 2 Selectivity among enolase superfamily members (from Kalyanaraman et al., 2005)

Ranks of substrates before rescoring (%) Ranks of substrates after rescoring (%)

Proteins MR GlucD MAL OSBS Enolase MR GlucD MAL OSBS Enolase

MR 6.5 22.6 24.7 10.7 6.5 0.4 4.5 9.3 18.8 21.9

GlucD 425 0.03 3.9 425 0.93 425 0.02 5.0 425 8.6

MAL 4.9 0.2 9.1 5.1 2.7 6.2 11.0 1.1 23.8 22.6

OSBS 12.9 11.5 25.7 6.1 11.7 2.1 6.5 7.0 0.2 12.3

Enolase 3.7 5.7 1.9 425 0.04 21.2 1.4 1.1 425 0.2

Each row represents the ranks of the 5 substrates when docked against one of the 5 enzymes. The diagonal elements represent the ranks of the

cognate substrates, which improve signiﬁcantly upon rescoring. If the computation is capturing selectivity, these values should be lower than those

of the oﬀ-diagonal elements. The substrates are as follows: MR = S-mandelate, GlucD = D-glucarate, MAL = L-threo (2s,3s)-3-methyl

aspartate, OSBS = 2-succinyl-2-hydroxy-2,4-cyclohexadiene-1-carboxylate, enolase = 2-phospho-glycerate.

This journal is

c

the Owner Societies 2006 Phys. Chem. Chem. Phys., 2006, 8, 5166–5177 | 5171

mechanics energy employed in the rescoring requires accurate

treatment of protonation and charge states to correctly ac-

count for the electrostatic properties of ligand–receptor bind-

ing complexes.

rescoring methods in screening against E. coli dihydrofolate

reductase (DHFR)

As a ﬁnal example, we describe our participation in the

McMaster Data-Mining and Docking Competition.

The

goal of this contest was to predict what compounds, out of a

database of 50 000, would bind to E. coli DHFR. The

molecular mechanics rescoring method played a key role in

this work, which also contained a new twist. Instead of dock-

ing to the crystal structure, we deliberately modiﬁed the

receptor to incorporate knowledge of ‘‘induced-ﬁt’’ eﬀects

associated with varying DHFR inhibitors’ scaﬀolds (as de-

rived from existing DHFR crystal structures). Speciﬁcally, we

developed and applied a receptor preparation procedure in

which torsion angles of loops and side chains are deliberately

sampled to open a key portion of the binding site. This

procedure used the same MM-GB/SA energy function as the

rescoring procedure, highlighting a major advantage of phy-

sics-based scoring functions: ligands and the protein receptor

can be treated consistently using the same scoring function,

making it possible to predict conformational changes asso-

ciated with ligand binding. Fig. 5 summarizes the results.

5. New results and discussion

In principle, the improved enrichment shown by our rescoring

method, relative to high-throughput docking programs, re-

ﬂects improved estimation of relative binding aﬃnities, at least

for a subset of the known inhibitors. However, the free energy

depends on a balance of many diﬀerent intrinsic and environ-

mental contributions as discussed previously. We have under-

taken an extensive analysis of the energetic components that

contribute to the discrimination between true actives and

decoys, and developed new improvements based on this

analysis. Here, we present the most recent results for two very

diﬀerent protein binding sites, both of which are important

drug targets. Thrombin has a large, solvent exposed polar

binding surface (Fig. 6a) while estrogen receptor (ER) has a

deeply buried and mostly hydrophobic binding pocket (Fig.

6b). Thrombin was included as a test case in our previously

published work while estrogen receptor is a new test case. We

included estrogen receptor in this study because, unlike most

other test cases we have examined since our original published

studies, the results we obtained using the molecular mechanics

rescoring were initially worse than those obtained using the

DOCK 3.5.54 scoring function. Since the molecular mechanics

scoring in principle captures important physical eﬀects that the

docking scoring function does not, such as receptor desolva-

tion and ligand strain, we set out to understand why it

performed more poorly. This investigation has led to signiﬁ-

cant improvements in the rescoring method. Generally, the

observations on these two protein systems are applicable to

Table 3 Measures of enrichment of the known inhibitors for nine enzyme systems achieved by docking alone (D) and the rescoring procedure (R)

(from Huang et al., 2006)

Enzyme PDB code

Number of

known inhibitors

% of ranked database

to ﬁnd 25% of known

inhibitors

Maximum enrichment

factor achieved

% of ranked database

where maximum enrichment

factor occurred

DRDRD R

DHFR 3dfr 100 0.3 0.1 110 239 0.1 0.1

GART 1c2t 50 0.9 0.8 46 159 0.3 0.1

AR 1ah3 722 3.5 4.0 8 12 2.0 0.1

PARP 1efy 45 4.6 2.3 6 11 5.2 3.8

PNP 1b8o 25 1.2 0.1 60 358 0.2 0.1

SAHH 1a7a 37 2.1 1.8 14 19 1.3 2.0

Thrombin 1ba8 243 4.2 0.8 25 49 0.1 0.1

AChE 1e66 554 5.0 5.1 21 25 0.4 0.1

TS 2bbq 171 1.5 0.5 25 52 0.3 0.1

In this work, both the known inhibitors and the drug-like decoys were taken from the MDL Drug Data Report (MDDR). Abbreviations: AR,

aldose reductase; DHFR, dihydrofolate reductase; GART, glycinamide ribonucleotide transformylase; PARP, poly(ADP-ribose) polymerase;

PNP, purine nucleoside phosphorylase; SAHH, S-adenosylhomocysteine hydrolase; AChE, acetylcholinesterase; TS, thymidylate synthase.

Fig. 5 The percent of known inhibitors identiﬁed (y axis) in increas-

ingly large subsets of the ranked database (x axis) (reproduced with

by SAGE Publications Inc.),

for E. coli DHFR. The grey line

represents the results expected from random selection of ligands.

The dotted (blue) line and solid (blue) line represent the docking

enrichment using the original holo crystal structure and the remodeled

structure, respectively. The solid (orange) line with circles represents

the rescoring of inhibitor poses from docking against the remodeled

structure.

5172 | Phys.Chem.Chem.Phys., 2006, 8, 5166–5177 This journal is

c

the Owner Societies 2006

other proteins with similar binding properties that we studied

(data not shown).

(a) Dielectric constant and dielectric boundary

One limitation of implicit solvent models is that the calculated

binding interaction energies can change dramatically as a

result of changes in the dielectric constant used for the protein

and ligand interiors (D

) and the deﬁnition of the dielectric

boundary. It has been suggested that the dielectric constant

inside a macromolecule is not a universal constant but simply

a parameter that depends on the model used.

From a

modeling standpoint, a dielectric constant of 1 is correct only

if electronic polarizability and conformational dynamics are

explicitly included in the calculation. Electronic polarizability

alone can result in an eﬀective dielectric constant of roughly

2to4.

Poisson–Boltzmann calculations ( e.g., for predicting

’s of protein side chains) frequently treat the internal

dielectric as an empirically adjustable parameter, and routi-

nely use an internal dielectric between 4 and 20 to obtain the

most reliable results.

Nonetheless, we used D

= 1 in all of the prior work

reported in section 4, which used a rigid receptor and ﬁxed

atomic partial charges (no electronic polarizability), because

the GB solvation model as well as the atomic force ﬁeld

charges were optimized using this dielectric constant. We have

empirically tested diﬀerent values of D

, without otherwise

adjusting the solvent model or force ﬁeld, in the context of

enriching known inhibitors in high-throughput docking. The

results suggest that using D

= 2 can lead to more robust

enrichment. For thrombin, increasing the value of D

from 1

to 2 does not change the overall enrichment signiﬁcantly (Fig.

7a). On the other hand, for ER, the larger dielectric constant

impacts the results more signiﬁcantly and positively (Fig. 7b).

This diﬀerence between the two binding sites is most likely due

to the extent of solvent exposure: in the polar, solvent-exposed

thrombin binding site, the electrostatic interactions are

screened primarily by the high dielectric of water, whereas in

the buried ER binding site, water plays little role in electro-

static screening and the internal dielectric is critical.

(b) Ligand reorganization energy

Generally, non-bonded intermolecular interactions are con-

sidered dominating in the ligand–receptor binding process.

However, if the bound conformation of the ligand is diﬀerent

from the conformation of the free ligand in solution, the

intramolecular energy of the ligand can contribute to the

binding free energy. In our rescoring protocol, ligand reorga-

nization energies have been approximated using a molecular

mechanics force ﬁeld and the GB/SA model, where minimiza-

tion of the ligands alone and bound to the protein were used to

compare ligand intramolecular energies in these two states. We

have assumed that the bound conformation of the ligand is

unique and that structural ﬂuctuations of the bound ligand

contribute negligibly to the free energy of binding. This

assumption may be reasonable, at least for tight binding

ligands. However, a more dramatic assumption implicit in

the published rescoring procedure is that the free ligand can be

approximated by a single low-energy conformation.

For most ﬂexible ligands, this assumption is likely to be

poor. Nonetheless, obtaining a true ensemble for ligands in

solution would be computationally expensive, and we assumed

that even a crude treatment of ligand strain would be better

than none at all. In subsequent work we have reexamined this

assumption. As an alternative to the published scheme, we

simply extracted the ligand conformation from the minimized

complex structure and evaluated its energy using this ﬁxed

geometry, and this value (E

) was used for binding energy

calculation (E

bind

= E

R*L

 E

). Surprisingly, this

modiﬁcation to the method resulted in non-trivial improve-

ments, especially with respect to the early enrichment for

Fig. 6 Binding surfaces of two representative protein systems, thrombin and estrogen receptor. The crystallographic ligand is represented by a

CPK model coloured by atom type. The key hydrogen bond interactions between the protein and ligands are illustrated with dashed (yellow) lines.

The molecular images were generated with UCSF Chimera.

This journal is

c

the Owner Societies 2006 Phys. Chem. Chem. Phys., 2006, 8, 5166–5177 | 5173

protein systems binding highly strained ligands. For both

thrombin and ER, the maximum enrichment factors are nearly

doubled using the new procedure relative to the published

procedure (Fig. 8). In both cases, the overall enrichment is

consistently improved throughout the top 20% of the data-

base.

This improved enrichment has been observed in many other

systems as well (data not shown), where higher energy ligand

structures are required to maximize the ligand–receptor inter-

actions. Interestingly, similar approximations have been made

in MM-PB/SA calculations

44–46

and LIE methods,

78,79

where

MD simulations were only performed on the ligand–receptor

complex instead of three independent simulations of free

ligand, free receptor and the ligand–receptor complex. Aqvist

proposed that the intramolecular ligand strain energy corre-

lates linearly with the intermolecular electrostatic interaction,

making it possible to ignore the strain energy in the LIE

model.

Our own view is that simply minimizing the ligand

in solvent may increase errors due to neglecting the entropy of

the ligand in solution and also due to inaccuracies in the ligand

torsional parameters. Further work will be necessary to de-

termine whether ensemble averaging of the ligand in solution,

as well as improvements in ligand torsional parameters,

69,70,80

will provide more accurate treatment of ligand strain.

Fig. 7 Enrichment plots obtained after docking alone (solid dark/blue line), after rescoring using interior dielectric constant of 1 (solid light/

orange line) and rescoring using interior dielectric constant of 2 (dotted light/green line). Left: the percent of known ligands identiﬁed in

increasingly large subsets of the ranked database. The diagonal (grey) line represents the results expected from random selection of ligands. Right:

enrichment factor as a function of the fraction of the ranked database.

Fig. 8 Enrichment plots for two representative protein systems obtained after docking alone (solid dark/blue line), after rescoring with free ligand

minimization (solid light/orange line) and rescoring without free ligand minimization (dotted light/orange line).

5174 | Phys.Chem.Chem.Phys., 2006, 8, 5166–5177 This journal is

c

the Owner Societies 2006

Combining the improvements discussed in the previous two

sections, both the early and overall enrichments are signiﬁ-

cantly improved for ER (Fig. 9b) with the maximum enrich-

ment factors nearly doubled and B20–30% of improvement

on the overall enrichments.

(d) Scaling the energy components

Accurate free energy calculations depend on a proper balance

of many diﬀerent energetic components. As we have empha-

sized, our rescoring method strikes a balance between compu-

tational speed and accuracy, and in particular neglects

entropic losses and protein ﬂexibility (in the results discussed

here). Empirically scaling certain energy components as a

post-rescoring process, in a manner similar to the LIE scheme,

may be useful to compensate for some of these limitations.

Indeed, a simple scaling scheme seems to consistently improve

the enrichment for all the targets we have studied.

In general, scaling up the electrostatic interaction energy,

relative to the other components of the scoring function, seems

to improve results for binding sites containing charged side

chains. For example, scaling up the electrostatic interaction

energy by a factor of 2 signiﬁcantly improves the enrichment

in thrombin (dark/purple solid line, Fig. 10a). If we follow

Aqvist’s argument

that intermolecular electrostatic interac-

tions correlate with intramolecular ligand strain energies, we

can assume that scaling up the electrostatic energies may

compensate for the lack of explicit treatment of intramolecular

ligand strain energies. Indeed, it is shown that the eﬀect of free

Fig. 9 Enrichment plots obtained after docking alone (solid dark/blue line) and after rescoring (solid light/orange line), rescoring using interior

dielectric constant of 2 without free ligand minimization (dotted light/green line).

Fig. 10 Enrichment plots obtained after rescoring (solid light/orange line), rescoring with scaling particular energetic components (solid dark/

purple line) and rescoring without free ligand minimization procedure (dotted line). See text for details.

This journal is

c

the Owner Societies 2006 Phys. Chem. Chem. Phys., 2006, 8, 5166–5177 | 5175

ligand minimization is diminished after we scale up the

electrostatic interaction energies in charged binding site like

thrombin (dark/purple dotted line, Fig. 10a).

By contrast, for hydrophobic sites like ER, scaling up the

vdW term improves results. The maximum enrichment factor

for ER is nearly doubled by scaling up the vdW interaction

energy by a factor of 2 (dark/purple solid line, Fig. 10b). We

speculate that the MM-GB/SA scoring function underesti-

mates the non-polar binding contributions to the free energy

of binding, and that increasing the vdW term compensates for

this deﬁciency. Note that we did not attempt to scale all of the

energetic components to maximize the enrichment perfor-

mance for speciﬁc targets. We suspect that it will not be

possible to ﬁnd a universal set of scaling factors that perform

excellently across many targets.

6. Conclusions

Ligand binding aﬃnity prediction is one of the most important

applications of computational chemistry in the ﬁeld of struc-

ture-based drug design. However, accurately scoring/ranking

database compounds with respect to their estimated binding

aﬃnities to a biomolecular target remains highly challenging.

Molecular mechanics based energy functions have been used

in combination with MD simulations to predict absolute and

relative ligand binding free energies, using methods such as

FEP, TI and MM-PB/SA. However, such MD-based free

energy methods are computationally expensive and can be

complicated to apply. We have developed a physics-based

rescoring method that can be applied to hundreds of thou-

sands of compounds by invoking a number of simplifying

approximations and by developing new computational meth-

ods, especially the multi-scale truncated Newton minimization

algorithm.

In our previous studies and the new results presented here,

we have demonstrated that our rescoring method is a promis-

ing approach for improving the discrimination between known

ligands and decoys in virtual screening of large compound

databases. As we have emphasized, our rescoring method

strikes a balance between computational speed and accuracy,

and ultimately, we can envisage following up the physics-based

rescoring with even more computationally intensive (but pre-

sumably more accurate) methods for a subset of ligands. For

example, free energy methods like FEP can capture protein

and ligand entropy losses due to binding, which are ignored in

our scoring method. In addition, the new generation of

polarizable force ﬁelds could in principle be used to treat

electronic polarizability eﬀects that are neglected by the ﬁxed

charge force ﬁelds we have used.

From a more pragmatic standpoint, we believe that the two

most signiﬁcant limitations of the rescoring method in its

current form are related to incorrect poses generated by the

docking algorithm and the rigid receptor approximation ap-

plied in this work. A simple extension of the current method is

to subject a small number of dissimilar docking poses to

rescoring, minimizing the receptor along with the ligand

during the rescoring stage, and use the most favorable binding

energy for rank-ordering ligands.

Acknowledgements

We thank Brian Shoichet, John Irwin, Alan Graves, Johannes

Hermann, and the rest of the Shoichet group for many helpful

conversations that were critical in guiding this work and for

technical assistance. NIH grants GM071790, AI035707, and

GM56531 are acknowledged for ﬁnancial support. QB3 at

UCSF is thanked for computational support, MDL Inc. for

providing the MDDR database and ISIS software (to Prof.

Brian Shoichet, UCSF), the Shoichet lab for making compu-

ters available for this work, and Schro

dinger Inc. for use of

IMPACT. M.P.J. is a member of the Scientiﬁc Advisory

Board of Schro

dinger Inc.

References

1 Computational Biochemistry and Biophysics, ed. O. M. Becker, A.

D. MacKerell, Jr, B. Roux and M. Watanabe, Marcel Dekker,

Inc., New York, 2001.

2 D. L. Beveridge and F. M. DiCapua, Annu. Rev. Biophys. Biophys.

Chem., 1989, 18, 431–492.

3 P. Kollman, Chem. Rev., 1993, 93, 2395–2417.

4 Ajay and M. A. Murcko, J. Med. Chem., 1995, 38, 4953–4967.

5 A. Ajay, M. A. Murcko and P. F. W. Stouten, in Practical

application of computer-aided drug design, ed. P. S. Charifson,

Marcel Dekker, Inc., New York, Editon edn, 1997, pp. 165–194.

6 T. Lazaridis, Curr. Org. Chem., 2002, 6, 1319–1332.

7 B. O. Brandsdal, F. Osterberg, M. Almlof, I. Feierberg,

V. B. Luzhkov and J. Aqvist, Adv. Protein Chem., 2003, 66,

123–158.

8 J. M. Swanson, R. H. Henchman and J. A. McCammon, Biophys.

J., 2004, 86, 67–74.

9 B. K. Shoichet, S. L. McGovern, B. Wei and J. J. Irwin, Curr.

Opin. Chem. Biol., 2002, 6, 439–446.

10 W. P. Walters, M. T. Stahl and M. A. Murcko, Drug Discovery

Today, 1998, 3, 160–178.

11 B. K. Shoichet, Nature, 2004, 432, 862–865.

12 M. Rarey, B. Kramer, T. Lengauer and G. Klebe, J. Mol. Biol.,

1996, 261, 470–489.

13 R. A. Friesner, J. L. Banks, R. B. Murphy, T. A. Halgren, J. J.

Klicic, D. T. Mainz, M. P. Repasky, E. H. Knoll, M. Shelley, J. K.

Perry, D. E. Shaw, P. Francis and P. S. Shenkin, J. Med. Chem.,

2004, 47, 1739–1749.

14 I. Muegge and Y. C. Martin, J. Med. Chem., 1999, 42, 791–804.

15 R. DeWitte and E. Shakhnovich, J. Am. Chem. Soc., 1996, 118,

11733–11744.

16 A. E. Cho, V. Guallar, B. J. Berne and R. Friesner, J. Comput.

Chem., 2005, 26, 915–931.

17 K. Raha and K. M. Merz, Jr, J. Med. Chem., 2005, 48,

4558–4575.

18 B. R. Brooks, R. E. Bruccoleri, B. D. Olafson, D. J. States, S.

Swaminathan and M. Karplus, J. Comput. Chem., 1983, 4

187–217.

19 A. D. MacKerell, Jr, in Computational Biochemistry and Biophy-

sics, ed. O. M. Becker, A. D. MacKerell, Jr, B. Roux and M.

Watanabe, Marcel Dekker, Inc., New York, Editon edn, 2001.

20 W. L. Jorgensen and J. Tirado-Rives, Proc. Natl. Acad. Sci. U. S.

A., 2005, 102, 6665–6670.

21 A. Nicholls and B. Honig, J. Comput. Chem., 1991, 12, 435–445.

22 W. C. Still, A. Tempczyk, R. C. Hawley and T. Hendrickson,

J. Am. Chem. Soc., 1990, 112, 6127–6129.

23 A. Ghosh, C. S. Rapp and R. A. Friesner, J. Phys. Chem. B, 1998,

102, 10983–10990.

24 V. Tsui and D. A. Case, Biopolymers, 2000, 56, 275–291.

25 M. Karplus and G. A. Petsko, Nature, 1990, 347, 631–639.

26 D. A. McQuarrie, Statistical Mechanics, Harper & Row, New

York, 1976.

27 I. D. Kuntz, E. C. Meng and B. K. Schoichet, Acc. Chem. Res.,

1994, 27, 117–123.

28 D. A. Pearlman and P. S. Charifson, J. Med. Chem., 2001, 44,

3417–3423.

5176 | Phys.Chem.Chem.Phys., 2006, 8, 5166–5177 This journal is

c

the Owner Societies 2006

29 D. A. Pearlman and P. S. Charifson, J. Med. Chem., 2001, 44,

502–511.

30 J. Wang, P. Morin, W. Wang and P. A. Kollman, J. Am. Chem.

Soc., 2001, 123, 5521–5230.

31 B. G. Rao, E. E. Kim and M. A. Murcko, J. Comput. Aided Mol.

Des., 1996, 10, 23–30.

32 M. R. Reddy and M. D. Erion, J. Am. Chem. Soc., 2001, 123,

6246–6252.

33 W. F. V. G. Chris Oostenbrink, Proteins: Structure, Function, and

Bioinformatics, 2004, 54, 237–246.

34 C. J. Cramer, Essentials of Computational Chemistry Theories and

Models, John Wiley & Sons Ltd, Chichester, UK, 2002.

35 U. C. Singh, F. K. Brown, P. K. Bash and P. A. Kollman, J. Am.

Chem. Soc., 1987, 109, 1607.

36 D. M. Ferguson, D. A. Pearlman, W. C. Swope and P. A. Koll-

man, J. Comput. Chem., 1992, 13, 362–370.

37 J. A. McCammon, Curr. Opin. Struct. Biol., 1991, 1, 196–200.

38 D. J. Price and W. L. Jorgensen, J. Comput. Aided Mol. Des., 2001,

15, 681–695.

39 R. C. Rizzo, J. Tirado-Rives and W. L. Jorgensen, J. Med. Chem.,

2001, 44, 145–154.

40 P. A. Kollman, I. Massova, C. Reyes, B. Kuhn, S. Huo, L. Chong,

M. Lee, T. Lee, Y. Duan, W. Wang, O. Donini, P. Cieplak, J.

Srinivasan, D. A. Case and T. E. Cheatham, III, Acc. Chem. Res.,

2000, 33, 889–897.

41 R. C. Rizzo, T. Aynechi, D. A. Case and I. D. Kuntz, J. Chem.

Theory Comput., 2006, 2, 128–139.

42 S. Huo, J. Wang, P. Cieplak, P. A. Kollman and I. D. Kuntz,

J. Med. Chem., 2002, 45, 1412–1419.

43 T. Steinbrecher, D. A. Case and A. Labahn, J. Med. Chem., 2006,

49, 1837–1844.

44 B. Kuhn, P. Gerber, T. Schulz-Gasch and M. Stahl, J. Med. Chem.,

2005, 48, 4040–4048.

45 P. Bonnet and R. A. Bryce, J. Mol. Graphics Modell., 2005, 24,

147–156.

46 B. Kuhn and P. A. Kollman, J. Med. Chem., 2000,

43, 3786–3791.

47 J. Wang, X. Kang, I. D. Kuntz and P. A. Kollman, J. Med. Chem.,

2005, 48, 2432–2444.

48 J. Aqvist, C. Medina and J. E. Samuelsson, Protein Eng., 1994, 7,

385–391.

49 W. Wang, J. Wang and P. A. Kollman, Proteins, 1999, 34,

395–402.

50 R. H. Smith, Jr, W. L. Jorgensen, J. Tirado-Rives, M. L. Lamb, P.

A. Janssen, C. J. Michejda and M. B. Kroeger Smith, J. Med.

Chem., 1998, 41, 5272–5286.

51 R. Zhou, R. A. Friesner, A. Ghosh, R. C. Rizzo, W.L. Jorgensen

and R. M. Levy, J. Phys. Chem. B, 2001, 105, 10388–10397.

52 U. Bren, V. Martinek and J. Florian, J. Phys. Chem. B, 2006, 110,

10557–10566.

53 M. Almlof, J. Aqvist, A. O. Smalas and B. O. Brandsdal, Biophys.

J., 2006, 90, 433–442.

54 Z. Zhou and J. D. Madura, Proteins: Structure, Function, and

Bioinformatics, 2004, 57, 493–503.

55 E. Perola, W. P. Walters and P. S. Charifson, Proteins, 2004, 56,

235–249.

56 W. B. Floriano, N. Vaidehi, G. Zamanakos and W. A. Goddard,

III, J. Med. Chem., 2004, 47, 56–71.

57 C. Kalyanaraman, K. Bernacki and M. P. Jacobson, Biochemistry,

2005, 44, 2059–2071.

58 N. Huang, C. Kalyanaraman, J. J. Irwin and M. P. Jacobson, J.

Chem. Inf. Model, 2006, 46, 243–253.

59 T. A. Halgren, R. B. Murphy, R. A. Friesner, H. S. Beard, L. L.

Frye, W. T. Pollard and J. L. Banks, J. Med. Chem., 2004, 47,

1750–1759.

60 E. C. Meng, B. K. Shoichet and I. D. Kuntz, J. Comput. Chem.,

1992, 13, 505–524.

61 D. M. Lorber and B. K. Shoichet, Protein Sci., 1998, 7, 938–950.

62 B. Q. Wei, W. A. Baase, L. H. Weaver, B. W. Matthews and B. K.

Shoichet, J. Mol. Biol., 2002, 322, 339–355.

63 I. D. Kuntz, J. M. Blaney, S. J. Oatley, R. Langridge and T. E.

Ferrin,

J. Mol. Biol., 1982, 161, 269–288.

64 MDDR, MDL Inc., San Leandro, CA.

65 IMPACT, 2003, Schrodinger Inc., New York.

66 M. P. Jacobson, G. A. Kaminski, R. A. Friesner and C. A. Rapp,

J. Phys. Chem. B, 2002, 106, 11673–11680.

67 M. P. Jacobson, D. L. Pincus, C. S. Rapp, T. J. Day, B. Honig, D.

E. Shaw and R. A. Friesner, Proteins, 2004, 55, 351–367.

68 X. Li, M. P. Jacobson and R. A. Friesner, Proteins, 2004, 55,

368–382.

69 W. L. Jorgensen, D. S. Maxwell and J. Tirado-Rives, J. Am. Chem.

Soc., 1996, 118, 11225–11236.

70 G. A. Kaminski, R. A. Friesner, J. Tirado-Rives and W. L.

Jorgensen, J. Phys. Chem. B, 2001, 105, 6474–6487.

71 E. Gallicchio, L. Y. Zhang and R. M. Levy, J. Comput. Chem.,

2002, 23, 517–529.

72 D. X. Xie and T. Schlick, SIAM J. Optimization, 1999, 10,

132–154.

73 M. Tuckerman, B. J. Berne and G. J. Martyna, J. Chem. Phys.,

1992, 97, 1990–2001.

74 K. Bernacki, C. Kalyanaraman and M. P. Jacobson, J. Biomol.

Screen, 2005, 10, 675–681.

75 E. F. Pettersen, T. D. Goddard, C. C. Huang, G. S. Couch, D. M.

Greenblatt, E. C. Meng and T. E. Ferrin, J. Comput. Chem., 2004,

25, 1605–1612.

76 C. N. Schutz and A. Warshel, Proteins, 2001, 44, 400–417.

77 J. J. Havranek and P. B. Harbury, Proc. Natl. Acad. Sci. U. S. A.,

1999, 96, 11145–11150.

78 T. Hansson, J. Marelius and J. Aqvist, J. Comput. Aided Mol. Des.,

1998, 12, 27–35.

79 J. Aqvist and J. Marelius, Comb. Chem. High Throughput Screen,

2001, 4, 613–626.

80 J. L. Banks, H. S. Beard, Y. Cao, A. E. Cho, W. Damm, R. Farid,

A. K. Felts, T. A. Halgren, D. T. Mainz, J. R. Maple, R. Murphy,

D. M. Philipp, M. P. Repasky, L. Y. Zhang, B. J. Berne, R. A.

Friesner, E. Gallicchio and R. M. Levy, J. Comput. Chem., 2005,

26, 1752–1780.

This journal is

c

the Owner Societies 2006 Phys. Chem. Chem. Phys., 2006, 8, 5166–5177 | 5177

Robustly interrogating machine learning based scoring functions: what are they learning?

Preprint

Full-text available

Nov 2023

Motivation Machine learning-based scoring functions (MLBSFs) have been found to exhibit inconsistent performance on different benchmarks and be prone to learning dataset bias. For the field to develop MLBSFs that learn a generalisable understanding of physics, a more rigorous understanding of how they perform is required. Results In this work, we compared the performance of a diverse set of popular MLBSFs (RFScore, SIGN, OnionNet-2, Pafnucy, and PointVS) to our proposed baseline models that can only learn dataset biases on a range of benchmarks. We found that these baseline models were competitive in accuracy to these MLBSFs in almost all proposed benchmarks, indicating these models only learn dataset biases. Our tests and provided platform, ToolBoxSF, will enable researchers to robustly interrogate MLBSF performance and determine the effect of dataset biases on their predictions. Availability and Implementation https://github.com/guydurant/toolboxsf Contact deane@stats.ox.ac.uk Supplementary information Supplementary data are available at Bioinformatics online.

Synthesis, Crystal Structure, and DFT Study of Tricyclo[4.3.1.03,7]decane Scaffold

Article

Jun 2024

Synthesis, crystal and molecular structure, DFT and antifungal activity studies of (E)-2-(3-methoxy-4-((6-(trifluoromethyl)pyrimidin-4-yl)oxy)styryl)-5-((2-(trifluoromethyl)benzyl)thio)-1,3,4-oxadiazole

Article

Jun 2024
J MOL STRUCT

In-silico assessment of bioactive compounds from chewing stick (Salvadora persica) against N-acetylneuraminate lyase (5ZKA) of Fusobacterium nucleatum involved in salicyclic acid metabolism

Article

Jun 2024
J MOL STRUCT

Synthesis, Crystal Structure Analysis and DFT Studies of Two Benzospirocyclic Ketones

Article

May 2024

Unveiling the potential of recently FDA-approved drugs as quorum sensing inhibitors against P. Aeruginosa using high-performance computational techniques

Article

Jan 2024
J BIOMOL STRUCT DYN

Synthesis, crystal structure and DFT study of 2-(3-bromophenyl)-1-(4-morpholinyl)ethanone

Article

Jan 2024

A High-Quality Data Set of Protein-Ligand Binding Interactions Via Comparative Complex Structure Modeling

Article

Jan 2024
J CHEM INF MODEL

TrIP─Transformer Interatomic Potential Predicts Realistic Energy Surface Using Physical Bias

Article

Dec 2023

Synthesis, crystal structure, DFT, vibrational properties, Hirshfeld surface and antitumor activity studies of 3-((4-methylpiperazin-1-yl) methyl)-1-octyl-5-(p-tolyl)-1H-pyrrolo[2,3-c]pyridine

Article

Sep 2023
J MOL STRUCT

Binding Affinity and Specificity from Computational Studies

Article

Full-text available

Dec 2002

Themis Lazaridis

Computational methods available for the calculation of relative and absolute binding affinities (free energy simulations, continuum electrostatics, linear interaction energy approximations, and empirical solvation models) are reviewed together with recent applications to biological systems. The decomposability of the binding free energy into physically meaningful components is examined and results obtained for these components are presented. Some of these components, such as the direct interactions, the translational / rotational entropy loss, and the desolvation free energy are well recognized. Recent calculations have shown that the translational / rotational entropy loss is not as large as some theoretical calculations have previously suggested because of substantial residual movements in the bound complex. Recent work also points to the importance of contributions that are often neglected in binding affinity calculations, such as the protein reorganization energy and, for flexible ligands, the ligand reorganization energy. Future work should concentrate on the improvement of the energy functions and simulation protocols for the achievement of more precise and accurate predictions.

Free Energy Calculations: Applications to Chemical and Biochemical Phenomena

Article

Nov 1993

Peter. Kollman

no abstract

StatisticaL Mechanics

Article

Jan 1975

Donald A. McQuarrie

Statistical Mechanics

Article

Feb 1977

A solution of the Smoluchowski equation for rotational Brownian motion applicable to large step-like temporal field changes

Article

Oct 1982
J MOL STRUCT

The Smoluchowski equation, describing rotational Brownian motion under the action of large step-like temporal changes in an electric field, is solved by means of a new method. This method essentially makes use of an expansion of the orientational distribution function in terms of biorthogonal functions. The evaluation of the ensemble averages (P1) and (P2) by this procedure leads to relatively simple expressions with different relaxation times, which are dependent on the interaction energy. Furthermore, these relaxation times are in any case shorter than Debye's dipole relaxation time.

ChemInform Abstract: Calculating Structures and Free Energies of Complex Molecules: Combining Molecular Mechanics and Continuum Models

Article

Mar 2001
ChemInform

Peter A. Kollman

ChemInform is a weekly Abstracting Service, delivering concise information at a glance that was extracted from about 100 leading journals. To access a ChemInform Abstract of an article which was published elsewhere, please select a “Full Text” option. The original article is trackable via the “References” option.

Free energy from simulations Current Opinion in Structural Biology 1991, 1 : 196–200

Article

Apr 1991
CURR OPIN STRUC BIOL

J. Andrew McCammon

Free energies derived from computer simulations can aid in the interpretation or prediction of experimental data on biomolecular structure, thermodynamics and kinetics. Progress made during the past year has improved the accuracy and speed of free energy calculations, and has provided new insights into molecular associations, protein folding and electron transfer.

Practical Application of Computer-Aided Drug Design

Article

Jan 1997

Paul S Charifson

An abstract is not available.

Force Field Validation Using Protein Side Chain Prediction

Article

Oct 2002

The prediction of protein side chain conformations is used to evaluate the accuracy of force field parameters. Specifically, new torsional parameters have recently been reported for the OPLS-AA force field, which achieved substantially better accuracy with respect to high level gas-phase quantum chemical calculations [J. Phys. Chem. B 2001, 105, 6474]. Here we demonstrate that these new parameters also lead to qualitatively improved side chain prediction accuracy. The primary emphasis is on the prediction of single side chain conformations, with the rest of the protein held fixed at the native configuration. Errors due to incomplete sampling can thus be essentially eliminated, using a combination of rotamer search and energy minimization. In addition, the protein environment is modeled realistically using implicit solvation and an explicit representation of crystal packing effects. Aided by the development of new algorithms, these calculations have been performed with modest computational requirements (a cluster of PCs) on a database of 36 proteins (5000 total residues). The side chain prediction tests that we employ are quite general and can be used to evaluate nonbonded or solvation parameters as well. As such, they provide a useful complement to decoy studies for force field validation.

SMoG: de Novo Design Method Based on Simple, Fast, and Accurate Free Energy Estimates. 1. Methodology and Supporting Evidence

Article

Nov 1996

In this paper, we present SMoG (Small Molecule Growth), a novel, straightforward method for de novo lead design and the evidence for its effectiveness. It is based on a simple model for ligand-protein interactions and a scoring that is directly related to the free energy through a knowledge-based potential. A large number of structures are examined by an efficient metropolis Monte Carlo molecular growth algorithm that generates molecules through the adjoining of functional groups directly in the binding region. Thus SMoG is a method that is able to rank a large number of potential compounds according to binding free energy in a short time. In this sense, SMoG represents a step toward an ideal computational tool for ligand design.

Molecular mechanics methods for predicting protein-ligand binding

Abstract and Figures

Recommended publications

Blind tests of RNA nearest-neighbor energy prediction

Dielectric image effects in environmental reorganization free energies and inter-reactant work terms...

Using temperature effects to predict the interactions between two RNAs

A Bulk Water-Dependent Desolvation Energy Model for Analyzing the Effects of Secondary Solutes on Bi...