ArticlePDF Available

Development and validation of a modular, extensible docking program: DOCK 5

December 2006
Journal of Computer-Aided Molecular Design 20(10-11):601-19

December 2006
20(10-11):601-19

Source
PubMed

Authors:

We report on the development and validation of a new version of DOCK. The algorithm has been rewritten in a modular format, which allows for easy implementation of new scoring functions, sampling methods and analysis tools. We validated the sampling algorithm with a test set of 114 protein-ligand complexes. Using an optimized parameter set, we are able to reproduce the crystal ligand pose to within 2 A of the crystal structure for 79% of the test cases using our rigid ligand docking algorithm with an average run time of 1 min per complex and for 72% of the test cases using our flexible ligand docking algorithm with an average run time of 5 min per complex. Finally, we perform an analysis of the docking failures in the test set and determine that the sampling algorithm is generally sufficient for the binding pose prediction problem for up to 7 rotatable bonds; i.e. 99% of the rigid ligand docking cases and 95% of the flexible ligand docking cases are sampled successfully. We point out that success rates could be improved through more advanced modeling of the receptor prior to docking and through improvement of the force field parameters, particularly for structures containing metal-based cofactors.

Summary of scoring functions and sampling algorithms for commonly used docking programs

…

The major DOCK 5 classes and their interconnections. The bold arrows denote the connections between the classes that implement the DOCK sampling algorithm. The path traced by the arrows illustrates the sequence of operations performed upon a ligand molecule during docking. The bold lines (without arrowheads) denote functional connections between classes. These connections allow one class to call functions implemented in another. This diagram demonstrates that the classes imple

…

Zinc VDW parameters used to generate grids

…

Average length of time in seconds for docking calcu- lation using the optimized parameter set (see Appendix 1)

…

Comparison of DOCK success rates to other docking programs for flexible ligand docking

…

Figures - uploaded by Natasja Brooijmans

Content may be subject to copyright.

Content uploaded by Natasja Brooijmans

Content may be subject to copyright.

ORIGINAL PAPER

Development and validation of a modular, extensible docking

program: DOCK 5

Demetri T. Moustakas Æ P. Therese Lang Æ Scott Pegg Æ Eric Pettersen Æ

Irwin D. Kuntz Æ Natasja Brooijmans Æ Robert C. Rizzo

Received: 12 April 2006 / Accepted: 22 July 2006 / Published online: 6 December 2006

 Springer Science+Business Media B.V. 2006

Abstract We report on the development and valida-

tion of a new version of DOCK. The algorithm has been

rewritten in a modular format, which allows for easy

implementation of new scoring functions, sampling

methods and analysis tools. We validated the sampling

algorithm with a test set of 114 protein–ligand com-

plexes. Using an optimized parameter set, we are able to

reproduce the crystal ligand pose to within 2 A

of the

crystal structure for 79% of the test cases using our rigid

ligand docking algorithm with an average run time of

1 min per complex and for 72% of the test cases using

our ﬂexible ligand docking algorithm with an average

run time of 5 min per complex. Finally, we perform an

analysis of the docking failures in the test set and

determine that the sampling algorithm is generally suf-

ﬁcient for the binding pose prediction problem for up to

7 rotatable bonds; i.e. 99% of the rigid ligand docking

cases and 95% of the ﬂexible ligand docking cases are

sampled successfully. We point out that success rates

could be improved through more advanced modeling of

the receptor prior to docking and through improvement

of the force ﬁeld parameters, particularly for structures

containing metal-based cofactors.

Keywords Automated docking  Scoring functions 

Structure-based drug design  Flexible docking 

Binding mode prediction  Incremental construction 

Validation

Introduction

Transient non-covalent interactions are critical for bio-

logical processes. The sequencing of a variety of ge-

nomes and the development of proteomics techniques

have enabled scientists to study these interactions on the

widest scales [1]. Advances in X-ray crystallography,

nuclear magnetic resonance spectroscopy, and other

experimental structure techniques provide the ability to

study these interactions at an atomic level of detail [2].

One important application of these advances is the de-

sign of small molecules that interact with cellular pro-

cesses to modify biological activity and treat disease.

D. T. Moustakas and P. T. Lang are joint ﬁrst authors

Electronic Supplementary Material The structure ﬁles for

the test set and the optimized input ﬁles used to generate this

data can be found at the DOCK web site

(http://dock.compbio.ucsf.edu).

D. T. Moustakas

Joint Graduate Program in Bioengineering, University of

California, San Francisco, 600 16th Street, Genentech Hall,

Box 2240, San Francisco, CA 94143, USA

D. T. Moustakas

Joint Graduate Program in Bioengineering, University of

California, Berkeley, Berkeley, CA, USA

P. T. Lang  N. Brooijmans

Graduate Program in Chemistry and Chemical Biology,

University of California, San Francisco, 600 16th Street,

Genentech Hall, Box 2240, San Francisco, CA 94143, USA

S. Pegg  E. Pettersen  I. D. Kuntz (&)

Department of Pharmaceutical Chemistry, University of

California, San Francisco, 600 16th Street, Genentech Hall,

Box 2240, San Francisco, CA 94143, USA

e-mail: kuntz@cgl.ucsf.edu

R. C. Rizzo

Department of Applied Mathematics and Statistics, Stony

Brook University, Room 1-101, Stony Brook, NY

11794-3600, USA

123

J Comput Aided Mol Des (2006) 20:601–619

DOI 10.1007/s10822-006-9060-4

The drug discovery process typically requires be-

tween 10 years and 15 years from early discovery until

FDA approval [3]. Computational tools—such as vir-

tual screening, homology modeling and cheminfor-

matics—are applied both to facilitate various stages of

research and to create models that explain experi-

mental data [4–6]. Molecular docking, which can

broadly be deﬁned as the prediction of the orientation

of two molecules with respect to one another, is a

computational technique that has been successfully

used in both of these capacities [7]. In drug design

applications, one molecule is typically a protein or

nucleic acid drug target—the receptor—and the other

is a potential ligand. In these applications, docking is

used to identify novel ligands that interact with a bio-

molecular target and to predict the geometric position

(binding mode) of ligands with respect to the target of

interest.

DOCK background

DOCK is one example of a family of molecular

docking packages available, which includes Glide,

FlexX, and GOLD (Table 1)[8–11]. Each of these

programs consists of two key parts: a search algorithm

and a scoring function. The search algorithm samples

both the relative orientations of the two objects as well

as their conformations. It must be thorough enough to

ensure adequate coverage of the binding free energy

landscape in order to ﬁnd the global minimum of the

scoring function. The scoring function ranks the vari-

ous geometries generated by the search algorithm,

proposing the top-scoring pose as the global minimum.

It must rapidly evaluate receptor–ligand complex sta-

bility with sufﬁcient accuracy such that the global

minimum of the scoring function agrees with experi-

mental data.

The number of degrees of freedom in recep-

tor–ligand interactions is very large, and several

approximations must be made to ensure that the

docking problem is tractable. Many different

approaches, ranging from freezing non-essential mo-

tions to the use of preferred conformations, have been

developed to reduce the number of degrees of freedom

sampled [12]. In the DOCK algorithm, for example,

the receptor is considered to be conformationally rigid,

requiring only the ligand conformational, translational

and rotational degrees of freedom to be sampled dur-

ing complex formation. This assumption is reasonable

in docking applications in which either the receptor

conformation does not change dramatically upon li-

gand binding or in which the aim is to stabilize a par-

ticular receptor conformation.

In order to guide the search for ligand orientations

with respect to the receptor, a negative image of the

active site volume is created by placing spheres on the

solvent accessible surface area of the receptor, thus

restricting the ligand orientational sampling to the

most relevant region on the surface of the receptor

[13]. To sample the internal degrees of freedom of the

ligand, DOCK uses the incremental construction

algorithm, anchor-and-grow, which separates the li-

gand ﬂexibility into two steps [14, 15], (Fig. 1). First,

the largest rigid substructure of the ligand (anchor) is

identiﬁed and rigidly oriented in the active site by

matching its heavy atoms centers to the receptor

sphere centers (orientation). The anchor orientations

are evaluated and optimized using the scoring func-

tion and the energy minimizer. The orientations are

then ranked according to their score, spatially clus-

tered by heavy atom root mean squared deviation

(RMSD), and prioritized (pruning). Next, the

remaining ﬂexible portion of the ligand is built onto

the best anchor orientations within the context of the

receptor (grow). It is assumed that the shape of the

binding site will help restrict the sampling of ligand

conformations to those that are most relevant for the

receptor geometry.

Table 1 Summary of scoring functions and sampling algorithms for commonly used docking programs

Method Ligand sampling method

Receptor sampling method

Scoring function

Solvation scoring

c,d

DOCK 4/5 IC SE MM DDD, GB, PB

FlexX/FlexE IC SE ED NA

Glide CE + MC TS MM + ED DS

GOLD GA GA MM + ED NA

Sampling methods are deﬁned as Genetic Algorithm (GA), Conformational Expansion (CE), Monte Carlo (MC), incremental

construction (IC), merged target structure ensemble (SE), torsional search (TS)

Scoring functions are deﬁned as either empirically derived (ED) or based on molecule mechanics (MM)

If the package does not accommodate this option, the symbol NA (Not Available) is used

Additional accuracy can be added to the scoring function using implicit solvent models. The most commonly used options are

distance dependent dielectric (DDD), a parameterized desolvation term (DS), generalized Born (GB) and linearized Poisson

Boltzmann (PB)

602 J Comput Aided Mol Des (2006) 20:601–619

123

In order to evaluate a large number of ligand poses

in a reasonable amount of time, approximate scoring

functions must be used. Once again, numerous solu-

tions to this problem have been proposed, including a

variety of empirical and physics-based terms [12].

DOCK uses an energy scoring function based on the

AMBER molecular mechanics force ﬁeld [14, 16].

Only the interactions between the ligand and protein

are considered, leaving only intermolecular van der

Waals (VDW) and electrostatic components in the

function. Since the receptor is considered to be rigid,

the receptor contribution to the potential energy can

be pre-calculated and stored on a grid [16]. These

approximations enable the program to evaluate large

libraries of small molecules against a receptor in a

reasonable period of time.

This paper describes a new version of the DOCK

program and explores the critical variables that con-

trol its ability to ﬁnd correct binding modes in a suite

of test problems. Our motivation is to provide a

modular docking package that permits the easy

development of new scoring functions, search algo-

rithms, and analysis tools. Thus, each functional unit

of the DOCK algorithm was implemented as a self-

contained and portable module that interacts with the

user through a well-deﬁned interface (Fig. 2). The

object-oriented language C++ was chosen to allow

each component of the DOCK algorithm to be

implemented as a class, which encapsulates both the

data structures and functions [17]. DOCK 5 incorpo-

rates several new routines, including parallelization of

the algorithm through an external library, modiﬁca-

tion of the ligand structural class to enable greater

user control over sampling, and clustering of the ﬁnal

results by root mean square deviation. The implica-

tions of these additions will be discussed in this

paper. Additional scoring functions and alternate

sampling techniques have been implemented as

well and will be discussed in future papers (http://

dock.compbio.ucsf.edu).

Previous studies have examined the scoring function

and the matching algorithm of DOCK in detail ([14]

and equations 1–6 in [16]). In this paper, we pay par-

ticular attention to the robustness of the anchor-and-

grow portion of the DOCK algorithm. We seek to

maximize the success of complex structure prediction

by independently optimizing the various steps in the

anchor-and-grow algorithm. In the process, we also

quantify and bound the errors for cases in which ﬂex-

ible docking fails and provide direction for potential

areas of improvement.

Fig. 1 The ‘‘anchor-and-grow’’ conformational search algo-

rithm. The algorithm performs the following steps: (1) DOCK

perceives the molecule’s rotatable bonds, which it uses to

identify an anchor segment and overlapping rigid layer segments.

(2) Rigid docking is used to generate multiple poses of the

anchor within the receptor. (3) The ﬁrst layer atoms are added to

each anchor pose, and multiple conformations of the layer 1

atoms are generated. An energy score within the context of the

receptor is computed for each conformation. (4) The partially

grown conformations are ranked by their score and are spatially

clustered. The least energetically favorable and spatially diverse

conformations are discarded. (5) The next rigid layer is added to

each remaining conformation, generating a new set of confor-

mations. (6) Once all layers have been added, the set of

completely grown conformations and orientations is returned

J Comput Aided Mol Des (2006) 20:601–619 603

123

Overview of test set

The validation of any software program requires

careful testing of all aspects of the algorithm and

assessment of its utility in all anticipated applications

of the software. Molecular docking is commonly used

in several modes, namely ligand binding mode pre-

diction, virtual screening, and prioritization of a set

of related compounds based on their afﬁnity. How-

ever, predicting the correct binding mode of a li-

gand–receptor complex is a requisite step for the

successful comparison of different ligands and

therefore will be the focus of this paper. It is

important to note, however, that predicting binding

orientations is not the only metric for the accuracy

and utility of docking algorithms. Optimizing DOCK

for applications, including ranking libraries of small

molecules and calculating absolute free energies

of binding, will be addressed in other papers

(http://dock.compbio.ucsf.edu).

Large-scale validation of docking algorithms was

long hampered by the lack of a large number of high

quality protein–ligand complex crystal structures.

Thanks to advances in automation in molecular biol-

ogy and crystallography, the number of structures in

the Protein Data Bank (PDB) continues to grow at a

rapid pace [18]. The developers of GOLD were ﬁrst to

test their program on a large number of available

structures [19]. Their test set was compiled using a

number of criteria to select candidate protein–ligand

complex structures. The protein must be of pharma-

cological interest and the ligands must be drug-like. In

addition, complexes were chosen that exhibited inter-

esting and unusual interactions between the ligand and

the protein. The ﬁnal set of 100 (more recently ex-

panded to 134) protein–ligand complexes has served as

the basis for other, larger test sets [11, 20–22].

More recently, the CCDC/Astex set compiled 305

protein–ligand complex structures by expanding the

original GOLD test set [22]. However, the authors

note that many of the new entries contain larger li-

gands that have more rotatable bonds, making this set

less drug-like. The crystal structures in the CCDC/

Astex set were evaluated for crystallographic errors

and inconsistencies, yielding a ‘‘clean’’ set of 224 pro-

tein–ligand complexes. To create the test set for the

DOCK validation studies, we ﬁltered out 84 complexes

with eight or more rotatable ligand bonds. In addition,

several of the complexes had properties that we felt

made them inappropriate for a validation set. These

issues included ligands that were covalently bound to

the receptor (PDB code 1ASE), ligands with missing

Fig. 2 The major DOCK 5 classes and their interconnections.

The bold arrows denote the connections between the classes that

implement the DOCK sampling algorithm. The path traced by

the arrows illustrates the sequence of operations performed upon

a ligand molecule during docking. The bold lines (without

arrowheads) denote functional connections between classes.

These connections allow one class to call functions implemented

in another. This diagram demonstrates that the classes imple-

menting the DOCK sampling methods are heavily connected to

a layer of classes that implement the physics engine: the force

ﬁeld, the scoring functions, and the energy minimizers. The thin

lines denote hierarchical relationships between a master class

and modular subclasses. These hierarchical arrangements allow

new functional classes (scoring functions, energy minimizers,

etc.) to be plugged into the existing DOCK algorithm in a

modular fashion

604 J Comput Aided Mol Des (2006) 20:601–619

123

electron density (PDB code 1EED), and known se-

quence misregistry in the receptor (PDB code 3HVT).

Ligands with vanadium that required VDW types in

which we were not completely conﬁdent were also re-

moved. The ﬁnal test set contained 114 drug-like

complexes (see Methods, Table 2).

Methods

DOCK 4 to DOCK 5 conversion

The new DOCK rigid body orienting code was written

as a direct implementation of the isomorphous sub-

graph matching method of Kuhl et al. [23]. All receptor

sphere pairs and atom center pairs are considered for

inclusion in a matching clique. This is more computa-

tionally demanding than the clique matching algorithm

implemented in previous versions of DOCK that used

a distance binning algorithm to restrict the clique

search, in which pairs of spheres and atom centers were

binned by distance. Only sphere pairs and center pairs

that were within the same distance bin were considered

as potential matches [14]. The new DOCK clique

matching implementation avoids bin boundaries that

prevent some receptor sphere and ligand atom pairs

from matching, and, as a result, it can ﬁnd good mat-

ches missed by previous versions of DOCK. The rigid

body rotation code was also corrected to avoid a sin-

gularity that occurred if the spheres in the match lay

within the same plane. Both of these changes improved

orientational sampling.

The anchor-and-grow algorithm in the new version

of DOCK was also modiﬁed to prevent premature

pruning of the growth tree. The DOCK 5 anchor-and-

grow code was completely rewritten with several dif-

ferences in the implementations between DOCK 4 and

5. The anchor-and-grow implementation in DOCK 5

ﬁxed a series of bugs that caused some branches of the

search to be pruned when they should have been pre-

served for the next round of growth. The mechanism of

minimization of partially grown conformers was also

changed to allow the entire partial conformer to move,

instead of just the latest layer, enabling more accurate

ranking and pruning of the partially grown conformers.

In addition, the simplex minimizer was re-coded

based on the original Nelder and Mead algorithm [24].

The new minimizer implementation consistently found

lower energy minima when using the same set of 1,000

ligand orientations in a receptor, indicating that it was

performing better than the previous version (data not

shown). In addition, we changed the mechanism of

minimization of partially grown ligand conformers to

allow all atoms in the partial conformer to be mini-

mized, rather than only the outermost layer of atoms.

These changes may explain why DOCK 4 performs

more poorly when run with the DOCK 5 optimized

parameters (see below).

The ﬁnal version of the new DOCK code, including

all functions described below and all bug ﬁxes, was

posted to the DOCK web site as version 5.4.0 (http://

dock.compbio.ucsf.edu). All experiments performed

with the new implementation of DOCK used this

version and will be referred to as DOCK 5 for conve-

nience. All experiments performed with the previous

version of DOCK used version 4.0.1 and will be re-

ferred to as DOCK 4.

Conversion of the DOCK codebase from C to C++

The design of the new DOCK 5 architecture balances

the speed of the code, or computational performance,

against its modularity and extensibility. The code was

developed using ANSI C++ to ensure portability across

multiple platforms [17]. The only external library used

by DOCK 5 is MPICH for parallel processing [25]. To

enable easy modiﬁcation or replacement of DOCK 5

algorithm components, the DOCK 5 class structure

was designed so that there are classes for each major

DOCK algorithm function, and these classes interface

with each other by passing instances of the DOCK 5

molecule class. Within the major functions, there are

two layers of classes: those that implement the ligand

sampling functions—rigid orienting, conformational

searching, and minimizing—and those that implement

Table 2 Complexes used in the test set (total of 114 complexes)

Protein data bank identiﬁer

1A28 1COM 1FLR 1OKL 1TYL 2MCP

1A6W 1COY 1HAK 1PBD 1UKZ 2PCP

1A9U 1CPS 1HDC 1PDZ 1ULB 2PHH

1ABE 1D3H 1HSL 1PHD 1WAP 2PK4

1ABF 1D4P 1HYT 1PHG 1XID 2TMN

1ACJ 1DBB 1IMB 1PTV 1XIE 2YPI

1ACM 1DBJ 1IVB 1QCF 1YDR 3CPA

1ACO 1DG5 1LAH 1QPE 2AAD 3ERD

1AI5 1DID 1LCP 1QPQ 2ACK 3GPB

1AOE 1DOG 1LDM 1RNT 2ADA 3HVT

1AQW 1DR1 1LST 1ROB 2AK3 4AAH

1AZM 1DWB 1LYL 1RT2 2CHT 4COX

1BYG 1EBG 1MDR 1SNC 2CMD 4CTS

1C5C 1ETT 1MLD 1SRJ 2CPP 4FBP

1C5X 1F0R 1MRG 1TDB 2CTC 4LBD

1C83 1F0S 1MRK 1TNG 2DBL 5ABP

1CBX 1F3D 1MUP 1TNH 2GBP 5CPP

1CIL 1FGI 1NGP 1TNI 2H4N 6RNT

1CKP 1FKI 1NIS 1TNL 2LGS 7TIM

J Comput Aided Mol Des (2006) 20:601–619 605

123

the underlying physics engine—the force ﬁeld deﬁni-

tions and the scoring functions. The sampling classes

are applied sequentially to the ligand molecule; the

physics engine classes are utilized by the sampling

classes to score the ligand–receptor interaction after

each step.

As a speciﬁc example of modularity, the DOCK 5

scoring functions are implemented as a master score

class with ﬁve scoring function subclasses. The master

score class acts as an interface to the scoring subclasses,

enabling the user to designate primary and secondary

scoring functions at runtime. This design was chosen

because the individual scoring functions were best

implemented as individual classes; they each require

different input and use different internal data struc-

tures. While they could have been implemented into

one large scoring class, the result would have been

quite large and disjoint. This solution was also applied

to the ligand conformational search, energy minimi-

zation and post-docking analysis classes.

The DOCK 5 molecule class was designed to con-

tain the minimum information required to specify a

three-dimensional ligand conformation (atom coordi-

nates, bond connectivity, atom partial charges, atom

types and bond types) to minimize the memory re-

quired to store a molecule, allowing large arrays of

molecules to be stored in RAM. Standard C-style ar-

rays were used to store the molecular data to maximize

the speed of accessing this information.

Test set preparation

The proteins and ligands were extracted from the PDB

ﬁles, which were downloaded from the PDB website

(www.rcsb.org, Table 2). The ligands were assigned

atom types and bond types manually, and hydrogens

were added using Sybyl [26]. Subsequently, AM1-BCC

partial electrostatic charges were calculated using the

Antechamber package distributed with Amber 8 [27,

28]. The number of rotatable bonds of each of the li-

gands was measured using DOCK, and ligands

with > 7 rotatable bonds were eliminated from the test

set. We choose seven or fewer bonds to give a rea-

sonable representation of DOCK’s performance using

compounds similar to those of most interest in drug

discovery [29–31]. The ﬁnal test set that was used

consisted of 114 non-covalent protein–ligand com-

plexes [32] (Table 2).

For the proteins, we removed all waters, covalently

linked sugars, sulfates, and halogens that were not

part of the ligand. Co-factors, such as heme, ATP,

and NADPH, were kept, atom and bond types were

assigned manually, and Gasteiger–Hu

ckel partial

electrostatic charges were calculated using the

‘‘Compute’’ module in Sybyl [26, 33, 34]. Ions, such as

calcium and zinc, were considered to be part of the

protein and the correct charge was assigned manually.

Different VDW parameters for zinc were used

depending on the coordination state of the zinc atom

in the protein–ligand complex (Table 3). Hydrogens

were added to the protein residues using the ‘‘Bio-

polymer’’ module in Sybyl, as were AMBER partial

charges and VDW parameters [26, 37]. No additional

optimization of the protein structure was carried out

at this point.

The GRID accessory program of DOCK was used

to pre-calculate scoring function potential grids [16].

All parameters were set to default parameters, except

for the ‘‘energy_cutoff_distance,’’ which was set to

9,999, resulting in the inclusion of all protein atoms in

the energy calculation. For matching, the dms program

was used to generate a molecular surface for each

receptor [38]. The SPHGEN accessory program of

DOCK was used to create a negative image of the

surface using spheres [39, 40]. For the purpose of this

validation study, a general procedure was established

to generate a sphere cluster for every protein in the test

set. In this procedure, we select all the spheres found

within 10 A

of any ligand atom. The receptor box

delimiting the active side was calculated with the

accessory program SHOWBOX using the sphere set

with an additional 5 A

boundary. We have explored

additional box sizes ranging from 1 A

to 9 A

padding

and found that there is little sensitivity to the exact

padding amount (i.e. success rate for rigid ligand

docking of 80 ± 1%, time increase 10% with padding

size increase, and an average test set energy of -

50 ± 0.1 DOCK units). The ﬁnal procedure creates

sphere sets with an average of 101 docking spheres and

boxes of ~20 A

. These receptor sphere sets are larger

than what one would typically use in most docking

applications. This adds stringency to our testing of

DOCK 5 by increasing the orientational and transla-

tional space that it must search.

Table 3 Zinc VDW parameters used to generate grids

Tetra-coordinated Zinc

Radius 1.700 A

Well depth 0.067 kcal/mol

Penta-coordinated Zinc

Radius 1.100 A

Well depth 0.0125 kcal/mol

Parameters used for receptors with tetra coordinated zinc ions

[35]

Parameters used for receptors with penta coordinated zinc ions

[36]

606 J Comput Aided Mol Des (2006) 20:601–619

123

Optimized hydrogen locations for test set receptors

To assess the effect of hydrogen placements on dock-

ing outcomes, we also optimized the hydrogen atom

placement and hydrogen-bonding network for the

receptor using the ‘‘Dock Prep’’ module in Chimera

[41]. In this module, the hybridization states of the

non-hydrogen atoms of a PDB structure are deter-

mined by an enhanced version of the IDATM atom-

typing algorithm [42]. Then, all hydrogens that can be

unambiguously positioned are added to the ﬁle. To

assist in positioning ambiguous hydrogens, hydrogen-

bonding interactions are examined. The deﬁnitions of

hydrogen-bonding donors and acceptors as well as

hydrogen-bonding angle and distance criteria are based

on the values found in Mills and Dean [43]. Relevant

hydrogen bonds (H-bonds) are examined from shortest

to longest, with satisfaction of shorter bonds having

priority. For H-bonds where it is unclear which end is

acting as the donor (e.g. water–water), use of that bond

is postponed until either end is resolved further,

though any lower-priority bonds that conﬂict geomet-

rically with the postponed bond are eliminated from

consideration at that time. If neither end is resolved by

other interactions, the ambiguity is decided arbitrarily.

Should examination of H-bond interactions not com-

pletely determine the positions of all of the hydrogens

bound to a heavy atom, they are positioned to ﬁrst

satisfy potential H-bond interactions, then any

remaining hydrogens are positioned to avoid steric

clashes with other atoms. For histidine residues, nor-

mally one nitrogen will be protonated (chosen based

on H-bond/steric considerations); however if both ring

nitrogens are H-bond donors, they will both be pro-

tonated.

Selection of active site waters

All waters within 3 A

RMSD of any ligand heavy atom

were selected. These waters were included as part of

the receptor. The new receptor–water complexes were

then subjected to the same hydrogen bonding optimi-

zation as above.

DOCK parameter optimization

To characterize the performance of DOCK 5 in

regenerating known complex structures, we explored

the optimum parameters for use with rigid and ﬂexible

ligand docking strategies (see Appendix 1). Unless

otherwise stated, all docking experiments were carried

out on 2.2 GHz dual processor Opteron 828s running

Linux Fedora Core 3. The code was compiled using

open-source GNU compilers (http://www.gnu.org).

The optimized parameters have been implemented as

the defaults. We note that our primary criterion for

optimization was success in ﬁnding the proper ligand

geometry and not the CPU time required per com-

pound. Unless otherwise stated, these parameters were

used for all experiments in this paper.

Greedy clustering of conformational ensemble

The greedy clustering algorithm is designed to elimi-

nate redundant ligand orientations from consideration.

DOCK generates a set of ligand orientations that are

ranked by the scoring function. The RMSD between

each ligand orientation in the list is calculated. If the

RMSD between two ligand orientations falls within the

clustering threshold, the second orientation is assigned

to a cluster with the ﬁrst. The ﬁrst ligand orientation is

selected and compared to all subsequent unclustered

orientations in the list; this process is repeated until the

last unclustered orientation has been selected. Once

the entire list has been processed, only the best scoring

ligand pose in each cluster, designated as the cluster

head, is retained.

Evaluation of MPI functionality

Parallel processing is fully integrated into the DOCK

calculation. The DOCK program starts a single master

node and a set of processing nodes. The master node

performs ﬁle processing and molecule input/output,

whereas the processing nodes perform the actual

docking calculations. If the number of processors is set

to 1, the code defaults to non-MPI behavior. As a re-

sult of this conﬁguration, there will be minimal differ-

ence in performance between 1 and 2 processors.

Improved performance will only become evident with

more than two nodes. It should be emphasized that the

primary beneﬁt in using DOCK 5 in parallel mode is to

reduce bookkeeping tasks associated with manually

splitting up a database into multiple chunks, which

then must be submitted to different processors indi-

vidually. DOCK 5 automatically partitions out subsets

of a database to various nodes, collates and ranks the

ﬁnal results, and takes care of all intermediate book-

keeping.

To gauge the performance of parallelization of the

DOCK 5 algorithm, two small subsets of the NCI

database from the ZINC database were constructed

[25, 44]. The two subsets, one containing 500 and the

other 1,000 small molecules, were ﬁltered to have £5

and £14 rotatable bonds, respectively. The receptor

used as a target for this study was HIV-1 reverse

J Comput Aided Mol Des (2006) 20:601–619 607

123

transcriptase in complex with nevirapine (PDB code

1VRT). Because the receptor was not part of the test

set, nevirapine was ﬂexibly redocked using the opti-

mized parameters, which yielded a ligand orientation

0.28 A

RMSD from the crystal structure orientation.

In addition, a library consisting of 1,000 copies of ne-

viripine was generated to remove dependence on the

order and size of the compound library. All parallel-

ization study calculations were executed at the Com-

putational Science Center at Brookhaven National

Laboratory (http://www.bnl.gov/csc) on a cluster con-

sisting of 34 nodes with dual 3.2 GHz Xeon processors

running Linux. Tests were performed using between 2

and 68 nodes. The code was compiled using open-

source GNU compilers and MPI software mpich ver-

sion 1.2.7 from Argonne National Laboratory (http://

wwwunix.mcs.anl.gov/mpi/mpich).

Results

We ﬁrst consider the results of rigidly docking ligands,

which used a conformation taken directly from the

complex crystal structure, to the complex crystal

structure conformation of the receptor. We then pres-

ent the results of ﬂexible ligand docking tests. In each

case, we consider (a) the overall performance of each

sampling algorithm, (b) the ability of each algorithm to

reproduce the crystal ligand orientation as the top-

scoring pose, (c) the effect of the initial ligand con-

formation on the performance of the algorithm, (d) any

additional information contained in the set of all

sampled ligand orientations, and (e) the ability to ex-

tract additional information by clustering docking re-

sults. We also compare the performance of DOCK 5 to

equivalent DOCK 4 experiments. Finally, we analyze

the cases in which DOCK 5 fails to reproduce the

crystal structure and propose some directions for

improvement of both the DOCK algorithm and our

test set preparation method.

Rigid ligand docking

Overall performance

Unless otherwise noted, all experiments described in

this section involved rigid docking of the complex

crystal structure ligand conformation to the receptor

complex crystal structure. For each case in the test set,

the heavy atom RMSD between the top-scoring

docked ligand pose and the complex crystal structure

ligand pose was evaluated. A DOCK 5 run was con-

sidered to be successful for cases in which the RMSD

between for the top-scoring ligand orientation and the

crystal ligand orientation was less than 2.0 A

.DOCK5

selects the correct pose as the lowest energy structure

for 79% (90/114) of the test cases using the rigid

docking protocol with an average time of 55 s per

complex.

Dependence on ligand conformation

An ensemble of ligand conformations was generated

using the anchor-and-grow algorithm to apply changes

of each of the ligand’s rotatable bonds. This expan-

sion generated a conformation ensemble for each li-

gand that covered all torsional parameters that

DOCK samples. Each generated conformation was

rigidly docked to the receptor, and the results from all

the dockings were binned according to the magnitude

of the ligand’s conformational perturbation (Fig. 3a).

The curve shows dramatic and continual decrease in

the success rate as the perturbation magnitude in-

creases with little success for any ligand conforma-

tions greater than 0.5 A

heavy atom RMSD away

from the crystal conformation. Therefore, any con-

formation generation method must generate ligand

conformations within 0.5 A

heavy atom RMSD of the

crystal conformation for rigid docking to have a rea-

sonable chance to succeed.

Analysis of total orientational ensemble

To this point, we have disregarded ‘‘near misses,’’

which we deﬁne as any generated orientations within

RMSD from the crystal structure that are close to

the top of the ranked conformation list, but are not the

best scoring poses. We can examine the remaining

poses either by including all poses that differ by a ﬁxed

energy unit from the most favorable geometry or by

including those that differ by a ﬁxed number of ranked

poses from the most favorable energy. In order to

quantify the extent of these partial successes, all gen-

erated ligand poses for each test case were preserved

and sorted by their energy scores.

An energy gap is deﬁned as the difference between

the DOCK score of the top scoring ligand orientation

and the score of a ligand ranked further down the list.

Considering all docked ligand orientations with an

energy gap of 2.5 DOCK units—an average of ﬁve li-

gand orientations—increases the rigid ligand docking

success rate to 90% for the entire test set, while an

average of 50 orientations increase the rigid docking

success rate to 99% (Fig. 4a, b). These results indicate

that the orienting method samples near-crystal ligand

orientations well, but the current energy scoring func-

608 J Comput Aided Mol Des (2006) 20:601–619

123

tion cannot discriminate well between the top-ranked

orientations.

Geometric clustering of poses

Each ligand conformational ensemble was spatially

clustered according to inter-pose RMSD values (see

Methods section for algorithm details). After examin-

ing a range of potential cut-offs, an optimal value of

1.0 A

was chosen (Fig. 5). Using this clustering

threshold, only 15 clusterheads are required to achieve

a success rate of 99%, compared with the top 50

ranked unclustered orientations. This result is encour-

aging, suggesting that the clustering helps sort through

the conformers efﬁciently.

Flexible ligand docking

Overall performance

Unless otherwise noted, all experiments described in

this section involved ﬂexible docking of the ligand to

the receptor complex crystal structure. As with the ri-

gid docking tests, the heavy atom RMSD between the

a) b)

Fig. 3 (a) Rigid docking

success rates (n)—as

calculated by any

conformation being within

heavy atom RMSD of the

complex crystal

orientation—shown as a

function of the ligand internal

conformation perturbation

magnitude (RMSD). (b)

Flexible growth success rates

(S)—as calculated by any

conformation being within

heavy atom RMSD of the

complex crystal

orientation—shown as a

function of the magnitude of

the anchor perturbation

(RMSD)

a) b)

Fig. 4 (a) The rigid (n) and

ﬂexible (S) docking success

rate as a function of the

DOCK score energy gap

(kcal/mol) for all conformers

generated. (b) The rigid and

ﬂexible docking success rate

as a function of the number of

ranked conformers examined

J Comput Aided Mol Des (2006) 20:601–619 609

123

top-scoring docked ligand pose and the complex crystal

structure ligand pose was evaluated for each complex

in the test set. The success rate over the entire test set

using the optimized ﬂexible ligand anchor-and-grow

protocol was 72% (82/114) with an average time of

314 s per complex.

Dependence on anchor position

The anchor-and-grow algorithm belongs to the set of

incremental construction algorithms for searching li-

gand conformational space [14, 15]. It uses a rigid

docking step for the ‘‘anchors’’ to identify likely anchor

positions (anchor orienting), and a torsion angle search

step to generate ligand conformations rooted at the

previously identiﬁed anchor positions (ﬂexible growth).

In order for ﬂexible docking to succeed, both of these

individual steps must be successful.

To measure the dependence of success rate on the

precision of the anchor location, the crystal position of

the anchor for each complex in the test set was per-

turbed randomly from 0 A

to more than 10 A

. Each

perturbed anchor position was then considered as the

starting point for ﬂexible growth (Fig. 3b). With the

anchor starting less than 0.5 A

heavy atom RMSD

from the crystal orientation, the growth algorithm can

ﬁnd the experimental orientation 99% of the time.

However, the results demonstrate a rapid decrease in

success rate as the anchor is moved further away from

its crystal structure position, decreasing to 76% at

1.0 A

perturbation down to 54% at 2.0 A

. These data

imply that if the ﬂexible ligand docking algorithm can

place the anchor within 0.5 A

heavy atom RMSD of

the crystal anchor position, DOCK 5 has a very high

probability of successfully predicting the full binding

mode correctly.

Analysis of total conformational ensemble

We examined the entire ensemble of conformers gen-

erated by ﬂexible docking, as we described previously

in the rigid ligand docking analysis. Considering all

docked ligand conformations with a 2.5 DOCK unit

energy gap—an average of ﬁve ligand orienta-

tions—increases the success rate to 82%, while an

average of 100 orientations increasing the success rate

to 95% (Fig. 4a, b). Again, these results indicate that

the sampling density produced by the optimized

parameters is quite high, but there is little discrimina-

tion between very similar poses by the current scoring

function.

Geometric clustering of poses

As with the rigid ligand docking tests, each confor-

mational ensemble was spatially clustered according to

interpose RMSD (see Methods section for algorithm

details). A clustering threshold of 1.0 A

, as determined

in the rigid docking section, was used (Fig. 5). Using

this clustering threshold, only 50 clusterheads must be

examined to reach a success rate of 95% as compared

to 100 purely ranked orientations. Once again, this

result is encouraging, as it requires a small number of

ligand poses to be retained for rescoring with more

a) b)

Fig. 5 The rigid (ﬁlled) and

ﬂexible (open) docking

success rate as a function of

the number of cluster heads

examined. Clusters with

heavy atom RMSD cutoffs of

1.0 A

(

d), 3.0 A

(m), and

5.0 A

(r) were compared

610 J Comput Aided Mol Des (2006) 20:601–619

123

advanced scoring functions that are better at discrimi-

nating between very similar ligand poses.

Comparison to DOCK 4

Using the optimized DOCK 5 parameters, we per-

formed the same rigid and ﬂexible ligand docking

experiments on the entire test set using the last avail-

able version of DOCK 4. The performance of the

current implementation of DOCK 5 compared favor-

ably with the DOCK 4 performance (Table 4). We

attribute the improved accuracy in performance to

improvements outlined in the Methods Section. How-

ever, when comparing the speed of docking experi-

ments between DOCK 4 and DOCK 5, DOCK 4 is

ﬁvefold faster for rigid docking and 30-fold faster for

ﬂexible ligand docking than DOCK 5 (Table 5). We

attribute this increased calculation time to extra stages

of minimization and sampling in DOCK 5, as well as

additional overhead necessary to preserve the modu-

larity of the code (see Methods).

Comparison to other docking methods

Developers of Glide, GOLD and FlexX have also

evaluated their methods using similar test sets and

made some of their analyses available [9, 45, 46]. Based

on this data, we note that DOCK’s ﬂexible docking

success rate of 70% is comparable to Glide’s and

FlexX’s success rates of 82% and 61%, respectively

(Table 6). Unfortunately, GOLD has not posted the

results for the entire CCDC/Astex test set, so a com-

plete comparison could not be made. However, for the

subset of the test set they did report, DOCK’s success

rate of 67% is once again reasonable as compared to the

success rate of 77% for GOLD, considering that the

DOCK scoring function does not use either empirically

weighted parameters or adjustable parameters.

Analysis of successes and failures of docking

protocols

Docking failures can be categorized into two catego-

ries: sampling (soft) and scoring (hard) failures [47].

For scoring failures, an orientation near the crystal

structure was sampled in the course of the DOCK run,

but the scoring function failed to rank it at the top of

the list. A sampling failure indicates that the DOCK

run failed to sample any orientations within 2 A

RMSD of the crystal structure. The major caveat of

this classiﬁcation scheme is the assumption that the

model of both the receptor and the ligand, including

the VDW parameters, electrostatics, and hydrogen

orientations and protonation states, reﬂect those that

occur in the experimental structure [48]. Here, we

analyze the ﬂexible docking ligand failures within the

sampling-scoring classiﬁcation scheme.

Failures resulting from receptor modeling/structural

problems

The original CCDC/Astex test set was ﬁltered for

experimental errors using a variety of metrics [22]. We

plotted the ﬂexible ligand success rate as a function of

various metrics of the quality of the X-ray structures to

determine if the selection criteria were appropriate for

testing the DOCK algorithm (Fig. 6). There appears to

be at best a weak correlation between the RMSD of

the best scoring DOCK pose and either crystal reso-

lution or b-factor of active site or backbone atoms,

indicating that the cut-offs chosen for the original set

were reasonable for docking purposes.

We next explored whether speciﬁc atom types caused

problems with the DOCK force ﬁeld terms by corre-

lating the test set success rate with the presence and type

of active site cofactor (Table 7). The only clear problem

involved metal ions in the receptor. These structures

showed a much lower success rate, accounting for nearly

half of both the rigid and ﬂexible ligand docking failures.

However, there still are a number of failures in the

portion of the test set without cofactors in the active site

that require further characterization. Unless otherwise

mentioned, all studies below were performed on this

subset, referred to as the Cofactor Free (CF) subset.

Table 4 Success based on DOCK version (see Methods)

DOCK version Rigid ligand Flexible ligand

4.0.1 71.9% 42.1%

5.4.0 79.0% 71.9%

Table 5 Average length of time in seconds for docking calcu-

lation using the optimized parameter set (see Appendix 1)

Average Minimum Maximum

DOCK 4 rigid lig 10.9 ± 12.1 0.99 66.8

DOCK 4 ﬂexible lig 7.1 ± 6.04 0.44 33.5

DOCK 5 rigid lig 55.4 ± 37.5 6.0 198.0

DOCK 5 ﬂexible lig 314.7 ± 449.8 2.0 2638.0

Table 6 Comparison of DOCK success rates to other docking

programs for ﬂexible ligand docking

Program No. of complexes Success DOCK success

GOLD 43 77% 67%

Glide 71 82% 70%

FlexX 71 61% 70%

J Comput Aided Mol Des (2006) 20:601–619 611

123

For all members of the test set, the experimental

resolution of the crystal structures was too poor to

identify hydrogen atom locations. We originally mod-

eled the hydrogen atom positions using a rule-based

method. To test this scheme, we applied a more ad-

vanced hydrogen addition procedure that accounted

for steric clashes and hydrogen-bonding networks to

the CF subset (see Methods). As a follow-up, we as-

sumed all crystallographically bound waters found

within 3 A

of any ligand heavy atom were critical for

binding and included them in the receptor model as

well. We found that both of these procedures improved

the ﬂexible ligand docking success rate (Table 8).

Failures resulting from ligand ﬂexibility

In addition to the selection criteria imposed on the

original test set, we also ﬁltered out complexes in

which the ligand had greater than seven rotatable

bonds (see Methods). We reexamined this choice on

the CF subset by plotting the rigid and ﬂexible ligand

docking success rate as a function of the number of

ﬂexible bonds (Fig. 7). As expected, the results show a

decrease in the success rate with increasing ligand size,

but with no dramatic drop-off.

Sampling versus scoring failures

We now return to classiﬁcation of DOCK failures

based on scoring and sampling classiﬁcations [47].

First, we examined the test set failure cases with active

site cofactors (Table 9). Within this set, nine examples

were scoring failures for both rigid and ﬂexible ligand

docking, indicating that new VDW and electrostatic

parameters need to be developed for magnesium,

heme groups, and some coordination states of zinc. In

addition, there were three ﬂexible ligand scoring fail-

ures that were rigid successes, thus suggesting that the

ﬂexible algorithm was able to identify additional ori-

entations with better scores than the experimental li-

gand orientation. Only two ﬂexible ligand docking

cases were sampling failures. We expected ﬂexible li-

gand docking sampling failures due to the increased

ligand degrees of freedom compared with rigid ligand

docking, but it does not appear to be a severe problem

in this test set containing ligands with less than eight

Fig. 6 Correlation of ﬂexible

ligand success (ﬁlled) and

failure (striped) rates with

crystallographic resolution

) and experimental

B-factor (A

). For active site

B-factors, the active site was

deﬁned as any atom within

of the experimental

ligand orientation

Table 7 Success as function of active site cofactor

Total count Rigid

success

Flexible

success

Entire test set 114 79.0% 71.9%

CF subset 76 81.6% 76.3%

Active site cofactor 38 73.7% 63.2%

Active site metal cofactor 28 64.3% 50.0%

Table 8 Flexible ligand success as function of CF test set

preparation (total of 76 complexes)

Test set preparation technique Success

Standard 76.3%

Hydrogen optimization 78.9%

Active site waters + hydrogen optimization 80.3%

612 J Comput Aided Mol Des (2006) 20:601–619

123

rotatable bonds. Finally, one of the rigid ligand dock-

ing scoring failures was a ﬂexible ligand success. In this

case, there was a large VDW clash between one of the

ligand atoms and the receptor. The anchor-and-grow

algorithm was able to build the ligand in the active site

to avoid this clash, which the rigid ligand docking

algorithm could not accommodate.

We repeated this analysis with the CF subset

(Table 10). Here, there was one rigid ligand docking

sampling failure, which also failed for ﬂexible ligand

docking. Upon closer examination of the receptor site,

a residue making critical interactions with the ligand

was not resolved in the experimental complex structure

(PDB code 1A6W). We anticipate that there may not

be enough contacts to correctly place the molecule.

Seven examples were scoring failures for both rigid and

ﬂexible ligand docking. In this subset, though, we

cannot attribute the failure to unusual atom types,

indicating that the scoring function is incorrectly

modeling some portion of the energy landscape. There

were also seven scoring failures for ﬂexible ligand

docking that were successes for rigid ligand docking,

once again suggesting that the ﬂexible docking algo-

rithm identiﬁed additional orientations that scored

better than the experimental orientation.

As in the cofactor set above, there were only three

additional ﬂexible ligand docking sampling failures.

One of these was also a scoring failure in rigid ligand

docking, implying that this failure case may actually be

due to a combination of both sampling and scoring

factors. The remaining two ﬂexible ligand docking

sampling failures once again indicate that the ﬂexible

algorithm was able to identify alternative orientations

that scored better than the crystal complex orientation.

Finally, ﬁve rigid ligand docking scoring failures were

ﬂexible ligand dockings successes, signifying that the

Fig. 7 Rigid and ﬂexible

docking success (ﬁlled) and

failure (striped) rates as a

function of the number of

rotatable bonds in each ligand

in CF test set

Table 9 Comparison of success and failure cases of both rigid

and ﬂexible docking for complexes in test set with cofactors in

active site (total of 36 complexes)

Rigid

sampling

failure

Rigid

scoring

failure

Rigid

success

Flexible sampling failure 0 0 2

Flexible scoring failure 0 9 3

Flexible success 0 1 23

Table 10 Comparison of success and failure cases of both rigid

and ﬂexible docking for complexes in CF subset (total of 76

complexes)

Rigid

sampling

failure

Rigid

scoring

failure

Rigid

success

Flexible sampling failure 1 1 2

Flexible scoring failure 0 7 7

Flexible success 0 5 53

J Comput Aided Mol Des (2006) 20:601–619 613

123

ﬂexible ligand docking algorithm is able to compensate

for intermolecular clashes in the active site of the

experimental structure that the rigid ligand algorithm

simple cannot accommodate (data not shown).

Analysis of DOCK score for docking protocols

To analyze the ability of DOCK to reproduce the li-

gand–receptor interaction energy as measured by the

DOCK scoring function, we plotted the score from the

top-ranking pose for both rigid and ﬂexible ligand

docking that were successful against the DOCK score

of the complex crystal structure (Fig. 8a, b). Each

crystal structure ligand was minimized with 1,000 steps

of the DOCK simplex minimizer. The signiﬁcant fea-

ture of both plots is that the docked pose generally

scores more favorably than the minimized crystal

structure. When rigid ligand docking is compared with

ﬂexible ligand docking, the ﬂexibly docked ligand

conformations almost always have a lower score

(Fig. 8c). These results indicate that increasing the

amount of ligand orientational and conformational

sampling increasingly identiﬁes deeper wells in the

binding energy landscape. When we plotted the ﬂexible

ligand success rate against the minimized crystal score,

there was little correlation, though DOCK was ob-

served to perform better using crystal structures with

scores more negative than –20 DOCK units (Fig. 8d).

This lack of correlation indicates that, while having a

negative interaction energy for the crystal structure

will increase the probability of DOCK ﬁnding the

correct binding orientation, this metric is not a good

predictive indicator of DOCKing success.

Database docking using MPI

Substantial speedup is observed for up to about 14

processors for the 500 compound library and 18 pro-

cessors for the 1,000 compound library (Fig. 9). Inter-

estingly, the library with 1,000 copies of neviripine

shows almost perfectly parallel behavior up to 68

processors. We hypothesize that the speedup for the

heterogeneous libraries will continue to approach ideal

as larger libraries with increased numbers of rotatable

bonds are used, but will never be completely linear due

to overhead from input and output and lag resulting

from communication between the nodes.

Discussion

In this paper we have described a new version of the

DOCK program. Our main purpose was to develop

modular code that was straightforward to modify and

c)a)

b) d)

Fig. 8 (a) Successful rigid

ligand docking scores (kcal/

mol) as a function of

minimized crystal structure

ligand scores (kcal/mol),

(b) Successful ﬂexible ligand

docking scores (kcal/mol) as a

function of minimized crystal

structure ligand scores (kcal/

mol), (c) Successful ﬂexible

ligand docking energy scores

(kcal/mol) as a function of

successful rigid ligand

docking energy scores (kcal/

mol), (d) Comparison of the

RMSD between all top

ranked ﬂexible ligand

orientations and the

minimized crystal ligand

orientations to the minimized

crystal interaction energy as

measured by the DOCK score

(kcal/mol)

614 J Comput Aided Mol Des (2006) 20:601–619

123

which showed improved performance over the old

version. By using an object-oriented language for

DOCK 5, we were able to accomplish this goal, and we

demonstrate, here, how routines such as the simplex

minimizer and the clustering algorithm can be added or

replaced without changes in other parts of the pro-

gram. The successful parallelization of the calculation

and the addition of post-processing clustering were

simple but useful modiﬁcations to the algorithm, which

encourages further investigations and algorithm

experimentation.

The performance of DOCK 5 on a curated test set of

114 protein–ligand complexes proved to be superior to

DOCK 4, with an over-all success rate of 79% for rigid

ligand docking and 72% for ﬂexible ligand docking,

compared with 72 and 42%, respectively for DOCK 4.

We ascribe the improvements to signiﬁcant changes in

the ﬂexible search sampling and pruning procedures

and to code corrections. The difference in performance

of DOCK 5 for rigid and ﬂexible docking is relatively

modest (79% vs. 72%) even though the search for

ﬂexible ligands includes both conﬁgurational and con-

formational spaces. Using the receptor structure to

prune the conformational search tree is clearly a rea-

sonably efﬁcient procedure. Although, the DOCK 5

code takes longer on average to run a calculation than

DOCK 4, we feel this drawback is balanced by the

improved results and the modularity of DOCK 5. Ef-

forts to increase throughput are underway.

We also wish to stress the importance of having a

high quality test set for evaluation of docking pro-

grams. X-ray crystallography typically provides essen-

tial but incomplete data for the calculations we wish to

carry out. For example, in the majority of cases,

hydrogen positions must be determined. In other cases,

critical water molecules must be placed and some

residues need to be modeled where experimental data

is lacking. The ligand conformations may also contain

signiﬁcant uncertainties. Finally, we must be aware of

the inherent assumptions underlying the force ﬁeld

parameters used in the molecular modeling steps. All

of these considerations speak to the need for careful

inspection of test set complexes. Our results demon-

strate this issue: the success rate for reconstitution of

the complex geometries was shown to depend on the

nature of the cofactors, the optimization of hydrogen

placements, and the inclusion of critical waters.

The primary result that emerges from the analysis of

the docking failures is that the current force ﬁeld re-

quires improvement, particularly in the treatment of

metal-containing cofactors. We also note that binding

conformations and conﬁgurations are determined by the

free energy of the system while we are only, at best,

estimating the enthalpy. Finally, we do identify a few

situations in ﬂexible ligand docking where the confor-

mational sampling is insufﬁcient. A test set with ligands

containing more than seven rotatable bonds would,

presumably, show an increase in these sampling failures.

We hypothesize that the key weakness is the pruning

algorithm, which we will explore in future studies.

What are the routes to improvement? An obvious

starting point is the use of more accurate methods for

preparing experimental structures, including tools for

accurate pK

prediction and de novo identiﬁcation of

critical waters. For the docking calculation itself, it

would be helpful to improve VDW and electrostatic

parameters for all atoms heavier than oxygen, partic-

ularly for metal atoms. Ideally, one would directly in-

clude charge polarization and ligation geometry in the

force ﬁeld. In addition, modiﬁcations to the force ﬁeld

to better approximate the free energy—e.g. general-

ized Born or Poisson Boltzmann implicit solvation

electrostatics with surface area corrections to account

for the hydrophobic effect—would also improve mod-

eling accuracy. The DOCK 5 platform is positioned to

enable future developments and work is underway to

incorporate them into future releases.

Conclusions

In this study, we have evaluated a new version of

DOCK. We have found that it predicts binding

geometries of a structurally diverse test set comparably

Fig. 9 Speedup (calculated as length of time for calculation on a

single processor/length of time for calculation on n processors)

for docking a library of 500 different small molecules (

s), 1,000

different small molecules (M), and 1,000 copies of nevaripine (S)

using ﬂexible ligand docking as a function of the number of

processors in MPI mode. A perfectly parallel calculation (–) is

plotted for comparison

J Comput Aided Mol Des (2006) 20:601–619 615

123

to similar algorithms and better than the previous

version of DOCK. Simultaneously, we have thoroughly

explored the sampling portions of the algorithm and

found that the majority of binding pose prediction

failures is a result of scoring function deﬁciencies. In

further exploration of these failures, we have deter-

mined that the docking success seems to be a function

of whether there are alternative orientations that score

well—as deﬁned by the scoring function—rather than

the interaction energy of the experimental structure

itself. Finally, we have implemented new functional-

ities and shown that they improve the success rates of

both rigid and ﬂexible ligand docking. In general, we

have a new tool that not only performs well on a typical

test set but is an ideal tool to explore any number of

new algorithms in the context of the molecular docking

problem.

Acknowledgements Gratitude is expressed to Dr. Bentley

Strockbine and Sudipto Mukherjee for computational assistance

with MPI calculations. Demetri Moustakas, Natasja Brooijmans,

P. Therese Lang and Irwin D. Kuntz would like to thank the NIH

grant GM 56531 (Paul Ortiz de Montellano, PI) for support. P.

Therese Lang would also like to thank the Burroughs Welcome

Foundation and the American Foundation for Pharmaceutical

Education for additional support. The authors would like to

thank Scott Brozell, Mathew Jacobson, and Brian Shoichet and

members of his group for helpful conversations.

Appendix 1

Rigid docking parameter optimization

The parameters listed in Appendix 1 control the

sampling of ligand poses within the receptor active site

during rigid ligand docking. The parameters that con-

trol the step sizes for the simplex minimizer

(simplex_trans_step, simplex_rot_step, and sim-

plex_tors_step) were optimized in a previous study and

were held at those values [14, 49]. For the remaining

parameters—the number of orientations (max_orien-

tations) and the number of minimization steps (sim-

plex_ﬁnal_max_iterations)—a series of rigid ligand

docking experiments were performed to optimize the

DOCK score for the top ranking pose averaged over

the entire test set and the success rate, deﬁned as the

orientation of the top ranking pose being within 2 A

heavy atom RMSD from the crystal ligand. The success

rate and DOCK scores initially improved as the num-

ber of orientations and the amount of minimization

increased and then converged (Fig. 10). We selected

the lowest converged values—1,000 orientations and

1,000 minimization steps—as optimal.

Flexible docking parameter optimization

For the more complex ﬂexible ligand algorithm, the

parameter optimization was performed ﬁrst on the

anchor docking, and the best parameters were then

used for optimizing the growth. The parameters that

control the sampling in both these steps are listed in

Appendix 2. As for rigid ligand docking, the

parameters that control step sizes for the simplex

minimizer were set to the previously deﬁned optimal

values.

Fig. 10 Optimization of parameters for rigid ligand docking.

Parameters of 50 (h), 100 (s), 1,000 (O), and 10,000 (.)

minimization steps (simplex_ﬁnal_max_iterations) are examined

as a function of the number of orientations (max_orientations)

Appendix 1 Description of and optimized default values for parameters that affect rigid ligand docking

Parameter name Parameter description Value

max_orientations The number of ligand poses sampled by the rigid orienting algorithm 1,000

simplex_score_converge The score threshold used to determine simplex convergence 0.1

simplex_trans_step The maximum initial translation step size for the simplex minimizer 1.0 A

simplex_rot_step The maximum initial rotational euler angle step size for the simplex minimizer 0.1 radian

simplex_tors_step The maximum initial dihedral angle step size 10

simplex_ﬁnal_max_iterations The maximum number of simplex iterations 1,000

616 J Comput Aided Mol Des (2006) 20:601–619

123

The ﬁrst step in the anchor-and-grow algorithm is

ring identiﬁcation or anchor segmentation. All bonds

within molecular rings are treated as rigid. This clas-

siﬁcation scheme is a ﬁrst-order approximation of

molecular ﬂexibility, since some amount of ﬂexibility

can exist in non-aromatic rings. To treat such phe-

nomena as sugar puckering and chair-boat hexane

conformations, the user needs to supply each ring

conformation as a separate input molecule. If the

molecule does not have a ring, the largest rigid seg-

ment is speciﬁed as the anchor. Additional bonds may

be speciﬁed as rigid by the user. For simplicity, all runs

in this study used the default of largest anchor only. If

the molecule had multiple anchors of the same size, the

ﬁrst anchor on the anchor list was used. Once the an-

chor had been identiﬁed, the parameters that control

the number of anchor orientations (max_orientations),

the number of anchor minimization steps (sim-

plex_anchor_max_iterations), and the cutoff for the

anchor pruning (num_confs_for_next_growth) were

explored. Because the anchors are substructures of the

ligand, the parameter convergence was monitored as a

function of the RMSD between the anchor orientation

and the corresponding substructure of the crystal li-

gand averaged over all generated orientations before

the pruning function. When the number of anchor

orientations and minimization steps were varied sys-

tematically, the number of minimization steps con-

verged at 500 (Fig. 11a). We expected this optimized

value to be lower than rigid docking because anchors

are typically smaller than the ﬁnal ligand.

Because the anchor orientations are pruned before

the growth step, we used the optimized number of

minimization steps while exploring the number of

anchor orientations and the pruning cutoff. The

optimal anchor pruning cutoff of 100 was chosen as a

balance between convergence and the length of the

calculation, which remained ﬁxed for the ﬁnal explo-

ration of the number of orientations. The optimal

number of orientations was selected to be 500 because

the combination of these three variables generated

the highest number of anchors near the crystal

structure (Fig. 11a). Note that if the number of ori-

entations was increased beyond the selected value,

the number of anchors near the crystal structure

dropped dramatically. We hypothesized that this

resulted from a combination of increased sampling

and pruning. The pruning function was designed to

identify a representative orientation from each energy

well that the matching algorithm ﬁnds (see Introduc-

tion: DOCK background). As sampling increased, the

ranked orientations began to converge toward the

bottom of the deepest energy wells, sampling less of

the alternative high energy wells. Because the pruning

function is designed to supply the most diverse

ligands, fewer orientations made it through the

pruning step as the sampling is increased. We felt that

this effect was reducing the potential sampling for the

algorithm and plan to explore alternatives in future

studies.

The next step in the anchor-and-grow algorithm is

ﬂexible bond identiﬁcation. Each ﬂexible bond is

associated with a label deﬁned in an editable ﬁle. The

parameter ﬁle is identiﬁed with the ﬂex_deﬁnition_ﬁle

parameter. Each label in the ﬁle contains a deﬁnition

based on the atom types and chemical environment of

the bonded atoms. Typically, bonds with some degree

of double bond character are excluded from minimi-

zation so that planarity is preserved. Each label is also

associated with a set of preferred torsion positions.

The location of each ﬂexible bond is used to partition

the molecule into rigid segments. A segment is the

largest local set of atoms that contains only non-

ﬂexible bonds.

Using the optimal anchor parameters, we varied

number of minimization steps for each layer of growth

(simplex_grow_max_iterations) and the cutoff of

number of conformers for the growth pruning function

(num_confs_for_next_growth). Because the dock run

now creates a complete pose, we return to using a

combination of the score for the top ranking pose

averaged over the entire test set and the success rate to

Appendix 2 Description of and optimized default values for parameters that affect ﬂexible ligand docking

Parameter name Parameter description Value

max_orientations The number of anchor poses sampled by the rigid orienting algorithm 500

num_anchor_orients_for_growth The maximum number of anchor orientations promoted to the conformational search 100

num_confs_for_next_growth The number of partially grown ligand conformers stored at each stage of the ﬂexible growth

procedure

100

simplex_anchor_max_iterations The maximum number of simplex iterations applied to the ligand anchor during anchor

docking

500

simplex_grow_max_iterations The maximum number of simplex iterations applied to the ligand during the ﬂexible growth

procedure

500

J Comput Aided Mol Des (2006) 20:601–619 617

123

monitor convergence. As with rigid ligand docking, the

success rate improves modestly with improved sam-

pling and eventually converges (Fig. 11). However,

although DOCK scores improved as the number of

orientations and the amount of minimization in-

creased, the values do not converge. We once again

attribute this phenomenon to the pruning function.

Therefore, we used the success rate to select the lowest

converged values—500 minimization steps and the

cutoff for the number of conformers for the growth

section as 100—as optimal.

References

1. Kopec KK, Bozyczko-Coyne D, Williams M (2005) Biochem

Pharmacol 69:1133

2. Congreve M, Murray CW, Blundell TL (2005) Drug Dis-

covery Today 10:895

3. Kraljevic S, Stambrook PJ, Pavelic K (2004) EMBO Rep 5:837

4. Schnecke V, Bostrom J (2006) Drug Discovery Today 11:43

5. Hillisch A, Pineda LF, Hilgenfeld R (2004) Drug Discovery

Today 9:659

6. Posner BA (2005) Curr Opin Drug Discovery Dev 8:487

7. Alvarez JC (2004) Curr Opin Chem Biol 8:365

8. Verdonk ML, Cole JC, Hartshorn MJ, Murray CW, Taylor

RD (2003) Proteins 52:609

9. Friesner RA, Banks JL, Murphy RB, Halgren TA, Klicic JJ,

Mainz DT, Repasky MP, Knoll EH, Shelley M, Perry JK,

Shaw DE, Francis P, Shenkin PS (2004) J Med Chem 47:1739

10. Halgren TA, Murphy RB, Friesner RA, Beard HS, Frye LL,

Pollard WT, Banks JL (2004) J Med Chem 47:1750

11. Kramer B, Rarey M, Lengauer T (1999) Proteins 37:228

12. Kitchen DB, Decornez H, Furr JR, Bajorath J (2004) Nat

Rev Drug Discovery 3:935

13. Shoichet BK, Bodian DL, Kuntz ID (1992) J Comput Chem

13:380

14. Ewing TJA, Kuntz ID (1997) J Comput Chem 18:1175

15. Leach AR, Kuntz ID (1992) J Comput Chem 13:730

16. Meng EC, Shoichet BK, Kuntz ID (1992) J Comput Chem

13:505

17. Lischner R (2003) C++ in a nutshell. 1st edn. O’Reilly

Media, Inc, Sebastopol, CA

Fig. 11 Optimization of parameters for ﬂexible ligand docking.

(a) Parameter optimization for anchor sampling portion of

ﬂexible ligand docking. TOP: Parameters of 0 (h), 50 (s), 100

(n), and 500 (O) anchor minimization steps (sim-

plex_anchor_max_iterations) are plotted as a function of the

number of orientations (max_orientations). BOTTOM: Param-

eters of 50 (vertical stripes), 500 (ﬁlled), and 5,000 (diagonal

stripes) anchor orientations (max_orientations) are compared

using an anchor pruning cutoff (num_confs_for_next_growth) of

100. (b) Parameter optimization for growth sampling portion of

ﬂexible ligand docking. Growth pruning cutoffs (num_con-

fs_for_next_growth) of 25 (s), 50 (n), 100 (O), and 200 (e)

are plotted as a function of the number of growth minimization

steps (simplex_grow_max_iterations)

618 J Comput Aided Mol Des (2006) 20:601–619

123

18. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN,

Weissig H, Shindyalov IN, Bourne PE (2000) Nucleic Acids

Res 28:235

19. Jones G, Willett P, Glen RC, Leach AR, Taylor R (1997) J

Mol Biol 267:727

20. Pang YP, Perola E, Xu K, Prendergast FG (2001) J Comput

Chem 22:1750

21. Perola E, Walters WP, Charifson PS (2004) Proteins 56:235

22. Nissink JW, Murray C, Hartshorn M, Verdonk ML, Cole JC,

Taylor R (2002) Proteins 49:457

23. Kuhl FS, Crippen GM, Friesen DK (1984) J Comput Chem

5:24

24. Nelder JA, Mead R (1965) Comput J 7:308

25. Gropp W, Lusk E, Doss N, Skjellum A (1996) Parallel

Computing 22:789

26. SYBYL, Tripos, Inc., St. Louis, Missouri, 63144

27. Case DA, Darden TA, Cheatham III, TE, Simmerling CL,

Wang J, Duke RE, Luo R, Merz KM, Wang B, Pearlman

DA, Crowley M, Brozell S, Tsui V, Gohlke H, Mongan J,

Hornak V, Cui G, Beroza P, Schafmeister C, Caldwell JW,

Ross WS, Kollman PA (2004) AMBER 8, University of

California, San Francisco

28. Jakalian A, Bush BL, Jack DB, Bayly CI (2000) J Comput

Chem 21:132

29. Hann MM, Oprea TI (2004) Curr Opin Chem Biol 8:255

30. Oprea TI (2002) J Comput-Aided Mol Des 16:325

31. Oprea TI, Davis AM, Teague SJ, Leeson PD (2001) J Chem

Inf Model 41:1308

32. Brooijmans N (2003) Theoretical studies of molecular rec-

ognition, Graduate Department of Chemistry and Chemical

Biology, University of California, San Francisco, San Fran-

cisco, CA

33. Purcell WP, Singer JA (1967) J Chem Eng Data 12:235

34. Gasteiger J, Marsili M (1980) Tetrahedron 36:3219

35. Aqvist J, Warshel A (1990) J Am Chem Soc 112:2860

36. Merz KM, Murcko MA, Kollman PA (1991) J Am Chem Soc

113:4484

37. Cornell WD, Cieplak P, Bayly CI, Gould IR, Merz KM,

Ferguson DM, Spellmeyer DC, Fox T, Caldwell JW, Koll-

man PA (1995) J Am Chem Soc 117:5179

38. Richards FM (1977) Ann Rev Biophys Bioeng 6:151

39. DesJarlais RL, Sheridan RP, Seibel GL, Dixon JS, Kuntz ID,

Venkataraghavan R (1988) J Med Chem 31:722

40. Kuntz ID, Blaney JM, Oatley SJ, Langridge R, Ferrin TE

(1982) J Mol Biol 161:269

41. Pettersen EF, Goddard TD, Huang CC, Couch GS, Green-

blatt DM, Meng EC, Ferrin TE (2004) J Comput Chem

25:1605

42. Meng EC, Lewis RA (1991) J Comput Chem 12:891

43. Mills JEJ, Dean PM (1996) J Comput-Aided Mol Des 10:607

44. Irwin JJ, Shoichet BK (2005) J Chem Inf Model 45:177

45. The results for the FlexX test set are available at http://

www.biosolveit.de/FlexX/

46. The results for the GOLD test set are available at http://

www.ccdc.cam.ac.uk/products/life_sciences/validate/

gold_validation/value.html

47. Verkhivker GM, Bouzida D, Gehlhaar DK, Rejto PA, Ar-

thurs S, Colson AB, Freer ST, Larson V, Luty BA, Marrone

T, Rose PW (2000) J Comput-Aided Mol Des 14:731

48. Kuntz ID, Agard DA (2003) Adv Protein Chem 66:1

49. Gschwend DA, Kuntz ID (1996) J Comput-Aided Mol Des

10:123

J Comput Aided Mol Des (2006) 20:601–619 619

123

Molecular Docking and Antimicrobial Screening of Alanine Derived Water-Soluble Cu(II) Complexes

Article

Full-text available

Jun 2024

Identification and Biological Evaluation of a Novel Small-Molecule Inhibitor of Ricin Toxin

Article

Full-text available

Mar 2024
MOLECULES

The plant-derived toxin ricin is classified as a type 2 ribosome-inactivating protein (RIP) and currently lacks effective clinical antidotes. The toxicity of ricin is mainly due to its ricin toxin A chain (RTA), which has become an important target for drug development. Previous studies have identified two essential binding pockets in the active site of RTA, but most existing inhibitors only target one of these pockets. In this study, we used computer-aided virtual screening to identify a compound called RSMI-29, which potentially interacts with both active pockets of RTA. We found that RSMI-29 can directly bind to RTA and effectively attenuate protein synthesis inhibition and rRNA depurination induced by RTA or ricin, thereby inhibiting their cytotoxic effects on cells in vitro. Moreover, RSMI-29 significantly reduced ricin-mediated damage to the liver, spleen, intestine, and lungs in mice, demonstrating its detoxification effect against ricin in vivo. RSMI-29 also exhibited excellent drug-like properties, featuring a typical structural moiety of known sulfonamides and barbiturates. These findings suggest that RSMI-29 is a novel small-molecule inhibitor that specifically targets ricin toxin A chain, providing a potential therapeutic option for ricin intoxication.

Transition Metal Complexes of 2-Aminopyridine Derivatives as Cyclooxygenase Inhibitors: Stability, Spectral, and Thermal Characterization, Electrochemical Behavior, DFT Calculations, Molecular Docking, and Biological Activities

Article

Nov 2023

Structural analysis of novel drug targets for mitigation of Pseudomonas aeruginosa biofilms

Article

Sep 2023

Pseudomonas aeruginosa is an opportunistic human pathogen responsible for acute and chronic, hard to treat infections. Persistence of P. aeruginosa is due to its ability to develop into biofilms which are sessile bacterial communities adhered to substratum and encapsulated in layers of self-produced exopolysaccharides. These biofilms provide enhanced protection from the host immune system and resilience towards antibiotics which poses a challenge for treatment. Various strategies have been expended for combating biofilms which involve inhibiting biofilm formation or promoting their dispersal. The current remediation approaches offer some hope for clinical usage however treatment and eradication of preformed biofilms is still a challenge. Thus, identifying novel targets and understanding the detailed mechanism of biofilm regulation becomes imperative. Structure-based drug discovery (SBDD) provides a powerful tool that exploits the knowledge of atomic resolution details of the targets to search for high affinity ligands. This review describes the available structural information on the putative target protein structures that can be utilised for high throughput in silico drug discovery against P. aeruginosa biofilms. Integrating available structural information on the target proteins in readily accessible format will accelerate the process of drug discovery.

Protein Structure Inspired Drug Discovery

Preprint

Full-text available

May 2024

Drug discovery starts with known function, either of a compound or a protein, in-turn prompting investigations to probe 3D structure of the compound-protein interface. As protein structure determines function, we hypothesized that unique 3D structural motifs represent primary information denoting unique function that can drive discovery of novel agents. Using a physics-based protein structure analysis platform developed by us, designed to conduct computationally intensive analysis at supercomputing speeds, we probed a high-resolution protein x-ray crystallographic library developed by us. We selected 3D structural motifs whose function was not otherwise established, that offered environments supporting binding of drug-like chemicals and were present on proteins that were not established therapeutic targets. For each of eight potential binding pockets on six different proteins we accessed a 60 million compound library and used our analysis platform to evaluate binding. Using eight-day colony formation assays acquired compounds were screened for efficacy against human breast, prostate, colon and lung cancer cells and toxicity against human bone marrow stem cells. Compounds selectively inhibiting cancer growth segregated to two pockets on separate proteins. The compound, Dxr2-017, exhibited selective activity against human melanoma cells in the NCI-60 cell line screen, had an IC50 of 19 nM against human melanoma M14 cells in our eight-day assay, while over 2100-fold higher concentrations inhibited stem cells by less than 30%. We show that Dxr2-017 induces anoikis, a unique form of programmed cell death in need of targeted therapeutics. The predicted target protein for Dxr2-017 is expressed in bacteria, not in humans. This supports our strategy of focusing on unique 3D structural motifs. It is known that functionally important 3D structures are evolutionarily conserved. Here we demonstrate proof-of-concept that protein structure represents high value primary data to support discovery of novel therapeutics. This approach is widely applicable. Author summary We introduce the concept that protein 3D structure represents primary information which can support downstream investigations, in this instance leading to the discovery of novel anticancer therapeutics.

Efficient Generation of Conformer Ensembles Using Internal Coordinates and a Generative Directional Graph Convolution Neural Network

Article

Apr 2024
J CHEM THEORY COMPUT

Molecular Docking Studies of Coronavirinae with Different Toll-Like Receptors (TLR 1-10)

Conference Paper

Feb 2024

DOCK 6: Incorporating hierarchical traversal through precomputed ligand conformations to enable large‐scale docking

Article

Sep 2023

To allow DOCK 6 access to unprecedented chemical space for screening billions of small molecules, we have implemented features from DOCK 3.7 into DOCK 6, including a search routine that traverses precomputed ligand conformations stored in a hierarchical database. We tested them on the DUDE‐Z and SB2012 test sets. The hierarchical database search routine is 16 times faster than anchor‐and‐grow. However, the ability of hierarchical database search to reproduce the experimental pose is 16% worse than that of anchor‐and‐grow. The enrichment performance is on average similar, but DOCK 3.7 has better enrichment than DOCK 6, and DOCK 6 is on average 1.7 times slower. However, with post‐docking torsion minimization, DOCK 6 surpasses DOCK 3.7. A large‐scale virtual screen is performed with DOCK 6 on 23 million fragment molecules. We use current features in DOCK 6 to complement hierarchical database calculations, including torsion minimization , which is not available in DOCK 3.7.

Molecular docking in organic, inorganic, and hybrid systems: a tutorial review

Article

Jun 2023

Molecular docking simulation is a very popular and well-established computational approach and has been extensively used to understand molecular interactions between a natural organic molecule (ideally taken as a receptor) such as an enzyme, protein, DNA, RNA and a natural or synthetic organic/inorganic molecule (considered as a ligand). But the implementation of docking ideas to synthetic organic, inorganic, or hybrid systems is very limited with respect to their use as a receptor despite their huge popularity in different experimental systems. In this context, molecular docking can be an efficient computational tool for understanding the role of intermolecular interactions in hybrid systems that can help in designing materials on mesoscale for different applications. The current review focuses on the implementation of the docking method in organic, inorganic, and hybrid systems along with examples from different case studies. We describe different resources, including databases and tools required in the docking study and applications. The concept of docking techniques, types of docking models, and the role of different intermolecular interactions involved in the docking process to understand the binding mechanisms are explained. Finally, the challenges and limitations of dockings are also discussed in this review.

Novel metal (II) complexes with 2, 2’- bithiophene ligands as promising antibacterial agents: Spectral investigation, electrochemical behavior, DFT studies, in vitro and in silico biological properties

Article

May 2023
J MOL STRUCT

The present study reports the synthesis, characterization, electrochemical behavior, and antimicrobial potential of novel bithiophene derivatives and their nickel (II) and copper (II) metal complexes. The coordination mode, geometry, and formula of all the compounds were investigated using a combination of physical, analytical, spectral methods and DFT calculations. The electrochemical properties of the synthesized compounds as evaluated by cyclic voltammetry showed that the redox potentials of the complexes are mainly influenced by the chelate structure, the ligand geometry, and the inductive effect of the substituents. The selected ligands and their metal (II) complexes demonstrate promising antimicrobial activity in both in vitro and in silico studies, indicating their potential as drug candidates. In addition, molecular docking studies show that both ligands and complexes could act as bacterial enzyme inhibitors supporting the newly discovered potential of this type of molecule.

A Combinatorial Algorithm for Calculating Ligand Binding

Article

Full-text available

Feb 1984
J COMPUT CHEM

We consider the problem of predicting the mode of binding of a small molecule to a receptor site on a protein. One plausible approach, given a rigid molecule and its geometry, is to search directly for the orientation in space that maximizes the degree of contact. The computation time required for such a naive procedure is proportional to n3m3, where n is the number of points in the site where binding can occur, and m is the number of atoms in the ligand. We give an alternative, combinatorial approach, in which only “contact–no-contact” criteria are considered. We relate this problem to the well-known combinatorial problem of finding cliques in a graph and show that we can use a solution to the clique problem not only to solve our original problem, but also the problem of avoiding energetically unfavorable matches. Our experience with this method indicates that the computation time required is proportional to nm2.8, with a lower constant of proportionality than that of the naive procedure.

A high-performance, portable implementation of the MPI message passing interface standard

Article

Sep 1996
PARALLEL COMPUT

MPI (Message Passing Interface) is a specification for a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, and applications specialists. Multiple implementations of MPI have been developed. In this paper, we describe MPICH, unique among existing implementations in its design goal of combining portability with high performance. We document its portability and performance and describe the architecture by which these features are simultaneously achieved. We also discuss the set of tools that accompany the free distribution of MPICH, which constitute the beginnings of a portable parallel programming environment. A project of this scope inevitably imparts lessons about parallel computing, the specification being followed, the current hardware and software environment for parallel computing, and project management; we describe those we have learned. Finally, we discuss future developments for MPICH, including those necessary to accommodate extensions to the MPI Standard now being contemplated by the MPI Forum.

Fast, efficient generation of high-quality atomic charges. AM1-BCC model: I. Method

Article

Jan 2000
J COMPUT CHEM

The AM1-BCC method quickly and efficiently generates high-quality atomic charges for use in condensed-phase simulations. The underlying features of the electron distribution including formal charge and delocalization are first captured by AM1 atomic charges for the individual molecule. Bond charge corrections (BCCs), which have been parameterized against the HF/6-31G* electrostatic potential (ESP) of a training set of compounds containing relevant functional groups, are then added using a formalism identical to the consensus BCI (bond charge increment) approach. As a proof of the concept, we fit BCCs simultaneously to 45 compounds including O-, N-, and S-containing functionalities, aromatics, and heteroaromatics, using only 41 BCC parameters. AM1-BCC yields charge sets of comparable quality to HF/6-31G* ESP-derived charges in a fraction of the time while reducing instabilities in the atomic charges compared to direct ESP-fit methods. We then apply the BCC parameters to a small “test set” consisting of aspirin, d-glucose, and eryodictyol; the AM1-BCC model again provides atomic charges of quality comparable with HF/6-31G* RESP charges, as judged by an increase of only 0.01 to 0.02 atomic units in the root-mean-square (RMS) error in ESP. Based on these encouraging results, we intend to parameterize the AM1-BCC model to provide a consistent charge model for any organic or biological molecule. © 2000 John Wiley & Sons, Inc. J Comput Chem 21: 132–146, 2000

Evaluation of the FLEXX incremental construction algorithm for protein–ligand docking

Article

Nov 1999
PROTEINS

We report on a test of FLEXX, a fully automatic docking tool for flexible ligands, on a highly diverse data set of 200 protein–ligand complexes from the Protein Data Bank. In total 46.5% of the complexes of the data set can be reproduced by a FLEXX docking solution at rank 1 with an rms deviation (RMSD) from the observed structure of less than 2 Å. This rate rises to 70% if one looks at the entire generated solution set. FLEXX produces reliable results for ligands with up to 15 components which can be docked in 80% of the cases with acceptable accuracy. Ligands with more than 15 components tend to generate wrong solutions more often. The average runtime of FLEXX on this test set is 93 seconds per complex on a SUN Ultra-30 workstation. In addition, we report on “cross-docking” experiments, in which several receptor structures of complexes with identical proteins have been used for docking all cocrystallized ligands of these complexes. In most cases, these experiments show that FLEXX can acceptably dock a ligand into a foreign receptor structure. Finally we report on screening runs of ligands out of a library with 556 entries against ten different proteins. In eight cases FLEXX is able to find the original inhibitor within the top 7% of the total library. Proteins 1999;37:228–241. ©1999 Wiley-Liss, Inc.

The Protein Data Bank

Article

Jan 2000

Helen Berman

The Protein Data Bank (PDB; http://www.rcsb.org/pdb/ ) is the single worldwide archive of structural data of biological macromolecules. This paper describes the goals of the PDB, the systems in place for data deposition and access, how to obtain further information, and near-term plans for the future development of the resource.

Molecular Docking Using Shape Descriptors

Article

Apr 1992
J COMPUT CHEM

Molecular docking explores the binding modes of two interacting molecules. The technique is increasingly popular for studying protein-ligand interactions and for drug design. A fundamental problem problem with molecular docking is that orientation space is very large and grows combinatorially with the number of degrees of freedom of the interacting molecules. Here, we describe and evaluate algorithms that improve the efficiency and accuracy of a shape-based docking method. We use molecular organization and sampling techniques to remove the exponential time dependence on molecular size in docking calculations. The new techniques allow us to study systems that were prohibitively large for the original method. The new algorithms are tested in 10 different protein-ligand systems, including 7 systems where the ligand is itself a protein. In all cases, the new algorithms successfully reproduce the experimentally determined configurations of the ligand in the protein.

A brief review and table of semiempirical parameters used in the H??ckel molecular orbital method

Article

Apr 1967

Free energy relationships in metalloenzyme-catalyzed reactions. Calculations of the effects of metal ion substitutions in Staphylococcal nuclease

Article

Apr 1990

Free energy perturbation calculations of the catalytic effects associated with substitutions of the active site Ca2+ ion in staphylococcal nuclease are reported. The calculated changes in the activation barrier for different ions are found to be consistent with kinetic measurements, and the catalytic rate of enzyme indeed appears to be optimized for Ca2+. Our results indicate that the more electrophilic ions (with large hydration free energy) increase the activation barrier as a result of overstabilization of the intermediately created OH- nucleophile and that the enzymatic rate is more affected by these ions than by those that are less electrophilic than Ca2+. A simple model for treating transition-metal ions is also presented and calibrated for the Mn2+ ion in solution. The calculated decrease in activity when Mn2+ is bound to the enzyme agrees fairly well with experimental observations. Simple free energy relationships are outlined in order to classify different types of metal-catalyzed enzymatic reactions. These relationships demonstrate that the optimization of the catalytic efficiency for a particular ion is related to its multiple tasks during the reaction; i.e., the ion must stabilize the negatively charged nuclephile as well as the subsequent transition state. Several other metalloenzymes are discussed in these terms, and it is argued that such free energy relationships can provide qualitative predictions of the effects associated with metal substitutions. Finally, a tentative qualitative classification of metalloenzymes is presented in terms of the interplay between metal and general-base catalysis, again based on linear free energy concepts.

Inhibition of Carbonic Anhydrase

Article

Jun 1991

We report free energy perturbation simulations on a series of sulfonamide (RS(O)2NH-) inhibitors of the zinc metalloenzyme human carbonic anhydrase II (HCAII). In order to carry out these simulations, we had to incorporate the zinc ion into thc AMBER force field. To do this, we have found that the following modifications are appropriate: (1) the charge on zinc was reduced from +2.0 to +0.8; (2) explicit covalent bonds and angles were incorporated between the zinc and its ligands (His 94, His 96, His 119). This model was determined by parametrizing the force field against the known structure of a HCAII-acetazolamide complex. The series of compounds examined include p-hexylbenzenesulfonamide (1), benzenesulfonamide (2), and p-hexylbenzenesulfonate (3). Two conversions were studied: the first involved the direct conversion of 1 into 2, while the second involved changing the sulfonamide group to a sulfonate (1 --> 3). The former simulation involved direct conversion of a hexyl group into a hydrogen atom, an ambitious calculation, which has provided insight into the capabilities of the free energy perturbation method. We find that we can reproduce experimental relative binding constants but that this ability to do so is very dependent on the molecular mechanical model used and on the simulation protocol. In order for us to compare our calculated results with experimental ones for the latter simulation, we have had to account for the pK(a) difference between the sulfonamide and a sulfonate groups. With the appropriate correction for the pK(a) difference between 1 and 3 we find that we are able to reproduce the experimental DELTA-DELTA-G(bind). We also find that the reason why sulfonamides are better inhibitors of HCAII than are sulfonates can be traced to a single hydrogen-bond interaction present in sulfonamides, but lacking in sulfonates.

Conformational analysis of flexible ligands in macromolecular receptor sites

Article

Jul 1992
J COMPUT CHEM

A computational method for exploring the orientational and conformational space of a flexible ligand within a macromolecular receptor site is presented. The approach uses a variant of the DOCK algorithm [Kuntz et al., J. Mol. Biol., 161, 288 (1982)] to determine orientations of a fragment of the ligand within the site. These positions then form the basis for exploring the conformational space of the rest of the ligand, using a systematic search algorithm. The search incorporates a method by which the ligand conformation can be modified in response to interactions with the receptor. The approach is applied to two test cases, in both of which the crystallographically determined structures are obtained. However, alternative models can also be obtained that differ significantly from those observed experimentally. The ability of a variety of measures of the intermolecular interaction to discriminate among these structures is discussed.

Development and validation of a modular, extensible docking program: DOCK 5

Abstract and Figures

Recommended publications

PERILAKU SEKSUAL BERISIKO PENULARAN HIV PADA TENAGA KERJA BONGKAR MUAT DI PELABUHAN KALIMAS SURABAYA

Computational analysis for selectivity of histone deacetylase inhibitor by replica-exchange umbrella...

Characteristics of obstetric fistula women repaired at Kisii Gynocare Fistula Centre

Accounting for Lack of Independence and Partial Overlap of Observation Zones in Line-Transect Mark-R...