ArticlePDF Available

Pushing the limits of what is achievable in protein-DNA docking: Benchmarking HADDOCK's performance

Authors:

Abstract and Figures

The intrinsic flexibility of DNA and the difficulty of identifying its interaction surface have long been challenges that prevented the development of efficient protein–DNA docking methods. We have demonstrated the ability our flexible data-driven docking method HADDOCK to deal with these before, by using custom-built DNA structural models. Here we put our method to the test on a set of 47 complexes from the protein–DNA docking benchmark. We show that HADDOCK is able to predict many of the specific DNA conformational changes required to assemble the interface(s). Our DNA analysis and modelling procedure captures the bend and twist motions occurring upon complex formation and uses these to generate custom-built DNA structural models, more closely resembling the bound form, for use in a second docking round. We achieve throughout the benchmark an overall success rate of 94% of one-star solutions or higher (interface root mean square deviation ≤4 Å and fraction of native contacts >10%) according to CAPRI criteria. Our improved protocol successfully predicts even the challenging protein–DNA complexes in the benchmark. Finally, our method is the first to readily dock multiple molecules (N > 2) simultaneously, pushing the limits of what is currently achievable in the field of protein–DNA docking.
Content may be subject to copyright.
Pushing the limits of what is achievable in
protein–DNA docking: benchmarking
HADDOCK’s performance
Marc van Dijk and Alexandre M. J. J. Bonvin*
Bijvoet Center for Biomolecular Research, Science Faculty, Utrecht University, Padualaan 8, 3584 CH Utrecht,
The Netherlands
Received January 9, 2010; Revised and Accepted March 17, 2010
ABSTRACT
The intrinsic flexibility of DNA and the difficulty of
identifying its interaction surface have long been
challenges that prevented the development of effi-
cient protein–DNA docking methods. We have
demonstrated the ability our flexible data-driven
docking method HADDOCK to deal with these
before, by using custom-built DNA structural
models. Here we put our method to the test on a
set of 47 complexes from the protein–DNA docking
benchmark. We show that HADDOCK is able to
predict many of the specific DNA conformational
changes required to assemble the interface(s). Our
DNA analysis and modelling procedure captures the
bend and twist motions occurring upon complex
formation and uses these to generate custom-built
DNA structural models, more closely resembling the
bound form, for use in a second docking round. We
achieve throughout the benchmark an overall
success rate of 94% of one-star solutions or
higher (interface root mean square deviation 4A
˚
and fraction of native contacts >10%) according to
CAPRI criteria. Our improved protocol successfully
predicts even the challenging protein–DNA com-
plexes in the benchmark. Finally, our method is the
first to readily dock multiple molecules (N>2) sim-
ultaneously, pushing the limits of what is currently
achievable in the field of protein–DNA docking.
INTRODUCTION
The computational docking field is proceeding ever faster
to become an integral part of the research workflow in life
sciences. Most of the developments in docking method-
ology were pioneered in the fields of small molecule
docking and protein–protein docking (1–3). Docking
has become a valuable tool in drug design, molecular
interaction studies, NMR and X-ray structural studies,
biochemical experiment design and validation (4–6).
While docking is flourishing in these fields, less progress
has been made in the development of successful protein–
DNA docking algorithms. This is in part due to two
system-dependent problems: (i) identifying the location
of the interaction interface(s) on the DNA and (ii)
modelling DNA conformational changes while maintain-
ing a correct representation of the DNA double-helix
during a simulation. The field of protein–DNA docking
is, however, receiving renewed interest as the vital role of
protein–DNA interactions in regulating gene expression
and guarding genome integrity has become apparent (7).
As a consequence, new protein–DNA docking methods
are put forward and proven protein–protein docking
concepts are extended to deal with these systems (8–17).
We have in the past adapted our data driven docking
method HADDOCK, to deal with protein–DNA systems
(18) and showed that it is able to deal with the two main
challenges mentioned above. The ability of HADDOCK
to use experimental data to drive the docking greatly
facilitates the identification and positioning of the inter-
action interfaces during the docking (19,20). The incorp-
oration of flexibility, both explicitly during the docking
and implicitly by the use of custom-built DNA structural
models, has proven to facilitate the conformational
changes in the protein and DNA needed to establish the
complex. The protocol was initially tested by docking the
unbound structures of three monomeric transcription
factors to their respective operator half-sites [phage 434
Cro (21), phage Arc (22) and Escherichia coli Lac (23)].
The resulting near native docking solutions reproduced
many of the contacts observed in the experimental struc-
tures as well as specific conformational changes in the
DNA. Our initial protein–DNA docking protocol has
been successfully used in a number of practical applica-
tions by various laboratories worldwide (24–28). Driven
by this success we have worked on improving the method’s
performance and user friendliness by facilitating the gen-
eration of custom DNA structural models (29) as well as
*To whom correspondence should be addressed. Tel: +31 30 2533859; Fax: +31 30 2537623; Email: a.m.j.j.bonvin@uu.nl
5634–5647 Nucleic Acids Research, 2010, Vol. 38, No. 17 Published online 13 May 2010
doi:10.1093/nar/gkq222
ßThe Author(s) 2010. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/
by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
at University Library Utrecht on October 15, 2010nar.oxfordjournals.orgDownloaded from
establishing a protein–DNA docking benchmark as a test
bed for future developments (30). Next to that,
HADDOCK has been made available to the community
as a web server (http://www.haddocking.org; http://had-
dock.chem.uu.nl).
Here we bring all these elements together and challenge
our method using the 47 test cases from the protein–DNA
benchmark to define the limits of our current approach.
We focus on the same two questions addressed in the
previous work (18): how successful is the method in
dealing with conformational changes upon complex for-
mation and how well is it able to identify the correct inter-
action interfaces? Compared to the three test cases used
previously, the 47 test cases in the benchmark pose some
considerable challenges. The initial test cases were all
major groove interacting transcription factors in their
monomeric form, targeting one operator half-side that
effectively spans one helical turn of DNA. The DNA-
interacting domain of these transcription factors changes
only conformation with respect to the side-chains of the
DNA-interacting residues. The global conformational
changes in the DNA were expressed as a uniform bend
and change in groove width. In contrast, among the 47 test
cases of the benchmark, not only transcription factors but
also enzymes and structural proteins are present. These
interact using a variation of structural domains, often
involving multiple proteins, targeted to one or multiple
sites on the DNA. Furthermore, the DNA length is
often more than one helical turn. As a consequence, con-
formational changes can no longer be expressed in a
smooth and uniform way but rather as an accumulation
of local DNA bending and twisting events. To cope with
these challenges we have improved our method for the
generation of custom DNA structural models by extend-
ing its ability to capture the main bend and twist motions
occurring in the DNA upon complex formation, and by
subsequently using this information for the generation of
custom DNA models.
The new results, again, show that the use of explicit
flexibility in combination with implicit flexibility by
means of an ensemble of custom-built DNA structural
models, greatly improves the protein–DNA docking effi-
ciency with respect to rigid-body docking. This is especial-
ly clear for the intermediate and difficult categories of the
benchmark where DNA conformational changes readily
occur. The use of experimental information for the
docking of a representative subset of the benchmark, dem-
onstrates the ability of our method to identify the correct
interfaces and assemble the complex under ‘real life’
docking conditions. Furthermore, our method is the first
to dock multiple molecules simultaneously, a valuable
feature in a benchmark containing 40% of multi-
component complexes. Top ranking docking solutions
throughout the benchmark readily score one and two
stars according to the CAPRI quality criteria (31) and
three-star predictions are getting within reach for ‘easy’
test cases.
To our knowledge this is the first time a protein–DNA
docking study of such a magnitude has been performed.
Our results stress the importance of conformational adap-
tation in the docking of protein–DNA complexes and
show the potential of HADDOCK to deal with them.
We hope that they will stimulate the docking community
to put their methods to the test on the same benchmark
and foster further developments.
MATERIALS AND METHODS
Protein–DNA docking benchmark
The performance of HADDOCK was evaluated using the
coordinate files for the bound and unbound proteins of 47
protein–DNA complexes available in the protein–DNA
benchmark version 1.2 [http://haddock.chem.uu.nl/dna/
benchmark.html (30)]. Canonical B-DNA 3D structural
models were built using the 3D-DART web server
[http://haddock.chem.uu.nl/dna (29)]. Their conformation
was of BII type with the sugar pucker in the C20-endo
conformation [sugar pseudo-rotation phase angle (P)
= 155, DNA backbone torsion angles: a=309,
b=159,g=37,d=146,"=218,z=191and =260].
Restraints used in the docking
Ambiguous interaction restraints, based on the true
interface. Ideal ambiguous interaction restraints (AIR)
restraint sets were generated based on the true interface(s)
of the reference complexes as follows: (i) retrieval of all
intermolecular atom–atom contacts below a cutoff of
5.0 A
˚; (ii) transformation of the atom–atom contacts to
their respective residue–residue counterparts distinguish-
ing between three categories: amino-acid to nucleotide
base contacts, amino-acid to nucleotide sugar–phosphate
backbone contacts or amino-acid to full nucleotide
contacts. Contacts that originated from amino-acid
residues having a relative main- or side-chain solvent ac-
cessibility of <30% as measured by NACCESS (32) where
discarded.
All residues used in creating the interaction restraint file
were defined as ‘active’. In effect we used the same pro-
cedure to generate AIRs as in the case of experimental
information with the difference that they are only
defined between the residues that are known to be in
close vicinity in the reference complex.
AIRs based on experimental information. To evaluate the
performance of HADDOCK in docking protein–DNA
complexes using experimental information, we selected
six representative tested cases from the ‘easy’ (3cro,
1by4), ‘intermediate’ (1azp, 1jj4) and ‘difficult’ (1a74,
1zme) category of the benchmark. For these we collected
biochemical and biophysical information from literature
sources. Only residues that are solvent accessible in the
unbound proteins, using the same criteria as described
above, were considered. For those DNA bases shown to
be involved in specific interactions with the protein, only
atoms able to interact by hydrogen-bond or non-bonded
interactions were defined. This selection was further
subdivided into atoms facing either the major or minor
groove in case information about the protein-binding
mode was available (Table 1). In case of non-specific inter-
actions with the DNA, only the atoms of the sugar–phos-
phate backbone that are able to interact via hydrogen
Nucleic Acids Research, 2010, Vol. 38, No. 17 5635
at University Library Utrecht on October 15, 2010nar.oxfordjournals.orgDownloaded from
bonds or non-bonded interactions were defined (Table 1).
Solvent accessible residues located in the predicted inter-
action interface, for which no experimental information
was available, were defined as ‘passive’. Residues for
which experimental information was available were
defined as ‘active’. An overview of the data used is listed
in Table 2.
DNA restraints. In order to preserve the helical conform-
ation during the flexible stages of the docking the
DNA was restrained as described before (18). For the
docking of the unbound protein(s) to a canonical
B-DNA structural model, the dihedral angles of the
sugar–phosphate backbone of the input structure (inp)
were measured and used as restraints (restricted
to a=a
inp
±10
,b=b
inp
±40
,g=g
inp
±20
,d=d
inp
±
50,"="
inp
±10
and z=z
inp
±50
). For the docking of
the unbound protein(s) to the ensemble of custom-built
DNA structural models, the same protocol for sugar–
phosphate backbone restraints was used but the restraint
error values were reduced to half of those in the canonical
B-DNA case.
Docking protocol
The default protein–DNA docking protocol as described
before (18) and implemented in HADDOCK version 2.0
(33) was used for all the docking runs. This protocol
includes the random removal of 50% of the ambiguous
interaction restraints for each docking trial. Several
docking-specific modifications were made as follows.
Bound–bound docking. Only rigid body docking gene-
rating 2000 solutions. Protein and DNA structures
were used in the bound conformation obtained from the
reference complex.
Table 2. Definition of the AIRs based on experimental data for the six selected test-cases
Protein DNA References
‘Easy’
1by4 (37) Act: (K31,R32)
a,b
)T5,C6,G25,A26 (E24,K27)
a,b
)
G3/4,C27/28 (K72,K73,R80)
b
)A2,G3/4
Act: (T5,C6,A26,C27,C28,T29)
a
(G3,G4)
a,c,d
,
(A2,T24)
a,c
T23
c
,G25
a,d
(38–46)
Pas: V34,A75,V76,Q77, R55,N56,Q59,R62
3cro (21) Act: (K29,Q31,S32,K42-P44)
a
L35
b
)C14,T15/T23,33 Act: (C6,A7,T16-T18,C24,A25,T34-
T36)
a
,(T32,T33)
a,b,c
Pas: K9,T18-T20,G27,V28,Q30,Q34,
I36,E37,V40,T41,R45,F46
(T4,A5,T13,C14,T15,T22,A23, G31)
a,c
(47–50)
‘Intermediate’
1azp (51) Act: W24
e
)G3,G15 V26
b
,M29
b
,S31
e
,V45
e
)C2-A4,
T13-G15 (K22,T33,R42)
e
)T5-G7,C10-A12
Act: C2,G3
f
,A4,T5,C6,G7,
C10,G11,A12,T13,C14,G15
f
(52–55)
Pas: K21,R25,G27,K28,K39,T40,A44, S46,E47
1jj4 (56) Act: (N13,K16,C17,R19-R21)
a
Act: (A3,C4,T30)
a
,(C5,G28,G29)
a,d
(T25-C27)
c
(57,58)
Pas: S34,T35,H37 )T26-C27
‘Difficult’
1a74 (59) Act: (H97,N122)
a,b
)A35,G36
(A54-N56,T59,R60,R65,R73, G75)
a
)T1-C7
Act: (T1-C7)
a,b,d
,(A35,G36)
b
, G40
d
(60–65)
Pas: V51,G57,P58,T66,V71,H77, H100,K119
1zme (66) Act: (R9,R11,H12,R80,R82,H83)
a
Pas: A4,K14,K39-S43, A75,K85,K100-S114
Act: (C2,G3,G4,C15,C17,G18,
C20,G21,G22,C33,C34,G35)
a
(T26-C32,C9-T14)
(67–72)
Active residues (Act) are grouped according to the available information. Continuous stretches of residues are separated by a dash. Arrows indicate
active restraints for specific pairs of residues. Passive residues (Pas) are only defined for the protein. Since 1by4, 1jj4 and 1a74 are symmetrical dimers
only the restraints for one subunit are shown. Base-specific restraints for 3cro, 1by4, 1jj4, 1a74 and 1zme are targeted to the atoms of the nucleotides
facing the major groove and those of 1azp to those facing the minor groove (Table 1).
a
Conserved residues.
b
Mutagenesis data.
c
Ethylation interference data.
d
Methylation interference data.
e
NMR native state amide hydrogen exchange.
f
Raman spectroscopy.
Table 1. Nucleotide atom subsets used in the definition of AIRs
DNA base Minor groove atoms Major groove atoms
Thy H3, O2, C20H3, O4, C4, C5, C6, C70
Ade N1, N3, C2, C40H61, H62, N1, N7, C5, C6, C80
Gua H1, H21, H22, N3, C2, C40H1, H21, N7, O6, C5, C6, C80
Cyt N3, O2, C20H41, H42, N3, C4, C5, C60
Non-specific backbone atoms
Sugar–phosphate backbone C10,C2
0,O3
0,O5
0, P, O1P, O2P
Subsets are defined for atoms capable of interacting using non-bonded or hydrogen bonded interactions. Individual subsets are defined for those
atoms facing the DNA major and minor groove for the four bases and for the sugar–phosphate backbone atoms.
5636 Nucleic Acids Research, 2010, Vol. 38, No. 17
at University Library Utrecht on October 15, 2010nar.oxfordjournals.orgDownloaded from
Unbound–unbound docking using a canonical B-DNA struc-
tural model. A single component HADDOCK run was
performed using the unbound proteins to yield a better
sampling of side-chains and loop conformations. The
residues of the interface either defined based on the refer-
ence complex or on experimental information were
allowed to sample additional conformations during the
semi-flexible refinement stage. Here, semi-flexible refine-
ment signifies the combination of the semi-flexible
simulated annealing stage in torsion angle space and
the final water refinement stage in Cartesian space.
Four protein models and the original unbound protein
structure were used together with the canonical B-DNA
model as an input ensemble for unbound–unbound
docking. A total of 4000 docking solutions (every
combination of models is sampled 800 times) were
generated in the rigid body docking stage and the top
10% based on the HADDOCK score were used in the
subsequent semi-flexible refinement stage. During the
semi-flexible simulated annealing stage, the full DNA
excluding the terminal base pairs was treated as semi-
flexible. The amino-acid residues within 5.0 A
˚of any
partner molecule were automatically defined as semi-
flexible.
Unbound–unbound docking using five custom-built DNA
structural models. The same protocol as for unbound–
unbound docking starting from canonical B-DNA was
used with as difference; five custom-built DNA structural
models were used instead of canonical B-DNA; the con-
formational freedom of the DNA in the semi-flexible
simulated annealing stage was limited by automatically
defining both the amino-acid residues and nucleotides
within 5.0 A
˚of any partner molecule as semi-flexible; the
error range for the sugar–phosphate backbone dihedral
angles as described above were reduced by half. Every
combination of protein–DNA input models is sampled
160 times in the rigid body docking stage. The procedure
for generating custom DNA structural models used as
input for this docking run is described below.
Generation of custom DNA structural models
The generation of five custom DNA structural models is
based on an analysis and a modelling step.
Analysis. The 10 best solutions from the top ranking
cluster, both according to the HADDOCK score, were
selected. The DNA structures in these solutions were
analyzed using 3DNA (34,35) and the DNA bend
analysis algorithm used in the 3D-DART server (29).
This resulted in average parameter values for the six
base pair (step) parameters (36) for every base pair
(step) in the structure. These describe the conformation
of the DNA. The average global bend vector with
respect to a common reference frame between every suc-
cessive base pair in the structures was calculated by 3D-
DART. This information was used in the modelling stage.
Modelling. The modelling of custom DNA structures is
based on the progressive introduction of global and
local DNA conformational changes to a canonical
B-DNA starting model.
(i) A default set of base pair (step) parameters repre-
senting a canonical B-DNA conformation with the
same sequence as the reference structure is
generated by 3D-DART using the ‘fiber’ utility of
the 3DNA software suite.
(ii) The Roll and Tilt values in the default set are
updated by 3D-DART to reflect the average
global bend vector for every base pair step in the
sequence. The central base pair is used as origin of
the global reference frame and default Twist values
are used for correcting the vectors direction relative
to the reference frame. The introduced bend vector
between base pairs is scaled, enabling sampling of
conformation change beyond the limits of the values
defined by the average ± the standard deviation
determined in the analysis stage. The scaling factor
is set between 2.0 and 3.0 for those ensembles that
show little deviation from a canonical helix and
between 4.0 and 6.0 for the remaining test cases.
For the docking of 1a74 using experimentally
derived restraints the scaling factor was set to 10.0
to match the amount of DNA bend to the curved
interaction surface of the protein (see ‘Results’
section).
(iii) All base pair step parameters are updated to reflect
the average values as determined by the analysis
stage resulting in a new weighted parameter PWxi
at base pair step idefined as follows:
PWxi ¼2ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pi=p
p

S

Pxi,ð1Þ
where Pxi is the average value for the given parameter
at base pair step iobtained from the analysis stage,
pi defines the standard deviation for the given par-
ameter at base pair step iand pis the standard
deviation for the given parameter for all base pair
steps. Sis a parameter-specific scaling factor that
compensates for the over- or under-estimation of a
given parameter as a result of the HADDOCK semi-
flexible refinement stages. Swas set to: twist: 0.8, roll:
0.8, tilt: 0.8, rise: 0.0, slide: 0.2 and shift: 0.8.
The new value Pni for the parameter at base pair
step iis now calculated as follows:
Pni ¼Pd+ðPWxi PdÞVðÞ ð2Þ
Here P
d
is the default value from canonical B-DNA
for the given parameter at base pair step iand Vis
a variance value used to sample the parameter
above or below its adjusted average (set to 0.8 by
default).
(iv) The default base pair parameters are updated in
the same way as for the base pair step parameters.
The base pair parameter-specific scaling factors
(S) used are: shear: 1.0, stretch: 1.0, stagger: 1.0,
buckle: 1.0 and propeller twist: 1.0. The
variance parameter Vis set to 0.8 by default.
Nucleic Acids Research, 2010, Vol. 38, No. 17 5637
at University Library Utrecht on October 15, 2010nar.oxfordjournals.orgDownloaded from
(v) The updated list of base pair (step) parameters is
used to build a 3D DNA structure using the same
parameters for the sugar pucker and phosphate
backbone dihedral angles as in the case of canonical
B-DNA.
Analysis
The quality of the generated solutions was evaluated
using the CAPRI criteria expressed as stars; three stars
(high quality): Fnat >0.5, l- or i-r.m.s.d <1.0 A
˚;
two stars (medium quality): Fnat >0.3, l-r.m.s.d <5.0 A
˚
or i-r.m.s.d <2.0 A
˚; one star (acceptable quality):
Fnat >0.1, l-r.m.s.d <10.0 A
˚or i-r.m.s.d <4.0 A
˚. Fnat is
the fraction of native contacts within a 5 A
˚cutoff, i-r.m.s.d
is the interface backbone (Ca,P) r.m.s.d and l-r.m.s.d is
the ligand backbone r.m.s.d calculated by superimposition
on all phosphate atoms of the reference DNA and subse-
quently on all Caatoms of the reference protein. For the
results in Figure 4 and the docking using experimentally
derived restraints, the reported r.m.s.d values were
calculated after superimposition of all heavy atoms of
the reference belonging to either the DNA, the protein,
the interface or the full complex. The r.m.s.d values were
calculated using ProFit (A.C.R. Martin, http://www
.bioinf.org.uk/software/profit)
Hardware
HADDOCK docking runs were performed on a Transtec
(Transtec AG, Tubingen, Germany) computer cluster
operating with 48, 2.0 GHz, 64 bit Opteron processors.
As a measure of CPU requirements, one complete run
starting with 4000 structures in the rigid-body docking
stage could be performed in 4 h on 48 processors.
RESULTS
The power of HADDOCK as a method relies among
others on its use of AIRs and explicit flexibility. An
AIR defines that a residue on the surface of a biomolecule
should be in close vicinity to another residue or group of
residues on the partner biomolecule when they form the
complex. By default this is described as an ambiguous
distance restraint between all atoms of the source
residue to all atoms of all reference residue(s) that are
assumed to be in the interface in the complex. The effective
distance between all those atoms, deff
iAB is calculated as
follows:
deff
iAB ¼X
NAatom
miA¼1X
NresB
k¼1X
NBatom
nkB¼1
1
d6
miAnkB
!
1=6
:ð3Þ
Here N
Aatom
indicates all atoms of the source residue on
molecule A, N
resB
the residues defined to be at the inter-
face of the reference molecule B, and N
Batom
all atoms of a
residue on molecule B. The 1/r
6
summation somewhat
mimics the attractive part of the Lennard–Jones potential
and ensures that the AIRs are satisfied as soon as any two
atoms of the biomolecules are in contact. The AIRs are
incorporated as an additional energy term to the energy
function that is minimized during the docking. The am-
biguous nature of these restraints easily allows experimen-
tal data that often provide evidence for a residue making
contacts to be used as driving force for the docking. As
such the AIRs define a network of restraints between the
possible interaction interface(s) of the molecules to be
docked without defining the relative orientation of the
molecules, minimizing the necessary search through con-
formational space needed to assemble the interfaces.
Because the AIRs are part of the energy function they
might also contribute to induce the conformational
changes during the flexible stage of the docking.
To objectively answer the question: ‘how successful is
HADDOCK in dealing with conformational changes
upon complex formation?’ the effects of the quality and
quantity of AIRs on complex formation and conform-
ational change should be kept to a minimum. This was
realized by constructing ideal AIR restraint sets based on
the true interface(s) of the reference complexes (see
‘Materials and Methods’ section). Using these restraints
we first evaluated the ability of HADDOCK to recon-
struct the complex from its components in their bound
conformation. Challenges in reconstruction due to struc-
tural characteristics, the inability of the restraints to drive
correct complex formation or selection of top ranking so-
lutions due to scoring problems can be identified at this
stage. Next we used the same restraints to drive the
docking between the unbound protein and a canonical
B-DNA 3D structural model using our two-stage
protein–DNA docking approach. We focused on the two
stages individually, first evaluating the effects of explicit
flexibility on the docking by comparing the docking
solutions from rigid body refinement with those after
semi-flexible refinement. Subsequently we analyzed the
conformation of the DNA in the final docking solutions.
Here, the focus was on the ability of HADDOCK to intro-
duce those specific DNA conformational changes in terms
of DNA bending and twisting that can lead to the final
conformation of the DNA in the complex. With this in-
formation an ensemble of custom DNA structural models
was generated using a modified protocol of our 3D-
DART DNA modelling web server (see ‘Materials and
Methods’ section). The resulting models were used as
input for a second, ‘refinement’, docking run. The results
were compared with those of the previous run starting
from a canonical B-DNA structural model to analyze
the effect of this implicit treatment of flexibility. Finally,
the same two-stage docking protocol was applied to a
subset of six test cases from the benchmark using AIR
restraints based on experimental information obtained
from literature sources.
Bound–bound docking
A bound–bound docking experiment is essentially an
exercise of separating the reference complex into its
individual biomolecules and reconstructing it again.
As the different components are already in their bound
conformation flexibility is not required and only rigid-
body docking needs to be performed. The ability of
HADDOCK to sample conformational space in search
5638 Nucleic Acids Research, 2010, Vol. 38, No. 17
at University Library Utrecht on October 15, 2010nar.oxfordjournals.orgDownloaded from
of the correct interaction interface(s) using ideal AIR
restraints was evaluated using the CAPRI star ranking
as a quality measure commonly used in protein–protein
docking (31). These criteria define one-star predictions as
‘acceptable’, two-star as ‘medium’ and three-star as ‘high’
quality with respect to their reference structure (see
‘Materials and Methods’ section).
The results illustrate that for 75% of the test cases three-
star solutions are generated (Figure 1, dark-grey bars).
For the first half of the test cases (left half of Figure 1)
more than 95% of the solutions ranked one-star or higher
but for the remaining, a sharp decline in the total number
of star-ranked solutions was observed. The latter group
of test cases corresponds mostly with the ‘intermediate’
and ‘difficult’ categories of the benchmark. They are
characterized by larger and more segmented interface(s).
Many of them require rearrangements of protein domains,
loops and secondary structure elements at the interfaces
upon interaction to generate a well-packed complex.
These, for instance, involve enzymes that perform their
catalytic function on single nucleotides that are flipped
out of the helix into a catalytic pocket of the protein
(1emh, 7mht), restriction enzymes clamping themselves
around the DNA (3bam,1rva) or proteins with complex
dimerization interfaces (1tro, 1f4k). Effective docking of
the bound conformation of these cases is hindered by non-
bonded repulsions associated with interface penetration
and the correct alignment of the segmented interfaces
during the rotation and translation stages of the rigid
body refinement. This limits the efficiency of the rigid-
body bound-bound docking and in part explains the
lower the total number of star-ranked solutions for these
cases.
Despite the differences in total number of star-ranked
solutions, the 10 best solutions were selected based on the
HADDOCK score in all cases coincided with the best so-
lutions based on the CAPRI criteria. This indicates that
the HADDOCK scoring function at this stage is sufficient
to retrieve the best solutions.
Unbound–unbound docking starting from a canonical
B-DNA structural model
We proceeded with the docking of the unbound conform-
ation of the proteins with canonical B-DNA models using
ideal AIRs. To increase the sampling of conformational
space for the proteins, especially those that use flexible
loops to interact with DNA grooves, we first performed
a simulated annealing on the interface residues followed
by a refinement in explicit water. This procedure resulted
in an ensemble of five structures, including the original
unbound protein, sampling different conformations of
the interface. In 66% of the cases, conformations closer
to the bound conformation then the unbound reference
protein were sampled. The protein–DNA docking
protocol, at this stage, effectively incorporates two
modes of flexibility: implicit sampling by means of the
ensemble of protein starting structures and explicit
sampling of protein and DNA conformational space
during semi-flexible refinement.
Figure 2 illustrates the docking results using only rigid-
body docking (A) and the effect of a subsequent semi-
flexible refinement (B). Here, the cumulative bar graphs
Percentage of acceptable solutions (%, 1 star or higher)
Complex (PDB id)
Figure 1. Cumulative bar graph expressing the quality of the docking solutions according to the CAPRI star rating for all 2000 bound–bound
rigid-body docking solutions. Complexes are sorted according to the total number of obtained stars. CAPRI criteria are defined as; three stars (high
quality): Fnat >0.5, l-r.m.s.d or i-r.m.s.d <1.0 A
˚; two stars (medium quality): Fnat >0.3, l-r.m.s.d <5.0 A
˚or i-r.m.s.d <2.0 A
˚; one star (acceptable
quality): Fnat >0.1, l-r.m.s.d <10.0 A
˚or i-r.m.s.d <4.0 A
˚. Fnat is the fraction of native contacts within a 5 A
˚cutoff.
Nucleic Acids Research, 2010, Vol. 38, No. 17 5639
at University Library Utrecht on October 15, 2010nar.oxfordjournals.orgDownloaded from
show the percentage of CAPRI one-star (white bars) and
two-star solutions (grey bars) over all rigid-body (4000)
and refined (400) solutions. Overall, 96% of the cases
improve due to explicit flexibility. For a number of
complexes, one- and two-star solutions were already
obtained after rigid-body docking. In all cases, except for
1dfm, the number of one- or two-star solutions increased
significantly after semi-flexible refinement. The number of
star ranking solutions obtained after rigid-body docking
and there subsequent improvement due to explicit flexibil-
ity, clearly divides the complexes into three groups that
coincide reasonably well with the ‘easy’, ‘intermediate’
and ‘difficult’ categories of the benchmark. For the ‘easy’
category the inclusion of explicit flexibility readily results
in a shift from one- to two-star solutions, for the ‘inter-
mediate’ category the number of one-star solutions greatly
improves and for the ‘difficult’ category one-star solutions
are often only achieved because of explicit flexibility.
Unbound–unbound docking starting from custom-built
B-DNA structural models
The previous docking results show the improvements that
can be obtained when using explicit flexibility versus rigid-
body docking. In all cases, the DNA and the proteins
could adapt their conformation to better interact with
each other. For the DNA, these conformational changes
range from small local changes in helical bend and groove
width, while maintaining a relative straight helix, to larger
global changes that effectively bend and twist the DNA
structure. However, the amount of conformational space
that can be sampled during the semi-flexible refinement
stage is limited. Starting from a canonical B-DNA struc-
tural model, the semi-flexible refinement stage improved
the DNA model on average by 0.84 ± 0.36A
˚all heavy
atom r.m.s.d with respect to the reference. This clearly
cannot account for the often large DNA conformational
tluciffidetaidemretniysae
Complex (PDB id)
Percentage (%) of 1 and 2 star solutions
A
B
C
Figure 2. Cumulative bar graphs expressing the quality of the best 400 docking solutions according to the HADDOCK score in terms of CAPRI
one-star (grey) and two-star (white) results, for the two-stage unbound–unbound protein–DNA docking using true interface derived restraints.
Results are presented for; the rigid-body docking starting from a canonical B-DNA model (A); after the semi-flexible refinement (B) and after
semi-flexible refinement using an ensemble of custom DNA 3D structural models (C). Complexes are sorted according to the total number of
obtained stars in (B), reclassifying the benchmark into ‘easy’, ‘intermediate’ and ‘difficult’ categories. See caption of Figure 1 for the definition of the
CAPRI criteria.
5640 Nucleic Acids Research, 2010, Vol. 38, No. 17
at University Library Utrecht on October 15, 2010nar.oxfordjournals.orgDownloaded from
changes observed in the benchmark (ranging from 3 up
to 10 A
˚).
The amount and consistency of the DNA conform-
ational changes that did occur during semi-flexible refine-
ment, can however provide an indication of the extent of
conformational change to be expected in the final complex
as we have shown before (18). By analyzing the conform-
ational changes in the top 10 solutions of the best cluster,
both selected based on the HADDOCK score, we
generated five new DNA structural models with custom
conformations reflecting the conformational changes that
took place in the DNA during the first docking round for
every test case (see ‘Materials and methods’ section).
The effects of using a custom-built DNA structural
ensemble on the docking results obtained after semi-
flexible refinement is illustrated in Figure 2C. Again, the
cumulative bar graph shows the percentage of CAPRI
one-star (white bars) and two-star solutions (grey bars)
among all (400) refined docking solutions according to
the HADDOCK score.
In a number of cases there is a marked increase in one-
and/or two-star solutions due to the use of the ensemble,
while in other cases there is no improvement or even a
reduction. However, because the ensemble contains
custom built DNA structures in different conformations,
it is possible that one or several of these are less successful
in sampling relevant conformational space than the ca-
nonical B-DNA model used in the first run. However, if
even only one of the five models is significantly better that
canonical B-DNA, and the scoring and clustering stage
select solutions obtained from this model then an im-
provement is achieved compared to only semi-flexible re-
finement. Figure 3 better illustrates the results by
individual graphs showing for every test case the various
r.m.s.d values and fraction of native contacts for the 10
best solutions of the top-ranking cluster, both selected
based on the HADDOCK score. The figure shows statis-
tics for the corresponding solutions after semi-flexible
refinement, the solutions from the rigid-body stage
starting from canonical B-DNA, and the solutions after
semi-flexible refinement using an ensemble of custom-built
DNA starting structures (source data can be found in
Supplementary Tables S1–S3 of the Supplementary
Data). With respect to the best 10 solutions, our
A
easy easy
intermediate intermediate
difficult difficult
DNA r.m.s.d (Å)
B
Complex r.m.s.d (Å)
C
Interface r.m.s.d (Å)
D
Fnat (frac.)
Figure 3. All heavy atom r.m.s.d values from the reference complex [(A) DNA only, (B) full complex, (C) interface] and fraction of native contacts
[Fnat, (D)] for the 10 best solutions of the best cluster, both selected based on the HADDOCK score, after rigid-body docking (open squares) and
semi-flexible refinement (closed circles) starting from a canonical B-DNA structural model and after semi-flexible refinement (open triangle) starting
from an ensemble of custom-built DNA models.
Nucleic Acids Research, 2010, Vol. 38, No. 17 5641
at University Library Utrecht on October 15, 2010nar.oxfordjournals.orgDownloaded from
two-stage docking protocol improved the results in 91%
of the cases relative to rigid-body docking. The use of an
ensemble of custom-built DNA structural models (the
second stage of the docking) further improved the
results in 72% of the cases compared to the first stage
only. For most complexes there is a marked improvement
in terms of r.m.s.d from the reference complex, when pro-
gressing from rigid-body docking to the use of an
ensemble of custom built DNA structural models. The
improvement in DNA, interface and all heavy-atom
r.m.s.d becomes more significant with the increasing diffi-
culty of the test cases. This trend is to be expected as the
conformational changes between unbound and bound
structures are small in the ‘easy’ category and become
more pronounced in the ‘intermediate’ and ‘difficult’
categories of the benchmark. These results show the effi-
ciency of the DNA modelling procedure in capturing the
essential motions that occur in the DNA upon com-
plex formation. The fraction of native contacts improves
significantly throughout the benchmark even when the so-
lutions improve little in terms of r.m.s.d. Apart
from this, the convergence in the 10 best solutions in
general improves, which is apparent in the smaller
standard deviations (Figure 3) and an improved
clustering (Supplementary Table S3, Supplementary
Data).
Unbound–unbound docking using experimental derived
restraints
In a ‘real-life’ docking situation, AIRs are typically
defined based on experimental data or interface predic-
tions (19,20). The quality and quantity of available data
can influence the correct assembly of the interaction inter-
face(s) and the conformational changes brought about in
the flexible stages of the docking. To evaluate the perform-
ance of our two-stage protein–DNA docking protocol
under these circumstances we selected six representative
test cases from the ‘easy’, ‘intermediate’ and ‘difficult’
categories of the benchmark (two of each). These are, re-
spectively, the protein–DNA complexes formed by the
phage 434 Cro (3cro) transcription factor and retinoid X
receptor (1by4), the hyperthermophile chromosomal
protein SAC7D (1azp) and papillomavirus type 18 E2
(1jj4) protein, the homing endonuclease I-PpoI (1a74)
and the proline utilization transcription activator PUT3
(1zme). For these we defined AIRs based on experimental
data collected from literature sources (see ‘Material and
Methods’ section). Docking the protein and DNA in their
bound conformation (Table 3, bound-rigid) using rigid-
body energy minimization only illustrates that the AIRs
defined based on experimental data are also able to recon-
struct the correct interaction interface(s) in all cases result-
ing in high quality predictions. The overall results for the
unbound docking again show a significant improvement in
terms of r.m.s.d from the reference complexes and fraction
of native contacts when progressing from rigid body
docking to semi-flexible refinement and finally a second
docking round starting from an ensemble of custom-
built DNA structural models (Table 3). The best
docking solutions superimposed onto their reference struc-
tures are presented in Figure 4.
Although the overall results improved for all six test
cases, differences were observed. The bound and
unbound components of the retinoid X receptor–DNA
complex (1by4) differ little from each other in terms of
r.m.s.d from the reference and rigid body docking
readily generates one-star solutions. The complex is
composed of two proteins that interact with the DNA
major groove but not with each other. Independent
movement of both proteins resulted in a relative large
variation in the 10 best solutions after semi-flexible refine-
ment when starting from a canonical B-DNA model. The
use of a custom built DNA library does not reduce this
variation but does significantly improve the fraction of
native contacts and medium quality solutions. The
phage 434 Cro–DNA complex (3cro) is a similar case
with the exception that the proteins dimerize. This
results in far less variation in the 10 best solutions after
the flexible stages and a sequential improvement of the
r.m.s.d values and fraction of native contacts at each
step of the docking. The hyperthermophile chromosomal
protein SAC7D–DNA complex (1azp) binds in a non-
specific manner to the DNA minor groove. The experi-
mental data available for this complex are less well
defined than for the other test cases. Despite this, the
two-stage docking protocol did reproduce the characteris-
tic minor groove widening observed for this system result-
ing in a significant improvement in r.m.s.d when using an
ensemble of custom built DNA structural models. The
specific kink in the DNA structure observed at the
second C–G base pair (61) in the reference complex
was, however, predicted at the third G–A base pair step
(25) in the docking solutions. The potential of our two-
stage docking protocol to deal with large DNA conform-
ational changes is best illustrated in the case of the homing
endonuclease I-PpoI–DNA complex (1a74). Here, the
overall bend of 38is reproduced in the best solutions
(45). The information available for this complex results
in a well defined, curved, interaction interface on the
protein and indicates that there is little conformational
difference of the protein in its bound and unbound state.
As such, the sharp bend introduced in the DNA by the
analysis and modelling step could be sampled up to 10
times the standard deviation from the average to match
the protein surface (see ‘Materials and methods’ section).
The proline utilization transcription activator PUT3
(1zme) is a difficult case from both protein and DNA per-
spectives. The protein contains two globular DNA
binding domains connected to a core domain with a
long flexible linker. The NMR ensemble of the unbound
protein contains the DNA binding domains in many dif-
ferent orientations that prevent effective docking in the
rigid body stage. Therefore, we cut the protein at the
flexible linkers, resulting in three parts that were docked
as separated bodies. Peptide linker restraints were defined
between the amino acids at the scission sites. After semi-
flexible refinement, we reconnected the different parts in
the 10 best solutions and used the resulting protein
ensemble for the second docking stage starting from an
ensemble of custom built DNA structural models.
5642 Nucleic Acids Research, 2010, Vol. 38, No. 17
at University Library Utrecht on October 15, 2010nar.oxfordjournals.orgDownloaded from
DISCUSSION
The use of AIRs is essential to the success of the
HADDOCK docking methodology in general. These are
used to position the protein at the interaction interface of
the DNA and, together with the flexible stages of the
docking, to facilitate conformational changes. We have
shown previously the importance of AIRs in protein–
DNA docking (18) using three monomeric transcription
factor DNA complexes as test cases. In the current study
we refined our initial method and evaluated its performance
on a benchmark of 47 protein–DNA complexes (30).
Compared with the initial three test cases the benchmark
contains complexes from various structural functional
classes in which one or multiple proteins interact with the
DNA using various binding modes. Because of the presence
of multiple proteins or DNA-binding domains, 40% of the
benchmark required docking following a multi-body
(N>2) approach. This challenging benchmark offers a
good platform to evaluate the capabilities of our docking
method. We will discuss in the following the two questions
that were the focus of both this study as well as the previous
work describing the initial protein–DNA docking method.
How well is the method able to identify the correct
interaction interface(s)?
The assembly of the interaction interface(s) is a process
driven by AIRs. In ‘real-life’ docking settings the AIRs are
typically defined based on experimental data and/or inter-
face predictions. The quality of the docking solutions is
therefore closely related to the amount and quality of
Table 3. Performance of the two-stage docking protocol when using AIRs based on experimental information: the r.m.s.d values from the
reference and fraction of native contacts for the top ten docking solutions of the top ranking cluster both selected based on the HADDOCK
score
r.m.s.d (A
˚) Fnat
e
CAPRI
f
*
,
**
,
***
Total
a
Interface
b
DNA
c
Protein
d
‘Easy’
1by4
Bound rigid 0.41
0.08
0.34
0.07
0.00
0.00
0.38
0.07
0.89
0.02
0,0,10
Unbound rigid 4.33
0.72
4.01
0.53
1.41
0.00
4.66
0.73
0.11
0.04
4,0,0
Unbound flex 6.72
2.10
5.87
1.71
1.90
0.19
6.98
2.21
0.17
0.05
5,0,0
DNA lib 5.52
2.43
4.91
2.32
1.61
0.14
5.85
2.46
0.27
0.09
4,3,0
3cro
Bound rigid 0.32
0.16
0.38
0.19
0.00
0.00
0.44
0.22
0.85
0.09
0,0,10
Unbound rigid 3.79
0.60
3.51
0.63
3.70
0.00
3.50
0.83
0.15
0.05
10,0,0
Unbound flex 3.57
0.63
3.29
0.68
2.86
0.30
3.19
0.68
0.27
0.07
6,2,0
DNA lib 2.89
0.40
2.62
0.73
2.08
0.21
2.96
0.43
0.40
0.06
3,7,0
‘Intermediate’
1azp
Bound rigid 0.33
0.07
0.31
0.07
0.00
0.00
0.11
0.00
0.92
0.03
0,0,10
Unbound rigid 7.12
2.06
7.09
2.25
3.25
0.00
3.58
0.02
0.02
0.02
0,0,0
Unbound flex 6.90
2.00
6.68
2.26
2.87
0.32
3.64
0.13
0.04
0.04
0,0,0
DNA lib 4.56
0.79
4.00
0.45
1.83
0.26
3.76
0.16
0.10
0.04
5,0,0
1jj4
Bound rigid 0.39
0.10
0.40
0.09
0.00
0.00
0.10
0.03
0.82
0.07
0,0,10
Unbound rigid 4.23
0.37
4.76
0.48
3.19
0.00
1.47
0.05
0.09
0.02
3,0,0
Unbound flex 4.25
0.43
4.55
0.58
3.19
0.21
2.40
0.02
0.16
0.07
6,0,0
DNA lib 3.22
0.30
3.62
0.38
2.38
0.14
2.37
0.05
0.21
0.07
9,1,0
‘Difficult’
1a74
Bound rigid 0.06
0.01
0.07
0.01
0.00
0.00
0.01
0.00
0.84
0.01
0,0,10
Unbound rigid 5.43
0.99
6.88
0.97
7.44
0.00
1.68
0.14
0.04
0.02
0,0,0
Unbound flex 4.95
0.38
6.30
0.46
7.12
0.32
1.84
0.14
0.14
0.04
8,0,0
DNA lib 2.72
0.25
3.37
0.32
3.76
0.19
1.78
0.12
0.24
0.05
9,1,0
1zme
Bound rigid 0.48
0.11
0.46
0.08
0.00
0.00
0.01
0.00
0.79
0.06
0,0,10
Unbound rigid 6.29
0.64
5.49
0.68
4.28
0.00
5.67
0.61
0.06
0.03
0,0,0
Unbound flex 6.15
0.62
5.29
0.59
4.68
0.33
5.88
0.27
0.12
0.06
4,0,0
DNA lib 5.27
0.62
4.63
0.80
3.35
0.13
5.55
0.48
0.15
0.04
8,0,0
Average all heavy atom r.m.s.d values from the reference structure (A
˚, standard deviation in subscript) calculated over:
a
The entire complex.
b
The interface.
c
The DNA only for the 10 top ranking solutions.
d
The protein only for the 10 top ranking solutions.
The r.m.s.d values are reported for; bound rigid-body docking (bound rigid); unbound rigid-body docking (unbound rigid), semi-flexible refinement
(unbound flex.) starting from canonical B-DNA; unbound semi-flexible docking using a library of custom-built DNA structural models as input
(DNA library).
e
Fnat is the fraction of native contacts.
f
Number of one-, two- and three-star CAPRI ranked solutions obtained in the top 10 solutions.
Nucleic Acids Research, 2010, Vol. 38, No. 17 5643
at University Library Utrecht on October 15, 2010nar.oxfordjournals.orgDownloaded from
available data in terms of their accuracy and information
content. We started from an ideal situation in which
the restraints were derived from the intermolecular
contacts in the reference complex. Bound docking
resulted for 75% of cases in three-star (high quality)
predictions among the top 10 solutions based on the
HADDOCK score (Figure 1). The percentage of
generated high-quality solutions and the total number of
star-ranked solutions, however, declined for the ‘inter-
mediate’ and ‘difficult’ cases due to interface topology
features such as segmentation and rearrangement of struc-
ture elements. Such rearrangements occur in protein
domains, loops and secondary-structure elements at the
interfaces during the process of complex formation; they
are required to form a well-packed complex. The differ-
ence between the bound and unbound conformation of
the protein and DNA interfaces in the benchmark (30)
further illustrates this. Consequently, in a bound–bound
docking setting, the docking efficiency is hindered by non-
bonded repulsions associated with interface penetration
and by the correct alignment of the segmented interfaces
during the rotation and translation stages of the rigid
body refinement. The increase in the total number of
star-ranked solutions for many of the ‘difficult’ test cases
in unbound–unbound docking relative to bound–bound
docking further illustrates this process as rearrangements
Figure 4. Best solutions from unbound flexible docking using an ensemble of custom-built DNA structural models (blue) superimposed on to the
reference structure (yellow). The complexes are grouped according to their docking difficulty (‘easy’, ‘intermediate’ and ‘difficult’) as indicated in
the benchmark. The CAPRI score for each solution is indicated as one or two stars after the PDB code as well as the fraction of native contacts (a),
the interface (b) and DNA r.m.s.d (c) from the reference structure. r.m.s.d values (A
˚) were calculated after superimposition on all heavy atoms of the
selected regions of the reference complex. The figures were generated using Pymol (DeLano Scientific LLC, www.pymol.org).
5644 Nucleic Acids Research, 2010, Vol. 38, No. 17
at University Library Utrecht on October 15, 2010nar.oxfordjournals.orgDownloaded from
are allowed to take place. Still there are a number of test
cases such as 1tro and 1f4k in which non-bonded repul-
sions hamper the docking. Given that these cases can be
identified beforehand, the docking efficiency could be
improved by scaling down the non-bonded energy terms
(inter_rigid term to 0.001 or lower in HADDOCK); this
allows penetration to occur during the docking. An initial
test with a scaled down non-bonded energy term for the
above-mentioned two test cases resulted in a significant
increase in the number of one- and two-star solutions
(Supplementary Table S4, Supplementary Data). This
shows that the AIRs are not the limiting factor but also
raises the question whether a change in the non-bonded
energy term scaling factor could be beneficial throughout
the benchmark. Our experience in protein–protein
docking however indicates that the scoring becomes
more challenging, which might be detrimental at the end.
The unbound two-stage flexible docking using the same
restraints (Figures 2 and 3) resulted in the prediction of
one- to two-star solutions depending on the level of diffi-
culty of the test cases. Although these results are signifi-
cantly better than unbound rigid-body docking only, they
still indicate that conformational changes are the limiting
factor in protein–DNA docking.
The same series of docking experiments were performed
with a representative selection of six test cases using AIRs
defined based on experimental information (Table 3,
Figure 4). The results were comparable to the use of
ideal restraints in terms of the CAPRI quality criteria.
This clearly illustrates that readily available non-structural
experimental data are sufficient to assemble the correct
interaction interface(s) in these challenging, often multi-
component, protein–DNA systems. Still, the quality of
the generated solutions is directly related to the quality
of the used experimental information. Sparse- and/or
low-quality information will likely result in poor-quality
docking solutions, especially for multi-component
systems. The AIRs can, however, be defined based on a
wider variety of information sources than used in the
current work. For instance, NMR data or even statistical
protein–DNA interaction potentials, are promising means
of improving the results either by driving the docking or
filtering solutions afterwards. With respect to the latter we
should note that the many different solutions generated in
this benchmark docking effort, provide a compelling set of
decoy structures that can be useful for the development
and validation of scoring functions.
How successful is the method in dealing with
conformational changes upon complex formation?
The correct treatment of conformational changes upon
complex formation is likely the most challenging aspect
of protein–DNA docking. Both protein(s) and DNA
readily change their conformation upon complex forma-
tion. The extent of this change forms the basis of the
protein–DNA benchmark categorization. Our two-stage
protein–DNA docking method was designed to deal
with this challenge and its performance is best illustrated
in the docking of unbound proteins with canonical
B-DNA using ideal AIRs. While a single docking run
was sufficient to generate two-star solutions for the
‘easy’ cases, the two-stage protocol was often required to
generate one–two-star solution for the ‘intermediate’ and
‘difficult’ cases. Altogether, this approach was successful
in generating at least one-star solutions for 96% of the
complete benchmark. This illustrates that the explicit flexi-
bility implemented in HADDOCK is sufficient to generate
two-star solution in the ‘easy’ cases where conformational
changes are limited but that this approach fails for cases
where such changes are more pronounced such as in the
‘intermediate’ and ‘difficult’ cases. For the latter, our
DNA analysis and modelling procedure is capable of ex-
tracting the main bend and twist motions that occur in the
DNA upon complex formation and use these for the
benefit of DNA modelling. In that way, a larger part of
the relevant DNA conformational space can be sampled
than what is feasible within a single round of semi-flexible
refinement. Even results of the ‘easy’ test cases with
limited conformational changes are improved by this
two-stage procedure. Finally, the use of experimentally-
derived AIRs on a subset of six test cases showed that
our method also significantly improved the docking
results under real-life conditions when less ideal AIR
restraints are available.
Although the semi-flexible refinement stage of
HADDOCK is able to introduce many of the DNA con-
formational changes required for correct complex forma-
tion it has difficulties predicting DNA groove expansion
facilitated by negative base pair step sliding (for example
in 1a74 and 1g9z). Consequently, this mode of conform-
ational change is not detected by our DNA analysis pro-
cedure and not introduced in the custom-built DNA
ensemble. Although the improvements in r.m.s.d to the
reference complex and fraction of native contacts clearly
illustrate that our method outperforms rigid-body docking
it does raise questions on the quality of the DNA in the
generated solutions. This however, remains a difficult issue
due to the lack of DNA structure validation procedures.
Furthermore, our method predominantly focuses on the
conformational changes in the DNA, but also proteins can
often change their conformation upon complex formation,
sometimes quite drastically as, for example, in the restric-
tion endonuclease MvaI (2oaa). While accounting for
small conformational changes by means of flexible refine-
ment and the use of protein ensembles that sample differ-
ent interface conformations, large conformational changes
such as loop and domain rearrangements or disordered to
order transitions remain a challenge. Such events are
present in some of the test cases where the use of an
ensemble of custom-built DNA structural models did
not improve the results significantly. This still leaves
plenty of opportunities for improvements, for instance in
those cases where protein domain rearrangements are
facilitated by flexible ‘hinges’ connecting them. Such
domains can be docked as separate bodies, enabling
them to sample conformational space individually. This
procedure has been successfully used for the proline util-
ization transcription activator PUT3 (1zme) in this study.
The flexible protein–DNA docking approach described
in this article can benefit protein–DNA interaction studies
at several levels. It can be used to generate models of
Nucleic Acids Research, 2010, Vol. 38, No. 17 5645
at University Library Utrecht on October 15, 2010nar.oxfordjournals.orgDownloaded from
protein–DNA complexes from the structures of the
unbound proteins and a canonical B-DNA in the
presence of suitable experimental data without any prior
knowledge of the DNA conformational changes required
to establish the complex. It should also be useful for
studying the effects of mutations or different operator se-
quences on complex formation. In addition, it can assist in
experimental structural studies by, for instance, providing
initial DNA structural models to guide and speed up the
NMR analysis and assignment process.
In summary, by allowing the inclusion of a large variety
of experimental and/or prediction data, together with a
flexible description of the DNA, the proposed docking
approach should be a useful tool in structural studies of
protein–DNA complexes.
SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.
FUNDING
European Community (FP6 I3 project ‘EU-NMR’,
contract no. RII3-026145 and FP7 I3 project ‘eNMR’,
contract no. 213010-e-NMR) and VICI grant from the
Netherlands Organization for Scientific Research (NWO)
to A.M.J.J.B. (grant no. 700.96.442). Funding for open
access charge: VICI grant from the Netherlands
Organization for Scientific Research (NWO) (grant no.
700.96.442 to A.M.J.J.B.).
Conflict of interest statement. None declared.
REFERENCES
1. Halperin,I., Ma,B., Wolfson,H. and Nussinov,R. (2002) Principles
of docking: An overview of search algorithms and a guide to
scoring functions. Proteins,47, 409–443.
2. Schneidman-Duhovny,D., Nussinov,R. and Wolfson,H.J. (2004)
Predicting molecular interactions in silico: II. Protein-protein and
protein-drug docking. Curr. Med. Chem.,11, 91–107.
3. Ritchie,D.W. (2008) Recent progress and future directions in
protein-protein docking. Curr. Protein Pept. Sci.,9, 1–15.
4. Gane,P.J. and Dean,P.M. (2000) Recent advances in structure-
based rational drug design. Curr. Opin. Struct. Biol.,10, 401–404.
5. Joseph-McCarthy,D. (1999) Computational approaches to
structure-based ligand design. Pharmacol. Ther.,84, 179–191.
6. Kuntz,I.D. (1992) Structure-based strategies for drug design and
discovery. Science,257, 1078–1082.
7. Dunn,R.K. and Kingston,R.E. (2007) Gene regulation in the
postgenomic era: technology takes the wheel. Mol. Cell,28,
708–714.
8. Adesokan,A.A., Roberts,V.A., Lee,K.W., Lins,R.D. and
Briggs,J.M. (2003) Prediction of HIV-1 integrase/viral DNA
interactions in the catalytic domain by fast molecular docking.
J. Med. Chem.,47, 821–828.
9. Aloy,P., Moont,G., Gabb,H.A., Querol,E., Aviles,F.X. and
Sternberg,M.J. (1998) Modelling repressor proteins docking to
DNA. Proteins,33, 535–549.
10. Bastard,K., Thureau,A., Lavery,R. and Prevost,C. (2003) Docking
macromolecules with flexible segments. J. Comput. Chem.,24,
1910–1920.
11. Fan,L. and Roberts,V.A. (2006) Complex of linker histone H5
with the nucleosome and its implications for chromatin packing.
Proc. Natl Acad. Sci. USA,103, 8384–8389.
12. Fanelli,F. and Ferrari,S. (2006) Prediction of MEF2A-DNA
interface by rigid body docking: a tool for fast estimation of
protein mutational effects on DNA binding. J. Struct. Biol.,153,
278–283.
13. Knegtel,R.M., Boelens,R. and Kaptein,R. (1994) Monte Carlo
docking of protein-DNA complexes: incorporation of DNA
flexibility and experimental data. Protein Eng.,7, 761–767.
14. Liu,Z., Guo,J.T., Li,T. and Xu,Y. (2008) Structure-based
prediction of transcription factor binding sites using a protein-
DNA docking approach. Proteins,72, 1114–1124.
15. Poulain,P., Saladin,A., Hartmann,B. and Prevost,C. (2008)
Insights on protein-DNA recognition by coarse grain modelling.
J. Comput. Chem.,29, 2582–2592.
16. Roberts,V.A., Case,D.A. and Tsui,V. (2004) Predicting
interactions of winged-helix transcription factors with DNA.
Proteins,57, 172–187.
17. Sandmann,C., Cordes,F. and Saenger,W. (1996) Structure model
of a complex between the factor for inversion stimulation (FIS)
and DNA: modeling protein-DNA complexes with dyad
symmetry and known protein structures. Proteins,25, 486–500.
18. van Dijk,M., van Dijk,A.D., Hsu,V., Boelens,R. and
Bonvin,A.M. (2006) Information-driven protein-DNA docking
using HADDOCK: it is a matter of flexibility. Nucleic Acids Res.,
34, 3317–3325.
19. Melquiond,A.S.J. and Bonvin,A.M.J.J. (2009) Experimental
Constraint-Driven Docking. In Zacharias,M. (ed.), Protein-protein
Complexes: Analysis, Modelling and Drug Design. Imperial College
Press, London, pp. 183–209.
20. van Dijk,A.D., Boelens,R. and Bonvin,A.M. (2005) Data-driven
docking for the study of biomolecular complexes. FEBS J.,272,
293–312.
21. Mondragon,A. and Harrison,S.C. (1991) The phage 434 Cro/OR1
complex at 2.5 A resolution. J. Mol. Biol.,219, 321–334.
22. Raumann,B.E., Rould,M.A., Pabo,C.O. and Sauer,R.T. (1994)
DNA recognition by beta-sheets in the Arc repressor-operator
crystal structure. Nature,367, 754–757.
23. Chuprina,V.P., Rullmann,J.A., Lamerichs,R.M., van Boom,J.H.,
Boelens,R. and Kaptein,R. (1993) Structure of the complex of lac
repressor headpiece and an 11 base-pair half-operator determined
by nuclear magnetic resonance spectroscopy and restrained
molecular dynamics. J. Mol. Biol.,234, 446–462.
24. Bessiere,D., Lacroix,C., Campagne,S., Ecochard,V., Guillet,V.,
Mourey,L., Lopez,F., Czaplicki,J., Demange,P., Milon,A. et al.
(2008) Structure-function analysis of the THAP zinc finger of
THAP1, a large C2CH DNA-binding module linked to Rb/E2F
pathways. J. Biol. Chem.,283, 4352–4363.
25. Cai,S., Zhu,L., Zhang,Z. and Chen,Y. (2007) Determination of
the three-dimensional structure of the Mrf2-DNA complex using
paramagnetic spin labeling. Biochemistry,46, 4943–4950.
26. Gamsjaeger,R., Swanton,M.K., Kobus,F.J., Lehtomaki,E.,
Lowry,J.A., Kwan,A.H., Matthews,J.M. and Mackay,J.P. (2008)
Structural and biophysical analysis of the DNA binding
properties of myelin transcription factor 1. J. Biol. Chem.,283,
5158–5167.
27. Liu,W., Vierke,G., Wenke,A.K., Thomm,M. and Ladenstein,R.
(2007) Crystal structure of the archaeal heat shock regulator from
Pyrococcus furiosus: a molecular chimera representing eukaryal
and bacterial features. J. Mol. Biol.,369, 474–488.
28. Singh,S., Hager,M.H., Zhang,C., Griffith,B.R., Lee,M.S.,
Hallenga,K., Markley,J.L. and Thorson,J.S. (2006) Structural
insight into the self-sacrifice mechanism of enediyne resistance.
ACS Chem. Biol.,1, 451–460.
29. van Dijk,M. and Bonvin,A.M. (2009) 3D-DART: a DNA
structure modelling server. Nucleic Acids Res.,37, W235–W239.
30. van Dijk,M. and Bonvin,A.M. (2008) A protein-DNA docking
benchmark. Nucleic Acids Res.,36, e88.
31. Janin,J. (2005) Assessing predictions of protein-protein
interaction: the CAPRI experiment. Protein Sci.,14, 278–283.
32. Hubbard,S.J. and Thornton,J.M. (1993) ‘NACCESS’, computer
program, Department of Biochemistry and Molecular Biology.
University College London.
33. de Vries,S.J., van Dijk,A.D., Krzeminski,M., van Dijk,M.,
Thureau,A., Hsu,V., Wassenaar,T. and Bonvin,A.M. (2007)
5646 Nucleic Acids Research, 2010, Vol. 38, No. 17
at University Library Utrecht on October 15, 2010nar.oxfordjournals.orgDownloaded from
HADDOCK versus HADDOCK: new features and performance
of HADDOCK2.0 on the CAPRI targets. Proteins,69, 726–733.
34. Lu,X.J. and Olson,W.K. (2003) 3DNA: a software package for
the analysis, rebuilding and visualization of three-dimensional
nucleic acid structures. Nucleic Acids Res.,31, 5108–5121.
35. Lu,X.J. and Olson,W.K. (2008) 3DNA: a versatile, integrated
software system for the analysis, rebuilding and visualization of
three-dimensional nucleic-acid structures. Nat. Protoc.,3,
1213–1227.
36. Dickerson,R.E. (1989) Definitions and nomenclature of nucleic
acid structure parameters. J. Biomol. Struct. Dyn.,6, 627–34.
37. Zhao,Q., Chasse,S.A., Devarakonda,S., Sierk,M.L., Ahvazi,B. and
Rastinejad,F. (2000) Structural basis of RXR-DNA interactions.
J. Mol. Biol.,296, 509–520.
38. Danielsen,M., Hinck,L. and Ringold,G.M. (1989) Two amino
acids within the knuckle of the first zinc finger specify DNA
response element activation by the glucocorticoid receptor. Cell,
57, 1131–1138.
39. Glass,C.K. (1994) Differential recognition of target genes by
nuclear receptor monomers, dimers, and heterodimers. Endocr.
Rev.,15, 391–407.
40. Haussler,M.R., Whitfield,G.K., Haussler,C.A., Hsieh,J.C.,
Thompson,P.D., Selznick,S.H., Dominguez,C.E. and Jurutka,P.W.
(1998) The nuclear vitamin D receptor: biological and molecular
regulatory properties revealed. J. Bone Miner. Res.,13, 325–349.
41. Koszewski,N.J., Reinhardt,T.A. and Horst,R.L. (1996) Vitamin D
receptor interactions with the murine osteopontin response
element. J. Steroid Biochem. Mol. Biol.,59, 377–388.
42. Lee,M.S., Kliewer,S.A., Provencal,J., Wright,P.E. and Evans,R.M.
(1993) Structure of the retinoid X receptor alpha DNA binding
domain: a helix required for homodimeric DNA binding. Science,
260, 1117–1121.
43. Mader,S., Kumar,V., de Verneuil,H. and Chambon,P. (1989)
Three amino acids of the oestrogen receptor are essential to its
ability to distinguish an oestrogen from a glucocorticoid-
responsive element. Nature,338, 271–274.
44. Nelson,C.C., Hendy,S.C., Faris,J.S. and Romaniuk,P.J. (1996)
Retinoid X receptor alters the determination of DNA binding
specificity by the P-box amino acids of the thyroid hormone
receptor. J. Biol. Chem.,271, 19464–19474.
45. Rastinejad,F., Perlmann,T., Evans,R.M. and Sigler,P.B. (1995)
Structural determinants of nuclear receptor assembly on DNA
direct repeats. Nature,375, 203–211.
46. Umesono,K. and Evans,R.M. (1989) Determinants of target gene
specificity for steroid/thyroid hormone receptors. Cell,57,
1139–1146.
47. Harrison,S.C., Anderson,J.E., Koudelka,G.B., Mondragon,A.,
Subbiah,S., Wharton,R.P., Wolberger,C. and Ptashne,M. (1988)
Recognition of DNA sequences by the repressor of bacteriophage
434. Biophys. Chem.,29, 31–37.
48. Koudelka,G.B. (1998) Recognition of DNA structure by 434
repressor. Nucleic Acids Res.,26, 669–675.
49. Koudelka,G.B. and Lam,C.Y. (1993) Differential recognition of
OR1 and OR3 by bacteriophage 434 repressor and Cro. J. Biol.
Chem.,268, 23812–23817.
50. Wharton,R.P., Brown,E.L. and Ptashne,M. (1984) Substituting an
alpha-helix switches the sequence-specific DNA interactions of a
repressor. Cell,38, 361–369.
51. Robinson,H., Gao,Y.G., McCrary,B.S., Edmondson,S.P.,
Shriver,J.W. and Wang,A.H. (1998) The hyperthermophile
chromosomal protein Sac7d sharply kinks DNA. Nature,392,
202–205.
52. Clark,A.T., Smith,K., Muhandiram,R., Edmondson,S.P. and
Shriver,J.W. (2007) Carboxyl pK(a) values, ion pairs, hydrogen
bonding, and the pH-dependence of folding the hyperthermophile
proteins Sac7d and Sso7d. J. Mol. Biol.,372, 992–1008.
53. Dostal,L., Chen,C.Y., Wang,A.H. and Welfle,H. (2004) Partial
B-to-A DNA transition upon minor groove binding of protein
Sac7d monitored by Raman spectroscopy. Biochemistry,43,
9600–9609.
54. Kahsai,M.A., Martin,E., Edmondson,S.P. and Shriver,J.W. (2005)
Stability and flexibility in the structure of the hyperthermophile
DNA-binding protein Sac7d. Biochemistry,44, 13500–13509.
55. Peters,W.B., Edmondson,S.P. and Shriver,J.W. (2005) Effect of
mutation of the Sac7d intercalating residues on the temperature
dependence of DNA distortion and binding thermodynamics.
Biochemistry,44, 4794–4804.
56. Kim,S.S., Tam,J.K., Wang,A.F. and Hegde,R.S. (2000) The
structural basis of DNA target discrimination by papillomavirus
E2 proteins. J. Biol. Chem.,275, 31245–31254.
57. Bedrosian,C.L. and Bastia,D. (1990) The DNA-binding domain
of HPV-16 E2 protein interaction with the viral enhancer:
protein-induced DNA bending and role of the nonconserved core
sequence in binding site affinity. Virology,174, 557–575.
58. Sanchez,I.E., Dellarole,M., Gaston,K. and de Prat Gay,G. (2008)
Comprehensive comparison of the interaction of the E2 master
regulator with its cognate target DNA sites in 73 human
papillomavirus types by sequence statistics. Nucleic Acids Res.,36,
756–769.
59. Flick,K.E., Jurica,M.S., Monnat,R.J. Jr and Stoddard,B.L. (1998)
DNA binding and cleavage by the nuclear intron-encoded homing
endonuclease I-PpoI. Nature,394, 96–101.
60. Argast,G.M., Stephens,K.M., Emond,M.J. and Monnat,R.J. Jr
(1998) I-PpoI and I-CreI homing site sequence degeneracy
determined by random mutagenesis and sequential in vitro
enrichment. J. Mol. Biol.,280, 345–353.
61. Eklund,J.L., Ulge,U.Y., Eastberg,J. and Monnat,R.J. Jr (2007)
Altered target site specificity variants of the I-PpoI His-Cys box
homing endonuclease. Nucleic Acids Res.,35, 5839–5850.
62. Ellison,E.L. and Vogt,V.M. (1993) Interaction of the intron-
encoded mobility endonuclease I-PpoI with its target site. Mol.
Cell Biol.,13, 7531–7539.
63. Galburt,E.A., Chadsey,M.S., Jurica,M.S., Chevalier,B.S., Erho,D.,
Tang,W., Monnat,R.J. Jr and Stoddard,B.L. (2000)
Conformational changes and cleavage by the homing
endonuclease I-PpoI: a critical role for a leucine residue in the
active site. J. Mol. Biol.,300, 877–887.
64. Muscarella,D.E., Ellison,E.L., Ruoff,B.M. and Vogt,V.M. (1990)
Characterization of I-Ppo, an intron-encoded endonuclease that
mediates homing of a group I intron in the ribosomal DNA of
Physarum polycephalum. Mol. Cell Biol.,10, 3386–3396.
65. Wittmayer,P.K., McKenzie,J.L. and Raines,R.T. (1998)
Degenerate DNA recognition by I-PpoI endonuclease. Gene,206,
11–21.
66. Swaminathan,K., Flynn,P., Reece,R.J. and Marmorstein,R. (1997)
Crystal structure of a PUT3-DNA complex reveals a novel
mechanism for DNA recognition by a protein containing a
Zn2Cys6 binuclear cluster. Nat. Struct. Biol.,4, 751–759.
67. Axelrod,J.D., Majors,J. and Brandriss,M.C. (1991) Proline-
independent binding of PUT3 transcriptional activator protein
detected by footprinting in vivo. Mol. Cell Biol.,11, 564–567.
68. Brandriss,M.C. (1987) Evidence for positive regulation of the
proline utilization pathway in Saccharomyces cerevisiae. Genetics,
117, 429–435.
69. Marczak,J.E. and Brandriss,M.C. (1989) Isolation of constitutive
mutations affecting the proline utilization pathway in
Saccharomyces cerevisiae and molecular analysis of the PUT3
transcriptional activator. Mol. Cell Biol.,9, 4696–4705.
70. Marczak,J.E. and Brandriss,M.C. (1991) Analysis of constitutive
and noninducible mutations of the PUT3 transcriptional
activator. Mol. Cell Biol.,11, 2609–2619.
71. Siddiqui,A.H. and Brandriss,M.C. (1989) The Saccharomyces
cerevisiae PUT3 activator protein associates with proline-specific
upstream activation sequences. Mol. Cell Biol.,9, 4706–4712.
72. Walters,K.J., Dayie,K.T., Reece,R.J., Ptashne,M. and Wagner,G.
(1997) Structure and mobility of the PUT3 dimer. Nat. Struct.
Biol.,4, 744–750.
Nucleic Acids Research, 2010, Vol. 38, No. 17 5647
at University Library Utrecht on October 15, 2010nar.oxfordjournals.orgDownloaded from

Supplementary resource (1)

... HADDOCK employs the integration of flexibility, both explicitly during the docking and implicitly by the use of customized DNA structural models. It is also proven to facilitate the identification and positioning of the interaction interfaces during docking [98]. There are also some disadvantages of using HADDOCK. ...
... There are also some disadvantages of using HADDOCK. Despite the fact that the semi-flexible refinement stage of HADDOCK is efficient, it has difficulties predicting the DNA groove expansion facilitated by negative base-pair step sliding [98]. Moreover, HADDOCK requires a Unix environment with a Crystallographic and NMR system as its structural calculation engine. ...
Article
Aptamers are single-stranded DNA or RNA oligonucleotides generated by SELEX that exhibit binding affinity and specificity against a wide variety of target molecules. Compared to RNA aptamers, DNA aptamers are much more stable and therefore are widely adopted in a number of applications especially in diagnostics. The tediousness and rigor associated with certain steps of the SELEX intensify the efforts to adopt in silico molecular docking approaches together with in vitro SELEX procedures in developing DNA aptamers. Inspired by these endeavors, we carry out an overview of the in silico molecular docking approaches in DNA aptamer generation, by detailing the stepwise procedures as well as shedding some light on the various softwares used. The in silico maturation strategy and the limitations of the in silico approaches are also underscored.
... Prior to the docking, the input structures are converted into their coarse-grained counterparts and hydrogen-bonding base pairs are automatically detected so that a special set of parameters and restraints are used for those during the docking. We evaluate the performance of coarse-grained protein-nucleic acid docking using 44 unboundunbound complexes from the protein-DNA benchmark (van Dijk and Bonvin, 2010). The results show a similar performance in terms of success rate and model quality while reducing the computational costs by ∼6-fold compared to standard atomistic simulations. ...
... The available experimental information was collected from literature and include conserved residues, mutagenesis data, ethylation interference data, methylation interference data, NMR native state amide hydrogen exchange, and Raman spectroscopy as described in van Dijk and Bonvin (2010). As in the previous study (van Dijk and Bonvin, 2010), the sampling was slightly increased to 2,000/400/400 for it0/it1/itw docking stages, respectively. ...
Article
Full-text available
Modeling biomolecular assemblies is an important field in computational structural biology. The inherent complexity of their energy landscape and the computational cost associated with modeling large and complex assemblies are major drawbacks for integrative modeling approaches. The so-called coarse-graining approaches, which reduce the degrees of freedom of the system by grouping several atoms into larger “pseudo-atoms,” have been shown to alleviate some of those limitations, facilitating the identification of the global energy minima assumed to correspond to the native state of the complex, while making the calculations more efficient. Here, we describe and assess the implementation of the MARTINI force field for DNA into HADDOCK, our integrative modeling platform. We combine it with our previous implementation for protein-protein coarse-grained docking, enabling coarse-grained modeling of protein-nucleic acid complexes. The system is modeled using MARTINI topologies and interaction parameters during the rigid body docking and semi-flexible refinement stages of HADDOCK, and the resulting models are then converted back to atomistic resolution by an atom-to-bead distance restraints-guided protocol. We first demonstrate the performance of this protocol using 44 complexes from the protein-DNA docking benchmark, which shows an overall ~6-fold speed increase and maintains similar accuracy as compared to standard atomistic calculations. As a proof of concept, we then model the interaction between the PRC1 and the nucleosome (a former CAPRI target in round 31), using the same information available at the time the target was offered, and compare all-atom and coarse-grained models.
... These successes lead to the question of whether it is possible to accurately predict the structure of complexes containing a much wider range of biomolecules, including ligands, ions, nucleic acids and modified residues, within a deep-learning framework. A wide range of predictors for various specific interaction types has been developed [16][17][18][19][20][21][22][23][24][25][26][27][28] , as well as one generalist method developed concurrently with the present work 29 , but the accuracy of such deep-learning attempts has been mixed and often below that of physics-inspired methods 30,31 . Almost all of these methods are also highly specialized to particular interaction types and cannot predict the structure of general biomolecular complexes containing many types of entities. ...
Article
Full-text available
The introduction of AlphaFold 2¹ has spurred a revolution in modelling the structure of proteins and their interactions, enabling a huge range of applications in protein modelling and design2–6. Here we describe our AlphaFold 3 model with a substantially updated diffusion-based architecture that is capable of predicting the joint structure of complexes including proteins, nucleic acids, small molecules, ions and modified residues. The new AlphaFold model demonstrates substantially improved accuracy over many previous specialized tools: far greater accuracy for protein–ligand interactions compared with state-of-the-art docking tools, much higher accuracy for protein–nucleic acid interactions compared with nucleic-acid-specific predictors and substantially higher antibody–antigen prediction accuracy compared with AlphaFold-Multimer v.2.37,8. Together, these results show that high-accuracy modelling across biomolecular space is possible within a single unified deep-learning framework.
... Thus, docking methods will often combine with strategies such as normal modes or a later stage using molecular dynamics to overcome limitations in scoring functions (29,30). The most successful efforts come from the High-Ambiguity Driven Docking (HAD-DOCK) (31,30) group combining docking with ambiguous data and DNA flexibility (via normal modes). They have led to helpful benchmark sets of different difficulties for assessing protein-DNA docking performance (32). ...
Article
Full-text available
Structural, regulatory and enzymatic proteins interact with DNA to maintain a healthy and functional genome. Yet, our structural understanding of how proteins interact with DNA is limited. We present MELD-DNA, a novel computational approach to predict the structures of protein-DNA complexes. The method combines molecular dynamics simulations with general knowledge or experimental information through Bayesian inference. The physical model is sensitive to sequence-dependent properties and conformational changes required for binding, while information accelerates sampling of bound conformations. MELD-DNA can: (i) sample multiple binding modes; (ii) identify the preferred binding mode from the ensembles; and (iii) provide qualitative binding preferences between DNA sequences. We first assess performance on a dataset of 15 protein-DNA complexes and compare it with state-of-the-art methodologies. Furthermore, for three selected complexes, we show sequence dependence effects of binding in MELD predictions. We expect that the results presented herein, together with the freely available software, will impact structural biology (by complementing DNA structural databases) and molecular recognition (by bringing new insights into aspects governing protein-DNA interactions).
... This is partly due to the limitations in scoring functions to handle highly charged systems as well as due to the difficulties in modeling nucleic acid conformational changes during binding. Studies like those conducted by Haddock [30,32,33] have incorporated DNA flexibility through normal modes but have not yet taken advantage of the sequence-dependent deformation profiles of DNA. ...
Article
Full-text available
The β subunit of E. coli DNA polymererase III is a DNA sliding clamp associated with increasing the processivity of DNA synthesis. In its free form, it is a circular homodimer structure that can accomodate double-stranded DNA in a nonspecific manner. An open state of the clamp must be accessible before loading the DNA. The opening mechanism is still a matter of debate, as is the effect of bound DNA on opening/closing kinetics. We use a combination of atomistic, coarse-grained, and enhanced sampling strategies in both explicit and implicit solvents to identify opening events in the sliding clamp. Such simulations of large nucleic acid and their complexes are becoming available and are being driven by improvements in force fields and the creation of faster computers. Different models support alternative opening mechanisms, either through an in-plane or out-of-plane opening event. We further note some of the current limitations, despite advances, in modeling these highly charged systems with implicit solvent.
Article
Full-text available
Proteins and nucleic acids are essential biological macromolecules for cell life. Indeed, interactions between proteins and DNA regulate many biological processes such as protein synthesis, signal transduction, DNA storage, or DNA replication and repair. Despite their importance, less than 4% of total structures deposited in the Protein Data Bank (PDB) correspond to protein-DNA complexes, and very few computational methods are available to model their structure. We present here the pyDockDNA web server, which can successfully model a protein-DNA complex with a reasonable predictive success rate (as benchmarked on a standard dataset of protein-DNA complex structures, where DNA is in B-DNA conformation). The server implements the pyDockDNA program, as a module of pyDock suite, thus including third-party programs, modules, and previously developed tools, as well as new modules and parameters to handle the DNA properly. The user is asked to enter Protein Data Bank files for protein and DNA input structures (or suitable models) and select the chains to be docked. The server calculations are mainly divided into three steps: sampling by FTDOCK, scoring with new energy-based parameters and the possibility of applying external restraints. The user can select different options for these steps. The final output screen shows a 3D representation of the top 10 models and a table sorting the model according to the scoring function selected previously. All these output files can be downloaded, including the top 100 models predicted by pyDockDNA. The server can be freely accessed for academic use (https://model3dbio.csic.es/pydockdna).
Article
As expressed predominantly in cardiac tissue, beta1-adrenoceptor (β1-AR) is broadly accepted as one of the main targets for drugs against cardiovascular ailments. However, the discovery of β1-AR ligand is gravely challenged due to the lack of efficient screening method. This work developed a general strategy for pursuing β1-AR ligands from the herbal extract by immobilizing haloalkane dehalogenase (Halo)-tagged β1-AR onto microspheres coated with 6-chlorohexanoic acid, and applying the immobilized β1-AR in the analysis of ligand-receptor interaction. The morphology was characterized by scanning electron microscope (SEM) and X-ray photoelectron spectroscopy (XPS). The chromatographic specificity of the immobilized receptor column was evaluated by determining the association constants of atenolol, esmolol and metoprolol using stepwise frontal analysis plus injection amount-dependent method. The potential ligands binding to β1-AR was screened by collecting the peak with retention time longer than the void time, and identified the collection by reverse phase liquid chromatography coupled with tandem mass spectrometry. The association constants of the three drugs to β1-AR were (3.33±0.29)×106 M⁻¹, (2.33±0.23)×106 M⁻¹ and (2.06±0.03)×106 M⁻¹, indicating a desired specificity of the immobilized receptor for recognizing its ligands. Molecular docking showed that van der Waals, hydrogen bonds, and hydrophobic interactions were the principal interaction forces for the receptor-drug complexes. Benzoylmesaconine was screened as the potential ligand of β1-AR in Radix Aconiti Lateralis Praeparata extract. The association constant of the ligand was (1.06±0.02)×105 M⁻¹, hinting structural modification may be required before clinical application. The immobilized β1-AR is possible to provide a rapid method for screening potential ligands in herbal extract.
Article
Nucleic acid-ligand complexes underlie numerous cellular processes, such as gene function expression and regulation, in which their three-dimensional structures are important to understand their functions and thus to develop therapeutic interventions. Given the high cost and technical difficulties in experimental methods, computational methods such as molecular docking have been actively used to investigate nucleic acid-ligand interactions in which an accurate scoring function is crucial. However, because of the limited number of experimental nucleic acid-ligand binding data and structures, the scoring function development for nucleic acid-ligand interactions falls far behind that for protein-protein and protein-ligand interactions. Here, based on our statistical mechanics-based iterative approach, we have developed an iterative knowledge-based scoring function for nucleic acid-ligand interactions, named as ITScore-NL, by explicitly including stacking and electrostatic potentials. Our ITScore-NL scoring function was extensively evaluated for its ability in the binding mode and binding affinity predictions on three diverse test sets and compared with state-of-the-art scoring functions. Overall, ITScore-NL obtained significantly better performance than the other 12 scoring functions and predicted near-native poses with rmsd ≤ 1.5 Å for 71.43% of the cases when the top three binding modes were considered and a good correlation of R = 0.64 in binding affinity prediction on the large test set of 77 nucleic acid-ligand complexes. These results suggested the accuracy of ITScore-NL and the necessity of explicitly including stacking and electrostatic potentials.
Chapter
Docking, a molecular modelling method, has wide applications in identification and optimization in modern drug discovery. This chapter addresses the recent advances in the docking methodologies like fragment docking, covalent docking, inverse docking, post processing, hybrid techniques, homology modeling etc. and its protocol like searching and scoring functions. Advances in scoring functions for e.g. consensus scoring, quantum mechanics methods, clustering and entropy based methods, fingerprinting, etc. are used to overcome the limitations of the commonly used force-field, empirical and knowledge based scoring functions. It will cover crucial necessities and different algorithms of docking and scoring. Further different aspects like protein flexibility, ligand sampling and flexibility, and the performance of scoring function will be discussed. Full Text Preview Fundamental Necessities Molecular docking program emphasize on the following basic requirements (Mahajan, & Gill, 2014; Krovat, Steindl, & Langer, 2005): 1. A target protein structure with or without a bound ligand is detected by various experimental techniques like NMR or X-Ray crystallography, but if protein structure is not present then protein prediction is done by any technique like threading modelling, homology modelling. 2. Database containing existing or virtual compounds for the docking process 3. Sampling and scoring method, desired scoring and searching algorithms require a computational framework for its efficient working 4. The three-dimensional structure of the protein ligand complex has to be studied in depth of atomic resolution. Continue Reading
Article
Full-text available
There is a growing interest in structural studies of DNA by both experimental and computational approaches. Often, 3D-structural models of DNA are required, for instance, to serve as templates for homology modeling, as starting structures for macro-molecular docking or as scaffold for NMR structure calculations. The conformational adaptability of DNA when binding to a protein is often an important factor and at the same time a limitation in such studies. As a response to the demand for 3D-structural models reflecting the intrinsic plasticity of DNA we present the 3D-DART server (3DNA-Driven DNA Analysis and Rebuilding Tool). The server provides an easy interface to a powerful collection of tools for the generation of DNA-structural models in custom conformations. The computational engine beyond the server makes use of the 3DNA software suite together with a collection of home-written python scripts. The server is freely available at http://haddock.chem.uu.nl/dna without any login requirement.
Article
Endonucleases encoded by mobile group I introns are highly specific DNases that induce a double-strand break near the site to which the intron moves. I-PpoI from the acellular slime mold Physarum polycephalum mediates the mobility of intron 3 (Pp LSU 3) in the extrachromosomal nuclear ribosomal DNA of this organism. We showed previously that cleavage by I-PpoI creates a four-base staggered cut near the point of intron insertion. We have now characterized several further properties of the endonuclease. As determined by deletion analysis, the minimal target site recognized by I-PopI was a sequence of 13 to 15 bp spanning the cleavage site. The purified protein behaved as a globular dimer in sedimentation and gel filtration. In gel mobility shift assays in the presence of EDTA, I-PpoI formed a stable and specific complex with DNA, dissociating with a half-life of 45 min. By footprinting and interference assays with methidiumpropyl-EDTA-iron(II), I-PpoI contacted a 22- to 24-bp stretch of DNA. The endonuclease protected most of the purines found in both the major and minor grooves of the DNA helix from modification by dimethyl sulfate (DMS). However, the reactivity to DMS was enhanced at some purines, suggesting that binding leads to a conformational change in the DNA. The pattern of DMS protection differed fundamentally in the two partially symmetrical halves of the recognition sequence.
Article
A mutation has been identified that prevents Saccharomyces cerevisiae cells from growing on proline as the sole source of nitrogen, causes noninducible expression of the PUT1 and PUT2 genes, and is completely recessive. In the put3-75 mutant, the basal level of expression (ammonia as nitrogen source) of PUT1-lacZ and PUT2-lacZ gene fusions as measured by β-galactosidase activity is reduced 4- and 7-fold, respectively, compared with the wild-type strain. Normal regulation is not restored when the cells are grown on arginine as the sole nitrogen source and put3-75 cells remain sensitive to the proline analog, l-azetidine-2-carboxylic acid, indicating that the block is not at the level of transport of the inducer, proline. In a cross between the put3-75 strain and the semidominant, constitutive mutation PUT3c-68, only parental ditype tetrads were found, indicating allelism of the two mutations. Further support for allelism derives from the comparison of enzyme levels in heteroallelic and heterozygous diploid strains. The constitutive allele appears to be fully dominant to the noninducible allele but only partially dominant to the wild type, suggesting an interaction between the wild-type and PUT3c-68 gene products. The PUT3 gene maps on chromosome XI, about 5.7 cM from the centromere. The phenotypes of alleles of the PUT3 gene, either recessive and noninducible (the put3-75 phenotype) or semidominant and constitutive (the PUT3c-68 phenotype), and their pleiotropy suggest that the PUT3 gene product is a positive activator of the proline utilization pathway.
Article
The docking of repressor proteins to DNA starting from the unbound protein and model-built DNA coordinates is modeled computationally. The approach was evaluated on eight repressor/DNA complexes that employed different modes for protein/ DNA recognition. The global search is based on a protein-protein docking algorithm that evaluates shape and electrostatic complementarity, which was modified to consider the importance of electrostatic features in DNA-protein recognition. Complexes were then ranked by an empirical score for the observed amino acid /nucleotide pairings (i.e., protein-DNA pair potentials) derived from a database of 20 protein/DNA complexes. A good prediction had at least 65% of the correct contacts modeled. This approach was able to identify a good solution at rank four or better for three out of the eight complexes. Predicted complexes were filtered by a distance constraint based on experimental data defining the DNA footprint. This improved coverage to four out of eight complexes having a good model at rank four or better. The additional use of amino acid mutagenesis and phylogenetic data defining residues on the repressor resulted in between 2 and 27 models that would have to be examined to find a good solution for seven of the eight test systems. This study shows that starting with unbound coordinates one can predict three-dimensional models for protein/DNA complexes that do not involve gross conformational changes on association. Proteins 33:535–549, 1998. © 1998 Wiley-Liss, Inc.
Article
PUT3 is a member of a family of at least 79 fungal transcription factors that contain a six-cysteine, two-zinc domain called a 'Zn2Cys6 binuclear cluster'. We have determined the crystal structure of the DNA binding region from the PUT3 protein bound to its cognate DNA target. The structure reveals that the PUT3 homodimer is bound asymmetrically to the DNA site. This asymmetry orients a -strand from one protein subunit into the minor groove of the DNA resulting in a partial amino acid-base pair intercalation and extensive direct and water-mediated protein interactions with the minor groove of the DNA. These interactions facilitate a sequence dependent kink at the centre of the DNA site and specify the intervening base pairs separating two DNA half-sites that are contacted in the DNA major groove. A comparison with the GAL4−DNA and PPR1−DNA complexes shows how a family of related DNA binding proteins can use a diverse set of mechanisms to discriminate between the base pairs separating conserved DNA half-sites.
Article
We present a comprehensive software package, 3DNA, for the analysis, reconstruction and visualization of three‐dimensional nucleic acid structures. Starting from a coordinate file in Protein Data Bank (PDB) format, 3DNA can handle antiparallel and parallel double helices, single‐stranded structures, triplexes, quadruplexes and other complex tertiary folding motifs found in both DNA and RNA structures. The analysis routines identify and categorize all base interactions and classify the double helical character of appropriate base pair steps. The program makes use of a recently recommended reference frame for the description of nucleic acid base pair geometry and a rigorous matrix‐based scheme to calculate local conformational parameters and rebuild the structure from these parameters. The rebuilding routines produce rectangular block representations of nucleic acids as well as full atomic models with the sugar–phosphate backbone and publication quality ‘standardized’ base stacking diagrams. Utilities are provided to locate the base pairs and helical regions in a structure and to reorient structures for effective visualization. Regular helical models based on X‐ray diffraction measurements of various repeating sequences can also be generated within the program.
Article
The proteins Sac7d and Sso7d belong to a class of small chromosomal proteins from the hyperthermophilic archaeon Sulfolobus acidocaldarius and S. solfactaricus, respectively. These proteins are extremely stable to heat, acid and chemical agents. Sac7d binds to DNA without any particular sequence preference and thereby increases its melting temperature by approximately 40 degrees C. We have now solved and refined the crystal structure of Sac7d in complex with two DNA sequences to high resolution. The structures are examples of a nonspecific DNA-binding protein bound to DNA, and reveal that Sac7d binds in the minor groove, causing a sharp kinking of the DNA helix that is more marked than that induced by any sequence-specific DNA-binding proteins. The kink results from the intercalation of specific hydrophobic side chains of Sac7d into the DNA structure, but without causing any significant distortion of the protein structure relative to the uncomplexed protein in solution.
Article
Coarse grain modelling of macromolecules is a new approach, potentially well adapted to answer numerous issues, ranging from physics to biology. We propose here an original DNA coarse grain model specifically dedicated to protein-DNA docking, a crucial, but still largely unresolved, question in molecular biology. Using a representative set of protein-DNA complexes, we first show that our model is able to predict the interaction surface between the macromolecular partners taken in their bound form. In a second part, the impact of the DNA sequence and electrostatics, together with the DNA and protein conformations on docking is investigated. Our results strongly suggest that the overall DNA structure mainly contributes in discriminating the interaction site on cognate proteins. Direct electrostatic interactions between phosphate groups and amino acid side chains strengthen the binding. Overall, this work demonstrates that coarse grain modeling can reveal itself a precious auxiliary for a general and complete description and understanding of protein-DNA association mechanisms.
Article
Most drugs have been discovered in random screens or by exploiting information about macromolecular receptors. One source of this information is in the structures of critical proteins and nucleic acids. The structure-based approach to design couples this information with specialized computer programs to propose novel enzyme inhibitors and other therapeutic agents. Iterated design cycles have produced compounds now in clinical trials. The combination of molecular structure determination and computation is emerging as an important tool for drug development. These ideas will be applied to acquired immunodeficiency syndrome (AIDS) and bacterial drug resistance.