ArticlePDF Available

Pushing the limits of what is achievable in protein-DNA docking: Benchmarking HADDOCK's performance

May 2010
Nucleic Acids Research 38(17):5634-47

May 2010
38(17):5634-47

DOI:10.1093/nar/gkq222

Source
PubMed

License
CC BY-NC 2.5

Authors:

Alexandre M J J Bonvin

Utrecht University

The intrinsic flexibility of DNA and the difficulty of identifying its interaction surface have long been challenges that prevented the development of efficient protein–DNA docking methods. We have demonstrated the ability our flexible data-driven docking method HADDOCK to deal with these before, by using custom-built DNA structural models. Here we put our method to the test on a set of 47 complexes from the protein–DNA docking benchmark. We show that HADDOCK is able to predict many of the specific DNA conformational changes required to assemble the interface(s). Our DNA analysis and modelling procedure captures the bend and twist motions occurring upon complex formation and uses these to generate custom-built DNA structural models, more closely resembling the bound form, for use in a second docking round. We achieve throughout the benchmark an overall success rate of 94% of one-star solutions or higher (interface root mean square deviation ≤4 Å and fraction of native contacts >10%) according to CAPRI criteria. Our improved protocol successfully predicts even the challenging protein–DNA complexes in the benchmark. Finally, our method is the first to readily dock multiple molecules (N > 2) simultaneously, pushing the limits of what is currently achievable in the field of protein–DNA docking.

Cumulative bar graph expressing the quality of the docking solutions according to the CAPRI star rating for all 2000 bound–bound rigid-body docking solutions. Complexes are sorted according to the total number of obtained stars. CAPRI criteria are defined as; three stars (high quality): Fnat > 0.5, l-r.m.s.d or i-r.m.s.d < 1.0 Å; two stars (medium quality): Fnat > 0.3, l-r.m.s.d < 5.0 Å or i-r.m.s.d < 2.0 Å; one star (acceptable quality): Fnat > 0.1, l-r.m.s.d < 10.0 Å or i-r.m.s.d < 4.0 Å. Fnat is the fraction of native contacts within a 5 Å cutoff.

…

Cumulative bar graphs expressing the quality of the best 400 docking solutions according to the HADDOCK score in terms of CAPRI one-star (grey) and two-star (white) results, for the two-stage unbound–unbound protein–DNA docking using true interface derived restraints. Results are presented for; the rigid-body docking starting from a canonical B-DNA model (A); after the semi-flexible refinement (B) and after semi-flexible refinement using an ensemble of custom DNA 3D structural models (C). Complexes are sorted according to the total number of obtained stars in (B), reclassifying the benchmark into ‘easy’, ‘intermediate’ and ‘difficult’ categories. See caption of Figure 1 for the definition of the CAPRI criteria.

…

. Definition of the AIRs based on experimental data for the six selected test-cases

…

All heavy atom r.m.s.d values from the reference complex [(A) DNA only, (B) full complex, (C) interface] and fraction of native contacts [Fnat, (D)] for the 10 best solutions of the best cluster, both selected based on the HADDOCK score, after rigid-body docking (open squares) and semi-flexible refinement (closed circles) starting from a canonical B-DNA structural model and after semi-flexible refinement (open triangle) starting from an ensemble of custom-built DNA models.

…

Best solutions from unbound flexible docking using an ensemble of custom-built DNA structural models (blue) superimposed on to the reference structure (yellow). The complexes are grouped according to their docking difficulty (‘easy’, ‘intermediate’ and ‘difficult’) as indicated in the benchmark. The CAPRI score for each solution is indicated as one or two stars after the PDB code as well as the fraction of native contacts (a), the interface (b) and DNA r.m.s.d (c) from the reference structure. r.m.s.d values (Å) were calculated after superimposition on all heavy atoms of the selected regions of the reference complex. The figures were generated using Pymol (DeLano Scientific LLC, www.pymol.org).

…

Figures - uploaded by Alexandre M J J Bonvin

Content may be subject to copyright.

Content uploaded by Alexandre M J J Bonvin

Content may be subject to copyright.

Available via license: CC BY-NC 2.5

Content may be subject to copyright.

Pushing the limits of what is achievable in

protein–DNA docking: benchmarking

HADDOCK’s performance

Marc van Dijk and Alexandre M. J. J. Bonvin*

Bijvoet Center for Biomolecular Research, Science Faculty, Utrecht University, Padualaan 8, 3584 CH Utrecht,

The Netherlands

Received January 9, 2010; Revised and Accepted March 17, 2010

ABSTRACT

The intrinsic flexibility of DNA and the difficulty of

identifying its interaction surface have long been

challenges that prevented the development of effi-

cient protein–DNA docking methods. We have

demonstrated the ability our flexible data-driven

docking method HADDOCK to deal with these

before, by using custom-built DNA structural

models. Here we put our method to the test on a

set of 47 complexes from the protein–DNA docking

benchmark. We show that HADDOCK is able to

predict many of the specific DNA conformational

changes required to assemble the interface(s). Our

DNA analysis and modelling procedure captures the

bend and twist motions occurring upon complex

formation and uses these to generate custom-built

DNA structural models, more closely resembling the

bound form, for use in a second docking round. We

achieve throughout the benchmark an overall

success rate of 94% of one-star solutions or

higher (interface root mean square deviation 4A

and fraction of native contacts >10%) according to

CAPRI criteria. Our improved protocol successfully

predicts even the challenging protein–DNA com-

plexes in the benchmark. Finally, our method is the

first to readily dock multiple molecules (N>2) sim-

ultaneously, pushing the limits of what is currently

achievable in the field of protein–DNA docking.

INTRODUCTION

The computational docking ﬁeld is proceeding ever faster

to become an integral part of the research workﬂow in life

sciences. Most of the developments in docking method-

ology were pioneered in the ﬁelds of small molecule

docking and protein–protein docking (1–3). Docking

has become a valuable tool in drug design, molecular

interaction studies, NMR and X-ray structural studies,

biochemical experiment design and validation (4–6).

While docking is ﬂourishing in these ﬁelds, less progress

has been made in the development of successful protein–

DNA docking algorithms. This is in part due to two

system-dependent problems: (i) identifying the location

of the interaction interface(s) on the DNA and (ii)

modelling DNA conformational changes while maintain-

ing a correct representation of the DNA double-helix

during a simulation. The ﬁeld of protein–DNA docking

is, however, receiving renewed interest as the vital role of

protein–DNA interactions in regulating gene expression

and guarding genome integrity has become apparent (7).

As a consequence, new protein–DNA docking methods

are put forward and proven protein–protein docking

concepts are extended to deal with these systems (8–17).

We have in the past adapted our data driven docking

method HADDOCK, to deal with protein–DNA systems

(18) and showed that it is able to deal with the two main

challenges mentioned above. The ability of HADDOCK

to use experimental data to drive the docking greatly

facilitates the identiﬁcation and positioning of the inter-

action interfaces during the docking (19,20). The incorp-

oration of ﬂexibility, both explicitly during the docking

and implicitly by the use of custom-built DNA structural

models, has proven to facilitate the conformational

changes in the protein and DNA needed to establish the

complex. The protocol was initially tested by docking the

unbound structures of three monomeric transcription

factors to their respective operator half-sites [phage 434

Cro (21), phage Arc (22) and Escherichia coli Lac (23)].

The resulting near native docking solutions reproduced

many of the contacts observed in the experimental struc-

tures as well as speciﬁc conformational changes in the

DNA. Our initial protein–DNA docking protocol has

been successfully used in a number of practical applica-

tions by various laboratories worldwide (24–28). Driven

by this success we have worked on improving the method’s

performance and user friendliness by facilitating the gen-

eration of custom DNA structural models (29) as well as

*To whom correspondence should be addressed. Tel: +31 30 2533859; Fax: +31 30 2537623; Email: a.m.j.j.bonvin@uu.nl

5634–5647 Nucleic Acids Research, 2010, Vol. 38, No. 17 Published online 13 May 2010

doi:10.1093/nar/gkq222

ßThe Author(s) 2010. Published by Oxford University Press.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/

by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

at University Library Utrecht on October 15, 2010nar.oxfordjournals.orgDownloaded from

establishing a protein–DNA docking benchmark as a test

bed for future developments (30). Next to that,

HADDOCK has been made available to the community

as a web server (http://www.haddocking.org; http://had-

dock.chem.uu.nl).

Here we bring all these elements together and challenge

our method using the 47 test cases from the protein–DNA

benchmark to deﬁne the limits of our current approach.

We focus on the same two questions addressed in the

previous work (18): how successful is the method in

dealing with conformational changes upon complex for-

mation and how well is it able to identify the correct inter-

action interfaces? Compared to the three test cases used

previously, the 47 test cases in the benchmark pose some

considerable challenges. The initial test cases were all

major groove interacting transcription factors in their

monomeric form, targeting one operator half-side that

eﬀectively spans one helical turn of DNA. The DNA-

interacting domain of these transcription factors changes

only conformation with respect to the side-chains of the

DNA-interacting residues. The global conformational

changes in the DNA were expressed as a uniform bend

and change in groove width. In contrast, among the 47 test

cases of the benchmark, not only transcription factors but

also enzymes and structural proteins are present. These

interact using a variation of structural domains, often

involving multiple proteins, targeted to one or multiple

sites on the DNA. Furthermore, the DNA length is

often more than one helical turn. As a consequence, con-

formational changes can no longer be expressed in a

smooth and uniform way but rather as an accumulation

of local DNA bending and twisting events. To cope with

these challenges we have improved our method for the

generation of custom DNA structural models by extend-

ing its ability to capture the main bend and twist motions

occurring in the DNA upon complex formation, and by

subsequently using this information for the generation of

custom DNA models.

The new results, again, show that the use of explicit

ﬂexibility in combination with implicit ﬂexibility by

means of an ensemble of custom-built DNA structural

models, greatly improves the protein–DNA docking eﬃ-

ciency with respect to rigid-body docking. This is especial-

ly clear for the intermediate and diﬃcult categories of the

benchmark where DNA conformational changes readily

occur. The use of experimental information for the

docking of a representative subset of the benchmark, dem-

onstrates the ability of our method to identify the correct

interfaces and assemble the complex under ‘real life’

docking conditions. Furthermore, our method is the ﬁrst

to dock multiple molecules simultaneously, a valuable

feature in a benchmark containing 40% of multi-

component complexes. Top ranking docking solutions

throughout the benchmark readily score one and two

stars according to the CAPRI quality criteria (31) and

three-star predictions are getting within reach for ‘easy’

test cases.

To our knowledge this is the ﬁrst time a protein–DNA

docking study of such a magnitude has been performed.

Our results stress the importance of conformational adap-

tation in the docking of protein–DNA complexes and

show the potential of HADDOCK to deal with them.

We hope that they will stimulate the docking community

to put their methods to the test on the same benchmark

and foster further developments.

MATERIALS AND METHODS

Protein–DNA docking benchmark

The performance of HADDOCK was evaluated using the

coordinate ﬁles for the bound and unbound proteins of 47

protein–DNA complexes available in the protein–DNA

benchmark version 1.2 [http://haddock.chem.uu.nl/dna/

benchmark.html (30)]. Canonical B-DNA 3D structural

models were built using the 3D-DART web server

[http://haddock.chem.uu.nl/dna (29)]. Their conformation

was of BII type with the sugar pucker in the C20-endo

conformation [sugar pseudo-rotation phase angle (P)

= 155, DNA backbone torsion angles: a=309,

b=159,g=37,d=146,"=218,z=191and =260].

Restraints used in the docking

Ambiguous interaction restraints, based on the true

interface. Ideal ambiguous interaction restraints (AIR)

restraint sets were generated based on the true interface(s)

of the reference complexes as follows: (i) retrieval of all

intermolecular atom–atom contacts below a cutoﬀ of

5.0 A

˚; (ii) transformation of the atom–atom contacts to

their respective residue–residue counterparts distinguish-

ing between three categories: amino-acid to nucleotide

base contacts, amino-acid to nucleotide sugar–phosphate

backbone contacts or amino-acid to full nucleotide

contacts. Contacts that originated from amino-acid

residues having a relative main- or side-chain solvent ac-

cessibility of <30% as measured by NACCESS (32) where

discarded.

All residues used in creating the interaction restraint ﬁle

were deﬁned as ‘active’. In eﬀect we used the same pro-

cedure to generate AIRs as in the case of experimental

information with the diﬀerence that they are only

deﬁned between the residues that are known to be in

close vicinity in the reference complex.

AIRs based on experimental information. To evaluate the

performance of HADDOCK in docking protein–DNA

complexes using experimental information, we selected

six representative tested cases from the ‘easy’ (3cro,

1by4), ‘intermediate’ (1azp, 1jj4) and ‘diﬃcult’ (1a74,

1zme) category of the benchmark. For these we collected

biochemical and biophysical information from literature

sources. Only residues that are solvent accessible in the

unbound proteins, using the same criteria as described

above, were considered. For those DNA bases shown to

be involved in speciﬁc interactions with the protein, only

atoms able to interact by hydrogen-bond or non-bonded

interactions were deﬁned. This selection was further

subdivided into atoms facing either the major or minor

groove in case information about the protein-binding

mode was available (Table 1). In case of non-speciﬁc inter-

actions with the DNA, only the atoms of the sugar–phos-

phate backbone that are able to interact via hydrogen

Nucleic Acids Research, 2010, Vol. 38, No. 17 5635

at University Library Utrecht on October 15, 2010nar.oxfordjournals.orgDownloaded from

bonds or non-bonded interactions were deﬁned (Table 1).

Solvent accessible residues located in the predicted inter-

action interface, for which no experimental information

was available, were deﬁned as ‘passive’. Residues for

which experimental information was available were

deﬁned as ‘active’. An overview of the data used is listed

in Table 2.

DNA restraints. In order to preserve the helical conform-

ation during the ﬂexible stages of the docking the

DNA was restrained as described before (18). For the

docking of the unbound protein(s) to a canonical

B-DNA structural model, the dihedral angles of the

sugar–phosphate backbone of the input structure (inp)

were measured and used as restraints (restricted

to a=a

inp

±10

,b=b

inp

±40

,g=g

inp

±20

,d=d

inp

50,"="

inp

±10

and z=z

inp

±50

). For the docking of

the unbound protein(s) to the ensemble of custom-built

DNA structural models, the same protocol for sugar–

phosphate backbone restraints was used but the restraint

error values were reduced to half of those in the canonical

B-DNA case.

Docking protocol

The default protein–DNA docking protocol as described

before (18) and implemented in HADDOCK version 2.0

(33) was used for all the docking runs. This protocol

includes the random removal of 50% of the ambiguous

interaction restraints for each docking trial. Several

docking-speciﬁc modiﬁcations were made as follows.

Bound–bound docking. Only rigid body docking gene-

rating 2000 solutions. Protein and DNA structures

were used in the bound conformation obtained from the

reference complex.

Table 2. Deﬁnition of the AIRs based on experimental data for the six selected test-cases

Protein DNA References

‘Easy’

1by4 (37) Act: (K31,R32)

a,b

)T5,C6,G25,A26 (E24,K27)

a,b

)

G3/4,C27/28 (K72,K73,R80)

)A2,G3/4

Act: (T5,C6,A26,C27,C28,T29)

(G3,G4)

a,c,d

(A2,T24)

a,c

T23

,G25

a,d

(38–46)

Pas: V34,A75,V76,Q77, R55,N56,Q59,R62

3cro (21) Act: (K29,Q31,S32,K42-P44)

L35

)C14,T15/T23,33 Act: (C6,A7,T16-T18,C24,A25,T34-

T36)

,(T32,T33)

a,b,c

Pas: K9,T18-T20,G27,V28,Q30,Q34,

I36,E37,V40,T41,R45,F46

(T4,A5,T13,C14,T15,T22,A23, G31)

a,c

(47–50)

‘Intermediate’

1azp (51) Act: W24

)G3,G15 V26

,M29

,S31

,V45

)C2-A4,

T13-G15 (K22,T33,R42)

)T5-G7,C10-A12

Act: C2,G3

,A4,T5,C6,G7,

C10,G11,A12,T13,C14,G15

(52–55)

Pas: K21,R25,G27,K28,K39,T40,A44, S46,E47

1jj4 (56) Act: (N13,K16,C17,R19-R21)

Act: (A3,C4,T30)

,(C5,G28,G29)

a,d

(T25-C27)

(57,58)

Pas: S34,T35,H37 )T26-C27

‘Diﬃcult’

1a74 (59) Act: (H97,N122)

a,b

)A35,G36

(A54-N56,T59,R60,R65,R73, G75)

)T1-C7

Act: (T1-C7)

a,b,d

,(A35,G36)

, G40

(60–65)

Pas: V51,G57,P58,T66,V71,H77, H100,K119

1zme (66) Act: (R9,R11,H12,R80,R82,H83)

Pas: A4,K14,K39-S43, A75,K85,K100-S114

Act: (C2,G3,G4,C15,C17,G18,

C20,G21,G22,C33,C34,G35)

(T26-C32,C9-T14)

(67–72)

Active residues (Act) are grouped according to the available information. Continuous stretches of residues are separated by a dash. Arrows indicate

active restraints for speciﬁc pairs of residues. Passive residues (Pas) are only deﬁned for the protein. Since 1by4, 1jj4 and 1a74 are symmetrical dimers

only the restraints for one subunit are shown. Base-speciﬁc restraints for 3cro, 1by4, 1jj4, 1a74 and 1zme are targeted to the atoms of the nucleotides

facing the major groove and those of 1azp to those facing the minor groove (Table 1).

Conserved residues.

Mutagenesis data.

Ethylation interference data.

Methylation interference data.

NMR native state amide hydrogen exchange.

Raman spectroscopy.

Table 1. Nucleotide atom subsets used in the deﬁnition of AIRs

DNA base Minor groove atoms Major groove atoms

Thy H3, O2, C20H3, O4, C4, C5, C6, C70

Ade N1, N3, C2, C40H61, H62, N1, N7, C5, C6, C80

Gua H1, H21, H22, N3, C2, C40H1, H21, N7, O6, C5, C6, C80

Cyt N3, O2, C20H41, H42, N3, C4, C5, C60

Non-speciﬁc backbone atoms

Sugar–phosphate backbone C10,C2

0,O3

0,O5

0, P, O1P, O2P

Subsets are deﬁned for atoms capable of interacting using non-bonded or hydrogen bonded interactions. Individual subsets are deﬁned for those

atoms facing the DNA major and minor groove for the four bases and for the sugar–phosphate backbone atoms.

5636 Nucleic Acids Research, 2010, Vol. 38, No. 17

at University Library Utrecht on October 15, 2010nar.oxfordjournals.orgDownloaded from

Unbound–unbound docking using a canonical B-DNA struc-

tural model. A single component HADDOCK run was

performed using the unbound proteins to yield a better

sampling of side-chains and loop conformations. The

residues of the interface either deﬁned based on the refer-

ence complex or on experimental information were

allowed to sample additional conformations during the

semi-ﬂexible reﬁnement stage. Here, semi-ﬂexible reﬁne-

ment signiﬁes the combination of the semi-ﬂexible

simulated annealing stage in torsion angle space and

the ﬁnal water reﬁnement stage in Cartesian space.

Four protein models and the original unbound protein

structure were used together with the canonical B-DNA

model as an input ensemble for unbound–unbound

docking. A total of 4000 docking solutions (every

combination of models is sampled 800 times) were

generated in the rigid body docking stage and the top

10% based on the HADDOCK score were used in the

subsequent semi-ﬂexible reﬁnement stage. During the

semi-ﬂexible simulated annealing stage, the full DNA

excluding the terminal base pairs was treated as semi-

ﬂexible. The amino-acid residues within 5.0 A

˚of any

partner molecule were automatically deﬁned as semi-

ﬂexible.

Unbound–unbound docking using ﬁve custom-built DNA

structural models. The same protocol as for unbound–

unbound docking starting from canonical B-DNA was

used with as diﬀerence; ﬁve custom-built DNA structural

models were used instead of canonical B-DNA; the con-

formational freedom of the DNA in the semi-ﬂexible

simulated annealing stage was limited by automatically

deﬁning both the amino-acid residues and nucleotides

within 5.0 A

˚of any partner molecule as semi-ﬂexible; the

error range for the sugar–phosphate backbone dihedral

angles as described above were reduced by half. Every

combination of protein–DNA input models is sampled

160 times in the rigid body docking stage. The procedure

for generating custom DNA structural models used as

input for this docking run is described below.

Generation of custom DNA structural models

The generation of ﬁve custom DNA structural models is

based on an analysis and a modelling step.

Analysis. The 10 best solutions from the top ranking

cluster, both according to the HADDOCK score, were

selected. The DNA structures in these solutions were

analyzed using 3DNA (34,35) and the DNA bend

analysis algorithm used in the 3D-DART server (29).

This resulted in average parameter values for the six

base pair (step) parameters (36) for every base pair

(step) in the structure. These describe the conformation

of the DNA. The average global bend vector with

respect to a common reference frame between every suc-

cessive base pair in the structures was calculated by 3D-

DART. This information was used in the modelling stage.

Modelling. The modelling of custom DNA structures is

based on the progressive introduction of global and

local DNA conformational changes to a canonical

B-DNA starting model.

(i) A default set of base pair (step) parameters repre-

senting a canonical B-DNA conformation with the

same sequence as the reference structure is

generated by 3D-DART using the ‘ﬁber’ utility of

the 3DNA software suite.

(ii) The Roll and Tilt values in the default set are

updated by 3D-DART to reﬂect the average

global bend vector for every base pair step in the

sequence. The central base pair is used as origin of

the global reference frame and default Twist values

are used for correcting the vectors direction relative

to the reference frame. The introduced bend vector

between base pairs is scaled, enabling sampling of

conformation change beyond the limits of the values

deﬁned by the average ± the standard deviation

determined in the analysis stage. The scaling factor

is set between 2.0 and 3.0 for those ensembles that

show little deviation from a canonical helix and

between 4.0 and 6.0 for the remaining test cases.

For the docking of 1a74 using experimentally

derived restraints the scaling factor was set to 10.0

to match the amount of DNA bend to the curved

interaction surface of the protein (see ‘Results’

section).

(iii) All base pair step parameters are updated to reﬂect

the average values as determined by the analysis

stage resulting in a new weighted parameter PWxi

at base pair step ideﬁned as follows:

PWxi ¼2ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

pi=p





Pxi,ð1Þ

where Pxi is the average value for the given parameter

at base pair step iobtained from the analysis stage,

pi deﬁnes the standard deviation for the given par-

ameter at base pair step iand pis the standard

deviation for the given parameter for all base pair

steps. Sis a parameter-speciﬁc scaling factor that

compensates for the over- or under-estimation of a

given parameter as a result of the HADDOCK semi-

ﬂexible reﬁnement stages. Swas set to: twist: 0.8, roll:

0.8, tilt: 0.8, rise: 0.0, slide: 0.2 and shift: 0.8.

The new value Pni for the parameter at base pair

step iis now calculated as follows:

Pni ¼Pd+ðPWxi PdÞVðÞ ð2Þ

Here P

is the default value from canonical B-DNA

for the given parameter at base pair step iand Vis

a variance value used to sample the parameter

above or below its adjusted average (set to 0.8 by

default).

(iv) The default base pair parameters are updated in

the same way as for the base pair step parameters.

The base pair parameter-speciﬁc scaling factors

(S) used are: shear: 1.0, stretch: 1.0, stagger: 1.0,

buckle: 1.0 and propeller twist: 1.0. The

variance parameter Vis set to 0.8 by default.

Nucleic Acids Research, 2010, Vol. 38, No. 17 5637

at University Library Utrecht on October 15, 2010nar.oxfordjournals.orgDownloaded from

(v) The updated list of base pair (step) parameters is

used to build a 3D DNA structure using the same

parameters for the sugar pucker and phosphate

backbone dihedral angles as in the case of canonical

B-DNA.

Analysis

The quality of the generated solutions was evaluated

using the CAPRI criteria expressed as stars; three stars

(high quality): Fnat >0.5, l- or i-r.m.s.d <1.0 A

˚;

two stars (medium quality): Fnat >0.3, l-r.m.s.d <5.0 A

or i-r.m.s.d <2.0 A

˚; one star (acceptable quality):

Fnat >0.1, l-r.m.s.d <10.0 A

˚or i-r.m.s.d <4.0 A

˚. Fnat is

the fraction of native contacts within a 5 A

˚cutoﬀ, i-r.m.s.d

is the interface backbone (Ca,P) r.m.s.d and l-r.m.s.d is

the ligand backbone r.m.s.d calculated by superimposition

on all phosphate atoms of the reference DNA and subse-

quently on all Caatoms of the reference protein. For the

results in Figure 4 and the docking using experimentally

derived restraints, the reported r.m.s.d values were

calculated after superimposition of all heavy atoms of

the reference belonging to either the DNA, the protein,

the interface or the full complex. The r.m.s.d values were

calculated using ProFit (A.C.R. Martin, http://www

.bioinf.org.uk/software/proﬁt)

Hardware

HADDOCK docking runs were performed on a Transtec

(Transtec AG, Tubingen, Germany) computer cluster

operating with 48, 2.0 GHz, 64 bit Opteron processors.

As a measure of CPU requirements, one complete run

starting with 4000 structures in the rigid-body docking

stage could be performed in 4 h on 48 processors.

RESULTS

The power of HADDOCK as a method relies among

others on its use of AIRs and explicit ﬂexibility. An

AIR deﬁnes that a residue on the surface of a biomolecule

should be in close vicinity to another residue or group of

residues on the partner biomolecule when they form the

complex. By default this is described as an ambiguous

distance restraint between all atoms of the source

residue to all atoms of all reference residue(s) that are

assumed to be in the interface in the complex. The eﬀective

distance between all those atoms, deff

iAB is calculated as

follows:

deff

iAB ¼X

NAatom

miA¼1X

NresB

k¼1X

NBatom

nkB¼1

miAnkB

1=6

:ð3Þ

Here N

Aatom

indicates all atoms of the source residue on

molecule A, N

resB

the residues deﬁned to be at the inter-

face of the reference molecule B, and N

Batom

all atoms of a

residue on molecule B. The 1/r

summation somewhat

mimics the attractive part of the Lennard–Jones potential

and ensures that the AIRs are satisﬁed as soon as any two

atoms of the biomolecules are in contact. The AIRs are

incorporated as an additional energy term to the energy

function that is minimized during the docking. The am-

biguous nature of these restraints easily allows experimen-

tal data that often provide evidence for a residue making

contacts to be used as driving force for the docking. As

such the AIRs deﬁne a network of restraints between the

possible interaction interface(s) of the molecules to be

docked without deﬁning the relative orientation of the

molecules, minimizing the necessary search through con-

formational space needed to assemble the interfaces.

Because the AIRs are part of the energy function they

might also contribute to induce the conformational

changes during the ﬂexible stage of the docking.

To objectively answer the question: ‘how successful is

HADDOCK in dealing with conformational changes

upon complex formation?’ the eﬀects of the quality and

quantity of AIRs on complex formation and conform-

ational change should be kept to a minimum. This was

realized by constructing ideal AIR restraint sets based on

the true interface(s) of the reference complexes (see

‘Materials and Methods’ section). Using these restraints

we ﬁrst evaluated the ability of HADDOCK to recon-

struct the complex from its components in their bound

conformation. Challenges in reconstruction due to struc-

tural characteristics, the inability of the restraints to drive

correct complex formation or selection of top ranking so-

lutions due to scoring problems can be identiﬁed at this

stage. Next we used the same restraints to drive the

docking between the unbound protein and a canonical

B-DNA 3D structural model using our two-stage

protein–DNA docking approach. We focused on the two

stages individually, ﬁrst evaluating the eﬀects of explicit

ﬂexibility on the docking by comparing the docking

solutions from rigid body reﬁnement with those after

semi-ﬂexible reﬁnement. Subsequently we analyzed the

conformation of the DNA in the ﬁnal docking solutions.

Here, the focus was on the ability of HADDOCK to intro-

duce those speciﬁc DNA conformational changes in terms

of DNA bending and twisting that can lead to the ﬁnal

conformation of the DNA in the complex. With this in-

formation an ensemble of custom DNA structural models

was generated using a modiﬁed protocol of our 3D-

DART DNA modelling web server (see ‘Materials and

Methods’ section). The resulting models were used as

input for a second, ‘reﬁnement’, docking run. The results

were compared with those of the previous run starting

from a canonical B-DNA structural model to analyze

the eﬀect of this implicit treatment of ﬂexibility. Finally,

the same two-stage docking protocol was applied to a

subset of six test cases from the benchmark using AIR

restraints based on experimental information obtained

from literature sources.

Bound–bound docking

A bound–bound docking experiment is essentially an

exercise of separating the reference complex into its

individual biomolecules and reconstructing it again.

As the diﬀerent components are already in their bound

conformation ﬂexibility is not required and only rigid-

body docking needs to be performed. The ability of

HADDOCK to sample conformational space in search

5638 Nucleic Acids Research, 2010, Vol. 38, No. 17

at University Library Utrecht on October 15, 2010nar.oxfordjournals.orgDownloaded from

of the correct interaction interface(s) using ideal AIR

restraints was evaluated using the CAPRI star ranking

as a quality measure commonly used in protein–protein

docking (31). These criteria deﬁne one-star predictions as

‘acceptable’, two-star as ‘medium’ and three-star as ‘high’

quality with respect to their reference structure (see

‘Materials and Methods’ section).

The results illustrate that for 75% of the test cases three-

star solutions are generated (Figure 1, dark-grey bars).

For the ﬁrst half of the test cases (left half of Figure 1)

more than 95% of the solutions ranked one-star or higher

but for the remaining, a sharp decline in the total number

of star-ranked solutions was observed. The latter group

of test cases corresponds mostly with the ‘intermediate’

and ‘diﬃcult’ categories of the benchmark. They are

characterized by larger and more segmented interface(s).

Many of them require rearrangements of protein domains,

loops and secondary structure elements at the interfaces

upon interaction to generate a well-packed complex.

These, for instance, involve enzymes that perform their

catalytic function on single nucleotides that are ﬂipped

out of the helix into a catalytic pocket of the protein

(1emh, 7mht), restriction enzymes clamping themselves

around the DNA (3bam,1rva) or proteins with complex

dimerization interfaces (1tro, 1f4k). Eﬀective docking of

the bound conformation of these cases is hindered by non-

bonded repulsions associated with interface penetration

and the correct alignment of the segmented interfaces

during the rotation and translation stages of the rigid

body reﬁnement. This limits the eﬃciency of the rigid-

body bound-bound docking and in part explains the

lower the total number of star-ranked solutions for these

cases.

Despite the diﬀerences in total number of star-ranked

solutions, the 10 best solutions were selected based on the

HADDOCK score in all cases coincided with the best so-

lutions based on the CAPRI criteria. This indicates that

the HADDOCK scoring function at this stage is suﬃcient

to retrieve the best solutions.

Unbound–unbound docking starting from a canonical

B-DNA structural model

We proceeded with the docking of the unbound conform-

ation of the proteins with canonical B-DNA models using

ideal AIRs. To increase the sampling of conformational

space for the proteins, especially those that use ﬂexible

loops to interact with DNA grooves, we ﬁrst performed

a simulated annealing on the interface residues followed

by a reﬁnement in explicit water. This procedure resulted

in an ensemble of ﬁve structures, including the original

unbound protein, sampling diﬀerent conformations of

the interface. In 66% of the cases, conformations closer

to the bound conformation then the unbound reference

protein were sampled. The protein–DNA docking

protocol, at this stage, eﬀectively incorporates two

modes of ﬂexibility: implicit sampling by means of the

ensemble of protein starting structures and explicit

sampling of protein and DNA conformational space

during semi-ﬂexible reﬁnement.

Figure 2 illustrates the docking results using only rigid-

body docking (A) and the eﬀect of a subsequent semi-

ﬂexible reﬁnement (B). Here, the cumulative bar graphs

Percentage of acceptable solutions (%, 1 star or higher)

Complex (PDB id)

Figure 1. Cumulative bar graph expressing the quality of the docking solutions according to the CAPRI star rating for all 2000 bound–bound

rigid-body docking solutions. Complexes are sorted according to the total number of obtained stars. CAPRI criteria are deﬁned as; three stars (high

quality): Fnat >0.5, l-r.m.s.d or i-r.m.s.d <1.0 A

˚; two stars (medium quality): Fnat >0.3, l-r.m.s.d <5.0 A

˚or i-r.m.s.d <2.0 A

˚; one star (acceptable

quality): Fnat >0.1, l-r.m.s.d <10.0 A

˚or i-r.m.s.d <4.0 A

˚. Fnat is the fraction of native contacts within a 5 A

˚cutoﬀ.

Nucleic Acids Research, 2010, Vol. 38, No. 17 5639

at University Library Utrecht on October 15, 2010nar.oxfordjournals.orgDownloaded from

show the percentage of CAPRI one-star (white bars) and

two-star solutions (grey bars) over all rigid-body (4000)

and reﬁned (400) solutions. Overall, 96% of the cases

improve due to explicit ﬂexibility. For a number of

complexes, one- and two-star solutions were already

obtained after rigid-body docking. In all cases, except for

1dfm, the number of one- or two-star solutions increased

signiﬁcantly after semi-ﬂexible reﬁnement. The number of

star ranking solutions obtained after rigid-body docking

and there subsequent improvement due to explicit ﬂexibil-

ity, clearly divides the complexes into three groups that

coincide reasonably well with the ‘easy’, ‘intermediate’

and ‘diﬃcult’ categories of the benchmark. For the ‘easy’

category the inclusion of explicit ﬂexibility readily results

in a shift from one- to two-star solutions, for the ‘inter-

mediate’ category the number of one-star solutions greatly

improves and for the ‘diﬃcult’ category one-star solutions

are often only achieved because of explicit ﬂexibility.

Unbound–unbound docking starting from custom-built

B-DNA structural models

The previous docking results show the improvements that

can be obtained when using explicit ﬂexibility versus rigid-

body docking. In all cases, the DNA and the proteins

could adapt their conformation to better interact with

each other. For the DNA, these conformational changes

range from small local changes in helical bend and groove

width, while maintaining a relative straight helix, to larger

global changes that eﬀectively bend and twist the DNA

structure. However, the amount of conformational space

that can be sampled during the semi-ﬂexible reﬁnement

stage is limited. Starting from a canonical B-DNA struc-

tural model, the semi-ﬂexible reﬁnement stage improved

the DNA model on average by 0.84 ± 0.36A

˚all heavy

atom r.m.s.d with respect to the reference. This clearly

cannot account for the often large DNA conformational

tluciffidetaidemretniysae

Complex (PDB id)

Percentage (%) of 1 and 2 star solutions

Figure 2. Cumulative bar graphs expressing the quality of the best 400 docking solutions according to the HADDOCK score in terms of CAPRI

one-star (grey) and two-star (white) results, for the two-stage unbound–unbound protein–DNA docking using true interface derived restraints.

Results are presented for; the rigid-body docking starting from a canonical B-DNA model (A); after the semi-ﬂexible reﬁnement (B) and after

semi-ﬂexible reﬁnement using an ensemble of custom DNA 3D structural models (C). Complexes are sorted according to the total number of

obtained stars in (B), reclassifying the benchmark into ‘easy’, ‘intermediate’ and ‘diﬃcult’ categories. See caption of Figure 1 for the deﬁnition of the

CAPRI criteria.

5640 Nucleic Acids Research, 2010, Vol. 38, No. 17

at University Library Utrecht on October 15, 2010nar.oxfordjournals.orgDownloaded from

changes observed in the benchmark (ranging from 3 up

to 10 A

˚).

The amount and consistency of the DNA conform-

ational changes that did occur during semi-ﬂexible reﬁne-

ment, can however provide an indication of the extent of

conformational change to be expected in the ﬁnal complex

as we have shown before (18). By analyzing the conform-

ational changes in the top 10 solutions of the best cluster,

both selected based on the HADDOCK score, we

generated ﬁve new DNA structural models with custom

conformations reﬂecting the conformational changes that

took place in the DNA during the ﬁrst docking round for

every test case (see ‘Materials and methods’ section).

The eﬀects of using a custom-built DNA structural

ensemble on the docking results obtained after semi-

ﬂexible reﬁnement is illustrated in Figure 2C. Again, the

cumulative bar graph shows the percentage of CAPRI

one-star (white bars) and two-star solutions (grey bars)

among all (400) reﬁned docking solutions according to

the HADDOCK score.

In a number of cases there is a marked increase in one-

and/or two-star solutions due to the use of the ensemble,

while in other cases there is no improvement or even a

reduction. However, because the ensemble contains

custom built DNA structures in diﬀerent conformations,

it is possible that one or several of these are less successful

in sampling relevant conformational space than the ca-

nonical B-DNA model used in the ﬁrst run. However, if

even only one of the ﬁve models is signiﬁcantly better that

canonical B-DNA, and the scoring and clustering stage

select solutions obtained from this model then an im-

provement is achieved compared to only semi-ﬂexible re-

ﬁnement. Figure 3 better illustrates the results by

individual graphs showing for every test case the various

r.m.s.d values and fraction of native contacts for the 10

best solutions of the top-ranking cluster, both selected

based on the HADDOCK score. The ﬁgure shows statis-

tics for the corresponding solutions after semi-ﬂexible

reﬁnement, the solutions from the rigid-body stage

starting from canonical B-DNA, and the solutions after

semi-ﬂexible reﬁnement using an ensemble of custom-built

DNA starting structures (source data can be found in

Supplementary Tables S1–S3 of the Supplementary

Data). With respect to the best 10 solutions, our

easy easy

intermediate intermediate

difficult difficult

DNA r.m.s.d (Å)

Complex r.m.s.d (Å)

Interface r.m.s.d (Å)

Fnat (frac.)

Figure 3. All heavy atom r.m.s.d values from the reference complex [(A) DNA only, (B) full complex, (C) interface] and fraction of native contacts

[Fnat, (D)] for the 10 best solutions of the best cluster, both selected based on the HADDOCK score, after rigid-body docking (open squares) and

semi-ﬂexible reﬁnement (closed circles) starting from a canonical B-DNA structural model and after semi-ﬂexible reﬁnement (open triangle) starting

from an ensemble of custom-built DNA models.

Nucleic Acids Research, 2010, Vol. 38, No. 17 5641

at University Library Utrecht on October 15, 2010nar.oxfordjournals.orgDownloaded from

two-stage docking protocol improved the results in 91%

of the cases relative to rigid-body docking. The use of an

ensemble of custom-built DNA structural models (the

second stage of the docking) further improved the

results in 72% of the cases compared to the ﬁrst stage

only. For most complexes there is a marked improvement

in terms of r.m.s.d from the reference complex, when pro-

gressing from rigid-body docking to the use of an

ensemble of custom built DNA structural models. The

improvement in DNA, interface and all heavy-atom

r.m.s.d becomes more signiﬁcant with the increasing diﬃ-

culty of the test cases. This trend is to be expected as the

conformational changes between unbound and bound

structures are small in the ‘easy’ category and become

more pronounced in the ‘intermediate’ and ‘diﬃcult’

categories of the benchmark. These results show the eﬃ-

ciency of the DNA modelling procedure in capturing the

essential motions that occur in the DNA upon com-

plex formation. The fraction of native contacts improves

signiﬁcantly throughout the benchmark even when the so-

lutions improve little in terms of r.m.s.d. Apart

from this, the convergence in the 10 best solutions in

general improves, which is apparent in the smaller

standard deviations (Figure 3) and an improved

clustering (Supplementary Table S3, Supplementary

Data).

Unbound–unbound docking using experimental derived

restraints

In a ‘real-life’ docking situation, AIRs are typically

deﬁned based on experimental data or interface predic-

tions (19,20). The quality and quantity of available data

can inﬂuence the correct assembly of the interaction inter-

face(s) and the conformational changes brought about in

the ﬂexible stages of the docking. To evaluate the perform-

ance of our two-stage protein–DNA docking protocol

under these circumstances we selected six representative

test cases from the ‘easy’, ‘intermediate’ and ‘diﬃcult’

categories of the benchmark (two of each). These are, re-

spectively, the protein–DNA complexes formed by the

phage 434 Cro (3cro) transcription factor and retinoid X

receptor (1by4), the hyperthermophile chromosomal

protein SAC7D (1azp) and papillomavirus type 18 E2

(1jj4) protein, the homing endonuclease I-PpoI (1a74)

and the proline utilization transcription activator PUT3

(1zme). For these we deﬁned AIRs based on experimental

data collected from literature sources (see ‘Material and

Methods’ section). Docking the protein and DNA in their

bound conformation (Table 3, bound-rigid) using rigid-

body energy minimization only illustrates that the AIRs

deﬁned based on experimental data are also able to recon-

struct the correct interaction interface(s) in all cases result-

ing in high quality predictions. The overall results for the

unbound docking again show a signiﬁcant improvement in

terms of r.m.s.d from the reference complexes and fraction

of native contacts when progressing from rigid body

docking to semi-ﬂexible reﬁnement and ﬁnally a second

docking round starting from an ensemble of custom-

built DNA structural models (Table 3). The best

docking solutions superimposed onto their reference struc-

tures are presented in Figure 4.

Although the overall results improved for all six test

cases, diﬀerences were observed. The bound and

unbound components of the retinoid X receptor–DNA

complex (1by4) diﬀer little from each other in terms of

r.m.s.d from the reference and rigid body docking

readily generates one-star solutions. The complex is

composed of two proteins that interact with the DNA

major groove but not with each other. Independent

movement of both proteins resulted in a relative large

variation in the 10 best solutions after semi-ﬂexible reﬁne-

ment when starting from a canonical B-DNA model. The

use of a custom built DNA library does not reduce this

variation but does signiﬁcantly improve the fraction of

native contacts and medium quality solutions. The

phage 434 Cro–DNA complex (3cro) is a similar case

with the exception that the proteins dimerize. This

results in far less variation in the 10 best solutions after

the ﬂexible stages and a sequential improvement of the

r.m.s.d values and fraction of native contacts at each

step of the docking. The hyperthermophile chromosomal

protein SAC7D–DNA complex (1azp) binds in a non-

speciﬁc manner to the DNA minor groove. The experi-

mental data available for this complex are less well

deﬁned than for the other test cases. Despite this, the

two-stage docking protocol did reproduce the characteris-

tic minor groove widening observed for this system result-

ing in a signiﬁcant improvement in r.m.s.d when using an

ensemble of custom built DNA structural models. The

speciﬁc kink in the DNA structure observed at the

second C–G base pair (61) in the reference complex

was, however, predicted at the third G–A base pair step

(25) in the docking solutions. The potential of our two-

stage docking protocol to deal with large DNA conform-

ational changes is best illustrated in the case of the homing

endonuclease I-PpoI–DNA complex (1a74). Here, the

overall bend of 38is reproduced in the best solutions

(45). The information available for this complex results

in a well deﬁned, curved, interaction interface on the

protein and indicates that there is little conformational

diﬀerence of the protein in its bound and unbound state.

As such, the sharp bend introduced in the DNA by the

analysis and modelling step could be sampled up to 10

times the standard deviation from the average to match

the protein surface (see ‘Materials and methods’ section).

The proline utilization transcription activator PUT3

(1zme) is a diﬃcult case from both protein and DNA per-

spectives. The protein contains two globular DNA

binding domains connected to a core domain with a

long ﬂexible linker. The NMR ensemble of the unbound

protein contains the DNA binding domains in many dif-

ferent orientations that prevent eﬀective docking in the

rigid body stage. Therefore, we cut the protein at the

ﬂexible linkers, resulting in three parts that were docked

as separated bodies. Peptide linker restraints were deﬁned

between the amino acids at the scission sites. After semi-

ﬂexible reﬁnement, we reconnected the diﬀerent parts in

the 10 best solutions and used the resulting protein

ensemble for the second docking stage starting from an

ensemble of custom built DNA structural models.

5642 Nucleic Acids Research, 2010, Vol. 38, No. 17

at University Library Utrecht on October 15, 2010nar.oxfordjournals.orgDownloaded from

DISCUSSION

The use of AIRs is essential to the success of the

HADDOCK docking methodology in general. These are

used to position the protein at the interaction interface of

the DNA and, together with the ﬂexible stages of the

docking, to facilitate conformational changes. We have

shown previously the importance of AIRs in protein–

DNA docking (18) using three monomeric transcription

factor DNA complexes as test cases. In the current study

we reﬁned our initial method and evaluated its performance

on a benchmark of 47 protein–DNA complexes (30).

Compared with the initial three test cases the benchmark

contains complexes from various structural functional

classes in which one or multiple proteins interact with the

DNA using various binding modes. Because of the presence

of multiple proteins or DNA-binding domains, 40% of the

benchmark required docking following a multi-body

(N>2) approach. This challenging benchmark oﬀers a

good platform to evaluate the capabilities of our docking

method. We will discuss in the following the two questions

that were the focus of both this study as well as the previous

work describing the initial protein–DNA docking method.

How well is the method able to identify the correct

interaction interface(s)?

The assembly of the interaction interface(s) is a process

driven by AIRs. In ‘real-life’ docking settings the AIRs are

typically deﬁned based on experimental data and/or inter-

face predictions. The quality of the docking solutions is

therefore closely related to the amount and quality of

Table 3. Performance of the two-stage docking protocol when using AIRs based on experimental information: the r.m.s.d values from the

reference and fraction of native contacts for the top ten docking solutions of the top ranking cluster both selected based on the HADDOCK

score

r.m.s.d (A

˚) Fnat

CAPRI

***

Total

Interface

DNA

Protein

‘Easy’

1by4

Bound rigid 0.41

0.08

0.34

0.07

0.00

0.38

0.07

0.89

0.02

0,0,10

Unbound rigid 4.33

0.72

4.01

0.53

1.41

0.00

4.66

0.73

0.11

0.04

4,0,0

Unbound ﬂex 6.72

2.10

5.87

1.71

1.90

0.19

6.98

2.21

0.17

0.05

5,0,0

DNA lib 5.52

2.43

4.91

2.32

1.61

0.14

5.85

2.46

0.27

0.09

4,3,0

3cro

Bound rigid 0.32

0.16

0.38

0.19

0.00

0.44

0.22

0.85

0.09

0,0,10

Unbound rigid 3.79

0.60

3.51

0.63

3.70

0.00

3.50

0.83

0.15

0.05

10,0,0

Unbound ﬂex 3.57

0.63

3.29

0.68

2.86

0.30

3.19

0.68

0.27

0.07

6,2,0

DNA lib 2.89

0.40

2.62

0.73

2.08

0.21

2.96

0.43

0.40

0.06

3,7,0

‘Intermediate’

1azp

Bound rigid 0.33

0.07

0.31

0.07

0.00

0.11

0.00

0.92

0.03

0,0,10

Unbound rigid 7.12

2.06

7.09

2.25

3.25

0.00

3.58

0.02

0,0,0

Unbound ﬂex 6.90

2.00

6.68

2.26

2.87

0.32

3.64

0.13

0.04

0,0,0

DNA lib 4.56

0.79

4.00

0.45

1.83

0.26

3.76

0.16

0.10

0.04

5,0,0

1jj4

Bound rigid 0.39

0.10

0.40

0.09

0.00

0.10

0.03

0.82

0.07

0,0,10

Unbound rigid 4.23

0.37

4.76

0.48

3.19

0.00

1.47

0.05

0.09

0.02

3,0,0

Unbound ﬂex 4.25

0.43

4.55

0.58

3.19

0.21

2.40

0.02

0.16

0.07

6,0,0

DNA lib 3.22

0.30

3.62

0.38

2.38

0.14

2.37

0.05

0.21

0.07

9,1,0

‘Diﬃcult’

1a74

Bound rigid 0.06

0.01

0.07

0.01

0.00

0.01

0.00

0.84

0.01

0,0,10

Unbound rigid 5.43

0.99

6.88

0.97

7.44

0.00

1.68

0.14

0.04

0.02

0,0,0

Unbound ﬂex 4.95

0.38

6.30

0.46

7.12

0.32

1.84

0.14

0.04

8,0,0

DNA lib 2.72

0.25

3.37

0.32

3.76

0.19

1.78

0.12

0.24

0.05

9,1,0

1zme

Bound rigid 0.48

0.11

0.46

0.08

0.00

0.01

0.00

0.79

0.06

0,0,10

Unbound rigid 6.29

0.64

5.49

0.68

4.28

0.00

5.67

0.61

0.06

0.03

0,0,0

Unbound ﬂex 6.15

0.62

5.29

0.59

4.68

0.33

5.88

0.27

0.12

0.06

4,0,0

DNA lib 5.27

0.62

4.63

0.80

3.35

0.13

5.55

0.48

0.15

0.04

8,0,0

Average all heavy atom r.m.s.d values from the reference structure (A

˚, standard deviation in subscript) calculated over:

The entire complex.

The interface.

The DNA only for the 10 top ranking solutions.

The protein only for the 10 top ranking solutions.

The r.m.s.d values are reported for; bound rigid-body docking (bound rigid); unbound rigid-body docking (unbound rigid), semi-ﬂexible reﬁnement

(unbound ﬂex.) starting from canonical B-DNA; unbound semi-ﬂexible docking using a library of custom-built DNA structural models as input

(DNA library).

Fnat is the fraction of native contacts.

Number of one-, two- and three-star CAPRI ranked solutions obtained in the top 10 solutions.

Nucleic Acids Research, 2010, Vol. 38, No. 17 5643

at University Library Utrecht on October 15, 2010nar.oxfordjournals.orgDownloaded from

available data in terms of their accuracy and information

content. We started from an ideal situation in which

the restraints were derived from the intermolecular

contacts in the reference complex. Bound docking

resulted for 75% of cases in three-star (high quality)

predictions among the top 10 solutions based on the

HADDOCK score (Figure 1). The percentage of

generated high-quality solutions and the total number of

star-ranked solutions, however, declined for the ‘inter-

mediate’ and ‘diﬃcult’ cases due to interface topology

features such as segmentation and rearrangement of struc-

ture elements. Such rearrangements occur in protein

domains, loops and secondary-structure elements at the

interfaces during the process of complex formation; they

are required to form a well-packed complex. The diﬀer-

ence between the bound and unbound conformation of

the protein and DNA interfaces in the benchmark (30)

further illustrates this. Consequently, in a bound–bound

docking setting, the docking eﬃciency is hindered by non-

bonded repulsions associated with interface penetration

and by the correct alignment of the segmented interfaces

during the rotation and translation stages of the rigid

body reﬁnement. The increase in the total number of

star-ranked solutions for many of the ‘diﬃcult’ test cases

in unbound–unbound docking relative to bound–bound

docking further illustrates this process as rearrangements

Figure 4. Best solutions from unbound ﬂexible docking using an ensemble of custom-built DNA structural models (blue) superimposed on to the

reference structure (yellow). The complexes are grouped according to their docking diﬃculty (‘easy’, ‘intermediate’ and ‘diﬃcult’) as indicated in

the benchmark. The CAPRI score for each solution is indicated as one or two stars after the PDB code as well as the fraction of native contacts (a),

the interface (b) and DNA r.m.s.d (c) from the reference structure. r.m.s.d values (A

˚) were calculated after superimposition on all heavy atoms of the

selected regions of the reference complex. The ﬁgures were generated using Pymol (DeLano Scientiﬁc LLC, www.pymol.org).

5644 Nucleic Acids Research, 2010, Vol. 38, No. 17

at University Library Utrecht on October 15, 2010nar.oxfordjournals.orgDownloaded from

are allowed to take place. Still there are a number of test

cases such as 1tro and 1f4k in which non-bonded repul-

sions hamper the docking. Given that these cases can be

identiﬁed beforehand, the docking eﬃciency could be

improved by scaling down the non-bonded energy terms

(inter_rigid term to 0.001 or lower in HADDOCK); this

allows penetration to occur during the docking. An initial

test with a scaled down non-bonded energy term for the

above-mentioned two test cases resulted in a signiﬁcant

increase in the number of one- and two-star solutions

(Supplementary Table S4, Supplementary Data). This

shows that the AIRs are not the limiting factor but also

raises the question whether a change in the non-bonded

energy term scaling factor could be beneﬁcial throughout

the benchmark. Our experience in protein–protein

docking however indicates that the scoring becomes

more challenging, which might be detrimental at the end.

The unbound two-stage ﬂexible docking using the same

restraints (Figures 2 and 3) resulted in the prediction of

one- to two-star solutions depending on the level of diﬃ-

culty of the test cases. Although these results are signiﬁ-

cantly better than unbound rigid-body docking only, they

still indicate that conformational changes are the limiting

factor in protein–DNA docking.

The same series of docking experiments were performed

with a representative selection of six test cases using AIRs

deﬁned based on experimental information (Table 3,

Figure 4). The results were comparable to the use of

ideal restraints in terms of the CAPRI quality criteria.

This clearly illustrates that readily available non-structural

experimental data are suﬃcient to assemble the correct

interaction interface(s) in these challenging, often multi-

component, protein–DNA systems. Still, the quality of

the generated solutions is directly related to the quality

of the used experimental information. Sparse- and/or

low-quality information will likely result in poor-quality

docking solutions, especially for multi-component

systems. The AIRs can, however, be deﬁned based on a

wider variety of information sources than used in the

current work. For instance, NMR data or even statistical

protein–DNA interaction potentials, are promising means

of improving the results either by driving the docking or

ﬁltering solutions afterwards. With respect to the latter we

should note that the many diﬀerent solutions generated in

this benchmark docking eﬀort, provide a compelling set of

decoy structures that can be useful for the development

and validation of scoring functions.

How successful is the method in dealing with

conformational changes upon complex formation?

The correct treatment of conformational changes upon

complex formation is likely the most challenging aspect

of protein–DNA docking. Both protein(s) and DNA

readily change their conformation upon complex forma-

tion. The extent of this change forms the basis of the

protein–DNA benchmark categorization. Our two-stage

protein–DNA docking method was designed to deal

with this challenge and its performance is best illustrated

in the docking of unbound proteins with canonical

B-DNA using ideal AIRs. While a single docking run

was suﬃcient to generate two-star solutions for the

‘easy’ cases, the two-stage protocol was often required to

generate one–two-star solution for the ‘intermediate’ and

‘diﬃcult’ cases. Altogether, this approach was successful

in generating at least one-star solutions for 96% of the

complete benchmark. This illustrates that the explicit ﬂexi-

bility implemented in HADDOCK is suﬃcient to generate

two-star solution in the ‘easy’ cases where conformational

changes are limited but that this approach fails for cases

where such changes are more pronounced such as in the

‘intermediate’ and ‘diﬃcult’ cases. For the latter, our

DNA analysis and modelling procedure is capable of ex-

tracting the main bend and twist motions that occur in the

DNA upon complex formation and use these for the

beneﬁt of DNA modelling. In that way, a larger part of

the relevant DNA conformational space can be sampled

than what is feasible within a single round of semi-ﬂexible

reﬁnement. Even results of the ‘easy’ test cases with

limited conformational changes are improved by this

two-stage procedure. Finally, the use of experimentally-

derived AIRs on a subset of six test cases showed that

our method also signiﬁcantly improved the docking

results under real-life conditions when less ideal AIR

restraints are available.

Although the semi-ﬂexible reﬁnement stage of

HADDOCK is able to introduce many of the DNA con-

formational changes required for correct complex forma-

tion it has diﬃculties predicting DNA groove expansion

facilitated by negative base pair step sliding (for example

in 1a74 and 1g9z). Consequently, this mode of conform-

ational change is not detected by our DNA analysis pro-

cedure and not introduced in the custom-built DNA

ensemble. Although the improvements in r.m.s.d to the

reference complex and fraction of native contacts clearly

illustrate that our method outperforms rigid-body docking

it does raise questions on the quality of the DNA in the

generated solutions. This however, remains a diﬃcult issue

due to the lack of DNA structure validation procedures.

Furthermore, our method predominantly focuses on the

conformational changes in the DNA, but also proteins can

often change their conformation upon complex formation,

sometimes quite drastically as, for example, in the restric-

tion endonuclease MvaI (2oaa). While accounting for

small conformational changes by means of ﬂexible reﬁne-

ment and the use of protein ensembles that sample diﬀer-

ent interface conformations, large conformational changes

such as loop and domain rearrangements or disordered to

order transitions remain a challenge. Such events are

present in some of the test cases where the use of an

ensemble of custom-built DNA structural models did

not improve the results signiﬁcantly. This still leaves

plenty of opportunities for improvements, for instance in

those cases where protein domain rearrangements are

facilitated by ﬂexible ‘hinges’ connecting them. Such

domains can be docked as separate bodies, enabling

them to sample conformational space individually. This

procedure has been successfully used for the proline util-

ization transcription activator PUT3 (1zme) in this study.

The ﬂexible protein–DNA docking approach described

in this article can beneﬁt protein–DNA interaction studies

at several levels. It can be used to generate models of

Nucleic Acids Research, 2010, Vol. 38, No. 17 5645

at University Library Utrecht on October 15, 2010nar.oxfordjournals.orgDownloaded from

protein–DNA complexes from the structures of the

unbound proteins and a canonical B-DNA in the

presence of suitable experimental data without any prior

knowledge of the DNA conformational changes required

to establish the complex. It should also be useful for

studying the eﬀects of mutations or diﬀerent operator se-

quences on complex formation. In addition, it can assist in

experimental structural studies by, for instance, providing

initial DNA structural models to guide and speed up the

NMR analysis and assignment process.

In summary, by allowing the inclusion of a large variety

of experimental and/or prediction data, together with a

ﬂexible description of the DNA, the proposed docking

approach should be a useful tool in structural studies of

protein–DNA complexes.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

European Community (FP6 I3 project ‘EU-NMR’,

contract no. RII3-026145 and FP7 I3 project ‘eNMR’,

contract no. 213010-e-NMR) and VICI grant from the

Netherlands Organization for Scientiﬁc Research (NWO)

to A.M.J.J.B. (grant no. 700.96.442). Funding for open

access charge: VICI grant from the Netherlands

Organization for Scientiﬁc Research (NWO) (grant no.

700.96.442 to A.M.J.J.B.).

Conﬂict of interest statement. None declared.

REFERENCES

1. Halperin,I., Ma,B., Wolfson,H. and Nussinov,R. (2002) Principles

of docking: An overview of search algorithms and a guide to

scoring functions. Proteins,47, 409–443.

2. Schneidman-Duhovny,D., Nussinov,R. and Wolfson,H.J. (2004)

Predicting molecular interactions in silico: II. Protein-protein and

protein-drug docking. Curr. Med. Chem.,11, 91–107.

3. Ritchie,D.W. (2008) Recent progress and future directions in

protein-protein docking. Curr. Protein Pept. Sci.,9, 1–15.

4. Gane,P.J. and Dean,P.M. (2000) Recent advances in structure-

based rational drug design. Curr. Opin. Struct. Biol.,10, 401–404.

5. Joseph-McCarthy,D. (1999) Computational approaches to

structure-based ligand design. Pharmacol. Ther.,84, 179–191.

6. Kuntz,I.D. (1992) Structure-based strategies for drug design and

discovery. Science,257, 1078–1082.

7. Dunn,R.K. and Kingston,R.E. (2007) Gene regulation in the

postgenomic era: technology takes the wheel. Mol. Cell,28,

708–714.

8. Adesokan,A.A., Roberts,V.A., Lee,K.W., Lins,R.D. and

Briggs,J.M. (2003) Prediction of HIV-1 integrase/viral DNA

interactions in the catalytic domain by fast molecular docking.

J. Med. Chem.,47, 821–828.

9. Aloy,P., Moont,G., Gabb,H.A., Querol,E., Aviles,F.X. and

Sternberg,M.J. (1998) Modelling repressor proteins docking to

DNA. Proteins,33, 535–549.

10. Bastard,K., Thureau,A., Lavery,R. and Prevost,C. (2003) Docking

macromolecules with ﬂexible segments. J. Comput. Chem.,24,

1910–1920.

11. Fan,L. and Roberts,V.A. (2006) Complex of linker histone H5

with the nucleosome and its implications for chromatin packing.

Proc. Natl Acad. Sci. USA,103, 8384–8389.

12. Fanelli,F. and Ferrari,S. (2006) Prediction of MEF2A-DNA

interface by rigid body docking: a tool for fast estimation of

protein mutational eﬀects on DNA binding. J. Struct. Biol.,153,

278–283.

13. Knegtel,R.M., Boelens,R. and Kaptein,R. (1994) Monte Carlo

docking of protein-DNA complexes: incorporation of DNA

ﬂexibility and experimental data. Protein Eng.,7, 761–767.

14. Liu,Z., Guo,J.T., Li,T. and Xu,Y. (2008) Structure-based

prediction of transcription factor binding sites using a protein-

DNA docking approach. Proteins,72, 1114–1124.

15. Poulain,P., Saladin,A., Hartmann,B. and Prevost,C. (2008)

Insights on protein-DNA recognition by coarse grain modelling.

J. Comput. Chem.,29, 2582–2592.

16. Roberts,V.A., Case,D.A. and Tsui,V. (2004) Predicting

interactions of winged-helix transcription factors with DNA.

Proteins,57, 172–187.

17. Sandmann,C., Cordes,F. and Saenger,W. (1996) Structure model

of a complex between the factor for inversion stimulation (FIS)

and DNA: modeling protein-DNA complexes with dyad

symmetry and known protein structures. Proteins,25, 486–500.

18. van Dijk,M., van Dijk,A.D., Hsu,V., Boelens,R. and

Bonvin,A.M. (2006) Information-driven protein-DNA docking

using HADDOCK: it is a matter of ﬂexibility. Nucleic Acids Res.,

34, 3317–3325.

19. Melquiond,A.S.J. and Bonvin,A.M.J.J. (2009) Experimental

Constraint-Driven Docking. In Zacharias,M. (ed.), Protein-protein

Complexes: Analysis, Modelling and Drug Design. Imperial College

Press, London, pp. 183–209.

20. van Dijk,A.D., Boelens,R. and Bonvin,A.M. (2005) Data-driven

docking for the study of biomolecular complexes. FEBS J.,272,

293–312.

21. Mondragon,A. and Harrison,S.C. (1991) The phage 434 Cro/OR1

complex at 2.5 A resolution. J. Mol. Biol.,219, 321–334.

22. Raumann,B.E., Rould,M.A., Pabo,C.O. and Sauer,R.T. (1994)

DNA recognition by beta-sheets in the Arc repressor-operator

crystal structure. Nature,367, 754–757.

23. Chuprina,V.P., Rullmann,J.A., Lamerichs,R.M., van Boom,J.H.,

Boelens,R. and Kaptein,R. (1993) Structure of the complex of lac

repressor headpiece and an 11 base-pair half-operator determined

by nuclear magnetic resonance spectroscopy and restrained

molecular dynamics. J. Mol. Biol.,234, 446–462.

24. Bessiere,D., Lacroix,C., Campagne,S., Ecochard,V., Guillet,V.,

Mourey,L., Lopez,F., Czaplicki,J., Demange,P., Milon,A. et al.

(2008) Structure-function analysis of the THAP zinc ﬁnger of

THAP1, a large C2CH DNA-binding module linked to Rb/E2F

pathways. J. Biol. Chem.,283, 4352–4363.

25. Cai,S., Zhu,L., Zhang,Z. and Chen,Y. (2007) Determination of

the three-dimensional structure of the Mrf2-DNA complex using

paramagnetic spin labeling. Biochemistry,46, 4943–4950.

26. Gamsjaeger,R., Swanton,M.K., Kobus,F.J., Lehtomaki,E.,

Lowry,J.A., Kwan,A.H., Matthews,J.M. and Mackay,J.P. (2008)

Structural and biophysical analysis of the DNA binding

properties of myelin transcription factor 1. J. Biol. Chem.,283,

5158–5167.

27. Liu,W., Vierke,G., Wenke,A.K., Thomm,M. and Ladenstein,R.

(2007) Crystal structure of the archaeal heat shock regulator from

Pyrococcus furiosus: a molecular chimera representing eukaryal

and bacterial features. J. Mol. Biol.,369, 474–488.

28. Singh,S., Hager,M.H., Zhang,C., Griﬃth,B.R., Lee,M.S.,

Hallenga,K., Markley,J.L. and Thorson,J.S. (2006) Structural

insight into the self-sacriﬁce mechanism of enediyne resistance.

ACS Chem. Biol.,1, 451–460.

29. van Dijk,M. and Bonvin,A.M. (2009) 3D-DART: a DNA

structure modelling server. Nucleic Acids Res.,37, W235–W239.

30. van Dijk,M. and Bonvin,A.M. (2008) A protein-DNA docking

benchmark. Nucleic Acids Res.,36, e88.

31. Janin,J. (2005) Assessing predictions of protein-protein

interaction: the CAPRI experiment. Protein Sci.,14, 278–283.

32. Hubbard,S.J. and Thornton,J.M. (1993) ‘NACCESS’, computer

program, Department of Biochemistry and Molecular Biology.

University College London.

33. de Vries,S.J., van Dijk,A.D., Krzeminski,M., van Dijk,M.,

Thureau,A., Hsu,V., Wassenaar,T. and Bonvin,A.M. (2007)

5646 Nucleic Acids Research, 2010, Vol. 38, No. 17

at University Library Utrecht on October 15, 2010nar.oxfordjournals.orgDownloaded from

HADDOCK versus HADDOCK: new features and performance

of HADDOCK2.0 on the CAPRI targets. Proteins,69, 726–733.

34. Lu,X.J. and Olson,W.K. (2003) 3DNA: a software package for

the analysis, rebuilding and visualization of three-dimensional

nucleic acid structures. Nucleic Acids Res.,31, 5108–5121.

35. Lu,X.J. and Olson,W.K. (2008) 3DNA: a versatile, integrated

software system for the analysis, rebuilding and visualization of

three-dimensional nucleic-acid structures. Nat. Protoc.,3,

1213–1227.

36. Dickerson,R.E. (1989) Deﬁnitions and nomenclature of nucleic

acid structure parameters. J. Biomol. Struct. Dyn.,6, 627–34.

37. Zhao,Q., Chasse,S.A., Devarakonda,S., Sierk,M.L., Ahvazi,B. and

Rastinejad,F. (2000) Structural basis of RXR-DNA interactions.

J. Mol. Biol.,296, 509–520.

38. Danielsen,M., Hinck,L. and Ringold,G.M. (1989) Two amino

acids within the knuckle of the ﬁrst zinc ﬁnger specify DNA

response element activation by the glucocorticoid receptor. Cell,

57, 1131–1138.

39. Glass,C.K. (1994) Diﬀerential recognition of target genes by

nuclear receptor monomers, dimers, and heterodimers. Endocr.

Rev.,15, 391–407.

40. Haussler,M.R., Whitﬁeld,G.K., Haussler,C.A., Hsieh,J.C.,

Thompson,P.D., Selznick,S.H., Dominguez,C.E. and Jurutka,P.W.

(1998) The nuclear vitamin D receptor: biological and molecular

regulatory properties revealed. J. Bone Miner. Res.,13, 325–349.

41. Koszewski,N.J., Reinhardt,T.A. and Horst,R.L. (1996) Vitamin D

receptor interactions with the murine osteopontin response

element. J. Steroid Biochem. Mol. Biol.,59, 377–388.

42. Lee,M.S., Kliewer,S.A., Provencal,J., Wright,P.E. and Evans,R.M.

(1993) Structure of the retinoid X receptor alpha DNA binding

domain: a helix required for homodimeric DNA binding. Science,

260, 1117–1121.

43. Mader,S., Kumar,V., de Verneuil,H. and Chambon,P. (1989)

Three amino acids of the oestrogen receptor are essential to its

ability to distinguish an oestrogen from a glucocorticoid-

responsive element. Nature,338, 271–274.

44. Nelson,C.C., Hendy,S.C., Faris,J.S. and Romaniuk,P.J. (1996)

Retinoid X receptor alters the determination of DNA binding

speciﬁcity by the P-box amino acids of the thyroid hormone

receptor. J. Biol. Chem.,271, 19464–19474.

45. Rastinejad,F., Perlmann,T., Evans,R.M. and Sigler,P.B. (1995)

Structural determinants of nuclear receptor assembly on DNA

direct repeats. Nature,375, 203–211.

46. Umesono,K. and Evans,R.M. (1989) Determinants of target gene

speciﬁcity for steroid/thyroid hormone receptors. Cell,57,

1139–1146.

47. Harrison,S.C., Anderson,J.E., Koudelka,G.B., Mondragon,A.,

Subbiah,S., Wharton,R.P., Wolberger,C. and Ptashne,M. (1988)

Recognition of DNA sequences by the repressor of bacteriophage

434. Biophys. Chem.,29, 31–37.

48. Koudelka,G.B. (1998) Recognition of DNA structure by 434

repressor. Nucleic Acids Res.,26, 669–675.

49. Koudelka,G.B. and Lam,C.Y. (1993) Diﬀerential recognition of

OR1 and OR3 by bacteriophage 434 repressor and Cro. J. Biol.

Chem.,268, 23812–23817.

50. Wharton,R.P., Brown,E.L. and Ptashne,M. (1984) Substituting an

alpha-helix switches the sequence-speciﬁc DNA interactions of a

repressor. Cell,38, 361–369.

51. Robinson,H., Gao,Y.G., McCrary,B.S., Edmondson,S.P.,

Shriver,J.W. and Wang,A.H. (1998) The hyperthermophile

chromosomal protein Sac7d sharply kinks DNA. Nature,392,

202–205.

52. Clark,A.T., Smith,K., Muhandiram,R., Edmondson,S.P. and

Shriver,J.W. (2007) Carboxyl pK(a) values, ion pairs, hydrogen

bonding, and the pH-dependence of folding the hyperthermophile

proteins Sac7d and Sso7d. J. Mol. Biol.,372, 992–1008.

53. Dostal,L., Chen,C.Y., Wang,A.H. and Welﬂe,H. (2004) Partial

B-to-A DNA transition upon minor groove binding of protein

Sac7d monitored by Raman spectroscopy. Biochemistry,43,

9600–9609.

54. Kahsai,M.A., Martin,E., Edmondson,S.P. and Shriver,J.W. (2005)

Stability and ﬂexibility in the structure of the hyperthermophile

DNA-binding protein Sac7d. Biochemistry,44, 13500–13509.

55. Peters,W.B., Edmondson,S.P. and Shriver,J.W. (2005) Eﬀect of

mutation of the Sac7d intercalating residues on the temperature

dependence of DNA distortion and binding thermodynamics.

Biochemistry,44, 4794–4804.

56. Kim,S.S., Tam,J.K., Wang,A.F. and Hegde,R.S. (2000) The

structural basis of DNA target discrimination by papillomavirus

E2 proteins. J. Biol. Chem.,275, 31245–31254.

57. Bedrosian,C.L. and Bastia,D. (1990) The DNA-binding domain

of HPV-16 E2 protein interaction with the viral enhancer:

protein-induced DNA bending and role of the nonconserved core

sequence in binding site aﬃnity. Virology,174, 557–575.

58. Sanchez,I.E., Dellarole,M., Gaston,K. and de Prat Gay,G. (2008)

Comprehensive comparison of the interaction of the E2 master

regulator with its cognate target DNA sites in 73 human

papillomavirus types by sequence statistics. Nucleic Acids Res.,36,

756–769.

59. Flick,K.E., Jurica,M.S., Monnat,R.J. Jr and Stoddard,B.L. (1998)

DNA binding and cleavage by the nuclear intron-encoded homing

endonuclease I-PpoI. Nature,394, 96–101.

60. Argast,G.M., Stephens,K.M., Emond,M.J. and Monnat,R.J. Jr

(1998) I-PpoI and I-CreI homing site sequence degeneracy

determined by random mutagenesis and sequential in vitro

enrichment. J. Mol. Biol.,280, 345–353.

61. Eklund,J.L., Ulge,U.Y., Eastberg,J. and Monnat,R.J. Jr (2007)

Altered target site speciﬁcity variants of the I-PpoI His-Cys box

homing endonuclease. Nucleic Acids Res.,35, 5839–5850.

62. Ellison,E.L. and Vogt,V.M. (1993) Interaction of the intron-

encoded mobility endonuclease I-PpoI with its target site. Mol.

Cell Biol.,13, 7531–7539.

63. Galburt,E.A., Chadsey,M.S., Jurica,M.S., Chevalier,B.S., Erho,D.,

Tang,W., Monnat,R.J. Jr and Stoddard,B.L. (2000)

Conformational changes and cleavage by the homing

endonuclease I-PpoI: a critical role for a leucine residue in the

active site. J. Mol. Biol.,300, 877–887.

64. Muscarella,D.E., Ellison,E.L., Ruoﬀ,B.M. and Vogt,V.M. (1990)

Characterization of I-Ppo, an intron-encoded endonuclease that

mediates homing of a group I intron in the ribosomal DNA of

Physarum polycephalum. Mol. Cell Biol.,10, 3386–3396.

65. Wittmayer,P.K., McKenzie,J.L. and Raines,R.T. (1998)

Degenerate DNA recognition by I-PpoI endonuclease. Gene,206,

11–21.

66. Swaminathan,K., Flynn,P., Reece,R.J. and Marmorstein,R. (1997)

Crystal structure of a PUT3-DNA complex reveals a novel

mechanism for DNA recognition by a protein containing a

Zn2Cys6 binuclear cluster. Nat. Struct. Biol.,4, 751–759.

67. Axelrod,J.D., Majors,J. and Brandriss,M.C. (1991) Proline-

independent binding of PUT3 transcriptional activator protein

detected by footprinting in vivo. Mol. Cell Biol.,11, 564–567.

68. Brandriss,M.C. (1987) Evidence for positive regulation of the

proline utilization pathway in Saccharomyces cerevisiae. Genetics,

117, 429–435.

69. Marczak,J.E. and Brandriss,M.C. (1989) Isolation of constitutive

mutations aﬀecting the proline utilization pathway in

Saccharomyces cerevisiae and molecular analysis of the PUT3

transcriptional activator. Mol. Cell Biol.,9, 4696–4705.

70. Marczak,J.E. and Brandriss,M.C. (1991) Analysis of constitutive

and noninducible mutations of the PUT3 transcriptional

activator. Mol. Cell Biol.,11, 2609–2619.

71. Siddiqui,A.H. and Brandriss,M.C. (1989) The Saccharomyces

cerevisiae PUT3 activator protein associates with proline-speciﬁc

upstream activation sequences. Mol. Cell Biol.,9, 4706–4712.

72. Walters,K.J., Dayie,K.T., Reece,R.J., Ptashne,M. and Wagner,G.

(1997) Structure and mobility of the PUT3 dimer. Nat. Struct.

Biol.,4, 744–750.

Nucleic Acids Research, 2010, Vol. 38, No. 17 5647

at University Library Utrecht on October 15, 2010nar.oxfordjournals.orgDownloaded from

Supplementary Data

Data

May 2010

Marc van Dijk · Alexandre M J J Bonvin

In silico molecular docking in DNA aptamer development

Article

Jan 2021
BIOCHIMIE

Aptamers are single-stranded DNA or RNA oligonucleotides generated by SELEX that exhibit binding affinity and specificity against a wide variety of target molecules. Compared to RNA aptamers, DNA aptamers are much more stable and therefore are widely adopted in a number of applications especially in diagnostics. The tediousness and rigor associated with certain steps of the SELEX intensify the efforts to adopt in silico molecular docking approaches together with in vitro SELEX procedures in developing DNA aptamers. Inspired by these endeavors, we carry out an overview of the in silico molecular docking approaches in DNA aptamer generation, by detailing the stepwise procedures as well as shedding some light on the various softwares used. The in silico maturation strategy and the limitations of the in silico approaches are also underscored.

MARTINI-Based Protein-DNA Coarse-Grained HADDOCKing

Article

Full-text available

Oct 2019

Modeling biomolecular assemblies is an important field in computational structural biology. The inherent complexity of their energy landscape and the computational cost associated with modeling large and complex assemblies are major drawbacks for integrative modeling approaches. The so-called coarse-graining approaches, which reduce the degrees of freedom of the system by grouping several atoms into larger “pseudo-atoms,” have been shown to alleviate some of those limitations, facilitating the identification of the global energy minima assumed to correspond to the native state of the complex, while making the calculations more efficient. Here, we describe and assess the implementation of the MARTINI force field for DNA into HADDOCK, our integrative modeling platform. We combine it with our previous implementation for protein-protein coarse-grained docking, enabling coarse-grained modeling of protein-nucleic acid complexes. The system is modeled using MARTINI topologies and interaction parameters during the rigid body docking and semi-flexible refinement stages of HADDOCK, and the resulting models are then converted back to atomistic resolution by an atom-to-bead distance restraints-guided protocol. We first demonstrate the performance of this protocol using 44 complexes from the protein-DNA docking benchmark, which shows an overall ~6-fold speed increase and maintains similar accuracy as compared to standard atomistic calculations. As a proof of concept, we then model the interaction between the PRC1 and the nucleosome (a former CAPRI target in round 31), using the same information available at the time the target was offered, and compare all-atom and coarse-grained models.

Accurate structure prediction of biomolecular interactions with AlphaFold 3

Article

Full-text available

May 2024
NATURE

The introduction of AlphaFold 2¹ has spurred a revolution in modelling the structure of proteins and their interactions, enabling a huge range of applications in protein modelling and design2–6. Here we describe our AlphaFold 3 model with a substantially updated diffusion-based architecture that is capable of predicting the joint structure of complexes including proteins, nucleic acids, small molecules, ions and modified residues. The new AlphaFold model demonstrates substantially improved accuracy over many previous specialized tools: far greater accuracy for protein–ligand interactions compared with state-of-the-art docking tools, much higher accuracy for protein–nucleic acid interactions compared with nucleic-acid-specific predictors and substantially higher antibody–antigen prediction accuracy compared with AlphaFold-Multimer v.2.37,8. Together, these results show that high-accuracy modelling across biomolecular space is possible within a single unified deep-learning framework.

Structural predictions of protein-DNA binding: MELD-DNA

Article

Full-text available

Feb 2023
NUCLEIC ACIDS RES

Structural, regulatory and enzymatic proteins interact with DNA to maintain a healthy and functional genome. Yet, our structural understanding of how proteins interact with DNA is limited. We present MELD-DNA, a novel computational approach to predict the structures of protein-DNA complexes. The method combines molecular dynamics simulations with general knowledge or experimental information through Bayesian inference. The physical model is sensitive to sequence-dependent properties and conformational changes required for binding, while information accelerates sampling of bound conformations. MELD-DNA can: (i) sample multiple binding modes; (ii) identify the preferred binding mode from the ensembles; and (iii) provide qualitative binding preferences between DNA sequences. We first assess performance on a dataset of 15 protein-DNA complexes and compare it with state-of-the-art methodologies. Furthermore, for three selected complexes, we show sequence dependence effects of binding in MELD predictions. We expect that the results presented herein, together with the freely available software, will impact structural biology (by complementing DNA structural databases) and molecular recognition (by bringing new insights into aspects governing protein-DNA interactions).

Searching for Low Probability Opening Events in a DNA Sliding Clamp

Article

Full-text available

Feb 2022

The β subunit of E. coli DNA polymererase III is a DNA sliding clamp associated with increasing the processivity of DNA synthesis. In its free form, it is a circular homodimer structure that can accomodate double-stranded DNA in a nonspecific manner. An open state of the clamp must be accessible before loading the DNA. The opening mechanism is still a matter of debate, as is the effect of bound DNA on opening/closing kinetics. We use a combination of atomistic, coarse-grained, and enhanced sampling strategies in both explicit and implicit solvents to identify opening events in the sliding clamp. Such simulations of large nucleic acid and their complexes are becoming available and are being driven by improvements in force fields and the creation of faster computers. Different models support alternative opening mechanisms, either through an in-plane or out-of-plane opening event. We further note some of the current limitations, despite advances, in modeling these highly charged systems with implicit solvent.

pyDockDNA: A new web server for energy-based protein-DNA docking and scoring

Article

Full-text available

Oct 2022

Proteins and nucleic acids are essential biological macromolecules for cell life. Indeed, interactions between proteins and DNA regulate many biological processes such as protein synthesis, signal transduction, DNA storage, or DNA replication and repair. Despite their importance, less than 4% of total structures deposited in the Protein Data Bank (PDB) correspond to protein-DNA complexes, and very few computational methods are available to model their structure. We present here the pyDockDNA web server, which can successfully model a protein-DNA complex with a reasonable predictive success rate (as benchmarked on a standard dataset of protein-DNA complex structures, where DNA is in B-DNA conformation). The server implements the pyDockDNA program, as a module of pyDock suite, thus including third-party programs, modules, and previously developed tools, as well as new modules and parameters to handle the DNA properly. The user is asked to enter Protein Data Bank files for protein and DNA input structures (or suitable models) and select the chains to be docked. The server calculations are mainly divided into three steps: sampling by FTDOCK, scoring with new energy-based parameters and the possibility of applying external restraints. The user can select different options for these steps. The final output screen shows a 3D representation of the top 10 models and a table sorting the model according to the scoring function selected previously. All these output files can be downloaded, including the top 100 models predicted by pyDockDNA. The server can be freely accessed for academic use (https://model3dbio.csic.es/pydockdna).

Identifying potential ligands specifically binding to beta1-adrenoceptor from Radix Aconiti Lateralis Praeparata extract by affinity chromatographic method

Article

Aug 2022
J PHARMACEUT BIOMED

As expressed predominantly in cardiac tissue, beta1-adrenoceptor (β1-AR) is broadly accepted as one of the main targets for drugs against cardiovascular ailments. However, the discovery of β1-AR ligand is gravely challenged due to the lack of efficient screening method. This work developed a general strategy for pursuing β1-AR ligands from the herbal extract by immobilizing haloalkane dehalogenase (Halo)-tagged β1-AR onto microspheres coated with 6-chlorohexanoic acid, and applying the immobilized β1-AR in the analysis of ligand-receptor interaction. The morphology was characterized by scanning electron microscope (SEM) and X-ray photoelectron spectroscopy (XPS). The chromatographic specificity of the immobilized receptor column was evaluated by determining the association constants of atenolol, esmolol and metoprolol using stepwise frontal analysis plus injection amount-dependent method. The potential ligands binding to β1-AR was screened by collecting the peak with retention time longer than the void time, and identified the collection by reverse phase liquid chromatography coupled with tandem mass spectrometry. The association constants of the three drugs to β1-AR were (3.33±0.29)×106 M⁻¹, (2.33±0.23)×106 M⁻¹ and (2.06±0.03)×106 M⁻¹, indicating a desired specificity of the immobilized receptor for recognizing its ligands. Molecular docking showed that van der Waals, hydrogen bonds, and hydrophobic interactions were the principal interaction forces for the receptor-drug complexes. Benzoylmesaconine was screened as the potential ligand of β1-AR in Radix Aconiti Lateralis Praeparata extract. The association constant of the ligand was (1.06±0.02)×105 M⁻¹, hinting structural modification may be required before clinical application. The immobilized β1-AR is possible to provide a rapid method for screening potential ligands in herbal extract.

NLDock: a Fast Nucleic Acid–Ligand Docking Algorithm for Modeling RNA/DNA–Ligand Complexes

Article

Sep 2021

ITScore-NL: An Iterative Knowledge-Based Scoring Function for Nucleic Acid–Ligand Interactions

Article

Dec 2020

Nucleic acid-ligand complexes underlie numerous cellular processes, such as gene function expression and regulation, in which their three-dimensional structures are important to understand their functions and thus to develop therapeutic interventions. Given the high cost and technical difficulties in experimental methods, computational methods such as molecular docking have been actively used to investigate nucleic acid-ligand interactions in which an accurate scoring function is crucial. However, because of the limited number of experimental nucleic acid-ligand binding data and structures, the scoring function development for nucleic acid-ligand interactions falls far behind that for protein-protein and protein-ligand interactions. Here, based on our statistical mechanics-based iterative approach, we have developed an iterative knowledge-based scoring function for nucleic acid-ligand interactions, named as ITScore-NL, by explicitly including stacking and electrostatic potentials. Our ITScore-NL scoring function was extensively evaluated for its ability in the binding mode and binding affinity predictions on three diverse test sets and compared with state-of-the-art scoring functions. Overall, ITScore-NL obtained significantly better performance than the other 12 scoring functions and predicted near-native poses with rmsd ≤ 1.5 Å for 71.43% of the cases when the top three binding modes were considered and a good correlation of R = 0.64 in binding affinity prediction on the large test set of 77 nucleic acid-ligand complexes. These results suggested the accuracy of ITScore-NL and the necessity of explicitly including stacking and electrostatic potentials.

Docking Methodologies and Recent Advances

Chapter

Jan 2016

Docking, a molecular modelling method, has wide applications in identification and optimization in modern drug discovery. This chapter addresses the recent advances in the docking methodologies like fragment docking, covalent docking, inverse docking, post processing, hybrid techniques, homology modeling etc. and its protocol like searching and scoring functions. Advances in scoring functions for e.g. consensus scoring, quantum mechanics methods, clustering and entropy based methods, fingerprinting, etc. are used to overcome the limitations of the commonly used force-field, empirical and knowledge based scoring functions. It will cover crucial necessities and different algorithms of docking and scoring. Further different aspects like protein flexibility, ligand sampling and flexibility, and the performance of scoring function will be discussed. Full Text Preview Fundamental Necessities Molecular docking program emphasize on the following basic requirements (Mahajan, & Gill, 2014; Krovat, Steindl, & Langer, 2005): 1. A target protein structure with or without a bound ligand is detected by various experimental techniques like NMR or X-Ray crystallography, but if protein structure is not present then protein prediction is done by any technique like threading modelling, homology modelling. 2. Database containing existing or virtual compounds for the docking process 3. Sampling and scoring method, desired scoring and searching algorithms require a computational framework for its efficient working 4. The three-dimensional structure of the protein ligand complex has to be studied in depth of atomic resolution. Continue Reading

3D-DART: a DNA structure modelling server

Article

Full-text available

Jun 2009
NUCLEIC ACIDS RES

There is a growing interest in structural studies of DNA by both experimental and computational approaches. Often, 3D-structural models of DNA are required, for instance, to serve as templates for homology modeling, as starting structures for macro-molecular docking or as scaffold for NMR structure calculations. The conformational adaptability of DNA when binding to a protein is often an important factor and at the same time a limitation in such studies. As a response to the demand for 3D-structural models reflecting the intrinsic plasticity of DNA we present the 3D-DART server (3DNA-Driven DNA Analysis and Rebuilding Tool). The server provides an easy interface to a powerful collection of tools for the generation of DNA-structural models in custom conformations. The computational engine beyond the server makes use of the 3DNA software suite together with a collection of home-written python scripts. The server is freely available at http://haddock.chem.uu.nl/dna without any login requirement.

Interaction of the Intron-Encoded Mobility Endonuclease I- PpoI with its Target Site

Article

Dec 1993

Endonucleases encoded by mobile group I introns are highly specific DNases that induce a double-strand break near the site to which the intron moves. I-PpoI from the acellular slime mold Physarum polycephalum mediates the mobility of intron 3 (Pp LSU 3) in the extrachromosomal nuclear ribosomal DNA of this organism. We showed previously that cleavage by I-PpoI creates a four-base staggered cut near the point of intron insertion. We have now characterized several further properties of the endonuclease. As determined by deletion analysis, the minimal target site recognized by I-PopI was a sequence of 13 to 15 bp spanning the cleavage site. The purified protein behaved as a globular dimer in sedimentation and gel filtration. In gel mobility shift assays in the presence of EDTA, I-PpoI formed a stable and specific complex with DNA, dissociating with a half-life of 45 min. By footprinting and interference assays with methidiumpropyl-EDTA-iron(II), I-PpoI contacted a 22- to 24-bp stretch of DNA. The endonuclease protected most of the purines found in both the major and minor grooves of the DNA helix from modification by dimethyl sulfate (DMS). However, the reactivity to DMS was enhanced at some purines, suggesting that binding leads to a conformational change in the DNA. The pattern of DMS protection differed fundamentally in the two partially symmetrical halves of the recognition sequence.

Evidence for Positive Regulation of the Proline Utilization Pathway in Saccharomyces cerevisiae

Article

Nov 1987
GENETICS

M C Brandriss

A mutation has been identified that prevents Saccharomyces cerevisiae cells from growing on proline as the sole source of nitrogen, causes noninducible expression of the PUT1 and PUT2 genes, and is completely recessive. In the put3-75 mutant, the basal level of expression (ammonia as nitrogen source) of PUT1-lacZ and PUT2-lacZ gene fusions as measured by β-galactosidase activity is reduced 4- and 7-fold, respectively, compared with the wild-type strain. Normal regulation is not restored when the cells are grown on arginine as the sole nitrogen source and put3-75 cells remain sensitive to the proline analog, l-azetidine-2-carboxylic acid, indicating that the block is not at the level of transport of the inducer, proline. In a cross between the put3-75 strain and the semidominant, constitutive mutation PUT3c-68, only parental ditype tetrads were found, indicating allelism of the two mutations. Further support for allelism derives from the comparison of enzyme levels in heteroallelic and heterozygous diploid strains. The constitutive allele appears to be fully dominant to the noninducible allele but only partially dominant to the wild type, suggesting an interaction between the wild-type and PUT3c-68 gene products. The PUT3 gene maps on chromosome XI, about 5.7 cM from the centromere. The phenotypes of alleles of the PUT3 gene, either recessive and noninducible (the put3-75 phenotype) or semidominant and constitutive (the PUT3c-68 phenotype), and their pleiotropy suggest that the PUT3 gene product is a positive activator of the proline utilization pathway.

Modelling repressor proteins docking to DNA

Article

Dec 1998
PROTEINS

The docking of repressor proteins to DNA starting from the unbound protein and model-built DNA coordinates is modeled computationally. The approach was evaluated on eight repressor/DNA complexes that employed different modes for protein/ DNA recognition. The global search is based on a protein-protein docking algorithm that evaluates shape and electrostatic complementarity, which was modified to consider the importance of electrostatic features in DNA-protein recognition. Complexes were then ranked by an empirical score for the observed amino acid /nucleotide pairings (i.e., protein-DNA pair potentials) derived from a database of 20 protein/DNA complexes. A good prediction had at least 65% of the correct contacts modeled. This approach was able to identify a good solution at rank four or better for three out of the eight complexes. Predicted complexes were filtered by a distance constraint based on experimental data defining the DNA footprint. This improved coverage to four out of eight complexes having a good model at rank four or better. The additional use of amino acid mutagenesis and phylogenetic data defining residues on the repressor resulted in between 2 and 27 models that would have to be examined to find a good solution for seven of the eight test systems. This study shows that starting with unbound coordinates one can predict three-dimensional models for protein/DNA complexes that do not involve gross conformational changes on association. Proteins 33:535–549, 1998. © 1998 Wiley-Liss, Inc.

Crystal structure of a PUT3–DNA complex reveals a novel mechanism for DMA recognition by a protein containing a Zn2Cys6 binuclear cluster

Article

Sep 1997
Nat Struct Biol

PUT3 is a member of a family of at least 79 fungal transcription factors that contain a six-cysteine, two-zinc domain called a 'Zn2Cys6 binuclear cluster'. We have determined the crystal structure of the DNA binding region from the PUT3 protein bound to its cognate DNA target. The structure reveals that the PUT3 homodimer is bound asymmetrically to the DNA site. This asymmetry orients a -strand from one protein subunit into the minor groove of the DNA resulting in a partial amino acid-base pair intercalation and extensive direct and water-mediated protein interactions with the minor groove of the DNA. These interactions facilitate a sequence dependent kink at the centre of the DNA site and specify the intervening base pairs separating two DNA half-sites that are contacted in the DNA major groove. A comparison with the GAL4−DNA and PPR1−DNA complexes shows how a family of related DNA binding proteins can use a diverse set of mechanisms to discriminate between the base pairs separating conserved DNA half-sites.

Department of Biochemistry and Molecular Biology

Article

Jan 1993

3DNA — A Software Package for the Analysis, Rebuilding and Visualization of 3Dimensional Nucleic Acid Structures

Article

Sep 2003
NUCLEIC ACIDS RES

Xiang-Jun Lu

We present a comprehensive software package, 3DNA, for the analysis, reconstruction and visualization of three‐dimensional nucleic acid structures. Starting from a coordinate file in Protein Data Bank (PDB) format, 3DNA can handle antiparallel and parallel double helices, single‐stranded structures, triplexes, quadruplexes and other complex tertiary folding motifs found in both DNA and RNA structures. The analysis routines identify and categorize all base interactions and classify the double helical character of appropriate base pair steps. The program makes use of a recently recommended reference frame for the description of nucleic acid base pair geometry and a rigorous matrix‐based scheme to calculate local conformational parameters and rebuild the structure from these parameters. The rebuilding routines produce rectangular block representations of nucleic acids as well as full atomic models with the sugar–phosphate backbone and publication quality ‘standardized’ base stacking diagrams. Utilities are provided to locate the base pairs and helical regions in a structure and to reorient structures for effective visualization. Regular helical models based on X‐ray diffraction measurements of various repeating sequences can also be generated within the program.

The hyperthermophile chromsomal protein Sac7d sharply kinks DNA

Article

Mar 1998

The proteins Sac7d and Sso7d belong to a class of small chromosomal proteins from the hyperthermophilic archaeon Sulfolobus acidocaldarius and S. solfactaricus, respectively. These proteins are extremely stable to heat, acid and chemical agents. Sac7d binds to DNA without any particular sequence preference and thereby increases its melting temperature by approximately 40 degrees C. We have now solved and refined the crystal structure of Sac7d in complex with two DNA sequences to high resolution. The structures are examples of a nonspecific DNA-binding protein bound to DNA, and reveal that Sac7d binds in the minor groove, causing a sharp kinking of the DNA helix that is more marked than that induced by any sequence-specific DNA-binding proteins. The kink results from the intercalation of specific hydrophobic side chains of Sac7d into the DNA structure, but without causing any significant distortion of the protein structure relative to the uncomplexed protein in solution.

Insights on protein-DNA recognition by coarse grain modelling

Article

Nov 2008
J COMPUT CHEM

Coarse grain modelling of macromolecules is a new approach, potentially well adapted to answer numerous issues, ranging from physics to biology. We propose here an original DNA coarse grain model specifically dedicated to protein-DNA docking, a crucial, but still largely unresolved, question in molecular biology. Using a representative set of protein-DNA complexes, we first show that our model is able to predict the interaction surface between the macromolecular partners taken in their bound form. In a second part, the impact of the DNA sequence and electrostatics, together with the DNA and protein conformations on docking is investigated. Our results strongly suggest that the overall DNA structure mainly contributes in discriminating the interaction site on cognate proteins. Direct electrostatic interactions between phosphate groups and amino acid side chains strengthen the binding. Overall, this work demonstrates that coarse grain modeling can reveal itself a precious auxiliary for a general and complete description and understanding of protein-DNA association mechanisms.

Structure-Based Strategies for Drug Design and Discovery

Article

Sep 1992

Irwin D. Kuntz

Most drugs have been discovered in random screens or by exploiting information about macromolecular receptors. One source of this information is in the structures of critical proteins and nucleic acids. The structure-based approach to design couples this information with specialized computer programs to propose novel enzyme inhibitors and other therapeutic agents. Iterated design cycles have produced compounds now in clinical trials. The combination of molecular structure determination and computation is emerging as an important tool for drug development. These ideas will be applied to acquired immunodeficiency syndrome (AIDS) and bacterial drug resistance.

Pushing the limits of what is achievable in protein-DNA docking: Benchmarking HADDOCK's performance

Abstract and Figures

Supplementary resource (1)

Recommended publications

A protein–DNA docking benchmark

An effective approach for generating a three-Cys2His2 zinc-finger-DNA complex model by docking

The HADDOCK web server for data-driven bimolecular docking

3D-DART: a DNA structure modelling server