ArticlePDF AvailableLiterature Review

Principles and Overview of Sampling Methods for Modeling Macromolecular Structure and Dynamics

PLOS
PLOS Computational Biology
Authors:

Abstract and Figures

Investigation of macromolecular structure and dynamics is fundamental to understanding how macromolecules carry out their functions in the cell. Significant advances have been made toward this end in silico, with a growing number of computational methods proposed yearly to study and simulate various aspects of macromolecular structure and dynamics. This review aims to provide an overview of recent advances, focusing primarily on methods proposed for exploring the structure space of macromolecules in isolation and in assemblies for the purpose of characterizing equilibrium structure and dynamics. In addition to surveying recent applications that showcase current capabilities of computational methods, this review highlights state-of-the-art algorithmic techniques proposed to overcome challenges posed in silico by the disparate spatial and time scales accessed by dynamic macromolecules. This review is not meant to be exhaustive, as such an endeavor is impossible, but rather aims to balance breadth and depth of strategies for modeling macromolecular structure and dynamics for a broad audience of novices and experts.
Probing of coupled motions in the Eg5 kinesin motor domains in [91] through accelerated MD simulations. The top panel shows the structure and catalytic cycle of the kinesin motor domain. The ATPase catalytic site sits at the top of the β-sheet, flanked by three highly-conserved loops (P-loop, SI, and SII) connected to helices (also annotated) on either side of the sheet. The secondary structure topology is drawn, with β -strands drawn as triangles and α-helices as circles. The kinesin catalytic cycle is shown: Kinesin (K) has a weak affinity for the microtubule in the ADP-state. ADP release is followed by strong microtubule-binding. ATP binding may occur followed by hydrolysis and product release to regenerate the weakly-bound ADP state. The bottom panel projects conformations sampled by 200 nanosecond-long accelerated MD every 20 picoseconds on the two principal modes of motion. The latter are obtained through principal component analysis of collected X-ray structures for wildtype and variant Eg5. Three simulations are highlighted, the nucleotide-free (APO) one in (A), ADP-bound one in (B), and ATP-bound one in (C). The nucleotide-free simulation covers more of the conformation space, whereas restricted sampling is observed when Eg5 is bound to ATP or ADP. One of the conclusions in [91] is that structural changes from the ADP- to ATP-bound states which are evident in the collection of X-ray structures, are encoded in the intrinsic dynamics of the nucleotide-free motor domain; the nucleotides effectively rigidify the motor domain by narrowing the conformation space accessible by it, as evident in the restricted sampling observed through accelerated MD. This figure is reused from Scarabelli et al., 2013. CC-BY PLOS ONE [91].
… 
Sampling of the ensemble of closed-to-open and open-to-closed transition trajectories in AdK through the DIMS method [334]. An ensemble of 330 DIMS trajectories is compared to 45 Escherichia coli AdK X-ray structures. The conformations in each trajectory are projected onto a progress variable δRMSD measured as the RMSD of the conformation from the closed AdK structure (PDB ID 1ake:A) minus the RMSD of the conformation from the open AdK structure (PDB ID 4ake:A). For each of the 45 collected X-ray structures and each trajectory, the conformation in the trajectory closest in backbone RMSD to an X-ray structure is recorded, and the δRMSD value of the conformation along a trajectory is recorded. A probability distribution is then constructed for each X-ray structure over all DIMS trajectories to indicate where an X-ray structure is located along the simulated trajectories. The color bar indicates the probability density. The median of each distribution is marked by a white circle. The X-ray structures whose PDB IDs are listed on the y-axis are rank ordered based on the median. The second white line traces the location of the median when the simulations are repeated to sample open-to-closed transition trajectories. Out of 45 structures sorted by δRMSD, about 24 are closed-state structures, four are open, and 17 are intermediates. This work is an example of the capability of computational methods to elucidate transitions in detail and accurately map the location of experimentally determined structures in the transitions. This figure is adapted from Beckstein et al., 2009 [334]. The image was created by O. Beckstein.
… 
Content may be subject to copyright.
Principles and Overview of Sampling Methods for Modeling
Macromolecular Structure and Dynamics
Tatiana Maximova1, Ryan Moffatt1, Buyong Ma2,
Ruth Nussinov2,3,*, and Amarda Shehu1,4,5,*
1 Department of Computer Science, George Mason University, Fairfax, VA,
USA
2 Basic Science Program, Leidos Biomedical Research, Inc. Cancer and
Inflammation Program, National Cancer Institute, Frederick, MD, USA
3 Sackler Institute of Molecular Medicine, Department of Human Genetics
and Molecular Medicine, Sackler School of Medicine, Tel Aviv University,
Tel Aviv, Israel
4 Department of Biongineering, George Mason University, Fairfax, VA,
USA
5 School of Systems Biology, George Mason University, Manassas, VA,
USA
* nussinor@helix.nih.gov, amarda@gmu.edu
Abstract
Investigation of macromolecular structure and dynamics is fundamental to
understanding how macromolecules carry out their functions in the cell. Significant
advances have been made toward this end in silico, with a growing number of
computational methods proposed yearly to study and simulate various aspects of
macromolecular structure and dynamics. This review aims to provide an overview of
recent advances, focusing primarily on methods proposed for exploring the structure
space of macromolecules in isolation and in assemblies for the purpose of characterizing
equilibrium structure and dynamics. In addition to surveying recent applications that
showcase current capabilities of computational methods, this review highlights
state-of-the-art algorithmic techniques proposed to overcome challenges posed in silico
by the disparate spatial and time scales accessed by dynamic macromolecules. This
review is not meant to be exhaustive, as such an endeavor is impossible, but rather aims
to balance breadth and depth of strategies for modeling macromolecular structure and
dynamics for a broad audience of novices and experts.
Author Summary
This paper provides an overview of recent advancements in computational methods for
modeling macromolecular structure and dynamics. The focus is on methods aimed at
providing efficient representations of macromolecular structure spaces for the purpose of
characterizing equilibrium dynamics. The overview is meant to provide a summary of
state-of-the-art capabilities of these methods from an application point of view, as well
as highlight important algorithmic contributions responsible for recent advances in
macromolecular structure and dynamics modeling.
PLOS 1/83
Introduction 1
A detailed understanding of how fundamental biological macromolecules, such as 2
proteins and nucleic acids, carry out their biological functions is central to obtaining a 3
detailed and complete picture of molecular mechanisms in the healthy and diseased cell.
4
Furthering our understanding of macromolecules is central to understanding our own 5
biology, as proteins and nucleic acids are central components of cellular organization 6
and function. Many abnormalities involve macromolecules incapable of performing their
7
biological function [1–4], either due to external perturbations, such as environmental 8
changes, or internal perturbations, such as mutations [5–10], affecting their ability to 9
assume specific function-carrying structures. 10
It has long been known that the ability of a macromolecule to carry out its biological
11
function is dependent on its ability to assume a specific three-dimensional structure (in
12
other words, structure carries function) [11,12]. However, an increasing number of 13
experimental, theoretical, and computational studies have demonstrated that function is
14
the result of a complex yet precise relationship between macromolecular structure and 15
dynamics [13–21]. Most notably, in proteins, the ability to access and switch between 16
different structural states is key to biomolecular recognition and function 17
modulation [22, 23]. 18
The intrinsic dynamic personality of macromolecules [18] is not surprising and can 19
indeed be derived from first principles. Feynman highlighted the jiggling and wiggling of
20
atoms well before wet-laboratory techniques provided evidence of macromolecular 21
dynamics [24]. In the late 1970s and early 1980s, it became clear that treating 22
macromolecules as thermodynamic systems and employing basic principles allowed 23
anticipating and simulating their intrinsic state of perpetual motion [25, 26]. The 24
thermodynamic uncertainty principle was coined by Cooper in [26] to refer to the 25
inherent uncertainty about the particular state a macromolecule is or it will evolve to at
26
any given time. Cooper was among the first to employ tools from statistical 27
thermodynamics to show that macromolecular fluctuations are a direct result of thermal
28
interaction with the environment, and that any detailed description of macromolecular
29
structure and dynamics entailed employing probability distributions. Further work by 30
Wolynes and colleagues continued in this spirit, popularizing a statistical treatment of 31
macromolecules with tools borrowed from statistical mechanics and culminating in the 32
energy landscape view [5,13, 27, 28]. 33
Great advances have been made in the wet laboratory to elucidate macromolecular 34
structure and dynamics. Nowadays, techniques, such as X-ray crystallography, Nuclear
35
Magnetic Resonance (NMR), and cryo-Electron Microscopy (cryo-EM), can resolve 36
equilibrium structures and quantify equilibrium dynamics. Macroscopic measurements 37
obtained in the wet laboratory are Boltzmann-weighted averages over 38
microstates/structures populated by a macromolecule at equilibrium. Though in 39
principle wet-laboratory techniques are limited in their description of equilibrium 40
structures and dynamics to the time scales probed in the wet laboratory (a problem also
41
known as ensemble-averaging), much progress has been made [29
31]. The ensemble of
42
structures contributing to macroscopic measurements obtained in the wet laboratory 43
can be unraveled with complementary computational techniques [32–36]. In addition, 44
wet-laboratory techniques, such as NMR spectroscopy, can on their own directly 45
elucidate picosecond-millisecond long relaxation phenomena [37, 38]. Indeed, recent 46
single-molecule techniques have achieved great success at bypassing the ensemble 47
averaging problem and elucidating equilibrium dynamics [31,39–47]. 48
Transitions of a macromolecule between successive structural states can be captured
49
in the wet laboratory [31, 46, 48–53]. Wet-laboratory techniques can resolve key 50
well-populated intermediate structures along a transition [52,54], but they are generally
51
unable to span all the time scales involved in a transition and so fully account for a 52
PLOS 2/83
macromolecule’s equilibrium dynamics. A complete characterization of macromolecular
53
dynamics remains elusive in the wet laboratory due to the disparate time scales that 54
may be involved. Dwell times at successive states along a reaction may be too short to
55
be detected in the wet laboratory. The actual time a macromolecule spends during a 56
transition event can be short compared to its dwell time in any particular 57
thermodynamically-stable or meta-stable structural state. Indeed, neither wet- nor 58
dry-laboratory techniques can on their own span all spatial and time scales involved in
59
dynamic macromolecular processes [55]. 60
Macromolecular modeling research in silico is driven by the need to complement 61
wet-laboratory techniques and obtain a comprehensive and detailed characterization of
62
equilibrium dynamics. Such a characterization poses outstanding challenges in silico. In
63
principle, a full account of macromolecular dynamics requires a comprehensive 64
characterization of both the structure space available to a macromolecule at equilibrium
65
as well as the underlying free energy surface that governs accessibility of structures and
66
transitions between structures. Early work on protein modeling focused on short protein
67
chains and simplified representations models that laid out amino-acid chains on lattices.
68
These distinct choices made it possible to perform interesting calculations revealing key
69
properties of protein folding and unfolding [56], as well as predict quantities of 70
importance in protein stability and function, such as pKas of ionizable groups [57]. 71
On-lattice models incidentally also allowed key theoretical findings on the 72
computational complexity associated with computing lowest free-energy states in the 73
context of ab initio (now also known as de novo) protein structure prediction [58–60]. 74
The computational complexity of finding the global minimum energy conformation was
75
shown to be NP-hard. These findings made the case that sophisticated algorithms 76
would be needed to complement wet-laboratory characterizations of macromolecular 77
structure and dynamics for the purpose of elucidating biological function. 78
The advent of Molecular Dynamics (MD) simulations and the concept of an energy 79
function promised to revolutionize macromolecular modeling, as in principle the entire 80
equilibrium dynamics could be simulated by simply following the motions of the atoms
81
constituting a macromolecule down the slope of the energy function. Research in this 82
direction was made possible by a growing set of equilibrium structures resolved in the 83
wet laboratory, from myoglobin [61,62] and lysozyme [63] by 1967 to more than a 84
hundred thousand structures now freely available for anyone in the Protein Data Bank
85
(PDB) [64]. Seminal work in the Karplus laboratory on the MD method and in the 86
Lifson laboratory on the design of consistent energy functions and simplified molecular
87
models set the stage for a computational revolution in structural biology. 88
Commercialization of computers was critical to this revolution. 89
MD simulations had been shown successful in reproducing equilibrium properties of
90
argon [65], but it was McCammon and Karplus who provided the earliest demonstration
91
in 1977 of the power of MD-based modeling to simulate protein dynamics [25]: a short 92
9.2 picosecond-long trajectory was obtained showing in-vacuum, atomistic fluctuations 93
of the bovine pancreatic trypsin inhibitor around its native, folded structure. Realizing
94
the power of MD simulations in extracting precious information on macromolecular 95
structure and dynamics, the Karplus laboratory democratized modeling by offering the
96
CHARMM program to the computational community [66]. Further work by Karplus 97
and McCammon showed that significant features of protein dynamics would only 98
emerge over longer time scales. The simulation in [67] reached 100 picoseconds, but it 99
would soon become clear that MD-based probings of macromolecular structure and 100
dynamics were in practice limited by both macromolecular size (spatial scale) and time
101
of a phenomenon under investigation (time scale). A significant body of complementary
102
work in macromolecular structure and modeling investigated non-MD based methods. 103
In fact, two years earlier to the 1977 MD simulation by Karplus of equilibrium 104
PLOS 3/83
fluctuations of the bovine pancreatic trypsin inhibitor, Levitt and Warshel had 105
presented a computer simulation of the folding of the same inhibitor through a 106
simplified (now known as coarse-grained) model, where each residue was reduced to one
107
pseudo-atom, and an algorithm based on steepest descent [68]. Reproducibility of this 108
work has so far remained elusive. 109
Further work by Levitt and Warshel, prompted by the visionary Lifson at the 110
Weizmann Institute of Science, focused on the design of a consistent energy function for
111
proteins [69]. The idea was to come up with a small number of consistent parameters 112
that could be transferable from molecule to molecule and not depend on the local 113
environment of an atom. Once such an energy function was implemented, simple 114
algorithms could then be put together by making use of the function, its first derivative
115
(the force vector), and the second derivative (the curvature of the energy surface). It is
116
interesting to note that, though Lifson and Warshel were the first to introduce a 117
consistent energy function, they did so for small organic hydrocarbon molecules. It was
118
Levitt who realized that their parameters could be used to carry out calculations on 119
proteins. In 1969, Levitt published the first non-MD, steepest descent algorithm on a 120
simplified model encoding only heavy atoms of the X-ray structures of hemoglobin and
121
lysozyme [70]. This work was seminal for Levitt and Warshel to claim the first 122
simulation of protein folding [68]. The algorithm used in these simulations was quite 123
sophisticated, changing torsion angles, as proposed by Scheraga [71], and using normal
124
modes to rapidly compute low-energy paths out of local minima [72]. 125
Further work on coarse-grained and multiscale models built with the quantum 126
mechanics (QM)/molecular mechanics (MM) method proposed by Warshel [73] was 127
seminal in allowing simulation to reach longer spatial and time scales. Warshel, who 128
had a background in quantum mechanics, realized that large molecular systems could be
129
spatially divided into a region demanding quantum mechanical calculations (e.g., due to
130
bonds being broken) with the rest sufficiently represented by empirical force fields. This
131
method remains the cornerstone of modern multiscale modeling [74–80] and, together 132
with the idea of representing complex systems in different resolutions at different time 133
and length scales [76], has allowed simulations to elucidate structures, dynamics, and 134
the biological activity of systems of increasing complexity, from enzymes [74, 77, 81] to 135
complex molecular machines [82–91]. 136
In tandem with these developments, a new method, Metropolis Monte Carlo 137
(MC) [92, 93], made its debut in computational structural biology. In 1987, important 138
work in the Scheraga laboratory introduced an MC-based minimization method to 139
simulate protein folding [94]. In 1996, the Karplus laboratory demonstrated the ability
140
of MC simulations on a cubic lattice to simulate the folding mechanism of a protein-like
141
heteropolymer of 125 beads [95]. Following work in the Scheraga laboratory further 142
made the case for the utility of MC-based methods in studies of macromolecular 143
structure and dynamics [96
98]. Kinetic MC methods were designed to address the lack
144
of kinetics in the classic MC framework [99]. In light of contributions that gave birth to
145
computational structural biology [100], it is no surprise that the Nobel 2013 prize in 146
chemistry recognized computational scientists, namely, Karplus, Warshel, and Levitt for
147
their seminal work in the development of multiscale models for complex chemical 148
systems [101–103]. 149
Improvements in hardware over the last forty years have been critical to extending 150
the reach of MD- and MC-based modeling. For example, MD-based studies have 151
expanded their scope, scale, and thus applicability due to specialized architectures, such
152
as Anton [104,105], GPUs [106–109], and petascale national supercomputers, such as 153
BlueWaters, Titan, Mira, Stampede [110, 111]. The pervasiveness of supercomputing 154
has spurred great advances in algorithmic techniques to effectively parallelize MD. 155
Typically, in parallel MD, the interacting particles are spatially divided into subdomains
156
PLOS 4/83
that are assigned to different processors. In this framework, load balancing becomes an
157
issue for large-scale MD simulations now performed on thousands of processors and 158
involving billions of particles [112]. Many techniques now exist for dynamic load 159
balancing [113]. In addition, while each processor is responsible for advancing its own 160
particles in time, processors need to exchange information; accurate force calculations 161
require knowledge of neighbor particle positions. Work in [114] describes recent 162
strategies for efficient neighbor searches in parallel MD. Other techniques that permit 163
parallelization of MD address and optimize force splitting in the context of the 164
particle-mesh Ewald algorithm [115]. It is worth noting that many of these techniques 165
are now integrated in publicly-available parallel MD code, such as NAMD [116]. 166
Important contributions in enhancing exploration capability have also been made 167
from non-MD or non-MC frameworks but rather adaptations of stochastic optimization
168
frameworks often designed for modeling other complex, non-biological systems. These 169
frameworks, though less mature than MD and MC, are summarized here in the interest
170
of introducing readers to interesting complementary ideas. Algorithmic advances, 171
whether to extend the applicability of MD- and MC-based frameworks or adapt other 172
frameworks for macromolecular modeling, now allow predicting native structures of 173
given protein amino-acid sequences [117–120], mapping equilibrium ensembles, 174
structures spaces and underlying energy landscapes of macromolecules [6, 8,121–126], 175
revealing detailed transitions between stable and meta-stable structures [127–134], 176
modeling binding and docking reactions [135–137], revealing not only equilibrium 177
structures of bound protein-ligand or protein-protein assemblies but also calculating 178
association and disassociation rates [138, 139], and more. 179
This review aims to provide an overview of such advances. Given the rapidly 180
growing body of research in macromolecular modeling, aiming to provide an exhaustive
181
review would be a task in futility. For instance, while the development of molecular 182
force fields is recognized as crucial to accurate modeling [140, 141], this review does not
183
focus on force field development. Other important contributions due to the development
184
of ever-accurate coarse-grained representations of macromolecules, solvent models, and
185
multiscaling techniques are acknowledged, but the reader is referred to existing 186
comprehensive reviews on these topics [76,142–144]. Instead, this review focuses on 187
sampling methods for the exploration of macromolecular structure spaces and 188
underlying energy surfaces for the purpose of characterizing equilibrium structure and 189
dynamics. This focus is warranted due to the recognition that sampling remains a 190
problem [102, 128, 145]. The goal is to introduce a broad audience of researchers both to
191
most recent and exciting research from an application point of view, as well as highlight
192
important algorithmic contributions responsible for recent advancements in modeling 193
macromolecular structure and dynamics. 194
Recent Applications Made Possible by Hardware and 195
Algorithmic Advancements 196
There is by now a wealth of computational studies aimed at extracting information on 197
equilibrium structures and dynamics of macromolecules in molecular assemblies or 198
isolation. Non-MD based studies can extract information about 199
thermodynamically-stable or meta-stable structures while foregoing simulations of a 200
system’s dynamics. On the other hand, MD-based studies readily provide information 201
on the dynamics but can only elucidate structures accessible within the time of the 202
simulation. While non-MD based methods have made it possible to predict, for instance,
203
biologically-active structures of proteins given their amino-acid sequences, a problem 204
known as de novo structure prediction, only MD-based methods can provide detailed 205
PLOS 5/83
information on protein folding and unfolding. Different aspects of protein-ligand 206
binding, protein-DNA, protein-protein docking, equilibrium fluctuations, structure 207
prediction, folding, and unfolding can be modeled with MD and non-MD methods. 208
Disparate time scales are involved in macromolecular dynamics, and they constitute
209
the main challenge in describing macromolecular dynamics in fullness and detail via 210
MD-based simulations. For instance, bond vibrations occur on the femtosecond time 211
scale, solvent effects take anywhere from a few picoseconds up to a few nanoseconds, 212
transitions in side-chain rotation and secondary structure occur on the 10100 213
nanosecond time scale, large global structural transitions can occur on the microsecond
214
time scale, ligand binding and allosteric regulation are usually on the millisecond time 215
scale, and protein folding takes anywhere from a few microseconds to a few seconds, 216
depending on protein size. In extreme cases, natural ligand and drug binding is a much
217
longer event that can occur on the hours scale [146]. 218
Despite such challenges, much progress has been made. Equilibrium, atomistic, MD
219
simulations can reproduce in detail microsecond-long folding events for small proteins 220
on specially-designed supercomputers [104, 105, 147, 148]. Protein-ligand binding with 221
full ligand flexibility and protein flexibility limited to the binding site can be simulated
222
up to 100 microseconds [146, 149]. Brownian dynamics simulations can capture events 223
that occur in the microsecond time scale; when coupled with enhanced sampling 224
techniques, these simulations have been reported to capture slow events of large proteins
225
binding and sliding on DNA at 25 microseconds at a coarse resolution [150]. Longer 226
simulations of an estimated time scale of more than 48 milliseconds of the lac repressor
227
sliding on DNA have been reported via atomistic MD in explicit solvent [151]. 228
Coarse-grained modeling and longer time steps can can further increase time scales 229
but often at the cost of essential details [152]. However, multiscale MC simulations have
230
been reported to allow studying in detail processes that occur in the range of 231
milliseconds [76, 78]. Organizations of short MD or MC trajectories in Markov state 232
models (MSMs) can extract precious information on structure and dynamics for events
233
that occur on longer time scales, from a few milliseconds to a few seconds [146, 153]. 234
In the following we provide a short overview of the current applications pursued by 235
MD and non-MD methods without describing in detail the algorithmic ingredients of 236
such methods. We highlight key examples where recent advances in MD and non-MD 237
methods have made it possible to address problems and systems not possible before due
238
to the large spatial and time scales involved. Descriptions of the algorithmic ingredients
239
responsible for such computational advancements follow. 240
Simulation and Modeling of Macromolecular Interactions 241
Simulating interactions of macromolecules with other macromolecules or small 242
molecules is important to understand the molecular basis of mechanisms in the healthy
243
and diseased cell. Typically, three categories of interactions are of interest to 244
researchers, those of a protein with a small ligand, those of a protein with another 245
protein, and those of a protein with other molecular systems that include DNA, RNA, 246
and membranes. These specific applications can be approached in two different ways. 247
One considers simply the problem of predicting the three-dimensional native structure 248
of the complexed system from knowledge of the structures of the unbound units, 249
whereas the other additionally simulates the process of the units diffusing towards and
250
then binding with one another. For the problem of structure prediction, non-MD based
251
methods are currently the norm. They include algorithms enhancing MC or adapting 252
other stochastic optimization frameworks under the umbrella of evolutionary 253
computation. For the problem of actually simulating the dynamics of interacting units,
254
MD-based studies provide more detail but typically require more computational 255
PLOS 6/83
resources or algorithmic enhancements in order to surpass the long time scale often 256
needed for a complexation (binding) event to occur. 257
One of the challenges with modeling and simulating macromolecular interactions 258
with other small molecules or macromolecules is the possibility of induced fit. Induced 259
fit, introduced by Koshland in [154], refers to the mechanism of an initially loose 260
complex that induces a conformational change in either one or all loosely-bound units, 261
which then triggers a cascade of rearrangements ultimately resulting in a tighter-bound
262
complex. The induced fit mechanism seems to question the idea that structure-guided 263
studies can focus on shape complementarity first, but many wet-laboratory studies, as 264
well as the success of complementarity-driven methods, have demonstrated that induced
265
fit cannot describe all binding events [155]. 266
In response, inspired by the free energy landscape view presented by Frauenfelder 267
and Wolynes [13, 27], Nussinov and colleagues proposed a new concept to explain 268
binding events, that of conformational selection, also known as population 269
shift [156–158]. Conformational selection refers to the idea that all conformational 270
states of an unbound unit are present and accessible by the bound unit. The binding or
271
docking event causes a shift in the populations observed in the unbound ensembles 272
towards the specific bound conformational state. Though Nussinov and colleagues were
273
inspired by the free energy landscape view of Frauenfelder and Wolynes, it is worth 274
noting that the conformational selection model is a generalization of a much earlier 275
model, the Monod-Wyman-Changeaux (MWC) model [159]. The MWC model, also 276
known as the concerted or symmetry model, proposed the idea that regulated proteins 277
exist in different interconvertible states in the absence of any regulator, and that the 278
ratio of the different states is determined by the thermal equilibrium. The MWC model
279
has been credited with introducing the concept of conformational equilibrium and 280
selection by ligand binding, though in its original formulation the model was restricted
281
to two distinct symmetric states and to proteins made up of identical subunits. 282
The review in [23] summarizes many studies that observe conformational selection 283
for protein-ligand, protein-protein, protein-DNA, protein-RNA and RNA-ligand 284
interactions. We highlight work in [160], where unfolded structures of uncomplexed 285
ubiquitin in explicit solvent were subjected simultaneously to restraints from NMR 286
Nuclear Overhauser Effect (NOE) and Residual Dipolar Coupling (RDC) data 287
comprising solution dynamics up to microseconds. The obtained ensemble of structures
288
covered the structural homogeneity observed in 46 crystal structures of ubiquitin at the
289
time; the majority of the crystal structures were in complex with other proteins. These
290
results suggest that conformational selection rather than induced fit suffices to explain
291
the molecular recognition dynamics of ubiquitin. 292
While at face value the concepts of induced fit and conformational selection appear 293
mutually exclusive, studies have shown that versions of each are indeed observed; for 294
instance, conformational selection is usually followed by slight conformational 295
adjustments. In 2010, Nussinov and colleagues presented an extended view of binding 296
events where conformational selection and induced fit were seen as complementary to 297
each-other [161]. In many cases, following conformational selection, minor adjustments
298
of side chains and backbone are observed to take place to optimize interactions [161]. 299
Based on such observations, extended models have been proposed that combine 300
conformational selection, induced fit, and the classical lock-and-key mechanisms [162]. 301
A better understanding of contributions of each of these three mechanisms has 302
contributed over the years to several, effective methods for modeling and simulating 303
binding and docking events. A detailed review in the context of protein-ligand binding
304
for structure-based drug discovery is presented in [163]. 305
The overview below summarizes methods based on the lock-and-key mechanism, as 306
well as methods based on the induced-fit and conformational selection mechanisms. 307
PLOS 7/83
While the lock-and-key mechanism allows disregarding flexibility, the other mechanisms
308
clearly make the case for modeling the flexibility of the units participating in the 309
complexation event. While the induced-fit mechanism seems to suggest that only 310
MD-based methods can describe a complexation event, the conformational selection 311
mechanism has inspired many non-MD methods to integrate flexibility during or prior 312
to complexation, thus contributing to a rich and still growing literature. In the following
313
we provide an overview of this work, guided by applications on protein-ligand binding, 314
protein-protein docking, and protein-DNA docking. 315
Protein-Ligand Binding 316
In protein-ligand binding, the structure prediction problem involves predicting both the
317
binding site, unless this is known, the pose of the ligand, and its configuration. 318
Established and widely-adopted software now exist and include DOCK [164], 319
FlexX [165, 166], GOLD [167, 168], Autodock [169–171], Glide [172], 320
RosettaLigand [173, 174], SwissDock [175], Surflex-Dock [176], DOCKLASP [177], 321
rDock [178], istar [179] and more. The majority of existing software employ 322
evolutionary algorithms that approach the problem of protein-ligand binding under 323
stochastic optimization, where the goal is to find the lowest-energy structure of the 324
complex of bound units. Evolutionary algorithms have been demonstrated more 325
effective than other MD- or MC-based algorithms at finding the lowest-energy binding 326
pose (position and orientation) and configuration of a ligand on a macromolecule. For 327
instance, while earlier versions of the well-known Autodock software employed MC 328
simulated annealing (MC-SA), Autodock 3.0.5 and onwards switched to the Lamarckian
329
Genetic Algorithm (GA) due its higher efficiency and robustness over the MC-SA of 330
earlier versions for binding flexible ligands onto rigid receptors [180]. 331
The superiority of evolutionary algorithms for binding flexible ligands onto rigid 332
receptors is additionally demonstrated in a high-throughput screening setting. In this 333
context, we note representative work in the Caflisch laboratory [181], where a set of 334
publicly-available tools have been developed for high-throughput screening of large sets
335
of small ligand molecules by fragment-based docking for the purpose of 336
computer-assisted drug discovery (CADD). The high-throughput setting is made 337
possible due to a fast decomposition of a flexible ligand into rigid fragments, fast 338
docking and evaluation of binding free energy of docked fragments, and efficient docking
339
of a full flexible ligand through a GA rapidly searching over poses of fragment triplets 340
and evaluating poses with an efficient scoring function. Fragment-based docking can be
341
traced back to Karplus, whose work with Miranker on the minimization of multiple 342
copies of functional groups in the MCSS force field is considered the first 343
fragment-based procedure for drug discovery [182]. 344
Fragment-based high-throughput binding is leading to significant advances in CADD.
345
For instance, recent work in [183] identifies inhibitor chemotypes for the EphA3 tyrosine
346
kinase, a transmembrane protein belonging to the class of erythropoietin-producing 347
hepatocellular receptors with deregulations implicated in severe human pathologies such
348
as atherosclerosis, diabetes, and Alzheimer’s disease. 349
While the majority of protein-ligand binding software can handle flexible ligands, 350
the computational costs that would be incurred by fully flexible receptors remain 351
impractical in most settings. Fortunately, a significant number of binding modes fall 352
under the lock-and-key mechanism, which has been demonstrated effective in cases of 353
predicting structures of enzyme-inhibitor complexes with largely static binding 354
interfaces [184–188]. As expected, however, rigid receptor docking algorithms are 355
ineffective in cases of induced fit, where structural flexibility during binding is not 356
limited to the ligand. 357
To take into account ligand and receptor flexibility without incurring impractical 358
PLOS 8/83
computational costs, many protein-ligand binding algorithms implement soft docking, 359
where some overlap between the flexible, bound ligand and the rigid receptor is allowed
360
during docking. Unfavorable interactions due to the overlap are resolved in a 361
post-processing stage on selected bound complexes, effectively providing some localized
362
flexibility to the bound receptor. This approach is practical and warranted in settings 363
where the goal is to screen large libraries of potential drug compounds [189–191]. An 364
extensive review of the unique challenges in these settings can be found in [163,192]. 365
One way to control computational cost while taking into account both ligand and 366
receptor flexibility is by limiting flexibility to specific dihedral angles [193–197]. 367
Typically, existing approaches limit receptor flexibility to side-chain and/or backbone 368
bonds of receptor amino acids on or near the binding site. 369
Other methods attempt to take into account full receptor flexibility without 370
explicitly modeling it during binding. These methods, known as ensemble or conformer
371
docking, obtain an ensemble of low-energy conformations/conformers of the receptor 372
prior to the binding simulation [198]. The ensemble is obtained via any conformational
373
sampling methods, whether MD- or non-MD based (reviewed below). The ligand or a 374
library of ligands are then bound to each of the receptor conformers [199]. While 375
effective at controlling computational cost, these methods are limited in what aspects of
376
flexibility they model [200]. It is worth noting that they make use of the conformational
377
selection principle of which there is now increasing evidence [201]. 378
Methods that consider full receptor flexibility and go beyond ensemble docking exist,
379
and are based on MC or MD. MC-based methods are represented by the RosettaLigand
380
software [173,174]. Work in [202] employs long, unbiased MD simulations to simulate 381
the physical process by which a ligand diffuses and then binds a protein target. Studies
382
on specific protein-ligand complexes provide an opportunity for MD-based methods to 383
reveal the kinetics of ligand-receptor interactions and estimate binding affinities from a
384
large number of MD simulations of the binding process. Yet, even in such studies 385
computational cost needs to be controlled, as binding can be too slow to observe on the
386
time scales routinely accessible via MD [203]. 387
Given the time scale challenge, many enhanced sampling strategies have been 388
proposed for MD simulations. These include accelerated MD, replica-exchange MD, 389
umbrella sampling MD, and metadynamics methods [8, 149, 203
206]. Replica exchange
390
MD and metadynamics methods are among the most popular to simulate binding. To 391
control computational cost, the simulation is limited to the immediate binding and 392
unbinding events. To discourage spending computational resources on the diffusion 393
process, the ligand is either tethered (through distance restraints) to the receptor, or 394
many short MD simulations are conducted at various placements of the ligand relative 395
to the receptor. In the former, explicit geometric restraints are enforced on the ligand to
396
keep it within the binding volume and save the MD simulation from wasting precious 397
computational time on simulating the diffusion process [149]. In the latter, the sampled
398
receptor and ligand configurations are organized in an MSM, which allows obtaining 399
estimates of association and disassociation rates [139]. Other approaches include the 400
powerful self-guided Langevin dynamics method and the Accelerated adaptive 401
integration method, among others. A description of these methods and others is 402
provided later in this review. In summary, the goal of all these methods is to enhance 403
sampling of the receptor and ligand poses so that the binding event can be observed 404
within a reasonable computational budget. 405
Here we highlight some successful protein-ligand binding simulations. One concerns
406
the GTP and GDP nucleotide binding that is accompanied with a conformational 407
switch in the Ras and Rho proteins, which was studied in [207] due to the central role of
408
these proteins in cell growth regulation and a variety of human cancers [122]. In [207], 409
MD is used to simulate the ligand-free Ras and Rho proteins. In the absence of the 410
PLOS 9/83
ligand, these proteins show intrinsic flexibility and are able to convert between different
411
conformations. The presence of the nucleotide restricts the conformation space 412
accessible by the GTP-bound structure. Significant coupling is observed in the bound 413
state between motions on the nucleotide-binding site and motions of the 414
membrane-interacting C-terminus via the highly flexible loop 3. The importance of this
415
loop was originally suggested in [208]. Classic MD simulations with a double loop 3 416
mutant of Ras confer greater flexibility during conformational switching. This provides
417
evidence that loop 3 may represent a potential allosteric site in Ras and other 418
monomeric G-protein coupled receptors. This information, pieced together from various
419
studies, is valuable for structure-based drug design, because it highlights relevant 420
receptor structures for CADD [163]. 421
Another successful example of the utility of computational methods for 422
protein-ligand binding concerns drug prediction for the influenza virus. Several 423
inhibitors have been widely used as anti-influenza drugs. However, due to 424
naturally-occurring drug-resistant mutations [209], their inhibition ability has gradually
425
decreased. The family of influenza virus proteins, like M2, H1-H9, attaches itself to 426
sialic acids on the surface of epithelial cells of the upper respiratory tract of the host 427
using its own proteins that cover the surface of the virus, hemagglutinin and 428
neuraminidase [210, 211]. Inhibitors bind to the active sites of hemagglutinin and 429
neuraminidase, preventing linkage of the virus to epithelial cells. 430
Protein-ligand docking via MD simulations is being used to model inhibitor binding
431
to the influenza virus (or only the surface proteins hemagglutinin and neuraminidase). 432
One group of methods focuses on finding new inhibitors (ligands) that can bind to the 433
continuously mutating hemagglutinin and neuraminidase active sites [210,211]. 434
Representative findings are illustrated in Fig. 1. 435
In particular, work in [211] focuses on finding new inhibitors for hemagglutinin. 436
Several ligands are considered to bind to the hemagglutinin H5 and H7 trimers. The 437
exposed position of the binding site is used to guide the development of a trimeric 438
ligand with a centrally positioned core structure with radial topology. The core 439
structure of the ligands mimicks the C3 symmetry of the trimers. A specific ligand, 440
referred to as ligand 1, is found to bind to all three binding sites on H5 (deposited in 441
the PDB under PDB id 3M5G) at two different times of an MD simulation. Motion is 442
predominantly found at the core structure, while all three sialic acid residues remain in
443
their binding site during the simulation, indicating that 1 is also a good ligand for H7. 444
Ligand 1 also has a
KD
in the high nanomolar range and is therefore a compound with
445
one of the best reported affinities. 446
Another group of methods aims to modify (add new residues or suggest mutations) 447
to already known inhibitors in order to increase their binding ability [212,213]. Finally,
448
some methods focus on calculating binding free energies by quantum 449
mechanics/molecular mechanics simulations to predict binding abilities of possible 450
inhibitors [214]. The combined result of all these methods has been to suggest a 451
mechanism through which the inhibitor-virus binding can significantly influence viral 452
neutralization. 453
In addition to MD simulation methods, we draw attention to Brownian Dynamics 454
methods [215], which have been employed to simulate protein-ligand [216] and 455
protein-protein [217, 218] binding. In these methods, the net force experienced by a 456
modeled particle contains a random element, which models the implicit interactions 457
with solvent molecules. The norm of the random element is chosen from a probability 458
distribution function that is a solution to the Einstein diffusion equation (a list of 459
already built probability distribution functions can be found in [219]). By 460
coarse-graining out the fast motions, Brownian dynamics methods can simulate longer 461
time scales than can be typically approached in a classic MD simulation [220]. However,
462
PLOS 10/83
the particle-based part still necessitates using relatively small time steps for an accurate
463
description of the particle interactions. The Reaction Before Move method determines 464
reaction probability functions that extend time steps and further speed up such 465
simulations [219]. 466
The importance of accounting for receptor flexibility in protein-ligand binding is 467
further appreciated in light of allosteric effects. Allostery refers to couplings between 468
the active site and a regulatory, allosteric site, which is typically far away from the 469
active site, but causes chemical and/or physical changes in the active site that affect 470
binding. A detailed review of all observed interactions between allosteric and binding 471
sites is presented in [221]. The structural view of allostery considers interactions among
472
residues responsible for the allosteric coupling between allosteric and binding sites. 473
Uncovering allosteric communication among residues is becoming increasingly important
474
in CADD, as residues that mediate the allosteric communication may make for 475
druggable binding sites. Many methods are devoted to uncovering allosteric 476
communication, and a review of such methods is presented in [137]. Successful methods
477
include early ones based mainly on topological analyses of structures resolved in the wet
478
laboratory, such as graph theory, statistical coupling analysis, and perturbation 479
algorithms [222–227], and methods based on analyses of simulation trajectories. While 480
MD and enhanced versions of MD-based methods are used for the simulations, the 481
analysis is conducted with normal mode analysis (NMA) [228–230], correlation 482
matrices [231–233], community-network analysis [234], mutual information [235], and 483
dynamical network analysis [236
238]. MC-based methods have also been applied. The
484
MCPath method introduced in [239] models a receptor as a weighted network of 485
interacting residues and builds an MC trajectory by repeatedly applying MC moves 486
that directly propagate a signal between two interacting residues. MCPath is able to 487
uncover allostery pathways as well as allostery sites. 488
Protein-Nucleic Acid and Protein-Protein Docking 489
The computational challenges incurred when modeling protein-ligand binding grow 490
more severe when modeling interactions between macromolecules due to the much larger
491
spatial scales involved. Most current research addresses only the dimeric setting, where
492
the number of bound units is limited to two. In addition, the majority of methods 493
applied to the pairwise docking setting are non-MD based methods focused on obtaining
494
the native structure of the complex without information on the kinetics of the docking 495
process. Methods implementing MC or evolutionary algorithms are by now the most 496
popular. This is not surprising, given the overwhelming number of atoms whose motions
497
would have to be followed in an MD simulation. Specific MD-based studies on dimeric 498
systems of known proteins exist, and typically some information is employed from 499
wet-laboratory studies on the docking site to orient the units favorably and additionally
500
tether them to each-other so as to steer the simulation towards the docking 501
event [240,241]. In general, however, even when foregoing kinetics, predicting the 502
correct native structure of the bound units remains challenging. 503
Computational research in structure prediction for macromolecular pairwise docking
504
is active, and there are now many methods [242–255] driven by the community-wide 505
CAPRI experiment [256, 257]. The focused computational setting of a protein dimer has
506
allowed the application of demanding energy-driven optimization methods and even 507
modeling of structural flexibility for high-accuracy docking [243,251, 258]. In the light of
508
variable interfaces, such as antibody-antigen interfaces [259], accounting for flexibility is
509
key but exceptionally expensive. Methods, such as RosettaDock [260], allow full 510
flexibility and employ various models of increasing detail (from low-resolution, to 511
centroid-mode, coarse-grained, and then all-atom). RosettaDock has been reported to 512
achieve docking funnels for 63% of antibody-antigen targets, 62% of enzyme-inhibitor 513
PLOS 11/83
targets, and 35% of other targets; funnels are achieved on only 14% of targets deemed 514
difficult, where substantial conformational changes are expected to accompany 515
docking [261]. Other methods that consider ensemble docking have also been applied, 516
though with limited success due to the difficulty of obtaining a conformational ensemble
517
representative of the intrinsic structural flexibility of a macromolecule [262]. 518
Several CAPRI summaries make the case that high-accuracy pairwise docking is to 519
remain challenging for the near future [257,263, 264]. There is great difficulty, for 520
instance, in locating the native interaction interface or even part of it, with top methods
521
shown to predict only 30-58% of the correct interface in any given target [257]. An 522
energy-based treatment is not guaranteed to drive the optimization process towards the
523
right interface. Much research is invested in this direction. Machine learning methods, 524
though not the focus of this review, are showing promise in elucidating features of native
525
interaction interfaces so as to bypass the employment of interaction energy functions at
526
a global layer [265–268]. For instance, work in [269] proposes a learned model to be 527
used as a top filter to label sampled protein-protein dimers before attempting to refine
528
them with more accurate and computationally costly interaction energy functions. 529
Rather than employing information from machine learning models, methods such as 530
HADDOCK [243], the Integrative Modeling Platform (IMP) [270] and others [271,272],
531
employ wet-laboratory data to restrict sampling of bound conformations to those that 532
reproduce the wet-laboratory data. Work in [273] uses chemical shifts from NMR to 533
predict conformational changes upon complex formation in a class of engineered binding
534
proteins known as affibodies. Similarly, Haddock also restricts sampling through NMR
535
chemical shifts [243], whereas the IMP software provides more versatility by allowing the
536
integration of different types of wet-laboratory, biochemical and biophysical data and 537
the employment of models of various resolutions [270]. It is worth noting that, while the
538
majority of protein-protein docking algorithms are restricted to the dimeric setting, the
539
IMP software allows modeling multimeric assemblies of an arbitrary number of units. 540
Work in [274], for instance, reveals the native structure of the nuclear pore complex, a 541
50 MDA complex comprised of 456 proteins. Work in [275] reveals a higher-resolution 542
structure of a heptameric module in the yeast NPC by satisfying spatial restraints 543
derived from negative-stain electron microscopy and protein domain-mapping data. 544
While wet-laboratory techniques such as X-ray crystallography can provide 545
high-resolution structures for protein-protein dimers and even multimers, protein-DNA
546
dimers are typically difficult to crystallize. There is great need for docking methods to 547
reveal both binding mechanisms and final bound structures of protein-DNA complexes.
548
In contrast to the diversity of protein-protein interaction interfaces, protein-DNA 549
interaction interfaces often exhibit conserved sequence motifs and are thus accurately 550
detected with machine learning techniques [276,277]. Knowledge, even if partial, of the
551
interaction interface has greatly helped the applicability of docking methods for 552
protein-DNA binding [278, 279]. Haddock, for instance, already a top protein-protein 553
docking method, has been demonstrated effective for protein-DNA docking [280]. By 554
now, comprehensive maps of protein-DNA binding landscapes have been put together 555
for the largest class of metazoan DNA-binding domains, known as zinc fingers [281]. 556
These landscapes are essential to support efforts to determine, predict, and engineer 557
DNA-binding specificities. For instance, work in [282] studying interactions that 558
proteins make with nucleic acids, small molecules, ions, and peptides reveals genes that
559
are rich in mutations in the binding sites of proteins for which they encode and are thus
560
functionally-important in cancer. 561
The setting of modeling macromolecular interactions naturally suggests expanding 562
the focus beyond dimeric docking to multimeric docking. Elucidating structural details
563
of oligomers suggested by wet-laboratory studies is indeed key to advancing further 564
research on the role of oligomerization in the healthy and diseased cell [283,284] and is
565
PLOS 12/83
expected to keep motivating the design of algorithms for multimeric docking. 566
Computationally-demanding optimization and willingness to spend significant 567
computational resources on a dimeric assembly make application of current pairwise 568
docking methods to protein assemblies of an arbitrary number of units impractical. 569
Adaptations of these methods to extend their applicability to the multimeric setting are
570
neither trivial nor obvious. 571
Early work by Nussinov and colleagues introduced a greedy, systematic algorithm, 572
CombDock, for the problem of multimeric docking [285,286]. The algorithm is general 573
and can handle heteromeric and asymmetric complexes but is challenged by the 574
combinatorial explosion in the number of dimensions of the space of configurations with
575
increasing number of units. Other following work narrows the focus to symmetric 576
complexes and applies search and bound techniques from AI with additional 577
information of distance-based constraints from NMR to control the size of the search 578
space [287–291]. Work in the Sali lab, culminating in the IMP software [270], focuses 579
exclusively on the setting where integration of wet-laboratory data is key to narrow the
580
search space and model assemblies of hundreds of units at a low resolution. Research on
581
multimeric docking in the absence of wet-laboratory data is sparse. 582
In [292], an evolutionary algorithm, Multi-LZerD, is proposed that operates in the 583
absence of wet-laboratory data but is guided by interaction energy. Its success varies 584
with complex size. The mixed results obtained by Multi-LZerD reflect the mixed state 585
of the art in multimeric docking. In addition to successful cases, where the native 586
multimeric structure is reproduced, Multi-LZerD reports in various cases decoys that do
587
not reproduce the known native structures. While the decoys can be as far as 588
23.59˚
A away from a particular native structure, typically, the decoys contain correct 589
subcomplexes within 4.0˚
A. It is worth noting that the evolutionary algorithm is also 590
computationally demanding. Time concerns as well as the quality of current predictions
591
suggest that there is much room for improvement in multimeric docking. 592
Modeling of Macromolecular Structural Flexibility 593
Modeling the structural flexibility of uncomplexed proteins is key not only to allow 594
application of methods such as ensemble docking to the protein-ligand and 595
protein-protein docking problems, but also to obtain detailed information on the role of
596
protein sequence on structure, dynamics, and function. While it is in principle very 597
difficult to map the entire conformation space and underlying energy landscape of a 598
protein sequence, many methods are dedicated to specialized sub-problems. For 599
instance, literature is rich in methods that obtain a sample-based representation of the
600
equilibrium conformation ensemble of a protein. Other methods extend this 601
characterization to proteins that exhibit not only local fluctuations around an average, 602
wet-laboratory, equilibrium structure but indeed are characterized by multi-basin 603
landscapes where distinct structural states have comparable Boltzmann probabilities. 604
Many methods focus on such proteins and particularly on modeling transitions between
605
similarly stable structural states as a way to obtain information on function modulation
606
and changes to function upon sequence mutations. Other methods are dedicated to 607
capturing allosteric regulation and identifying coupled motions not in the vicinity of 608
binding sites. Yet others focus on obtaining detailed structural characterizations of 609
meta-stable states and other states present at low populations, even in natively unfolded
610
proteins, as a way to understand aggregation, misfunction, and other disorders. In the 611
following we provide an overview of these applications, highlighting selected ones to 612
showcase current capabilities. 613
PLOS 13/83
Sampling of Equilibrium Conformation Ensembles 614
In principle, complete information about structure and dynamics can be obtained from
615
mapping the energy landscape of a given macromolecular sequence. Despite advances in
616
atomistic MD simulations, this remains an insurmountable computational task but for 617
the smallest peptides. As such, we separate here the discussion of work on sampling the
618
ensemble of folded conformations from work that focuses on protein folding and/or 619
structure prediction. Methods that initiate their search for other conformations of the 620
equilibrium ensemble from one or a few given conformations or wet-laboratory data are
621
in practice more efficient and have been employed to characterize both local fluctuations
622
and large-scale motions connecting conformations of the equilibrium or native state in 623
proteins. 624
We highlight here work that builds over the MD or MC frameworks but restricts 625
sampling in conformation space to regions that reproduce wet-laboratory data. In 626
particular, chemical shifts, which are NMR observables measured under a wide range of
627
conditions and with great accuracy, are proving very useful to methods in generating 628
conformation ensembles that capture macromolecular dynamics in solution. For 629
instance, work in [293,294] uses chemical shifts for backbone atoms as restraints in a 630
replica-averaged MD simulation. Work in [295] additionally incorporates NMR chemical
631
shifts for side chains and demonstrates as a result great agreement between 632
reconstructed conformation ensembles and wet-laboratory data, thus improving the 633
accuracy of computational methods and ability to make useful predictions on 634
macromolecular structure and dynamics. Work in [296] characterizes in detail the native
635
conformation ensemble of the src-SH3 domain and role of water. Work in [297] 636
incorporates diffuse X-ray scattering data to characterize the conformational dynamics
637
of a crystalline protein at the µs time scale. In other works [129,298–301], restraints 638
from wet-laboratory data are employed to improve the quality and thus accuracy of 639
simulation methods. 640
In the above works, the main idea is to incorporate the wet-laboratory data into a 641
restraint potential that is added to a molecular mechanics force field. In [302], the free
642
energy landscapes of small-size proteins are characterized by using the NMR chemical 643
shifts as collective variables, also known as reaction coordinates in slight abuse of 644
terminology) in metadynamics simulations. Doing so enhances sampling and allows 645
visiting multiple free energy minima not typically reached by classic MD 646
simulations [302]. The free-energy landscape reconstructed for the third Ig-binding 647
domain of protein G from streptococcal bacteria (GB3) in [302] is shown in Fig. 1. 648
In [34], the interdomain motions of the hen lysozome are characterized using RDC data
649
to restrain MD simulations. 650
Fig. 1 651
Free-energy landscape of GB3 obtained with work in [302] using chemical 652
shifts as collective variables 653
Panel A shows a two-dimensional projection of sampled conformations. The x axis 654
shows values of the CamShift collective variables for each conformation, which measures
655
the difference between the wet-laboratory and calculated chemical shifts for the 656
backbone. The y axis shows the backbone RMSD between each conformation and the 657
reference structure (PDB id 2oed). Some selected conformations, from extended to 658
compact, are highlighted, drawn with the Visual Molecular Dynamics (VMD) 659
software [303]. Panel B shows a conformation with the lowest backbone RMSD (0.5˚
A) 660
from the reference structure. Such native-like conformations are visited multiple times 661
by the method. Panel B draws hydrophobic side chains to illustrate that the internal 662
packing of these side chains is practically identical to that observed in the reference 663
PLOS 14/83
structure. This figure is reproduced with permission of the Executive Editor PNAS 664
from article Granata et al, 2013 [302]. 665
The idea of incorporating wet-laboratory data in energy functions, thus resulting in
666
pseudo-energy functions, has been popular for over a decade and demonstrated effective
667
not only in the context of MD sampling but also of MC sampling for reconstructing 668
equilibrium conformation ensembles (and even structure prediction, as we review below).
669
For instance, work in [304] demonstrates that the use of replica-averaged structural 670
restraints in MD simulations with a particular force field and a set of wet-laboratory 671
data can provide an accurate approximation of the Boltzmann distribution of a 672
macromolecule. Though NMR chemical shifts are proving more general at capturing the
673
extensive equilibrium dynamics, NOE, RDCs, S2order parameters, J couplings, and 674
hydrogen exchange data have been used to restrain both MD and MC sampling and 675
obtain detailed information on structure and dynamics of equilibrium states and 676
transition states in proteins [32, 35, 36,305–313]. The main advantage of incorporating 677
wet-laboratory data is to remedy inherent biases in force fields and guide the sampling
678
of the conformation space to relevant regions. Concerns of accuracy then entirely shift 679
on the breadth of sampling and the generality of the wet-laboratory data to capture the
680
equilibrium dynamics. Recent work affirms that NMR chemical shifts are very powerful
681
in this regard, and combined with enhanced sampling techniques for MD and MC, allow
682
sampling equilibrium conformation ensembles and thus faithfully capturing equilibrium
683
dynamics [273, 293
295, 314]. It is worth noting that there is great difficulty in the wet
684
laboratory in calculating chemical shifts, J-couplings, and other measurements from 685
structures. A central issue is the large uncertainty inherent in such calculations. One 686
way in which computational methods address this issue is by integrating different types
687
of experimental data [315, 316]. 688
Other non-MD based methods have also been applied, particularly to model internal,
689
equilibrium structural fluctuations of uncomplexed proteins. These methods, such as 690
CONCOORD [317], FIRST/FRODA [318,319], and PEM [320–322], are designed to 691
rapidly populate the conformation space in a neighborhood around a given structure. 692
They typically restrict an underlying stochastic optimization process based on MC or 693
other non-MD algorithms with geometric constraints. The constraints are obtained from
694
analysis of a given structure resolved in the wet laboratory and considered 695
representative of the equilibrium conformation ensemble. For instance, work in [317] 696
repeatedly generates and then corrects random conformations until a set of upper and 697
lower geometric bounds obtained from the given structure are satisfied. Work 698
in [318,319, 323] is based on constraint theory and models a given structure as a bar and
699
joint framework. This model allows employing rigidity analysis to reveal 700
underconstrained backbone angles on which sampling focuses to obtain inherent internal
701
fluctuations. Work in [320–322] is based on the treatment of inverse kinematics in 702
robotics and computes local fluctuations by restricting ends of consecutive overlapping
703
segments of the protein chain to positions in the given structure. 704
Structure-guided methods, while useful at probing regions of a conformation space 705
around a given structure, are not readily useful when the goal is to populate a highly 706
heterogeneous equilibrium ensemble for which there may not be sufficient representative
707
structures. On such proteins, often referred to as multi-basin proteins due to the 708
existence of potentially comparably-deep basins in the free-energy landscape, large 709
conformational changes are observed between basins. Detailed reconstruction of the 710
energy landscape of a protein is at this point challenging. Non-MD methods have been
711
devised and applied to capture thermodynamically-stable and semi-stable structural 712
states in multi-basin proteins [125,126]. In [126], an MC-SA method is devised that 713
employs multiple scales of representational detail and the fragment replacement 714
PLOS 15/83
technique popular in de novo structure prediction to map the energy landscape of the 715
uncomplexed adenylate kinase (AdK) protein. However, only a subset of the known 716
states are captured, pointing to the general challenge to devise enhance sampling 717
techniques capable of reconstructing energy landscapes of proteins in the absence of any
718
a priori information. Fortunately, significant, even if partial, information now exists 719
from wet-laboratory techniques on stable or semi-stable states of wildtype and variant 720
sequences of proteins. The method in [324] exploits this information to define a 721
lower-dimensional search space on which extensive sampling can be afforded to reveal 722
diverse thermodynamically-stable and semi-stable structural states. We note that such
723
states are stable in the lower-dimensional space, as no information is available on the 724
true potential energy surface. 725
While MD-based methods are challenged in a de novo setting, they are particularly
726
suitable to reveal the detailed structural transitions connecting two known structural 727
states. Providing detailed transitions is key to understanding the mechanistic basis of 728
several disorders linked to transition-modifying mutations. This promise has attracted 729
other non-MD methods that can sample conformational paths connecting two structural
730
states of interest without direct time-scale information on the transition. In the 731
following we provide an overview of work in modeling and simulating structural 732
transitions. 733
Modeling of Structural Transitions 734
Many proteins undergo large conformational changes that allow them to tune their 735
biological function by transitioning between different structural states, effectively acting
736
as dynamic molecular machines [325]. Since it is generally difficult for wet-laboratory 737
techniques to elucidate a transition in terms of intermediate conformations (though 738
successful examples exist [326]), computational techniques provide an alternative 739
approach [327]. However, transition trajectories may span multiple length and time 740
scales, connecting structural states more than 100
˚
A apart. This length scale is up to 2
741
orders of magnitude larger than a typical interatomic distance of 2˚
A. Transitions can 742
also demand micro-millisecond time scales, which is 612 orders of magnitude larger 743
than typical atomic oscillations of the femto-pico second time scale. 744
Typically, three types of methods are applied to model structural transitions, 745
MD-based methods, morphing-based methods, and robotics-inspired methods. 746
MD-based methods typically have to employ powerful algorithmic enhancements to
747
surpass high-energy barriers in structural transitions. However, cases exist when classic
748
MD methods have been able to capture spontaneous transitions of allosteric proteins by
749
monitoring the structural relaxation upon removal of the bound molecule from the 750
binding pocket [328, 329]. These works further highlight the utility of the 751
conformational selection or population shift principle, as removal of the bound molecule
752
prompts spontaneous movement towards a new equilibrium state. 753
In cases of high-energy barriers, biased or targeted MD methods are useful to 754
expedite transitions between given structures [127,330], but the concern with such 755
methods is that the transition trajectory may not correspond to the true one, as these 756
methods modify the underlying energy landscape; the order of events in transition paths
757
computed via targeted MD methods depends on the direction in which the MD 758
simulations are performed. For example, an application of biased MD to capture 759
transitions of Ras between its active and inactive structures resulted in unrealistic, 760
high-energy structures [330]. It is worth noting, however, that recent work in [331] has 761
proposed a technique to remove the length-scale bias from targeted MD simulations. 762
Essentially, the technique formulates local restraints, each acting on a small connected 763
portion of the protein sequence, resulting in a number of potentials that are then used 764
in targeted MD simulations. The technique has been demonstrated effective on an 765
PLOS 16/83
application to the open
closed transition in the protein calmodulin. The free energy
766
barriers associated with the computed paths have been shown comparable to those 767
obtained with a finite-temperature string method. 768
In contrast to biased MD methods, accelerated MD methods do not change the 769
entire landscape but only the relative height of the basins corresponding to the 770
structures that need connecting with intermediate conformations [332]. Accelerated MD
771
has been applied to several proteins to capture the transition of H-Ras between the 772
inactive and active structural states [10], map the structural and dynamical features of
773
kinesin motor domains [91], compute domain opening and dynamic coupling in alpha 774
subunit of heterotrimeric G proteins [333], and more. Representative results on an 775
application of accelerated MD for capturing the dynamics of the Eg5 kinesin motor 776
domain are shown in Fig. 2 . 777
Fig. 2 778
Probing of coupled motions in the Eg5 kinesin motor domains in [91] 779
through accelerated MD simulations 780
The top panel shows the structure and catalytic cycle of the kinesin motor domain. 781
The ATPase catalytic site sits at the top of the β-sheet, flanked by three 782
highly-conserved loops (P-loop, SI, and SII) connected to helices (also annotated) on 783
either side of the sheet. The secondary structure topology is drawn, with β-strands 784
drawn as triangles and
α
-helices as circles. The kinesin catalytic cycle is shown: Kinesin
785
(K) has a weak affinity for the microtubule in the ADP-state. ADP release is followed 786
by strong microtubule-binding. ATP binding may occur followed by hydrolysis and 787
product release to regenerate the weakly-bound ADP state. The bottom panel projects
788
conformations sampled by 200 nanosecond-long accelerated MD every 20 picoseconds on
789
the two principal modes of motion. The latter are obtained through principal 790
component analysis of collected X-ray structures for wildtype and variant Eg5. Three 791
simulations are highlighted, the nucleotide free (APO) one in (A), ADP-bound one in 792
(B), and ATP-bound one in (C). The nucleotide-free simulation covers more of the 793
conformation space, whereas restricted sampling is observed when Eg5 is bound to ATP
794
or ADP. One of the conclusions in [91] is that structural changes from the ADP- to 795
ATP-bound states which are evident in the collection of X-ray structures, are encoded 796
in the intrinsic dynamics of the nucleotide-free motor domain; the nucleotides effectively
797
rigidify the motor domain by narrowing the conformation space accessible by it, as 798
evident in the restricted sampling observed through accelerated MD. This figure is 799
reused from Scarabelli et al, 2013. CC-BY PLOS ONE [91]. 800
Even accelerated MD methods are limited in their ability to elucidate transition 801
trajectories that cross high energy barriers [10]. In contrast, the dynamic importance 802
sampling (DIMS) MD method [334, 335] is more effective at simulating macromolecular
803
transitions with energy barriers. In DIMS, the next conformational state sampled to 804
obtain a transition from a state A to a state B will be chosen to satisfy the most 805
productive movement to B and cross the energy barrier. The productive movement is 806
indicated by a robust progress variable, the instantaneous RMSD over heavy atoms 807
between a conformation and the target structure. DIMS is integrated in CHARMM and
808
has been tested on several systems [336], including modeling of slow transitions in 809
AdK [334], folding of protein A and protein G, and conformational changes in the 810
calcium sensor S100A6, the glucose–galactose-binding protein, maltodextrin, and 811
lactoferrin, showing good agreement between sampled intermediates and experimental 812
data [336]. 813
In particular, in [334], DIMS is applied to sample the ensemble of open-to-closed 814
PLOS 17/83
transitions for AdK. AdK is an enzyme that regulates the concentration of free 815
adenylate nucleotides in the cell by catalyzing the conversion of ATP and AMP into two
816
ADP molecules. The enzyme undergoes a large conformational change in its transition
817
between an open and a closed structural states, and this change has been observed even
818
in the absence of a substrate. As a result, AdK is one of the few proteins for which 819
wet-laboratory studies have been able to capture a great number of intermediate 820
structures populated during the open-to-closed transition. For this reason, AdK is a 821
poster system to measure the capability of computational methods to reproduce 822
transitions in great structural detail. Work in [334] is one of the few to provide atomistic
823
detail, as well as reproduce and map with great accuracy the location of known 824
intermediate structures along the transition. Representative results are shown in Fig. 3 .
825
Fig. 3 826
Sampling of the ensemble of closed-to-open and open-to-closed transition 827
trajectories in AdK through the DIMS method [334]. 828
An ensemble of 330 DIMS trajectories is compared to 45 E. Coli AdK X-ray 829
structures. The conformations in each trajectory are projected onto a progress variable
830
δRMSD measured as the RMSD of the conformation from the closed AdK structure 831
(PDB id 1ake:A) minus the RMSD of the conformation from the open AdK structure 832
(PDB id 4ake:A). For each of the 45 collected X-ray structures and each trajectory, the
833
conformation in the trajectory closest in backbone RMSD to an X-ray structure is 834
recorded, and the δRMSD value of the conformation along a trajectory is recorded. A 835
probability distribution is then constructed for each X-ray structure over all DIMS 836
trajectories to indicate where an X-ray structure is located along the simulated 837
trajectories. The color bar indicates the probability density. The median of each 838
distribution is marked by a white circle. The X-ray structures whose PDB ids are listed
839
on the y axis are rank ordered based on the median. The second white line traces the 840
location of the median when the simulations are repeated to sample open-to-closed 841
transition trajectories. Out of 45 structures sorted by
δ
RMSD, about 24 are closed-state
842
structures, 4 are open, and 17 are intermediates. This work is an example of the 843
capability of computational methods to elucidate in detail transitions and accurately 844
map the location of experimentally-determined structures in the transitions. This figure
845
is adapted from Beckstein et al, 2009 [334]. The image was created by O. Beckstein. 846
Morphing- and string-based methods provide an alternative way to compute 847
transition trajectories. Morphing-based methods include MolMov [337], FATCAT [338],
848
NOMAD-Ref [339], MinAction [130], Climber [340], and more. In Climber, the 849
interresidue distances in a given start structure are pulled towards distances in the goal
850
structure, using harmonic restraints incorporated in a pseudo-energy function. MolMov
851
and FATCAT interpolate linearly in Cartesian space or over rigid-body motions. 852
NOMAD-Ref uses elastic normal modes and interpolates interresidue distances per the
853
elastic network algorithm in [341]. MinAction solves action minimization equations at 854
each of the provided structures assuming a harmonic potential at them. Other methods
855
include those based on elastic network models (ENMs) [131,341], the nudged elastic 856
band, zero- and finite-temperature string methods [340, 342–347]. In particular, the 857
string-based methods make use of the committor function to account for not generally 858
knowing the collective variables underlying the transition [343], whereas methods based
859
on ENMs show the ability of coarse-grained models at capturing allosteric transitions in
860
supramolecular systems on the order of megadaltons [131]. In general, while efficient, all
861
these methods tend to reproduce similar conformational paths in independent runs 862
rather than provide a possibly heterogeneous ensemble of conformational paths realizing
863
PLOS 18/83
the transition. 864
Work in [348, 349] tackles this issue of possibly high inter-run path correlations with
865
the weighted ensemble method (WEM). WEM, originally proposed in [350], has been 866
shown a useful enhanced sampling method for off-equilibrium and equilibrium processes.
867
WEM uses a multiple-trajectory strategy where MC trajectories spawn new ones upon 868
reaching new regions of the conformation space. One of the first applications of WEM 869
to path sampling was on a 72-residue domain of the calmodulin protein. Coupled with a
870
united residue model, WEM was able to capture the transition between the 871
calcium-bound and calcium-free structural states and compare well with brute force 872
simulations in a fraction of brute-force simulation time. In [349], WEM is used to 873
investigate the mechanism of the conformational change that the 5HIR benzylhydantoin
874
transporter Mhp1 undergoes from a state poised to bind extracellular substrates to a 875
state that is competent to deliver substrate to the cytoplasm. WEM reveals a 876
heterogeneous ensemble of outward-to-inward conformational paths and identifies two 877
distinct modes of transport. 878
Robotics-inspired methods have also been applied to model structural transitions. 879
They rely on deep analogies between robot motion planning and macromolecular motion
880
simulation. In particular, the T-RRT [351] and PDST [352] methods, adapted from 881
tree-based robot motion planning frameworks, have focused on the problem of 882
computing conformational changes connecting two given structures in small and large 883
proteins. While T-RRT has been shown to connect known low-energy states of the 884
dialanine peptide (2 amino acids long) [351], the PDST method has been shown to 885
produce credible information on the order of conformational changes connecting stable 886
structural states of large proteins (200500 amino acids long) [352]. Both methods 887
control the dimensionality of the conformation space by either focusing on systems with
888
few amino acids [351] or by employing very coarse-grained representations to limit the 889
number of modeled parameters in large proteins [352]. Work in [353] extends the 890
capability of these frameworks to address large conformational changes in proteins, such
891
as calmodulin and AdK, while providing high-resolution intermediate conformations by
892
employing fragment-based moves. Other work detaches the sampling of the structure 893
space from analysis of motions [354]. MSM-based analysis of sampled conformations is 894
conducted to compute average properties of interest, such as expected number of 895
transitions connecting two given structural states in lieu of direct time-scale information.
896
Protein Folding and Structure Prediction 897
Protein folding and structure prediction are often treated as two sides of the same coin.
898
Protein folding, however, focuses on uncovering the detailed series of conformational 899
changes that a protein goes through from a denatured, unfolded state to its long-lived, 900
equilibrium, folded state. The folded or native structure is the end-result of this process,
901
but not the only goal. Indeed, there are many protein folding algorithms that employ 902
information about the native structure in order to expedite the search for the folding 903
mechanism. Structure prediction algorithms focus more on the end result; that is, the 904
goal is to uncover the native, folded structure even if the process by which these 905
methods do so does not resemble the physical folding one. In its broadest context, the 906
protein folding problem aims to shed light on the physical code by which a protein 907
amino-acid sequence determines the native structure, the speed with which proteins fold,
908
and the design of effective algorithms for predicting the native structure from sequence.
909
An extensive review of protein folding is presented in [355]. The credit with 910
introducing the problem to the computational biology community goes to Kendrew and
911
co-workers, who published the first structure of a globular protein, myoglobin and 912
showed the complexity and lack of symmetry or regularity in protein native 913
structures [61]. Since then, a general mechanism for folding has been elusive. Various 914
PLOS 19/83
paradigms have been proposed, evolving from the early days when folding was thought
915
to proceed deterministically, through a unique series of conformations for a protein at 916
hand, to the free energy landscape view founded upon description of an inherently 917
stochastic but biased process. The latter emerged from polymer statistical 918
thermodynamics and built evidence that protein folding energy landscapes are 919
funnel-like, narrower at the bottom, as the freedom of the protein to populate 920
low-energy regions is gradually restricted [5,28, 356]. While the energy landscape view 921
has inspired many folding and structure prediction algorithms, in itself there is no 922
suggestion of a mechanism that can be followed to efficiently fold proteins in silico.923
Application of MD simulations to observe the rare transition of a protein from an 924
unfolded state to a folded state have come a long way in both the size of the proteins 925
that can be handled and the time scales that can be modeled. Hardware advances, 926
improvements in force fields, coarse-grained models, multiscaling techniques, and novel
927
enhanced sampling techniques for MD have been crucial to surpassing spatial and time
928
scales. Atomistic MD simulations can now be afforded [357], with supercomputers such
929
as ANTON allowing running folding simulations of proteins of 50100 amino acids for 930
milliseconds [358], and software such as GROMACS [359], NAMD [116], and 931
AMBER [360] becoming more accessible and easy to use to many researchers. In the 932
following we elect to highlight recent work that showcases the state of protein folding. 933
We then proceed with an overview of complementary work in de novo structure 934
prediction. 935
Protein Folding 936
Some of the most striking advances in protein folding with atomistic, equilibrium MD 937
simulations in the presence of water molecules have come from the Pande group, 938
particularly through the Folding@Home project [148, 361–364]. In 2005, van der Spoel 939
and colleagues provided the first folding simulation that also predicted the native 940
structure of a peptide based on the Gibbs energy landscape [365]. In 2010, Shaw and 941
colleagues successfully modeled the folding of a 35-residue protein in explicit 942
solvent [147]. Soon afterward, Lindorff-Larsen and colleagues in the Shaw group 943
managed to fold 12 fast-folding proteins of length up to 80 amino acids and diverse 944
native topologies with atomistic detail and in explicit solvent [105]. Some striking 945
observations were made from analysis of the folding trajectories of these small proteins,
946
which generated much discussion in the protein folding community [366]. In addition to
947
matching folding rates measured in the wet laboratory, work in [105] demonstrated that
948
the folding trajectories contained discrete transitions between native and unfolded 949
states, in agreement with barrier-limited cooperative folding. Pathway heterogeneity 950
was shown to be minimal for 9 of the 12 proteins, with pathways sharing more than 60%
951
of the native contacts. These results naturally suggested that the pathways observed in
952
simulation were variations of a single underlying folding pathway. 953
The conclusions in [105] were also supported by wet-laboratory work in [367], which
954
detected a limited set of pathways and only four intermediates for the folding of the 955
calmodulin. Moreover, in [105] it was observed that long-range contacts locking in place
956
the native fold formed early along, together with a significant amount of secondary 957
structures and surface burial. This was confirmed in other folding simulations, as 958
well [368]. While the amount of residual structure is questioned by wet-laboratory 959
studies and may possibly be the result of the bias of current force fields [366], the 960
observations in [105] build the case for sequential stabilization as a mechanism for the 961
folding of small, fast-folding proteins. The term sequential stabilization, coined in [369],
962
refers to the fact that folding may not be completely cooperative but is characterized by
963
small-scale events that add secondary structure elements named foldons [370] in a 964
stepwise manner. Because foldons are intrinsically unstable, low-energy paths are likely
965
PLOS 20/83
to involve foldons building on top of existing structures, thus resulting in sequential 966
stabilization. 967
Demonstration of the contribution and role of long-range native contacts early on in
968
folding provided further justification for the use of G
¯o
-models and other coarse-grained
969
models that assume native contacts are the only ones that are kinetically-relevant [143].
970
However, while the wet-laboratory study of the folding of calmodulin in [367] 971
demonstrated the presence of non-native intermediates in larger, more complex proteins,
972
which is certainly observed in de novo structure prediction algorithms in the richness of
973
non-native local minima. It is worth noting that a growing body of wet-laboratory 974
studies are adding to the list of proteins known to fold through distinct native-like 975
intermediates [371]. 976
From a methodological point of view, a significant body of recent work in protein 977
folding employs long, equilibrium, atomistic MD simulations in explicit solvent to 978
observe multiple, spontaneous folding and unfolding events and reliably measure 979
thermodynamic and kinetic quantities, such as folding rates, free energies, folding 980
enthalpies, heat capacities, φ-values, and temperature-jump relaxation 981
profiles [104, 105, 368]. While generally short, off-equilibrium MD simulations can at 982
best sufficiently capture a single folding event, recent work that embeds many short 983
off-equilibrium runs in coarse-grained kinetic models, such as MSMs, is able to 984
approximate well the underlying folding dynamics [123,133, 372]. Methods that embed 985
many short simulations (MD or other stochastic optimization methods) in MSMs for 986
the calculation of system dynamics is gaining ground in diverse applications, from 987
folding, to structural transitions, to binding [128, 132, 354,373–375]. 988
De Novo Protein Structure Prediction 989
The de novo structure prediction problem is perhaps one of the most popular and 990
recognized ones in computational biology. The goal is to compute a structure that is 991
representative of the protein native state given the amino-acid sequence of a protein 992
with no known sequence homologs. This problem sprung from Anfinsen’s findings that 993
the amino-acid sequence determines to a great extent the native state of a protein [11]. 994
Knowing the native structure of a protein is central to protein-ligand binding studies, 995
particularly in the context of CADD. The significant technological advances that have 996
made high-throughput sequencing possible have also resulted in 1000-fold more 997
sequences than structures known for proteins. 998
Advances in in-silico structure prediction can be attributed to Moult and colleagues,
999
who founded the important Critical Assessment of protein Structure Prediction (CASP)
1000
competition to spur research in the structure prediction community in a competitive 1001
setting. At CASP gatherings, structures resolved in the wet laboratory and withheld 1002
from computational competitors are later revealed and compared with predictions. 1003
Community evaluations are then published and serve as a good measure of the progress
1004
in structure prediction. For instance, the latest review of structure prediction methods
1005
in [376] demonstrates that overall performance in CASP 10 improved substantially 1006
compared to previous competitions. 1007
An exponential growth in the number of structures solved in the wet laboratory has
1008
had a dramatic effect on the utility of comparative modeling methods, which model 1009
structures of a target protein sequence after known structures/templates of proteins 1010
with similar sequences to the target; homologous structures can now be detected for 1011
most proteins [376]. HHPred is one of the most successful template-based predictors in
1012
CASP [377]. Nevertheless, de novo (or template-free, free, ab initio) modeling remains 1013
of great interest. Techniques used in de novo algorithms to model conformations of 1014
variable regions, such as loops, are also employed in template-based methods to fill in 1015
incomplete models [378]. Second, the goal of obtaining information on the equilibrium 1016
PLOS 21/83
structure(s) of a protein from its amino-acid sequence is key to understanding function
1017
and changes to function upon perturbations. 1018
Currently, state-of-the-art methods for de novo structure prediction rely on usage of
1019
the fragment replacement technique also known as fragment assembly. The technique 1020
allows simplifying and discretizing the conformation space explored by algorithms by 1021
essentially modifying a bundle of consecutive parameters, typically backbone angles of 1022
consecutive amino acids, simultaneously, as opposed to modifying individual backbone 1023
angles separately. A stretch of consecutive backbone angles is known as a fragment, and
1024
any protein conformation can yield a new one if a fragment can be selected in it and its
1025
configuration replaced with a new one. Originally introduced by Baker [379], the new 1026
configurations were obtained from a pre-compiled library configurations built over 1027
known protein structures in the PDB. Essentially, known protein structures are excised
1028
in consecutive overlapping fragments, and their configurations are recorded in a library
1029
indexed by the amino-acid sequence of a fragment. Replacement of fragment 1030
configurations naturally makes for a move or step in the context of an MC search, and 1031
most methods that use fragment replacement essentially implement enhanced sampling
1032
algorithms over baseline MC. For instance, the most recognized de novo structure 1033
prediction method, Rosetta [118], implements a multiscale MC method, which carefully
1034
switches from coarse-grained to atomistic representations in the growing MC trajectory,
1035
employing specifically-designed energy functions and even switching between two 1036
effective temperatures to cross energy barriers and so allow the MC search escape 1037
shallow local minima. 1038
It is worth noting that careful construction of energy functions and representations 1039
of various granularity can be credited as much as the fragment replacement technique 1040
with advances in de novo structure prediction [119]. However, at the moment, a 1041
saturation point has been reached [380], and current research is focusing either on 1042
specialized moves for MC-based methods or other, higher-level mechanisms by which to
1043
enhance MC sampling. In current top CASP performers, secondary structures are built
1044
and packed relatively easily, and the difficulty in correct predictions is localized to 1045
variable regions such as loops. For this reason, efforts are devoted to rethinking the 1046
moves in an MC-based setting beyond fragment replacement. 1047
Work in [119, 120], which has resulted in the highly-successful Quark method, shows
1048
the utility of designing different types of moves and employing them at various stages 1049
during the MC search. As reported inn [120], Quark performs very well in the free 1050
modeling category. Performacne on 34 free modeling targets is measured by calculating
1051
the TM-score between the best prediction and the known native structure for each 1052
target versus target length (TM-score is a metric for measuring structural similarity and
1053
is considered superior to RMSD [381]; the reader is directed to Ref. [382] for details.). 1054
Performance is unusually high (
>
0.5) for targets (R0006-D1, R0007-D1, and R0012-D1)
1055
that are longer than 150 amino acids. In particular, two of the targets, R0006-D1, 1056
R0007-D1, were considered difficult targets in the CASP10-ROLL experiment. On 1057
R0006-D1, which is a β-barrel protein 169 amino acids long, Quark generates five 1058
models with the highest TM-score of 0.32. Structural superposition extracts a model 1059
with TM-score 0
.
5, which improves to a TM-score of 0
.
622 after energetic refinement via
1060
I-TASSER [383]. On R0007-D1, which is an αprotein 161 amino acids long, Quark 1061
generates a best model with TM-score 0.43. Structural superposition extracts a model 1062
with TM-score 0.48 from the LOMETS template pool, which then improves to a 1063
TM-score of 0.62 after energetic refinement via I-TASSER. These results suggest that 1064
the focus on designing specialized moves is well placed. 1065
Other work is focusing on enhancing the sampling capability beyond a simple 1066
MC-based search or even an MC-SA, though there is a growing consensus that 1067
improving accuracy in scoring functions may be more important than enhancing 1068
PLOS 22/83
sampling to advance the state of de-novo structure prediction. Progress in enhancing 1069
sampling comes from different communities of computational biologists and computer 1070
scientists. One direction focuses on gradually narrowing the search space, either by 1071
iteratively fixing segments of the chain exhibiting low diversity among sampled 1072
low-energy conformations [384] or indirectly achieving the same effect but by changing 1073
the probability distribution function over the fragment configuration library [385]. 1074
Other work builds on model-based search and uses information gathered during the 1075
search to guide exploration towards promising regions of the conformation 1076
space [386, 387]. In [386] gathered information is used to identify near-optimal minima 1077
worth exploring in greater detail with all-atom energy functions. In [387], a 1078
robotics-inspired algorithm adapts the search towards under-sampled but low-energy 1079
regions of the conformation space to balance breadth versus depth. 1080
The issue of how to balance computational resources between exploring the breadth
1081
of conformational space while going deep down in local minima is a core one in 1082
stochastic optimization. Progress has been made over the years, particularly by 1083
evolutionary algorithms that are now competitive with MC-based methods such as 1084
Rosetta [388–390]. Pursuing evolutionary algorithms for conformation sampling in de 1085
novo structure prediction has opened up novel directions on the design of effective 1086
moves [391] and multi-objective optimization [392], where the goal is not to minimize an
1087
aggregate energy score but instead improve on several orthogonal categories. 1088
Currently, de novo structure prediction methods are focused on proteins with one 1089
well-defined native structure. Multi-basin proteins present a challenge, as they demand
1090
much more computational resources be spent on exploring the breadth of the energy 1091
landscape. In addition, conformation sampling (also known as decoy sampling) is not 1092
the only challenge with de novo structure prediction. Analysis of sampled 1093
conformations to identify the native structure and offer it as prediction presents its own
1094
challenges. This problem in itself is known as decoy selection, and a review of challenges
1095
and the state of the art is presented in [393]. 1096
Modeling Structure and Dynamics of Intrinsically-Disordered 1097
Proteins and Intrinsically-Disordered Protein Regions 1098
Lately, increasing attention is paid to the problem of characterizing the structure and 1099
dynamics of intrinsically-disordered proteins (IDPs) [394–396]. There are now growing 1100
databases of IDPS and intrinsically-disordered protein regions (IDPRs), such as pE-DB,
1101
DisProt and IDEAL [397–399]. CECAM now regularly includes a workshop dedicated 1102
to promoting the development of new modeling methods and better understanding 1103
IDPs [400]. Since 2002, even CASP provides an independent assessment of methods for
1104
IDPS [396]. Several reviews discuss the fundamental principles of disorder in the 1105
biological function of IDPs/IDPRSs biological functions, including the role of disorder 1106
in cancer, neurodegeneration, genetic forms of Parkinson’s disease, and cardiovascular 1107
diseases [395, 395, 401–405]. 1108
IDPs/IDPRs pose unique challenges in silico. They do not have stable tertiary 1109
structures but still demonstrate biological activity. This phenomenon challenges the 1110
fundamental structure-function relationship and is an extreme case of the exception to 1111
the lock-and-key model [395]. IDPs/IDPRs are not random coils. They exhibit different
1112
degrees of disorder, from molten globules to coils, but even coil-like structures exhibit 1113
residual structure [402, 405]. A recent replica exchange MD simulation study revealed 1114
the structural contents of intrinsically disordered tau proteins. Tau proteins were 1115
discovered to be able to catalyze self-acetylation, which may promote pathological 1116
aggregation. The work characterized the atomic structures of two truncated tau 1117
constructs, K18 and K19, providing structural insights into tau’s paradox [406]. 1118
PLOS 23/83
IDPRs sequences are very different from those of ordered proteins, poor in 1119
hydrophobic amino acids and rich in charged amino aids. Disorder-promoting amino 1120
acids have now been identified, and they include Ala, Arg, Gly, Gln, Ser, Glu, Lys, and
1121
Pro [404, 405]. Based on sequence information alone, tools now exist to estimate the 1122
propensity of a sequence for disorder [407]. There are many methods for disorder 1123
analysis and prediction of the location of disordered regions [124, 408–411]. 1124
Computational methods are being designed to characterize structures and dynamics
1125
of IDPs/IDPRs. With specifically-designed force fields, some methods have shown 1126
promise in this regard [412, 413]. Treatment of IDPRs is now included in Rosetta [414].
1127
Two main groups of methods focus on IDPs/IDPRs. The first group consists of 1128
wet-laboratory techniques based on NMR Chemical Shifts and RDCs [415]. The second
1129
consists of MD-based methods [152, 153, 408, 416–418]. 1130
Both unrestrained MD [416] and long-range correlated MD [417] for 1131
well-characterized disordered proteins demonstrate good agreement with wet-laboratory
1132
data. The replica exchange with guided annealing method has also been shown suitable
1133
for IDPs [418]. The method escapes nonspecific compact states more efficiently and 1134
speeds up the generation of correct ensembles compared to classic replica exchange 1135
simulations. Work in [153] additionally shows the effectiveness of MD and MSMs for 1136
IDP modeling. 1137
Other methods combine NMR-based knowledge and MD simulations [6,302, 314, 413].
1138
While NMR ensembles are better suited to characterize local conformational states of 1139
IDPs [415], MD simulations allow calculating kinetics and elucidating meta-stable states
1140
and barriers between states [314]. Given their unique characteristics, computational 1141
methods are expected to continue their treatment of IDPS to better understand the 1142
connection between disorder and biological function and misfunction. 1143
Protein Design 1144
The protein design problem is that of finding an amino-acid sequence whose global free
1145
energy minimum state corresponds to a desired, target structure or contains a structural
1146
motif associated with a desired function [419]. Also known as inverse folding or inverse
1147
structure prediction, this problem is now at the crux of protein engineering, with 1148
applications in medicine, biotechnology, synthetic biology, nanotechnology, biomimetics,
1149
and more [420]. Stated as an optimization problem, protein design is amenable to 1150
algorithmic frameworks employed for structure prediction. 1151
Computational approaches to protein design can be categorized into forward design,
1152
explicit negative design, and heuristic negative design [419]. In forward design, the 1153
sequence and target fold/structure of a protein are known, and the goal is to optimize 1154
the sequence so that the target structure reaches such a low energy that will make any
1155
other non-target structures less energetically favored. No explicit non-target structures
1156
are considered. A successful application of forward design has yielded a very stable 1157
protein, Top7 [421], whose native structure was later shown identical to the determined
1158
X-ray structure. In explicit negative design, alternative structures are explicitly 1159
considered. The sequence is optimized so that the target native structure is lower in 1160
energy than all the alternative structures. Explicit negative design has been used to 1161
design specific coiled coils and DNA-binding and -cleaving enzymes [422–425]. 1162
The limitation of explicit negative design regarding prior knowledge and 1163
enumeration of non-favored alternative states has motivated heuristic negative design. 1164
In heuristic negative design, the goal is not to disfavor specific alternative structures; 1165
instead, the sequence is optimized through features that are likely to increase the energy
1166
of most undesired structures. Features follow closely strategies employed by nature to 1167
achieve the energy gap between the native structure and other structures that seems to
1168
be required for thermodynamic stability and function [419]. It is worth noting that 1169
PLOS 24/83
conclusions regarding energy gaps between native and non-native structures when 1170
employing scoring functions need to be taken with a grain of salt. Work in [365] relates
1171
gaps in Gibbs free energy to structure deviations (from NMR data). 1172
Compared to the other two strategies summarized above, heuristic negative design 1173
seems particularly important for biomolecular interactions [426,427]. Heuristic negative
1174
design also seems to be employed by nature for IDPs and by pathogens to fend off the 1175
host immune system [419]. 1176
Successful cases of designing proteins with novel functions abound [428
430] and are
1177
made possible by considerable advances in methods for de novo protein design. The 1178
current predominant computational approach is based on the (inverse folding) paradigm
1179
proposed in [431], which assumes a fixed backbone and searches over discrete low-energy
1180
configurations/rotamers of side chains for rotameric combinations that result in a 1181
lowest-energy all-atom tertiary structure [432]. In the interest of tractability, energy 1182
models are limited to pairwise energy functions. State-of-the-art functions for protein 1183
design are knowledge-based, relying on statistical parameters derived from databases of
1184
known protein properties [433
437]. Even with such energy models, the design problem
1185
with a rigid backbone and a discrete set of rotamers has been proven to be 1186
NP-hard [438]. 1187
Two types of methods have been proposed to address the combinatorial optimization
1188
problem of finding rotameric combinations. The first are based on exact optimization 1189
and seek completeness; that is, finding the global minimum energy conformation. The 1190
second forego completeness and are based on heuristic optimization. 1191
Exact optimization methods include dead-end elimination [439], branch-and-bound 1192
algorithms [440–442], integer linear programming [443,444], dynamic 1193
programming [445], or cost function networks [446]. These exact methods are efficient 1194
and they limit inaccuracies to the inadequacy of the energy model, but their focus on 1195
one single assignment is highly subjective to possible artifacts in the energy function, 1196
known and lamented in [447]. Moreover, the solution provided by such methods may be
1197
overly stabilized (effectively residing in a narrow basin), that it lacks the structural 1198
flexibility for the protein to operate the sought biological function under physiological 1199
conditions [448]. It is worth noting that unlike discrete rotamer assignments, work by 1200
Donald and colleagues pursues continuous rotamers and is able to reach lower-energy 1201
conformations [449]. This functionality is integrated in the popular OSPREY 1202
software [450]. It is expected that the design of a smoothed backbone-dependent 1203
rotamer library in [451], which allows evaluating rotamer characteristics as smooth and
1204
continuous functions of the
φ, ψ
angles will lead to more advances in taking into account
1205
side-chain flexibility in de novo design. An illustration of the capability of protein 1206
design algorithms is provided in Fig. 4. 1207
Fig. 4 1208
Predicting a pathogen’s resistance mutations [452] (A) Pictured is an 1209
illustration of a game between scientists and bacteria. For every drug that scientists 1210
develop against bacteria (a ”move”), bacteria respond with mutations that confer 1211
resistance to the drug. This paper shows that these ”moves’ by bacteria can be 1212
predicted in silico ahead of time by the Osprey protein design algorithm. Donald, 1213
Anderson, and co-workers used Osprey to prospectively predict in-silico mutations in 1214
Staphylococcus aureus against a novel preclinical antibiotic, and validated their 1215
predictions in vitro and in resistance selection experiments. Image (A) was created for 1216
this paper by Lei Chen and Yan Liang (L2Molecule.com
<http://l2molecule.com/>
).
1217
(B-C) Computationally predicting drug resistance mutations early in the discovery 1218
phase would be an important breakthrough in drug development. The most meaningful
1219
predictions of target mutations will show reduced affinity for the drug (C) while 1220
PLOS 25/83
maintaining viability in the complex context of a cell (B) . The protein design 1221
algorithm, K* in Osprey, was used to predict a single nucleotide polymorphism in the 1222
target DHFR that confers resistance to an experimental antifolate (Compound 1) in the
1223
preclinical discovery phase. Excitingly, the mutation was also selected in bacteria under
1224
antifolate pressure, confirming the prediction of a viable molecular response to external
1225
stress. Images (B-C) were created by Adegoke Ojewole in the Bruce Donald Lab, Duke
1226
University. 1227
Heuristic optimization methods for de novo design build on stochastic optimization 1228
or meta-heuristics, such as MC-SA [433,453], Genetic Algorithms [454, 455], and other 1229
stochastic optimization methods [442, 456, 457]. Methods based on stochastic 1230
optimization, best represented by RosettaDesign [458], currently dominate, mainly due
1231
to the ability to provide an ensemble of near-optimal solutions through their 1232
sampling-based approach. The backbone is kept fixed, and rotameric states are sampled
1233
systematically or in a sampling-based manner [433, 453] over pre-built rotamer 1234
libraries [435, 459]. All-atom energy minimization of the entire resulting all-atom 1235
conformation is often carried out [460, 461]. It is here, in the minimization stage to 1236
which all constructed conformations and sequences are subjected, that localized 1237
backbone fluctuations are allowed. The extent of these fluctuations is small, limited to
1238
backrub motions [462
465]. Larger motions are allowed, but only on loop regions, made
1239
possible by efficient inverse kinematics techniques like Cyclic Coordinate 1240
Descent [320, 466]. 1241
The importance of allowing backbone flexibility in the design process cannot be 1242
overestimated. The simple model of the backrub motion consists of a small dipeptide 1243
rotation about the C-C
α
axis. Recent studies suggest that integrating backrub motions
1244
in the design process leads to improved designs of protein-protein interaction interfaces
1245
and more realistic templates with improved fit between simulated side-chain dynamics 1246
and NMR data [462, 464, 467]. Additionally, work in [468] has demonstrated that taking
1247
into account backrub motions expands sequence diversity during search and allows new
1248
residue interactions that rigid-backbone approaches cannot accommodate. This leads to
1249
better designs with lower energies and has been confirmed in other studies, as 1250
well [469, 470]. 1251
Finally, an important highlight in protein design is the fact, that, despite the 1252
absence of evolutionary history in newly-designed proteins, evolutionary information can
1253
be accommodated in the design process. Work in [470] reveals strong correlations 1254
between residue covariance in naturally-occurring protein sequences and sequences 1255
optimized for the same structures by computational protein design. Covariance has 1256
been demonstrated for complementary changes in residue size, residue charge, and 1257
hydrogen bonding [471–475]. These findings suggest that structural restrains on 1258
co-evolving residues in contact can lead to further improvements both in de novo 1259
protein design and structure prediction. 1260
Categorization by Algorithmic Frameworks 1261
In the following we categorize methods by the algorithmic frameworks they modify and
1262
adapt for investigating macromolecular structure and dynamics. 1263
MD-based Methods and Enhancements 1264
In the classic MD setting, Newton’s equation of motion is iteratively solved on a finely 1265
discretized time scale to observe collective movements of the atoms comprising a 1266
molecular system through successive conformations terminating at a local minimum 1267
PLOS 26/83
conformation in the system’s energy surface. The ensemble of conformations obtained at
1268
equilibrium conditions observes the Boltzmann distribution. A distinct advantage of 1269
employing MD to simulate the equilibrium dynamics of a macromolecule is the ability 1270
to obtain great detail on individual and correlated motions of specific atoms and specific
1271
sites on a macromolecule, as well as correlated motions between macromolecular units of
1272
a complex. A disadvantage of the classic MD simulation setting is the inability to 1273
sample rare events that occur on long time scales. In particular, in the presence of high
1274
energetic barriers separating local minima in the energy surface, a classic MD 1275
simulation may be trapped and never escape within the time scale of the simulation. 1276
Limited sampling of the conformation space is a fundamental issue in classical MD, 1277
and algorithmic enhancements are proposed on a regular basis to enhance sampling 1278
capability. These include replica exchange, accelerated MD, umbrella sampling, biased 1279
or steered MD, importance sampling, activation relaxation, local elevation, 1280
conformational flooding, jump walking, multicanonical ensemble, MSM-driven MD, 1281
discrete timestep MD, swarm methods, and others [8, 149, 203–206,334, 476–489]. 1282
Recent reviews of advanced MD-based methods and outstanding issues are discussed
1283
in [124, 490
495]. A comprehensive list of commonly used MD packages for biomolecular
1284
simulation is presented in [493]. Examples of MD applications on proteins with large 1285
conformational changes that occur on long time scales, such as G-proteins, Ras-proteins,
1286
kinases, signaling proteins, and others can be found in [121, 122]. In the following, we 1287
highlight some of the algorithmic enhancements to the classical MD setting that are 1288
responsible for surpassing traditional MD time scales and characterizing the dynamics 1289
of complex systems. 1290
Accelerated MD and Adaptations 1291
The accelerated MD method [496, 497] locally flattens the potential energy surface to 1292
decrease the free energy barriers between two conformational states. When the system’s
1293
potential energy falls below some predefined threshold energy E, a bias potential is 1294
added. The level of flattening is regulated by two parameters that are typically specified
1295
by the user: the threshold energy E, which controls the portion of the potential surface
1296
affected by the bias, and the acceleration factor α, which determines the shape of the 1297
bias potential and thus how flattened the energy surface becomes. The bias potential 1298
allows escaping deep minima separated by high energy barriers, thus accelerating the 1299
transition between two conformational states of interest and extending the time scale of
1300
events that can be observed in simulation. Recent accelerated MD simulations with 1301
nanosecond steps [498] can explore more conformational dynamic events [499,500]. 1302
However, Boltzmann statistics need to be recovered from the simulations, and the effect
1303
of the bias potential must be unwinded. A reweighting procedure is typically used, 1304
which attempts to convert an accelerated MD trajectory to the canonical ensemble at a
1305
given temperature [8, 501]. 1306
Enhancements and adaptations of the baseline accelerated MD method are being 1307
proposed. We note here first the self-learning, reconnaissance metadynamics 1308
method [502], which combines principles of accelerated MD and the concept of collective
1309
variables that is the foundation of the metadynamics strategy. Similar to the baseline 1310
method, a bias potential is added to the true potential to locally flatten the energy 1311
surface. However, the bias potential is constructed over the low free energy region 1312
defined over a large number of locally-valid collective variables. The accelerated 1313
adaptive integration method [203] can be considered another adaptation of the baseline
1314
accelerated MD method for the problem of modeling ligand-binding processes. A ligand
1315
coupling parameter λis introduced to keep track of the end points of the 1316
receptor-ligand coupling and decoupling process; λtakes values from 0 to 1. The 1317
method assumes that some transitions can be more accessible if a certain stage of 1318
PLOS 27/83
coupling/decoupling (λ) is reached; the potential energy function is flattened at 1319
intermediate values of λinstead of at some threshold energy value E. 1320
Replica Exchange MD methods 1321
Replica exchange is a popular enhancement of the classical MD method; it is also 1322
known as parallel tempering. Originally, replica exchange was introduced to improve 1323
properties of the MC framework [503], but has since then been adapted to enhance MD
1324
sampling [504]. The usual continuous MD tra jectory is broken into several replica 1325
simulations randomly initialized and conducted at different temperatures. The number
1326
of replica simulations is typically determined by the user. So is the decision on 1327
temperatures assigned to the replica simulations. The simulations exchange information
1328
with one another by exchanging conformations at regular intervals. At a time, two 1329
simulations are selected, and their instantaneous conformations are exchanged according
1330
to the Metropolis criterion. The exchange often allows a particular simulation to escape
1331
a local minimum by making conformations accessed at higher temperatures available to
1332
those at lower temperatures, thus enhancing sampling capability. In addition, the 1333
setting of multiple simulations encourages parallel implementation and employment of 1334
distributed architectures with message passing. This gives replica exchange high 1335
exploration capability. Many adaptations and applications of replica exchange 1336
exist [149, 478, 505]. Work in [506] proposes a technique to deduce kinetics data from a
1337
heterogeneous ensemble of simulation trajectories. A detailed review of methods based
1338
on replica exchange can be found in [478]. 1339
Restrained Ensemble MD Methods 1340
We note here two methods to illustrate the employment of experimental data as 1341
restraints in MD-based simulations, the replica-averaged MD method and the 1342
replica-averaged metadynamics method. The employment of experimental data to 1343
correct a molecular force field and thus steer the sampled conformation ensemble 1344
towards the Boltzmann distribution has a rich history in macromolecular modeling. The
1345
idea of using experimental measurements as averaged structural restraints in MD 1346
simulations was first implemented for distances derived from NOE [35]. A penalty term
1347
was added to the force field if the time-average of an NMR observable calculated from 1348
an MD trajectory differed from that provided by experiment. A variation of this idea is
1349
to measure not a time-average but an ensemble-average observable. The latter is 1350
referred to as the replica-averaged approach, and a variety of restraining algorithms, 1351
including those that conduct both time and ensemble averaging, have been developed 1352
and applied to sample and characterize native, transition, intermediate, and unfolded 1353
states of proteins [17, 32, 34, 312, 316, 507–512]. 1354
Vendruscolo and colleagues [304] have demonstrated that MD simulations with 1355
replica-averaged structural restraints allow generating structural ensembles according to
1356
the maximum entropy principle introduced by Jaynes [513]. Jaynes addressed the 1357
problem of incorporating information from experiments into a structural model while 1358
avoiding corrupting the model with spurious and arbitrary biases. His maximum 1359
entropy method, however, proved too cumbersome. The restrained ensemble methods of
1360
Vendruscolo and others provide an alternative practical approach, but, until recently, it
1361
was not known whether these methods obey the maximum entropy principle. In 1362
addition to work in [304], Roux and collaborators demonstrate in [514] that 1363
restrained-ensemble MD simulations produce statistical distributions that are formally 1364
consistent with the maximum entropy principle. 1365
Distance restraints from NOE data, if available, can be integrated in ALMOST, an 1366
all-atom molecular simulation open-source package for macromolecules structure 1367
PLOS 28/83
determination and analysis [515]. In the replica-averaged metadynamics method [516], 1368
in addition to making use of replica-averaged restraints in the force field, the 1369
metadynamics framework is exploited to enhance sampling. Application on the 1370
α
-conotoxin SI, a 13-residue peptide that has been characterized extensively in the wet
1371
laboratory, shows that the method enables accurate reconstruction of the free energy 1372
landscape. 1373
Umbrella Sampling Umbrella sampling [517–519] is another method that employs 1374
collective variables. Umbrella sampling is related to importance sampling in statistics. 1375
Umbrella sampling addresses systems with energy landscapes where a high energy 1376
barrier separates two regions of the conformation space. The relevant system 1377
coordinates are grouped into sets of collective variables, with each set determining a 1378
separate umbrella window. A restraint bias potential forces the collective variables in a
1379
window to remain close to the center of mass. The restraint potential often takes a 1380
quadratic or harmonic form, determining the weighting function of a given window. If 1381
the configurations in a window are far from the equilibrium state, the weighting 1382
function will be large, and the simulation will be biased away from the initial 1383
configuration. The sets of collective variables must allow for slight overlap of their 1384
windows for proper reconstruction of the transitions between them. Extracting 1385
corresponding Boltzmann averages and handling overlapping weighting functions are 1386
key issues. The information from each window-biased simulation is converted into local
1387
probability histograms. The weighted histogram analysis method (WHAM) [520] is now
1388
the standard method to combine results from a set of umbrella sampling simulations. 1389
Work in [521] introduces superlinear numerical optimization algorithms to diagnose and
1390
quantify systematic errors due to limited sampling and to obtain fast and accurate 1391
solutions of coupled nonlinear WHAM equations. Work in [522] introduces a bootstrap
1392
method to accurately estimate error due to insufficient sampling and incorporates 1393
autocorrelations to reduce such errors. The method, g wham, has been incorporated in
1394
the popular GROMACS molecules simulation suite [359]. The umbrella sampling 1395
scheme can be integrated into other enhanced MD or MC strategies. We highlight here
1396
the self-learning umbrella sampling method in [523], which learns, through a feedback 1397
mechanism, which regions of a multidimensional space are worth exploring and 1398
automatically generates a set of windows. This method needs a significant smaller 1399
number of umbrella windows to characterize the free energy landscape over the most 1400
relevant regions without any loss in accuracy. Umbrella sampling has been employed to
1401
study processes with large conformational changes or rare events, such as ligand binding
1402
and ion induced diffusion in membrane proteins [523,524]. 1403
Adaptive MD Sampling Methods 1404
Guiding MD sampling via on-the-fly analysis of obtained conformations to determine 1405
undersampled regions of the conformation space is gaining ground in macromolecular 1406
modeling. The principal difficulty with adaptive sampling is the identification of 1407
meaningful collective variables over which to project conformations and obtain 1408
lower-dimensional embeddings of the conformation space for the identification of 1409
under-sampled regions and calculation of interesting statistics. While collective 1410
variables, such as number of native and non-native contacts, hydrogen bonds, dihedral 1411
angles, RMSD, radius of gyration remain popular, these variables have been shown to 1412
result in overly smooth landscapes [525] and mask interesting transitions. Recent work
1413
by Clementi and colleagues has reintroduced diffusion-based dimensionality reduction 1414
methods for extracting collective variables and has demonstrated the power of such 1415
methods for characterizing complex energy landscapes [526, 527]. Further work by the 1416
PLOS 29/83
same authors in [528, 529] employs the identified collective variables to guide and 1417
expedite sampling of rare events via MD. 1418
In contrast to methods that rely on the identification of collective variables, a 1419
different line of work in the early 2000s introduced the concept of kinetic clustering and
1420
conformation space network. Both were precursors of the MSM. The main idea was to 1421
organize conformations in discrete, graph-based models of connectivity to both visualize
1422
the free energy surface and carry out interesting calculations on such models. 1423
The concept of kinetic clustering evolved from the disconnectivity graphs put forth 1424
separately by Karplus and Wales [530–532]. Work by Rao and Caflisch took this idea 1425
further by proposing complex network analysis both to visualize and study the 1426
conformation space and folding of peptides [533]. In lieu of geometric clustering, 1427
conformations in [533] were grouped together by secondary structure, and the different
1428
emerging groups were abstracted as nodes of a network, with links between nodes 1429
recording observed transitions between groups. Interesting observations were made 1430
regarding network topology and peptide folding kinetics in [533] and in later 1431
applications investigating the impact of single-point mutations on peptide folding [534]
1432
(a detailed review of the conformation network idea can be found in [535]), but the 1433
broader analogy (and generalization) between conformation space networks and MSMs
1434
would emerge later. In tandem with the conformation space network proposed by 1435
Caflisch, related work by Karplus further propelled the disconnectivity graphs to 1436
additionally employ max-flow/min-cut algorithms to lay bare the hidden complexity of
1437
free energy surfaces of peptides and proteins [525, 536]. It is worth noting in this context
1438
that the free energy surface generated by implicit solvent is often very different and 1439
more complex than that generated by explicit solvent [537]. Early work in [538] 1440
demonstrates that explicit solvent smooths the energy surface. 1441
Kinetic clustering continues to be useful and has been used successfully to 1442
characterize protein folding through very long MD simulations [147]. In [147], 1443
conformations are assigned to clusters so that the long time scale behavior in 1444
cluster-space mimics that in the MD simulation. Autocorrelation functions of the time
1445
series of a large number of atomic distances are calculated to match the long time scale
1446
of these functions with corresponding correlation functions calculated over dynamics in
1447
cluster space. The assignments and then the construction of transitions between distinct
1448
long-lived states identifies the slower transitions [147]. 1449
It was only around 2005 that the analogy between the conformation space network 1450
and the MSM would be made by Pande and coworkers [363,539]. The notion of kinetic
1451
clustering was generalized, and the conformation space networks evolved into kinetic 1452
networks connecting meta-stable states, effectively MSMs [540]. The integration of 1453
MSMs [146,153, 541] into MD simulations allows investigating macromolecular dynamics
1454
even beyond the second time scale [123]. Originally, MSMs were only employed to 1455
analyze the connectivity of conformational states sampled through multiple, long MD 1456
simulations and employ calculations over the MSM to derive kinetic measurements [363].
1457
In [123], MSMs were employed to reconstruct folding pathways from short 1458
off-equilibrium, all-atom simulations in explicit solvent. MSM and MD methods have 1459
been applied to model folding [542–545], protein–ligand binding [136, 138, 546], protein 1460
switches in kinase and GPCRs [547,548], allostery [549] and IDPs [541, 550], revealing 1461
extensive statistical details about intermediates states [136,542, 551] and molecular 1462
interaction mechanisms. The employment of MSMs to focus computational resources to
1463
under-sampled regions of the conformation space in an adaptive manner is a rather 1464
recent development in macromolecular modeling. A semi-automatic protocol has been 1465
proposed in [552] to simulate the folding and unfolding of the villin headpiece in a very
1466
efficient manner. Work in [128] also proposes a semi-automatic protocol analyzing MD
1467
trajectories with a constructed MSM model to pinpoint where more sampling needs to 1468
PLOS 30/83
be conducted. As of now, a fully automatic protocol remains elusive [553]. 1469
While MSM-guided MD sampling relies on obtaining a discrete model of the 1470
connectivity of the sampled conformation space to guide further sampling, other 1471
methods rely on modifying the energy function itself to bias the simulation away from 1472
already-sampled conformations. One of the earliest methods to do so was local 1473
elevation [481]. In local elevation, the actual potential energy surface is modified in 1474
order to drive conformational sampling away from visited conformations (a bias term 1475
that is the sum of of repulsive functions is added to the potential energy function). 1476
Metadynamics methods follow a similar approach [554,555]. The assumption in 1477
these methods is that the system can be described in terms of a few collective variables.
1478
During the MD simulation, the location of the system is calculated in terms of the 1479
collective variables. A positive Gaussian potential is then added to the energy landscape
1480
so that the simulation is biased to return to the previous location. During the 1481
simulation, more and more Gaussians add up to the point that the system is 1482
discouraged from going back to previous locations in the energy landscape, thus 1483
exploring the full landscape. The time interval between the addition of two Gaussians 1484
and the height and width of a Gaussian are all tunable parameters to optimize the ratio
1485
between accuracy and computational cost. The crucial issue in metadynamics, as in 1486
other techniques based on collective variables, is to identify the right collective variables.
1487
Strategies to do so are reviewed in [555]. The metadynamics strategy is available as a 1488
portable plugin for MD simulation platforms in PLUMED [556]. Metadynamics MD has
1489
been applied to study the folding process of small proteins [557, 558], protein 1490
switches [559–561], and ion induced diffusion of small molecules in cavities and 1491
channels [562,563]. Metadynamics methods have also allowed modeling the docking 1492
process with full protein flexibility [135, 564–567]. 1493
MC-based Methods and Enhancements 1494
While a significant portion of research on macromolecular structure and dynamics 1495
employs MD-based methods, a just as significant portion employs MC sampling. In MC,
1496
the evolution of a conformation into another is not guided by Newton’s equation of 1497
motion but instead a programmed move or step designed to introduce a small or large 1498
conformational change. The end result of the move is only accepted according to the 1499
Metropolis criterion in order to promote the trajectory of consecutive conformations to
1500
converge to the global minimum while allowing some non-zero probability of escaping a
1501
current minimum. MC-based methods employ the notion of effective temperature to 1502
regulate the height of energy barriers that can be crossed. While generally regarded to
1503
have higher sampling capability than MD, MC methods also are prone to convergence 1504
to local minima and forego any direct information of time scales and kinetics. Many of
1505
the enhancement strategies for MD can be applied to MC-based methods. In the 1506
following we highlight two such enhancements. 1507
Collective Motions Molecular Dynamics and Monte Carlo 1508
Collective MD [568] belongs to the family of enhanced MD sampling methods that 1509
simplify sampling considering only the most dominant, low-frequency, low-resolution, 1510
collective motions. The latter are identified by modeling a structure through the 1511
anisotropic network model (ANM) [569]. The basic approach is to deform the structure
1512
collectively along the modes predicted by the ANM. A Metropolis-based MC scheme is
1513
employed to select the ANM modes; the stochasticity permits the system to occasionally
1514
circumvent energy barriers. The ANMPathway is a related sampling method that uses 1515
modes extracted from two ENMs representative of the experimental structures that 1516
constitute the end points of the transition under investigation [570]. Both methods have
1517
PLOS 31/83
been tested on modeling open-close transitions in AdK [568, 570] and several 1518
transporting membrane proteins [570]; the transition pathways were captured in great 1519
detail and at significantly lower computational cost than other methods [571]. 1520
Weighted Ensemble Method 1521
The weighted ensemble method (WEM) [572] is an enhanced sampling method with 1522
simplified sampling. WEM uses a multiple-trajectory strategy in which individual 1523
trajectories can spawn multiple daughter trajectories upon reaching new regions of 1524
configuration space called bins. The daughters are suitably weighted to ensure 1525
statistical rigor. WEM can yield rigorous estimates for time scales that are much longer
1526
than the simulations themselves. The idea to split and propagate re-weighted 1527
trajectories had been initially introduced in MC simulations, but WEM can be used as
1528
a sampling method for MD simulations, as well [572]. WEM has been employed to 1529
model folding [573], non-equilibrium [574] and equilibrium and processes [572], and 1530
conformational transitions between end-points separated by high energy barriers [575]. 1531
Other Algorithmic Frameworks 1532
Morphing Methods 1533
Geometric morphing uses the linear interpolation of each atom to construct a path 1534
between conformations. MolMovDB [337, 576] was the first online tool to allow 1535
obtaining and visualizing such paths. After each linear interpolation, the morphing 1536
algorithm in MolMovDB conducts an energy minimization to fix possible distortions 1537
and restore the stereochemistry of the intermediate points in the interpolated trajectory.
1538
The created morphs are stored in the database of motions and can be found by protein
1539
name, PDB ID, or motion type [577]. 1540
Conformational trajectories based on linear interpolation do not necessarily 1541
represent actual conformational pathways. Several morphing-based methods have been
1542
developed that provide non-linear interpolations between the start and goal structures 1543
to be connected through intermediate conformations [130, 338, 341,578
580]. Non-linear
1544
morphing methods rely on normal mode analysis (NMA) of harmonic-type models, such
1545
as the ENM and its variants, to obtain principle motions of a macromolecule about a 1546
local minimum. Such models are based on early concepts by Go, Scheraga, and 1547
Flory [581
583], and they rely on the assumption that macromolecules can be treated as
1548
deformable elastic bodies, where the interatomic potential function can be represented 1549
by a harmonic model [584, 585], and interactions depend only on the density of 1550
neighbors [586, 587]. The earliest application of NMA to elucidate equilibrium dynamics
1551
was conducted in the Karplus laboratory [228], though the usage of normal modes 1552
predates this by 7 years; Levitt and Warshel used normal modes to jump out of local 1553
minima in pioneering folding simulations [68,72]. Further work demonstrated the 1554
effectiveness of such models for capturing thermal vibrations and predicting 1555
experimental B-factors [584, 585, 588–590]. Other work employed normal modes 1556
extracted via NMA from a single structure to model equilibrium fluctuations and in 1557
some cases even capture simple conformational switching [591–598]. The NOMAD-Ref 1558
server [339] provides tools for online NMA of large molecules (of up to 100,000 atoms, 1559
maintaining atomistic detail of their structures) and access to a number of programs 1560
that use the normal modes to model deformations and conduct refinements of 1561
experimental structures. 1562
The earliest employment of NMA in the non-linear morphing setting, to extract 1563
information on intermediate conformations mediating the transition between a goal and
1564
start structure, appeared in [341, 599]. In [599], a geometric morphing technique is 1565
PLOS 32/83
proposed to bridge two ENMs corresponding to given start and goal structures. Related
1566
ideas appeared in [600, 601], moving along a few normal modes from the start structure
1567
pointing to the target structure and then parameterizing the elastic network along the 1568
pathway. In [578], the start and goal structures are interpolated upon optimal 1569
superposition of the CA atoms, but, in contrast to linear morphing methods, the 1570
resulting displacement vector is expanded as a linear combination of the normal modes
1571
calculated on the start structure. 1572
Since, typically ENMs involve only a single energy minimum and are not 1573
immediately applicable to model transitions between multiple stable and semi-stable 1574
structural states of a macromolecule, mixed ENMs [579, 602] and other, related, 1575
ENM-based models have been developed [130, 603–606]. The fundamental issue 1576
addressed in different ways in these works is how to interpolate the ENMs at the start 1577
and goal structures so that the resulting potential retains these structures as local 1578
minima [602]. The plastic network model (PNM) introduced in [603] can include 1579
additional known intermediate structures and is parameterized to account for known 1580
fluctuations available as experimental B-factors. 1581
A group of non-linear morphing methods based on ENMs, mixed ENMs, and 1582
variants such as PNM, compute transitions that are minimum-energy paths (MEP) in 1583
the energy landscape. In [603], the conjugate peak refinement (CPR) algorithm [607] is
1584
used to compute a series of steepest descent paths from saddle points to nearest minima
1585
to connect two structures of interest with a continuous curve in the conformation space.
1586
Similarly, in the Climber method [340, 608], a restraining energy depends linearly on the
1587
distance deviation between the current conformation and the target conformation in a 1588
way that allows full flexibility and enables the protein to move around high-energy 1589
barriers, rather than over them, resulting in the MEP. KOSMOS [609] is another online
1590
morph server that, in addition to offering NMA for nucleic acids, proteins, and their 1591
complexes, also generates plausible transition pathways by optimizing a 1592
topology-oriented cost function that guarantees a smooth transition without steric 1593
clashes. 1594
Transition Path Sampling and Chain-of-States Methods 1595
The main challenge with computing transitions of a macromolecule between meta-stable
1596
states or basins is due to the fact that a macromolecule may spend a very long time in
1597
one basin before transitioning to another. The disparity between the effective thermal 1598
energy and the typical energy barrier is manifested in long waiting periods where the 1599
macromolecule diffuses in a basin followed by a sudden jump to another basin. Such 1600
sudden jumps are rare events, and a significant body of work in macromolecular 1601
modeling is dedicated to enhancing conventional MC or MD simulation frameworks to 1602
capture such events in a reasonable time frame. These methods operationalize seminal 1603
ideas put forth by Pratt on transition path sampling (TPS) [610]. Even though the 1604
energy landscape of a complex system is typically dense in saddle points, only a few 1605
saddle points are relevant for transitions between basins. TPS methods do not rely on 1606
identifying saddle points in the potential energy surface. Instead, they implement 1607
importance sampling over a reduced set of collective variables that span the important 1608
regions of the high-dimensional search space [611–616]. TPS methods are numerical 1609
techniques that effectively conduct MC sampling of the ensemble of transition 1610
paths [617]. Detailed reviews of these methods can be found in [617, 618]. 1611
Transition paths obtained via TPS methods can be quite complicated for systems 1612
with high-dimensional conformation spaces and rugged energy landscapes; a statistical 1613
mechanics framework, known as the transition path theory (TPT) [619], is needed to 1614
organize and analyze the transition path ensemble. Moreover, the success of TPS 1615
methods depends on the particular progress coordinate defined to distinguish the 1616
PLOS 33/83
transition path in the search space, but finding an effective coordinate is non-trivial. 1617
Indeed, multiple progress coordinates may need to be defined to describe the transition.
1618
Therefore, a second group of methods founded on TPT implement the 1619
chain-of-states approach, which assumes that the transition path can be meaningfully 1620
encoded as a series or chain of structures (also referred to as images) [342, 607, 620
623].
1621
These methods can track an arbitrary number of progress coordinates while restraining
1622
sampling to effectively one dimension. In chain-of-states methods, a string of images is
1623
created between the given meta-stable states, and the images are relaxed to the 1624
transition pathway. Similar ideas had already appeared in [607,620]. Two types of 1625
chain-of-states methods were proposed afterwards, the nudged elastic band (NEB) 1626
methods and the string methods. 1627
The NEB method [624] addresses a key issue that arises when an artificial spring 1628
force is introduced to maintain even spacing between images. The problem is that when
1629
minimizing the elastic band, the component of the spring force that is perpendicular to
1630
the elastic band tends to pull the images off the MEP. To address this problem, in NEB,
1631
a minimization of the elastic band is carried out where the perpendicular component of
1632
the spring force and the parallel component of the true force are projected out. In this
1633
way, the spring force does not interfere with the relaxation of the images perpendicular
1634
to the path. The result is that the series of relaxed configurations is an approximation 1635
to the MEP, converging to the MEP when there is sufficient resolution in the discrete 1636
representation of the path (when enough images are included in the chain). It is worth
1637
noting that the MEP is just one, special path selected from curves connecting two given
1638
conformations. Work in [625] explains that this special path minimizes the absolute 1639
value of the mechanical work and so is the most probable path for an overdamped 1640
Brownian particle at 0 K [625] (in other words, the most probable Brownian trajectory
1641
in the absence of kinetic energy). Improvements to the NEB method introduced in [624]
1642
have been proposed, particularly regarding improving the tangent estimate [621] and 1643
lowering the computational cost of minimizations [342]. 1644
Generally, NEB methods require that the energy landscape be relatively smooth and
1645
are not effective on rugged energy landscapes [619]. Remedies have been proposed by 1646
having NEB methods operate on the free energy landscape [623], which is expected to 1647
be smoother, or by introducing temperature corrections to the MEP [626]. Caution 1648
must be exercised not to double count entropy when operating on free energy 1649
landscapes. One implication is that implicit solvent potentials cannot be employed to 1650
model dynamics on free energy landscapes. 1651
In string methods, splines are used instead to calculate tangents. In addition, image
1652
spacing is maintained via reparameterization. The first string method proposed in [622]
1653
belongs to the sub-category of zero-temperature string methods [344]. Extensions to 1654
operate on the space of collective variables and compute the minimum free energy path
1655
(MFEP) rather than MEP have also been proposed [343, 345]. Finite-temperature string
1656
methods were later proposed [347, 627] to better deal with overly rugged energy 1657
landscapes. 1658
String methods do not assume the energy landscape is smooth. They can also handle
1659
a large number of collective variables. Effective choices of collective variables have been
1660
discussed and tested in [628]. Work in [619] draws a difference between string methods
1661
and chain-of-states methods, as string methods start with an intrinsic formulation of the
1662
dynamics of curves/strings in configuration space and only resemble chain-of-states 1663
methods after discretization of the curves. String methods sample the configuration 1664
space with strings, which are smooth curves with intrinsic parameterization. The mean
1665
force and other conditional expectations are computed locally over the discretization 1666
points along the string. The string satisfies a differential equation that by construction
1667
guarantees that the string evolves to the most probable transition path connecting two
1668
PLOS 34/83
meta-stable states. 1669
In particular, the finite-temperature string method has been applied recently to 1670
model the complex α-helix to β-sheet transition in a β-hairpin mini protein in implicit 1671
solvent [629]. Transition pathways constructed by string methods have been reported 1672
in [630–634]. To fully appreciate the scope of the string method proposed in [343], we 1673
additionally note here its application to model in detail the transition of the converter 1674
of myosin VI between the PPS and R conformations by computing the associated 1675
MFEP for the R PPS isomerization, the free-energy profile along the transition 1676
pathway, and estimating the interconversion rate [635]. 1677
String methods make use of the approximation that, with high probability, the flux 1678
associated with transition paths is concentrated inside one or a few thin (reaction) 1679
tubes. This may not be a reasonable assumption, particularly for complex systems. The
1680
WEM is combined with a string method in [636] to address this issue. Another method,
1681
proposed in [637] and tested in [638, 639], combines a string method with swarms of 1682
trajectories [637]. 1683
Another drawback of string methods is their computational cost due to the multiple
1684
gradient calculations performed on images located far away from the transition state. 1685
Many methods are proposed to reduce this computational burden. We note here the 1686
growing and the freezing string methods [640–645]. The growing string method 1687
attempts to reduce the number of calculations in the iterative steps of string methods. 1688
Essentially, two string segments are grown independently from the start and goal 1689
structures until they join each-other. The freezing string method additionally reduces 1690
costs related to the parameterization in string methods. The images are optimized in a
1691
direction perpendicular to the progress coordinate with a few conjugate gradient steps 1692
and are then frozen in place, effectively constructing an approximate Hessian. Work 1693
in [646] demonstrates that this approximation performs as well as growing string 1694
methods that use the exact Hessian. As evidenced by the rich number works cited, work
1695
on methods for computing transition paths, rates, and transition states is very active. 1696
Evolutionary Algorithms 1697
An important group of methods to address optimization-related problems in 1698
macromolecular modeling consists of evolutionary algorithms (EAs). EAs approach 1699
stochastic optimization under the umbrella of evolutionary computation, where the 1700
main idea is for computation to mimic the process of evolution and natural selection to
1701
find local optima of a complex objective/fitness function. The realization that the 1702
potential energy landscape of a macromolecule can be non-linear and multimodal, and 1703
that many structure-centric macromolecular modeling problems can be cast as 1704
optimization problems makes EAs highly appealing for macromolecular modeling. 1705
Though EAs are highly customizable algorithms, they all follow a simple template. 1706
A population of samples of a configuration space (generally referred to as individuals) is
1707
evolved over a number of generations. An initialization mechanism specifies the initial 1708
population, which can consist of random samples or include configurations known to be
1709
local optima (for instance, experimentally-available structures may play this role). The
1710
population evolves either over a fixed, user-defined number of generations or until a 1711
different termination criterion is reached. In each generation, individuals with high 1712
fitness are repeatedly selected and varied upon. The selection mechanism specifies 1713
which individuals to select as parents for reproduction. The improvement mechanism 1714
consists of reproductive or variation operators, which can be asexual, introducing a 1715
mutation on a parent, or sexual, combining the material of two parents at one or more
1716
crossover points to generate offspring. A survival mechanism determines which 1717
individuals survive to the next generation. In non-overlapping or generational survival 1718
mechanisms, the offspring replace the parents. In overlapping ones, a subset of 1719
PLOS 35/83
individuals from the combined parent and offspring pool are selected for survival onto 1720
the next generation. A comprehensive review of EAs can be found in [647]. 1721
EAs are very rich algorithmic frameworks, as different design decisions in the 1722
initialization, variation, selection, and survival mechanisms can lead to very different 1723
behaviors. The decision on how to represent individuals is key both to the effectiveness
1724
and ease with which variation operators can be designed to produce good-quality 1725
individuals. EAs that employ crossover in addition to the asexual (mutation) operator 1726
are referred to as genetic algorithms (GAs). EAs that additionally incorporate a meme,
1727
which is a local improvement operator to improve an offspring and effectively map it to
1728
a nearby optimum, are referred to as hybrid or memetic EAs (MAs). The employment 1729
of multiple, independent objective functions as opposed to a single fitness function 1730
results in multi-objective EAs (MO-EAs). Specific variants that build over GA are 1731
respectively referred to as MGAs and MO-GAs. 1732
One of the first EAs for macromolecular structure modeling was a GA, proposed 1733
in [648] for the de novo protein structure prediction problem. Work in [648] also 1734
demonstrated that EAs are better able to escape local minima of a protein energy 1735
function than MC [648]. This result is not surprising, considering that the algorithm 1736
able to compute Lennard-Jones optima of atomic clusters in [649] was in fact an EA. 1737
Referred to as Basin Hopping, the algorithm was a 1+1 MA, which refers to an MA 1738
that has only one parent and one offspring. In a 1+1 MA, the population evolving over
1739
generations has size 1, and the offspring competes with the parent. We recall that MA 1740
refers to an EA where the offspring is subjected to a local improvement operator 1741
(energetic minimization). In Basin Hopping, the offspring replaces the parent with a 1742
probability resembling the Metropolis criterion. An MC search can also be viewed as an
1743
EA, specifically, a 1+1 EA, and all MC-based methods can be conceptualized as EAs 1744
employing highly specific insight about the optimization problem at hand. 1745
Given the early work in [648], EAs have a long history in de novo protein structure
1746
prediction. Customized EAs for this problem contain many evolutionary strategies and
1747
meta-heuristics, including the employment of a hall of fame to preserve “good” 1748
individuals (decoys), tabu search to improve the performance of a meme, co-evolving 1749
memes, niching, crowding, twin removal for population diversification, structuring of the
1750
solution space to facilitate distributed implementations capable of exploiting parallel 1751
computing architectures, and more. The main focus of algorithmic research on EAs is 1752
what mechanisms avoid premature convergence and allow finding the global optimum in
1753
overly rugged fitness landscapes. This is of particular interest on applications of EAs for
1754
different structure-centric problems in macromolecular modeling [650]. A comprehensive
1755
review of EAs for de novo protein structure prediction can be found in [651]. 1756
Though they have a long history in de novo structure prediction, EAs are not 1757
considered among the top performers in this problem for proteins no longer than 200 1758
amino acids. On long protein chains, where off-lattice models result in impractical 1759
computational demands, on-lattice EAs are by now the only viable algorithms [652,653].
1760
However, on shorter chains, where off-lattice models can be afforded, the injection of 1761
specialized operators (moves), such as molecular fragment replacement, and 1762
sophisticated hybrid potential energy functions have allowed rather simple MC-based 1763
algorithms to outperform non-customized EAs. Of note here are the Rosetta and Quark
1764
methods that often dominate the leader board in the CASP competition [118–120]. 1765
Even though EAs have yet to become state of the art in the de novo structure 1766
prediction setting, much progress has been made in recent years [390, 391, 654]. Recently,
1767
EAs have incorporated state-of-the-art, off-lattice representations and energy functions
1768
to become competitive with MC-based methods such as Rosetta [390, 391]. The 1769
additional recasting of the structure prediction problem as a multi-objective 1770
optimization one has resulted in higher exploration capability and conformation quality
1771
PLOS 36/83
over single-objective optimization approaches such as Rosetta [392, 655]. EAs are also 1772
employed to address protein folding [656]. 1773
While there is still much work to be done to demonstrate EAs as the state-of-the-art
1774
approaches for de novo structure prediction, there are three domains in macromolecular
1775
structure modeling where EAs are by now the best performers: protein-ligand binding, 1776
multimeric protein-protein docking, and cryo-EM reconstruction; 1777
In protein-ligand binding, some of the top algorithms are EAs. For instance, 1778
Autodock now employs a Lamarckian GA, which has been demonstrated to result in 1779
better-quality receptor-ligand bound configurations over the MC-SA algorithm 1780
employed in earlier releases [180]. In particular, work in [180] demonstrates that both 1781
the Lamarckian GA and a traditional GA can handle ligands of more degrees of freedom
1782
than MC-SA, and that the Lamarckian GA outperforms the traditional GA. The latter
1783
is due to the fact that in a Lamarckian GA, contrary to the Darwinian model of 1784
evolution, where only genetic traits are inheritable, an offspring is replaced with the 1785
result of the local improvement operator to which it is subjected. This results in 1786
essentially introducing phenotypic traits in the genotypic pool (improvements are passed
1787
onto the next generation), per Jean Baptiste Lamarck’s now discredited claim that 1788
phenotypic characteristics acquired during an individual’s lifetime can be become 1789
inheritable traits; (epigenetics is bringing more credibility, however, to Lamarck’s 1790
claims). It is worth pointing out that many MAs (for instance, even Basin Hopping) are
1791
Lamarckian EAs. MAs that are not Lamarckian choose not to replace the offspring with
1792
the result of the local improvement operator to which it is subjected but use the 1793
improved fitness in the survival mechanism; this is known as the Baldwin effect [657]. 1794
A domain where EAs are showing promise is in structure prediction for asymmetric,
1795
heteromeric assemblies. Currently, the only algorithm that has been shown capable of 1796
producing native asymmetric structures of heteromeric assemblies in the absence of 1797
wet-laboratory data is Multi-LZerD [292]. Multi-LZerD is a GA that represents 1798
multimeric conformations through spanning trees. The nodes in the tree represent the 1799
units, and the edges encode the presence of a direct interaction. As presented, 1800
Multi-LZerD proceeds over 3000 generations. While promising, the algorithm incurs a 1801
high computational cost to be practical in its current form for multimeric assemblies of
1802
more than 6 units. 1803
Another domain where EAs are shown to be highly successful is the simultaneous 1804
registration problem in cryo-EM microscopy reconstruction. One issue with cryo-EM is
1805
that low-resolution maps are often obtained for large asymmetric and/or dynamic 1806
macromolecular assemblies. In such cases, an important problem is how to 1807
simultaneously fit known structures of the units in the given map. A GA with 1808
specialized variation operators and tabu search has been proposed in [658] to 1809
successfully address this problem. This GA has also been used in later work in [659] to
1810
trace αhelices in low- to mid-resolution cryo-EM maps. 1811
While most of the work on EAs in the evolutionary computation community is 1812
driven by algorithmic design and analysis of the exploration capability rather than data
1813
quality, key ideas and strategies on evolutionary search are proving powerful in 1814
enhancing exploration capability in macromolecular structure modeling problems. For 1815
instance, several algorithmic decisions on how to select which parents for reproduction,
1816
generate offspring, and setup the competition for survival are key for balancing the 1817
breadth (exploration) and depth (exploitation) issue in exploration [647]. Lately, 1818
interesting ideas from multi-objective optimization are being incorporated in EAs for 1819
conformation sampling in de novo protein structure prediction. Namely, instead of 1820
pursuing the global minimum of an aggregate energy score, EA-based methods are 1821
proposed to obtain conformations that optimize specific sub-groupings of interatomic 1822
interactions [392]. EA-based methods are also showing promise in mapping energy 1823
PLOS 37/83
landscapes of proteins with large conformational changes [324, 660]. Due to the ongoing
1824
work in the evolutionary computation community on powerful and effective algorithmic
1825
strategies for obtaining solutions of complex objective functions and the realization of 1826
outstanding sampling bottlenecks in de novo structure prediction [661], adoption of EAs
1827
holds great promise for macromolecular structure modeling. 1828
Robotics-inspired Methods 1829
Since simulation of dynamics is the limiting factor in dynamics-based methods, 1830
efficiency concerns can be addressed by foregoing or at least delaying dynamics until 1831
credible conformational paths have been obtained. A different class of methods focuses
1832
not on producing transition trajectories but rather computing a sequence of 1833
conformations (a conformational path) with a credible energy profile. The working 1834
assumption is that, once obtained, credible conformational paths can then be locally 1835
deformed with techniques that consider dynamics to obtain actual transition 1836
trajectories. Such methods adapt sampling-based algorithms developed to address the 1837
robot motion-planning problem and are thus known as robotics-inspired methods. 1838
The objective in robot motion planning is to obtain paths that take a robot from a 1839
start to a goal configuration. The robot motion planning problem bears mechanistic 1840
analogies to the problem of computing conformations along a transition trajectory; in 1841
both problems the goal is to uncover what of the underlying conformation or 1842
configuration space is employed in motions of a mechanical or biological system from a
1843
start to a goal conformation or configuration. Analogies between molecular bonds and 1844
robot links and atoms and robot joints are made to perform fast molecular kinematics. 1845
Robotics-inspired methods are tree-based or roadmap-based [662]. Tree-based 1846
methods grow a tree in conformation space from a given, start to a given, goal 1847
conformation representing the structures bridged by the sought transition. The growth
1848
of the tree is biased so the goal conformation can be reached in reasonable 1849
computational time. As a result, tree-based methods are efficient but limited in their 1850
sampling. They are known as single-query methods, as they can only answer one 1851
start-to-goal query at a time; that is, only one path of consecutive conformations that 1852
connect the start to the goal can be extracted from the tree. Running them multiple 1853
times to sample an ensemble of conformational paths for the same query results in an 1854
ensemble with high inter-path correlations due to the biasing of the conformation tree. 1855
Roadmap-based methods adapt the Probabilistic Road Map (PRM) framework [663]. 1856
These methods support multiple queries. Rather than grow a tree in conformation 1857
space, these methods detach the sampling of conformations from the structure that 1858
encodes neighborhood relationships among conformations in the conformation space. 1859
Typically, a sampling stage first provides a discrete representation of the conformation 1860
space of interest, and then a roadmap building stage embeds sampled conformations in
1861
a graph/roadmap by connecting each one to its nearest neighbors. 1862
Roadmap-based methods bring their own unique set of challenges. 1863
Randomly-sampled conformations have very low probability of being in the region of 1864
interest for the transition. In particular, for long chains with many degrees of freedom 1865
(hundreds of backbone angles in small-to-medium protein chains), a protein 1866
conformation sampled at random is very unlikely to be physically realistic. Biased 1867
sampling techniques can be used to remedy this issue [664, 665], but it is hard to know 1868
which ones will focus sampling to regions of interest for the transition. In addition, both
1869
roadmap- and tree-based methods rely on local planners or local deformation techniques
1870
to connect two neighboring conformations. It is hard to find reasonable local planners 1871
for protein conformations. A linear interpolation is often carried over the employed 1872
parameters, typically backbone angles, but this can produce unrealistic conformations, 1873
and a lot of time can be spent energetically refining these conformations. Recent work 1874
PLOS 38/83
is considering complex local planners that are not based on interpolation but are instead
1875
re-formulations of the motion computation problem. Recent work in [666] introduces a
1876
prioritized path sampling scheme to address the computational demands of complex 1877
local planners in roadmap-based methods for protein motion computation. 1878
Roadmap-based methods have been employed to model unfolding of small 1879
proteins [665, 667]. Tree-based methods have been employed to model conformational 1880
changes and flexibility, predict the native structure, and compute conformational paths
1881
connecting given structural states [351,352, 387, 668–670]. In particular, the T-RRT 1882
method described in [351] and the PDST method described in [352] have focused on the
1883
problem of computing conformational paths connecting two given structures. While 1884
T-RRT has been shown to connect known low-energy states of the dialanine peptide (2
1885
amino acids long) [351], the PDST method has been shown to produce credible 1886
information on the order of conformational changes connecting stable states of large 1887
proteins (200
500 amino acids long) [352]. Both methods control the dimensionality of
1888
the conformation space by either focusing on systems with few amino acids [351] or by 1889
employing coarse-grained representations to reduce the number of modeled parameters
1890
in large proteins [352]. The tree-based method in [353] employs the fragment 1891
replacement technique to reduce the dimensionality of the conformation space and 1892
sample conformational paths connecting two given structural states of proteins ranging
1893
from from a few dozen to a few hundred amino acids. At each iteration, a conformation
1894
in the tree is selected for expansion. The expansion employs molecular fragment 1895
replacement and the Metropolis criterion to bias the tree towards low-energy 1896
conformations over time. The selection penalizes the tree from growing towards regions
1897
of the conformation space that have been oversampled, thus resulting in enhanced 1898
sampling of the conformation space. 1899
Conclusions 1900
This review has highlighted the breadth and depth of research in macromolecular 1901
modeling and simulation. A plethora of computational methods have been developed to
1902
study a wide spectrum of molecular events. QM methods are used to study molecular 1903
electronic structures and obtain detailed and accurate electronic structure calculations. 1904
Work in [671] employs such calculations to correlate quantum descriptors and the 1905
biological activity of 13 quinoxaline drug compounds and then suggest effective 1906
compounds against drug-resistant Mycobacterium tuberculosis. Recent efforts in 1907
quantum chemistry are devoted to circumventing computational bottlenecks of 1908
large-scale electronic structure calculations and extending applicability to molecular 1909
systems composed of hundreds of atoms [672]. At present, QM methods have too high a
1910
computational cost to be a competitive alternative to MD or MC methods and their 1911
variants. For this reason, the focus of this review has been on MM methods, such as 1912
MD and variations, which are the methods of choice to study macromolecular structure
1913
and dynamics. It should be noted that hybrid, QM/MM methods exist and are the 1914
methods of choice for modeling reactions in biomolecular systems [673]. 1915
One of the major themes in MM-based macromolecular modeling is the choice of 1916
resolution or detail. As this review has summarized, atomistic, explicit solvent MD 1917
simulations are becoming more affordable, both due to improvements in hardware and 1918
techniques that allow aggressive parallelization. Despite the challenges posed by the 1919
disparate spatial and time scales employed by macromolecules flexing their structures 1920
and interacting with their environment, significant algorithmic and hardware advances 1921
have allowed breaking the millisecond barrier [147]. Dynamical processes that involve 1922
millions of atoms can now be characterized. For example, work in [674] tracks via MD 1923
simulations the microsecond-long atomic motions of 1.2 million particles to study the 1924
PLOS 39/83
dissolution of the capsid of the satellite tobacco necrosis virus. 1925
MD and non-MD methods that employ reduced, coarse-grained macromolecular 1926
models are often regarded as “cheaper” albeit less accurate alternatives to atomistic 1927
MD methods. Such cheaper methods currently complement or facilitate atomistic 1928
MD-based studies. For example, protein docking methods are routinely employed to 1929
assist cryo-EM in resolving structures of molecular assemblies. Once such methods 1930
narrow down the possible conformation space, subsequent atomistic MD simulations are
1931
employed to make final predictions by examining stability and dynamics [111]. 1932
In some settings, these cheaper methods provide the only practical approach. Even 1933
with various accelerated MD simulations, mapping of protein energy landscapes remains
1934
challenging. For example, work in [10] shows that the sampling capability of accelerated
1935
MD greatly depends on the structure used to initiate a trajectory [10]. In our own 1936
laboratories, we have been able to compare the cheaper methods to published atomistic
1937
MD simulations of H-Ras [660]. In particular, on H-Ras, the evolutionary algorithm 1938
in [660] is able to map the energy landscape of H-Ras wildtype and selected variants in
1939
atomistic detail better than what can currently be achieved via known MD methods. In
1940
addition, in a similar comparison on known TIR domains, MD simulations are found to
1941
only cover a small portion of the known conformation space (unpublished data - Qi, 1942
Chen, Wei, Nussinov, and Ma, “L265P mutation changes the energy landscape of 1943
MyD88 protein”) 1944
In MD-based research, two different directions seem to be pursued by researchers at
1945
the moment. The first involves the employment of very long MD simulations, made 1946
possible by complex MD-customized architectures, like Anton. Thermodynamic and 1947
kinetic quantities can be readily extracted from such simulations. The second involves 1948
the employment of several short, off-equilibrium MD simulations, which allows the 1949
employment of parallel architectures but necessitates the employment of statistical 1950
models, such as Markov state models, to collect and organize the simulations to describe
1951
the long-time behavior of a system. Both directions are exciting and complementary. In
1952
particular, the second direction is leading to advances in the combination of continuous
1953
and discrete models for expediting modeling of long-time scale phenomena and is likely
1954
to lead to further algorithmic advancements. Within each of these directions, several 1955
open questions remain for researchers to pursue. A combination of both directions, 1956
dedicated architectures and continuous and discrete models promises to push the spatial
1957
and time scales that can be observed in silico even further. 1958
As summarized in this review, many non-MD algorithmic frameworks are being 1959
pursued to model different aspects of macromolecular structure and dynamics. Often, 1960
these frameworks are inspired or initiated from diverse communities of researchers. Of 1961
note here are evolutionary algorithms and robotics-inspired algorithms. While 1962
components of these algorithms are often investigated in detail in each of the 1963
corresponding communities, the focus in these communities has traditionally been on 1964
often on computational performance rather than quality of findings. Broad employment
1965
of these algorithms as tools complementary to MD is currently challenged by an 1966
inability to demonstrate utility on a broad class of macromolecular systems and validate
1967
findings with existing wet-laboratory or MD-based studies. Nonetheless, a growing body
1968
of researchers within each of these communities is introducing treatments focused on 1969
both computational performance and data quality. 1970
This review has summarized the current state of the art in diverse application areas.
1971
An emerging theme is the need to characterize in detail the structural flexibility of a 1972
macromolecular system under specific conditions. While great progress is being made, 1973
computing a conformation ensemble consistent with explicit or implicit constraints is 1974
likely to motivate the development of novel algorithms for years to come. 1975
Many other directions of research in macromolecular modeling and simulation could
1976
PLOS 40/83
not be described in detail here. These include the development of accurate and sensitive
1977
molecular force fields [140, 141] for macromolecular simulation, the development of 1978
increasingly accurate coarse-grained representations of macromolecules, solvent models,
1979
and multiscaling techniques [76,142
144], decoy/model selection algorithms [675] in de
1980
novo structure prediction, as well as the development of algorithmic tools to assist 1981
structure resolution in the wet laboratory [676, 677]. Additionally, while this review 1982
highlights some of the unique challenges posed by intrinsically disordered proteins and 1983
regions, it does not provide an overview of similar challenges posed by membrane 1984
proteins. The reader is referred to work in [678] for a review of such challenges and 1985
algorithmic advancements. 1986
Expected advances in each of the reviewed application areas promise to provide us 1987
with a more comprehensive and detailed understanding of our biology. In particular, 1988
unraveling the behavior of macromolecules in isolation and assembly will help us 1989
understand the molecular basis of mechanisms in the healthy and diseased cell. A truly
1990
synergistic employment of in-silico and wet-lab research to unravel molecular 1991
mechanisms also promises to lead to better therapeutics for combating cancer, 1992
neurodegenerative disorders, infections, and other important human disorders of our 1993
time. The journey into the future of computational structural biology promises to be 1994
exciting, and we hope that this review has inspired a few more researchers to join us on
1995
this journey. 1996
Acknowledgments 1997
Funding for this work is provided in part by the National Science Foundation (Grant 1998
No. 1421001, Grant No. 1440581, and CAREER Award No. 1144106 to AS) and and 1999
the Thomas F. and Kate Miller Jeffress Memorial Trust Award. This work has also 2000
been funded in whole or in part with Federal funds from the NCI, NIH, under contract
2001
number HHSN261200800001E to BM and RN. The content of this publication does not
2002
necessarily reflect the views or policies of the National Science Foundation or 2003
Department of Health and Human Services, nor does mention of trade names, 2004
commercial products, or organizations imply endorsement by the U.S. Government. 2005
This study was supported (in part) by the Intramural Research Program of the NIH, 2006
NCI, Center for Cancer Research. 2007
References
References
1. Soto C. Protein misfolding and neurodegeneration. JAMA Neurology.
2008;65(2):184–189.
2. Uversky VN. Intrinsic disorder in proteins associated with neurodegenerative
diseases. Front Biosci. 2009;14:5188–5238.
3. Fern´andez-Medarde A, Santos E. Ras in cancer and developmental diseases.
Genes Cancer. 2011;2(3):344–358.
4. Neudecker P, Robustelli P, Cavalli A, Walsh P, Lundstr om P, Zarrine-Afsar A,
et al. Structure of an intermediate state in protein folding and aggregation.
Science. 2012;336(6079):362–366.
5. Onuchic JN, Luthey-Schulten Z, Wolynes PG. Theory of protein folding: the
energy landscape perspective. Annu Rev Phys Chem. 1997;48:545–600.
PLOS 41/83
6. Ozenne V, Schneider R, Yao M, Huang JR, Salmon L, Zweckstetter M, et al.
Mapping the potential energy landscape of intrinsically disordered proteins at
amino acid resolution. J Am Chem Soc. 2012;134(36):15138–15148.
7. Levy Y, Jortner J, Becker OM. Solvent effects on the energy landscapes and
folding kinetics of polyalanine. Proc Natl Acad Sci USA. 2001;98(5):2188–2193.
8. Miao Y, Nichols SE, McCammon JA. Free energy landscapes of
G-protein-coupled receptors, explored by accelerated molecular dynamics. Phys
Chem Chem Phys. 2014;16(14):6398–6406.
9. Gorfe AA, Grant BJ, McCammon JA. Mapping the nucleotide and
isoform-dependent structural and dynamical features of Ras proteins. Structure.
2008;16(6):885–896.
10. Grant BJ, Gorfe AA, McCammon JA. Ras Conformational Switching:
Simulating Nucleotide-Dependent Conformational Transitions with Accelerated
Molecular Dynamics. PLoS Comput Biol. 2009;5(3):e1000325.
11. Anfinsen CB. Principles that govern the folding of protein chains. Science.
1973;181(4096):223–230.
12. Fersht AR. Structure and Mechanism in Protein Science. A Guide to Enzyme
Catalysis and Protein Folding. 3rd ed. New York, NY: W. H. Freeman and Co.;
1999.
13.
Frauenfelder H, Sligar SG, Wolynes PG. The energy landscapes and motion on
proteins. Science. 1991;254(5038):1598–1603.
14. Sawaya MR, Kraut J. Loop and Domain Movements in the Mechanism of E.
Coli Dihydrofolate Reductase: Crystallographic Evidence. Biochemistry.
1997;36(3):586–603.
15. Radkiewicz JL, Brooks CL. Protein dynamics in enzymatic catalysis:
Exploration of dihydrofolate reductase. J Am Chem Soc. 2000;122(2):225–231.
16. Vendruscolo M, Dobson CM. Dynamic visions of enzymatic reactions. Science.
2006;313(5793):1586–1587.
17.
Clore GM, Schwieters CD. How much backbone motion in ubiquitin is required
to account for dipolar coupling data measured in multiple alignment media as
assessed by independent cross-validation? J Am Chem Soc.
2004;126(9):2923–2938.
18. Henzler-Wildman K, Kern D. Dynamic personalities of proteins. Nature.
2007;450:964–972.
19.
Okazaki K, Koga N, Takada S, Onuchic JN, Wolynes PG. Multiple-basin energy
landscapes for large-amplitude conformational motions of proteins:
Structure-based molecular dynamics simulations. Proc Natl Acad Sci USA.
2006;103(32):11844–11849.
20. Hub JS, de Groot BL. Detection of Functional Modes in Protein Dynamics.
PLoS Comput Biol. 2009;5(8):e1000480.
21. Bahar I, Lezon TR, Yang LW, Eyal E. Global dynamics of proteins: bridging
between structure and function. Annu Rev Biophys. 2010;39:23–42.
PLOS 42/83
22. Boehr DD, Wright PE. How do proteins interact? Science.
2008;320(5882):1429–1430.
23. Boehr DD, Nussinov R, Wright PE. The role of dynamic conformational
ensembles in biomolecular recognition. Nature Chem Biol. 2009;5(11):789–96.
24. Feynman RP, Leighton RB, Sands M. The Feynman Lectures on Physics.
Reading, MA: Addison-Wesley; 1963.
25. McCammon JA, Gelin BR, Karplus M. Dynamics of folded proteins. Nature.
1977;267:585–590.
26. Cooper A. Protein fluctuations and the thermodynamic uncertainty principle.
Prog Biophys Mol Biol. 1984;44(3):181–214.
27. Frauenfelder H, Wolynes PG. Biomolecules: Where the Physics of Complexity
and Simplicity Meet. Physics Today. 1994;47(2):58–64.
28. Dill KA, Chan HS. From Levinthal to pathways to funnels. Nat Struct Biol.
1997;4(1):10–19.
29.
Heymann JB, Conway JF, Steven AC. Molecular dynamics of protein complexes
from four-dimensional cryo-electron microscopy. J Struct Biol.
2004;147(3):291–301.
30. Kleckner IR, Foster MP. An introduction to NMR-based approaches for
measuring protein dynamics. Biochim Biophys Acta. 2011;14(8):942–968.
31.
Fenwick RB, van den Bedem H, Fraser JS, Wright PE. Integrated description of
protein dynamics from room-temperature X-ray crystallography and NMR. Proc
Natl Acad Sci USA. 2014;111(4):E445–E454.
32. Best RB, Vendruscolo M. Determination of ensembles of structures consistent
with NMR order parameters. J Am Chem Soc. 2004;126(26):8090–8091.
33. Berlin K, Casta˜neda CA, Schneidman-Duhovny D, Sali A, Nava-Tudela A,
Fushman D. Recovering a representative conformational ensemble from
underdetermined macromolecular structural data. J Am Chem Soc.
2013;135(44):16595–16609.
34.
De Simone A, Montalvao RW, Dobson CM, Vendruscolo M. Characterization of
the Interdomain Motions in Hen Lysozyme Using Residual Dipolar Couplings as
Replica-Averaged Structural Restraints in Molecular Dynamics Simulations.
Biochemistry. 2013;52(37):6480–6486.
35. Lindorff-Larsen K, Best RB, DePristo MA, Dobson CM, Vendruscolo M.
Simultaneous determination of protein structure and dynamics. Nature.
2005;433(7022):128–132.
36. Vendruscolo M, Pacci E, Dobson C, Karplus M. Rare Fluctuations of Native
Proteins Sampled by Equilibrium Hydrogen Exchange. J Am Chem Soc.
2003;125(51):15686–15687.
37. Kay LE. Protein Dynamics from NMR. Nat Struct Biol. 1998;5(2-3):513–517.
38. Kay LE. NMR studies of protein structure and dynamics. J Magn Reson.
2005;173(2):193–207.
PLOS 43/83
39. Torella JP, Holden SJ, Santoso Y, Hohlbein J, Kapanidis AN. Identifying
Molecular Dynamics in Single-Molecule FRET Experiments with Burst Variance
Analysis. Biophys J. 2011;100(6):1568–1577.
40. Zhu G, editor. NMR of proteins and small biomolecules. vol. 326 of Topics in
Current Chemistry. Springer-Verlag; 2012.
41. Karam P, Powdrill MH, Liu HW, Vasquez C, Mah W, Bernatchez J, et al.
Dynamics of hepatitis C Virus (HCV) RNA-dependent RNA Polymerase NS5B
in Complex with RNA. J Biol Chem. 2014;289(20):14399–14411.
42.
Moerner WE, Fromm DP. Methods of single-molecule fluorescence spectroscopy.
Rev Scientific Instruments. 2003;74(8):3597–3619.
43. Greenleaf WJ, Woodside MT, Block SM. High-Resolution, Single-Molecule
Measurements of Biomolecular Motion. Annu Rev Biophys Biomol Struct.
2007;36:171–190.
44. Michalet X, Weiss S, J¨ager M. Single-Molecule Fluorescence Studies of Protein
Folding and Conformational Dynamics. Chem Rev. 2006;106(5):1785–1813.
45. Diekmann S, Hoischen C. Biomolecular dynamics and binding studies in the
living cell. Physics of Life Reviews. 2014;11(1):1–30.
46.
Hohlbein J, Craggs TD, Cordes T. Alternating-laser excitation: single-molecule
FRET and beyond. Chem Soc Rev. 2014;43:1156–1171.
47. Schlau-Cohen GS, Wang Q, Southall J, Cogdell RJ, Moerner WE.
Single-molecule spectroscopy reveals photosynthetic LH2 complexes switch
between emissive states. Proc Natl Acad Sci USA. 2013;110(27):10899–10903.
48. Moffat K. The frontiers of time-resolved macromolecular crystallography:
movies and chirped X-ray pulses. Faraday Discuss. 2003;122(79-88):65–77.
49. Schotte F, Lim M, Jackson TA, Smirnov AV, Soman J, Olson JS, et al.
Watching a protein as it functions with 150-ps time-resolved X-ray
crystallography. Science. 2003;300(5627):1944–1947.
50. Roy R, Hohng S, Ha T. A practical guide to single-molecule FRET. Nature
Methods. 2008;5(6):507–516.
51. Lee HM, M KS, Kim HM, Suh YD. Single-molecule surface-enhanced Raman
spectroscopy: a perspective on the current status. Phys Chem Chem Phys.
2013;15:5276–5287.
52.
Socher E, Imperiali B. FRET-CAPTURE: A sensitive method for the detection
of dynamic protein interactions. Chem Biochem. 2013;14(1):53–57.
53. Gall A, Ilioaia C, Kr¨uger TP, Novoderezhkin VI, Robert B, van Grondelle R.
Conformational Switching in a Light-Harvesting Protein as Followed by
Single-Molecule Spectroscopy. Biophys J. 2015;108(11):2713–2720.
54. ˚
Ad´en J, Wolf-Watz M. NMR Identification of Transient Complexes Critical to
Adenylate Kinase Catalysis. J Am Chem Soc. 2007;129(45):14003 –14012.
55. Russel D, Lasker K, Phillips J, Schneidman-Duhovny D, Vel´aquez-Muriel JA,
Sali A. The structural dynamics of macromolecular processes. Curr Opin Cell
Biol. 2009;21(1):97–108.
PLOS 44/83
56. Taketomi H, Ueda Y, Go N. Studies on protein folding, unfolding and
fluctuations by computer simulation: The effect of specific amino acid sequence
represented by specific inter-unit interactions. Int J Peptide Prot Res.
1975;7(6):445–459.
57. Bashford D, Karplus M. pKa’s of ionizable groups in proteins: atomic detail
from a continuum electrostatic model. Biochemistry. 1990;29(44):10219–10225.
58. Lau KF, Dill AK. A lattice statistical mechanics model of the conformational
and sequence spaces of of proteins. Macromolecules. 1989;22(10):3986–3997.
59. Unger R, Moult J. Finding lowest free energy conformation of a protein is an
NP-hard problem: Proof and implications. Bull Math Biol.
1993;55(6):1183–1198.
60.
Hart WE, Istrail S. Robust Proofs of NP-Hardness for Protein Folding: General
Lattices and Energy Potentials. J Comp Biol. 1997;4(1):1–22.
61. Kendrew JC, Bodo G, Dintzis HM, Parrish RG, Wyckoff H, Phillips DC. A
three-dimensional model of the myoglobin molecule obtained by X-ray analysis.
Nature. 1958;181(4610):662–666.
62.
Kendrew JC, Dickerson RE, Strandberg BE, Hart RG, Davies DR, Phillips DC,
et al. Structure of myoglobin: A three-dimensional fourier synthesis at 2 ˚
A
resolution. Nature. 1960;185(4711):422–427.
63.
Phillips DC. The Hen Egg-White Lysozyme Molecule. Proc Natl Acad Sci USA.
1967;57(3):483–495.
64.
Berman HM, Henrick K, Nakamura H. Announcing the worldwide Protein Data
Bank. Nat Struct Biol. 2003;10(12):980–980.
65. Verlet L. Computer ”experiments” on Classical Fluids. I. Thermodynamical
Properties of Lennard-Jones Molecules. Phys Rev Lett. 1967;159:98–103.
66.
Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M.
CHARMM: a program for macromolecular energy, minimization, and dynamics
calculations. J Comput Chem. 1983;4(2):187–217.
67. Karplus M, McCammon JA. Protein structural fluctuations during a period of
100 ps. Nature. 1979;277(5697):578.
68. Levitt M, Warshel A. Computer simulation of protein folding. Nature.
1975;253(5494):94–96.
69. Lifson S, Warshel A. A Consistent Force Field for Calculation on
Conformations, Vibrational Spectra and Enthalpies of Cycloalkanes and
n-Alkane Molecules”. J Phys Chem. 1968;49:5116–5129.
70. Levitt M, Lifson S. Refinement of Protein Conformations Using a
Macromolecular Energy Minimization Procedure. J Mol Biol. 1969;46:269–279.
71. Gibson KD, Scheraga A. Minimization of Polypeptide Energy. I. Preliminary
Structures of Bovine Pancreatic Ribonuclease S-peptide. Proc Natl Acad Sci
USA. 1967;58:420–427.
72. Levitt M. A Simplified Representation of Protein Conformations for Rapid
Simulation of Protein Folding. J Mol Biol. 1976;104:59–107.
PLOS 45/83
73. A W, Levitt M. Theoretical Studies of Enzymatic Reactions: Dielectric,
Electrostatic and Steric Stabilization of the Carbonium Ion in the Reaction of
Lysozyme. J Mol Biol. 1976;103:227–249.
74. Warshel A. Computer simulations of enzyme catalysis: Methods, progress, and
insights. Annu Rev Biophys Biomol Struct. 2003;32:425–443.
75. Donchev AG, Ozrin VD, Subbotin MV, Tarasov OV, Tarasov VI. A Quantum
Mechanical Polarizable Force Field for Biomolecular Interactions. Proc Natl
Acad Sci USA. 2005;102(22):7829–7834.
76. Zhou H. Theoretical frameworks for multiscale modeling and simulation. Curr
Opinion Struct Biol. 2014;25:67–76.
77. Kamerlin SC, Haranczyk M, Warshel A. Progresses in Ab Initio QM/MM Free
Energy Simulations of Electrostatic Energies in Proteins: Accelerated QM/MM
Studies of pKa, Redox Reactions and Solvation Free Energies. J Phys Chem B.
2009;113(5):1253–1272.
78. Kamerlin SCL, Vicatos S, Dryga A, Warshel A. Coarse-Grained (Multiscale)
Simulations in Studies of Biophysical and Chemical Systems. Ann Rev Phys
Chem. 2011;62(1):41–64.
79. Plotnikov NV, Warshel A. Exploring, Refining, and Validating the
Paradynamics QM/MM Sampling,”. J Phys Chem B. 2012;116(34):10342–10356.
80. Vicatos S, Rychkova A, Mukherjee S, Warshel A. An effective Coarse-grained
model for biological simulations: Recent refinements and validations. Proteins:
Structure, Function, and Bioinformatics. 2014;82(7):1168–1185.
81. Warshel A. Energetics of enzyme catalysis. Proc Natl Acad Sci USA.
1978;75(11):5250–5254.
82. Mukherjee S, Warshel A. Electrostatic origin of the mechanochemical rotary
mechanism and the catalytic dwell of F1-ATPase. Proc Natl Acad Sci USA.
2011;108(51):20550–20555.
83. Mukherjee S, Warshel A. Realistic simulations of the coupling between the
protomotive force and the mechanical rotation of the F0-ATPase. Proc Natl
Acad Sci USA. 2012;109(3):14876–14881.
84. Dryga A, Chakrabarty S, Vicatos S, Warshel A. Realistic simulation of the
activation of voltage-gated ion channels. Proc Natl Acad Sci USA.
2011;109(9):3335–3340.
85. Rychkova A, Mukherjee S, Bora RP, Warshel A. Simulating the pulling of
stalled elongated peptide from the ribosome by the translocon. Proc Natl Acad
Sci USA. 2013;110(25):10195–10200.
86.
Mukherjee S, Warshel A. Electrostatic origin of the unidirectionality of walking
myosin V motors. Proc Natl Acad Sci USA. 2013;110(43):17326–17331.
87. Ma J, Sigler PB, Xu Z, Karplus M. A Dynamic Model for the Allosteric
Mechanism of GroEL. J Mol Biol. 2000;302:303–313.
88. Henzler-Wildman KA, Thai V, Lei M, Ott M, Wolf-Watz M, Fenn T, et al.
Intrinsic motions along an enzymatic reaction trajectory. Nature.
2007;450(7171):838–844.
PLOS 46/83
89. Gao YQ, Yang W, Karplus M. A structure-based model for the synthesis and
hydrolysis of ATP by F1-ATPase. Cell. 2005;123(2):195–205.
90. Pu J, Karplus M. How subunit coupling produces the γ-subunit rotary motion
in F1-ATPase. Proc Natl Acad Sci USA. 2008;105(4):1192–1197.
91. Scarabelli G, Grant BJ. Mapping the Structural and Dynamical Features of
Kinesin Motor Domains. PLoS Comput Biol. 2013;9(11):e1003329.
92. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. Equation
of state calculations by fast computing machines. J Chem Phys.
1953;21(6):1087–1092.
93. Torrie GM, Valleau JP. Nonphysical sampling distributions in Monte Carlo
free-energy estimation: umbrella sampling. J Comput Phys. 1977;23(2):187–199.
94.
Li Z, Scheraga HA. Monte Carlo-minimization approach to the multiple-minima
problem in protein folding. Proc Natl Acad Sci USA. 1987;84(19):6611–6615.
95.
Dinner AR, Sali A, Karplus M. The folding mechanism of larger model proteins:
role of native structure. Proc Natl Acad Sci USA. 1996;93(16):8356–8361.
96.
Lee J, Scheraga HA, Rackovsky S. New optimization method for conformational
energy calculations on polypeptides: Conformational space annealing. J Comput
Chem. 1997;18(9):1222–1232.
97. Lee J, Scheraga HA, Rackovsky S. Conformational analysis of the 20-residue
membrane-bound portion of melittin by conformational space annealing.
Biopolymers. 1998;46(2):103–115.
98.
Lee J, Scheraga HA. Conformational space annealing by parallel computations:
Extensive conformational search of Met-enkephalin and of the 20-residue
membrane-bound portion of melittin. Int J Quantum Chem. 1999;75(3):255–265.
99. Voter AF. Introduction to the Kinetic Monte Carlo Method. In: Sickafus KE,
Kotomin EA, Uberuaga BP, editors. Radiation Effects in Solids. vol. 235 of
NATO Science Series. Springer Verlag; 2007. p. 1–23.
100. Levitt M. The birth of computational structural biology. Nat Struct Biol.
2001;8:392–393.
101. Karplus M. Development of multiscale models for complex chemical systems
from H+H2 to Biomolecules. Nobel Lecture. 2013;p. 1–33. Available from:
http://www.nobelprize.org/nobel_prizes/chemistry/laureates/2013/
karplus-lecture.pdf.
102. Warshel A. Multiscale modeling of biological functions: from enzymes to
molecular machines. Nobel Lecture. 2013;p. 1–25. Available from:
http://www.nobelprize.org/nobel_prizes/chemistry/laureates/2013/
warshel-lecture.pdf.
103. Levitt M. Birth and future of multiscale modeling for macromolecular systems.
Nobel Lecture. 2013;p. 1–31. Available from: http://www.nobelprize.org/
nobel_prizes/chemistry/laureates/2013/levitt-lecture.pdf.
104. Piana S, Lindorff-Larsen K, Shaw DE. Protein folding kinetics and
thermodynamics from atomistic simulation. Proc Natl Acad Sci USA.
2012;109(44):17845–17850.
PLOS 47/83
105.
Lindorff-Larsen K, Piana S, Dror RO, Shaw DE. How fast-folding proteins fold.
Science. 2011;334(6055):517–520.
106. Stone JE, Phillips JC, Freddolino PL, Hardy DJ, Trabuco LG, Schulten K.
Accelerating molecular modeling applications with graphics processors. J
Comput Chem. 2007;28(16):2618–2640.
107. Harvey MJ, Giupponi G, de Fabritiis G. ACEMD: Accelerating Biomolecular
Dynamics in the microsecond timescale. J Comput Theor Chem.
2009;5(6):1632–1639.
108. Tanner DE, Phillips JC, Schulten K. GPU/CPU Algorithm for Generalized
Born/Solvent-Accessible Surface Area Implicit Solvent Calculations. J Chem
Theory Comput. 2012;8(7):2521–2530.
109. G otz AW, Williamson MJ, Xu D, Poole D, Le Grand S, Walker RC. Routine
Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 1.
Generalized Born. J Chem Theory Comput. 2012;8(5):1542–1555.
110. Dubrow A. What Got Done in One Year at NSF’s Stampede Supercomputer.
Comput Sci Eng. 2015;17(2):83–88.
111. Zhao G, Perilla JR, Yufenyuy EL, Meng X, Chen B, Ning J, et al. Mature
HIV-1 capsid structure by cryo-electron microscopy and all-atom molecular
dynamics. Nature. 2013;497(7451):643–646.
112. Perilla JR, Goh BC, Cassidy CK, Liu B, Bernardi RC, Rudack T, et al.
Molecular dynamics simulations of large macromolecular complexes. Curr Opin
Struct Biol. 2015;31:64–74.
113. Fattebert JL, Richards DF, Glosli JN. Dynamic load balancing algorithm for
molecular dynamics based on Voronoi cells domain decompositions. Comput
Phys Communic. 2012;183(12):2608–2615.
114.
Proctor AJ, Lipscomb TJ, Zou A, Anderson JA, Cho SS. Performance Analyses
of a Parallel Verlet Neighbor List Algorithm for GPU-Optimized MD
Simulations; 2012.
115. Batcho P, Case DA, Schlick T. Optimized particle-mesh Ewald/multiple-time
step integration for molecular dynamics simulations. J Chem Phys.
2001;115(9):4003–4018.
116. Phillips JC, Braun R, Wang W, Gumbart J, Tajkhorshid E, Villa E, et al.
Scalable molecular dynamics with NAMD. J Comput Chem.
2005;26(16):1781–1802.
117. Bradley P, Misura KMS, Baker D. Toward High-Resolution de Novo Structure
Prediction for Small Proteins. Science. 2005;309(5742):1868–1871.
118. Leaver-Fay A, Tyka M, Lewis SM, Lange OF, Thompson J, Jacak R, et al.
ROSETTA3: an object-oriented software suite for the simulation and design of
macromolecules. Methods Enzymol. 2011;487:545–574.
119.
Xu D, Zhang Y. Ab initio protein structure assembly using continuous structure
fragments and optimized knowledge-based force field. Proteins: Struct Funct
Bioinf. 2012;80(7):1715–1735.
PLOS 48/83
120.
Zhang Y. Interplay of I-TASSER and QUARK for template-based and ab initio
protein structure prediction in CASP10. Proteins. 2014;82(Suppl 2):175–187.
121.
Grant BJ, Gorfe AA, McCammon JA. Large conformational changes in proteins:
signaling and other functions. Curr Opinion Struct Biol. 2010;20(2):142–147.
122. Prakash P, Gorfe AA. Lessons from computer simulations of Ras proteins in
solution and in membrane. Biochim Biophys Acta. 2013;1830(11):5211–5218.
123. No´e F, Schutte C, Vanden-Eijnden E, Reich L, Weikl TR. Constructing the
equilibrium ensemble of folding pathways from short off-equilibrium simulations.
Proc Natl Acad Sci USA. 2009;106(45):19011–19016.
124. Whitford PC, Sanbonmatsu KY, Onuchic JN. Biomolecular dynamics:
order-disorder transitions and energy landscapess. Reports on Progress in
Physics. 2012;75(7):076601.
125. Shehu A, Kavraki LE, Clementi C. Unfolding the Fold of Cyclic Cysteine-rich
Peptides. Protein Sci. 2008;17(3):482–493.
126. Shehu A, Kavraki LE, Clementi C. Multiscale Characterization of Protein
Conformational Ensembles. Proteins: Struct Funct Bioinf. 2009;76(4):837–851.
127.
Diaz JF, Wroblowski B, Schlitter J, Engelborghs Y. Calculation of pathways for
the conformational transition between the GTP- and GDP-bound states of the
Ha-ras-p21 protein: calculations with explicit solvent simulations and
comparison with calculations in vacuum. Proteins. 1997;28(3):434–451.
128. Malmstrom RD, Lee CT, Van Wart AT, Amaro RE. Application of
Molecular-Dynamics Based Markov State Models to Functional Proteins. J
Chem Theory Comput. 2014;10(7):2648–2657.
129. Maragliano L, Vanden-Eijnden E, Roux B. Free Energy and Kinetics of
Conformational Transitions from Voronoi Tessellated Milestoning with
Restraining Potentials. J Chem Theory Comput. 2009;5(10):2589–2594.
130. Franklin J, Koehl P, Doniach S, Delarue M. MinActionPath: maximum
likelihood trajectory for large-scale structural transitions in a coarse-grained
locally harmonic energy landscape. Nucleic Acids Res. 2007;35(Web Server
issue):W477–W482.
131. Yang Z, Mˆajek P, Bahar I. Allosteric Transitions of Supramolecular Systems
Explored by Network Models: Application to Chaperonin GroEL. PLoS Comput
Biol. 2009;5(4):e1000360.
132. Prinz JH, Keller B, No´e F. Probing molecular kinetics with Markov models:
metastable states, transition pathways and spectroscopic observables. Phys
Chem Chem Phys. 2011;13(38):16912–16927.
133. Beauchamp KA, Bowman GR, Lane TJ, Maibaum L, Haque IS, Pande VS.
MSMBuilder2: Modeling Conformational Dynamics at the Picosecond to
Millisecond Scale. J Chem Theory Comput. 2011;7(10):3412–3419.
134.
Ravindranathan KP, Gallicchio E, Levy RM. Conformational equilibria and free
energy profiles for the allosteric transition of the ribose-binding protein. J Mol
Biol. 2005;353(1):196–210.
PLOS 49/83
135. Pietrucci F, Marinelli F, Carloni P, Laio A. Substrate binding mechanism of
HIV-1 protease from explicit-solvent atomistic simulations. J Amer Chem Soc.
2009;131(33):11811–11818.
136. Buch I, Giorgino T, De Fabritiis G. Complete reconstruction of an
enzyme-inhibitor binding process by molecular dynamics simulations. Proc Natl
Acad Sci USA. 2011;108(25):10184–10189.
137.
Feher VA, Durrant JD, Van Wart AT, Amaro RE. Computational approaches to
mapping allosteric pathways. Curr Opinion Struct Biol. 2014;25:98–103.
138.
Held M, Metzner P, Prinz JH, No´e F. Mechanisms of protein-ligand association
and its modulation by protein mutations. Biophys J. 2011;100(3):701–710.
139.
Held M, No´e F. Calculating kinetics and pathways of protein-ligand association.
Eur J Cell Biol. 2012;91(4):357–364.
140. Freddolino PL, Park S, Roux B, Schulten K. Force field bias in protein folding
simulations. Biophys J. 2009;96(9):3772–3780.
141. Vitalini F, Mey AS, No´e F, Keller BG. Dynamic properties of force fields. J
Chem Phys. 2015;142:084101.
142. Sakae Y, Okamoto Y. Optimizations of protein force fields. In: Liwo A, editor.
Computational Methods to Study the Structure and Dynamics of Biomolecules
and Biomolecular Processes. Berlin, Heidelberg: Springer-Verlag; 2014. p.
195–247.
143.
Clementi C. Coarse-grained models of protein folding: Toy-models or predictive
tools? Curr Opinion Struct Biol. 2008;18:10–15.
144. Kleinjung J, Fraternali F. Design and application of implicit solvent models in
biomolecular simulations. Curr Opinion Struct Biol. 2014;25(100):126–134.
145.
Dryga A, Warshel A. Renormalizing SMD: The Renormalization Approach and
Its Use in Long Time Simulations and Accelerated PAU Calculations of
Macromolecules,. J Phys Chem B. 2010;114(39):12720–12728.
146. Chodera JD, No´e F. Markov state models of biomolecular conformational
dynamics. Curr Opinion Struct Biol. 2014;25:135–144.
147. Shaw DE, Maragakis P, Lindorff-Larsen K, Piana S, Dror RO, Eastwood MP,
et al. Atomic-Level Characterization of the Structural Dynamics of Proteins.
Science. 2010;330(6002):341–346.
148. Zagrovic B, Snow CD, Shirts MR, Pande VS. Simulation of folding of a small
alpha-helical protein in atomistic detail using worldwide-distributed computing.
J Mol Biol. 2002;323(5):927–937.
149.
Wang K, Chodera JD, Yang Y, Shirts MR. Identifying ligand binding sites and
poses using GPU-accelerated Hamiltonian replica exchange molecular dynamics.
J Computer-Aided Mol Des. 2013;27(12):989–1007.
150. Ando T, Skolnick J. Sliding of Proteins Non-specifically Bound to DNA:
Brownian Dynamics Studies with Coarse-Grained Protein and DNA Models.
PLoS Comput Biol. 2014;10(12):e1003990.
PLOS 50/83
151.
Marklund EG, Mahmutovic A, Berg OG, Hammar P, van der Spoel D, Fange D,
et al. Transcription-factor binding and sliding on DNA studied using micro- and
macroscopic models. Proc Natl Acad Sci USA. 2013;110(49):19796–19801.
152. Sz¨oll˝osi D, Horv´ath T, Han K, Dokholyan NV, Tompa P, Kalm´ar L, et al.
Discrete molecular dynamics can predict helical prestructured motifs in
disordered proteins. PLoS one. 2014;9(4):e95795.
153.
Shukla D, Hern´andez CX, Weber JK, Pande VS. Markov State Models Provide
Insights into Dynamic Modulation of Protein Function. Acc Chem Res.
2015;48(2):414–422.
154. Koshland D. Application of a theory of enzyme specificity to protein synthesis.
Proc Natl Acad Sci USA. 1958;44(2):98–104.
155. Bosshard HR. Molecular recognition by induced fit: how fit is the concept?
Physiology. 2001;16:171–173.
156.
Ma B, Kumar S, Tsai C, Nussinov R. Folding funnels and binding mechanisms.
Protein Eng. 1999;12(9):713–720.
157. Tsai C, Ma B, Nussinov R. Folding and binding cascades: shifts in energy
landscapes. Proc Natl Acad Sci USA. 1999;96(18):9970–9972.
158. Tsai C, Kumar S, Ma B, Nussinov R. Folding funnels, binding funnels, and
protein function. Protein Sci. 1999;8(6):1181–1190.
159. Monod J, Wyman J, Changeaux JP. On the nature of allosteric transitions: a
plausible model. J Mol Biol. 1965;12:88–118.
160. Lange OF, Lakomek NA, Far´es C, Schr¨oder GF, Walter KF, Becker S, et al.
Recognition Dynamics Up to Microseconds Revealed from an RDC-Derived
Ubiquitin Ensemble in Solution. Science. 2008;320(5882):1471–1475.
161. Csermely P, Palotai R, Nussinov R. Induced fit, conformational selection and
independent dynamic segments: an extended view of binding events. Trends
Biochem Sci. 2010;35(10):539–546.
162. Cui Q, Karplus M. Allostery and cooperativity revisited. Protein Sci.
2008;17(8):1295–1307.
163. Feixas F, Lindert S, Sinko W, McCammon JA. Exploring the role of receptor
flexibility in structure-based drug discovery. Biophys Chem. 2014;186:31–45.
164.
Ewing TJ, Makino S, Skillman AG, Kuntz ID. DOCK 4.0: search strategies for
automated molecular docking of flexible molecule databases. J Comput Aided
Mol Des. 2001;15(5):411–428.
165. Kramer B, Rarey M, Lengauer T. Evaluation of the FLEXX incremental
construction algorithm for protein-ligand docking. Proteins: Struct Funct Bioinf.
1999;37(2):228–241.
166. Wagener M, Vlieg J, Nabuurs SB. Flexible protein-ligand docking using the
Fleksy protocol. J Comput Chem. 2012;33(12):1215–1217.
167. Verdonk ML, Cole JC, Hartshorn MJ, Murray CW, Taylor RD. Improved
protein-ligand docking using GOLD. Proteins: Struct Funct Bioinf.
2003;52(4):609–623.
PLOS 51/83
168. Verdonk ML, Chessari G, Cole JC, Hartshorn MJ, Murray CW, Nissink JW,
et al. Modeling water molecules in protein-ligand docking using GOLD. J Med
Chem. 2005;48(20):6504–6515.
169. Goodsell DS, Morris GM, Olson AJ. Automated docking of flexible ligands:
applications of AutoDock. J Mol Recogn. 1996;9(1):1–5.
170. Morris GM, Huey R, Lindstrom W, Sanner MF, Belew RK, Goodsell DS, et al.
AutoDock4 and AutoDockTools4: Automated docking with selective receptor
flexibility. J Comput Chem. 2009;30(16):2785–2791.
171. Trott O, Olson AJ. AutoDock Vina: improving the speed and accuracy of
docking with a new scoring function, efficient optimization, and multithreading.
J Comput Chem. 2010;31(2):455–461.
172.
Vass M, Tarcsay A, Keser¨u GM. Multiple ligand docking by Glide: implications
for virtual second-site screening. J Comput Aided Mol Des. 2012;26(7):821–834.
173. Davis IW, Baker D. RosettaLigand docking with full ligand and receptor
flexibility. J Mol Biol. 2009;385(2):381–392.
174. Meiler J, Baker D. ROSETTALIGAND: Protein-small molecule docking with
full side-chain flexibility. Proteins: Struct Funct Bioinf. 2006;65(3):538–548.
175. Grosdidier A, Zoete V, Michielin O. SwissDock, a protein-small molecule
docking web service based on EADock DSS. Nucleic Acids Res. 2011;39(Suppl
2):W270–W277.
176. Spitzer R, Jain AN. Surflex-Dock: Docking benchmarks and real-world
application. J Comput Aided Mol Des. 2012;26(6):687–699.
177.
Chakraborty S. DOCLASP-Docking ligands to target proteins using spatial and
electrostatic congruence extracted from a known holoenzyme and applying
simple geometrical transformations. F1000Research. 2014;3.
178.
Ruiz-Carmona S, Alvarez-Garcia D, Foloppe N, Garmendia-Doval AB, Juhos S,
et al . rDock: A Fast, Versatile and Open Source Program for Docking Ligands
to Proteins and Nucleic Acids. PLoS Comput Biol. 2014;10(4):e1003571.
179. Li H, Leung KS, Ballester PJ, Wong MH. istar: A web platform for large-scale
protein-ligand docking. PLoS one. 2014;9(1):e85678.
180. Morris GM, Goodsell DS, Halliday RS, Huey R, Hart WE, Belew RK, et al.
Automated Docking Using a Lamarckian Genetic Algorithm and an Empirical
Binding Free Energy Function. J Comput Chem. 1998;19(14):1639–1662.
181. Huang D, Caflisch A. Library screening by fragment-based docking. J Mol
Recogn. 2010;23(2):183–193.
182. Miranker A, Karplus M. Functionality maps of binding sites: a multiple copy
simultaneous search method. Proteins. 1991;11(1):29–34.
183. Dong J, Zhao H, Zhou T, Spiliotopoulos D, Rajendran C, Li XD, et al.
Structural Analysis of the Binding of Type I, I1/2, and II Inhibitors to Eph
Tyrosine Kinases. ACS Med Chem Lett. 2015;6(1):79–83.
184. Jones S, Thornton JM. Principles of protein-protein interactions. Proc Natl
Acad Sci USA. 1996;93(1):13–20.
PLOS 52/83
185. Conte LL, Chothia C, Janin J. The atomic structure of protein-protein
recognition sites. J Mol Biol. 1999;285(5):2177–2198.
186. Norel R, Retrey D, Wolfson HJ, Nussinov R. Examination of shape
complementarity in docking of unbound proteins. Proteins. 1999;36(3):307–317.
187. Betts MJ, Sternberg MJ. An analysis of conformational changes on
protein-protein association: implications for predictive docking. Protein Eng.
1999;12(4):271–283.
188. Decanniere K, Transue TR, Desmyter A, Maes D, Muyldermans S, Wyns L.
Degenerate interfaces in antigen-antibody complexes. J Mol Biol.
2001;313(3):473–478.
189. Ferrari AM, Wei BQ, Costantino L, Shoichet BK. Soft Docking and Multiple
Receptor Conformations in Virtual Screening. J Med Chem.
2004;47(21):5076–5084.
190.
Sherman W, Beard HS, R F. Use of an induced fit receptor structure in virtual
screening. Chem Biol Drug Des. 2006;67(1):83–84.
191.
Nabuurs SB, Wagener M, De Vlieg J. A flexible approach to induced fit docking.
J Med Chem. 2007;50(26):6507–6518.
192. Ieong PU, Sorensen J, Vemu PL, Wong CW, Demir O, Williams NP, et al.
Progress towards automated Kepler scientific workflows for computer-aided drug
discovery and molecular simulations. Procedia Computer Science.
2014;29:1745–1755.
193.
Amaro RE, Baron R, McCammon JA. An improved relaxed complex scheme for
receptor flexibility in computer-aided drug design. J Comput Aided Mol Des.
2008;22(9):693–705.
194. B-Rao C, Subramanian J, Sharma SD. Managing protein flexibility in docking
and its applications. Drug Discov today. 2009;14(7-8):394–400.
195. Lexa KW, Carlson HA. Protein flexibility in docking and surface mapping. Q
Rev Biophys. 2012;45(3):301–343.
196.
Kokh DB, Wade RC, Wenzel W. Receptor flexibility in small-molecule docking
calculations. WIREs Comput Mol Sci. 2011;1(2):298–314.
197.
Leach AR. Ligand docking to proteins with discrete side-chain flexibility. J Mol
Biol. 1994;235(1):345–356.
198. Tian S, Sun H, Pan P, Li D, Zhen X, Li Y, et al. Assessing an ensemble
docking-based virtual screening strategy for kinase targets by considering
protein flexibility. J Chem Inf Model. 2014;54(10):2664–2679.
199. Sorensen J, Demir O, Swift RV, Feher VA, Amaro RE. Molecular docking to
flexible targets. Method Mol Biol. 2015;1215:445–469.
200.
Korb O, Olsson TS, Bowden SJ, Hall RJ, Verdonk ML, Liebeschuetz JW, et al.
Potential and limitations of ensemble docking. J Chem Inf Model.
2012;52(5):1262–1274.
201. Bohnuud T, Kozakov D, Vajda S. Evidence of conformational selection driving
the formation of ligand binding sites in protein-protein interfaces. PLoS Comput
Biol. 2014;10(10):e1003872.
PLOS 53/83
202.
Shan Y, Kim ET, Eastwood MP, Dror RO, Seeliger MA, Shaw DE. How does a
drug moelcule find its target binding site? J Am Chem Soc.
2011;133(24):9181–9183.
203.
Kaus JW, Arrar M, McCammon JA. Accelerated Adaptive Integration Method.
J Phys Chem B. 2014;118(19):5109–5118.
204. Wu X, Brooks BR. Toward canonical ensemble distribution from self-guided
Langevin dynamics simulation. J Chem Phys. 2011;134(13):134108.
205. Wu X, Hodoscek M, Brooks BR. Replica exchanging self-guided Langevin
dynamics for efficient and accurate conformational sampling. J Chem Phys.
2012;137(4):044106.
206. Kaus JW, Pierce LT, Walker RC, McCammon JA. Improving the Efficiency of
Free Energy Calculations in the Amber Molecular Dynamics Package. J Chem
Theory Comput. 2013;9(9):4131–4139.
207. Grant BJ, McCammon JA, Gorfe AA. Conformational Selection in G-Proteins:
Lessons from Ras and Rho. Biophys J. 2010;99(11):L87–L89.
208. Abankwa D, Hanzal-Bayer M, Ariotti N, Plowman SJ, Gorfe AA, Parton RG,
et al. A novel switch region regulates H-Ras membrane orientation and signal
output. EMBO J. 2008;27(5):727–735.
209. Gu RX, Liu LA, Wang YH, Xu Q, Wei DQ. Structural comparison of the
wild-type and drug-resistant mutants of the influenza A M2 proton channel by
molecular dynamics simulations. J Phys Chem B. 2013;117(20):6042–6051.
210.
Bozdaganyan ME, Orekhov PS, Bragazzi NL, Panatto D, Amicizia D, Pechkova
E, et al. Docking and Molecular Dynamics (MD) Simulations in Potential Drugs
Discovery: An Application to Influenza Virus M2 Protein. American J Biochem
Biotech. 2014;10(3):180–188.
211. Waldmann M, Jirmann R, Hoelscher K, Wienke M, Niemeyer FC, Rehders D,
et al. A Nanomolar Multivalent Ligand as Entry Inhibitor of the Hemagglutinin
of Avian Influenza. J Am Chem Soc. 2014;136(2):783–788.
212.
Greenway KT, LeGresley EB, Pinto BM. The influence of 150-cavity binders on
the dynamics of influenza A neuraminidases as revealed by molecular dynamics
simulations and combined clustering. PLoS ONE. 2013;8(3):e59873.
213. Goh BC, Rynkiewicz MJ, Cafarella TR, White MR, Hartshorn KL, Allen K,
et al. Molecular mechanisms of inhibition of influenza by surfactant protein d
revealed by large-scale molecular dynamics simulation. Biochemistry.
2013;52(47):8527–8538.
214. Woods CJ, Shaw KE, Mulholland AJ. Combined Quantum
Mechanics/Molecular Mechanics (QM/MM) Simulations for Protein-Ligand
Complexes: Free Energies of Binding of Water Molecules in Influenza
Neuraminidase. J Phys Chem B. 2014;119(3).
215.
Ermak DL, McCammon J. Brownian dynamics with hydrodynamic interactions.
J Chem Phys. 1978;69(4):1352–1360.
216. ElSawy KM, Twarock R, Lane DP, Verma CS, Caves LS. Characterization of
the ligand receptor encounter complex and its potential for in silico
kinetics-based drug development. J Chem Theory Comput. 2011;8(1):314–321.
PLOS 54/83
217. Mereghetti P, Wade RC. Atomic detail Brownian dynamics simulations of
concentrated protein solutions with a mean field treatment of hydrodynamic
interactions. J Phys Chem B. 2012;116(29):8523–8533.
218. ElSawy K, Verma CS, Joseph TL, Lane DP, Twarock R, Caves L. On the
interaction mechanisms of a p53 peptide and nutlin with the MDM2 and
MDMX proteins: a Brownian dynamics study. Cell Cycle. 2013;12(3):394–404.
219. Frazier Z, Alber F. A Computational Approach to Increase Time Scales in
Brownian Dynamics–Based Reaction-Diffusion Modeling. J Comput Biol.
2012;19(6):606–618.
220. Beck M, Topf M, Frazier Z, Tjong H, Xu M, Zhang S, et al. Exploring the
spatial and temporal organization of a cell’s proteome. J Struct Biol.
2011;173(3):483–496.
221.
Tsai C, Nussinov R. A Unified View of ”How Allostery Works”. PLoS Comput
Biol. 2014;10(2):e1003394.
222. Lockless SW, Ranganathan R. Evolutionarily conserved pathways of energetic
connectivity in protein families. Science. 1999;286(5438):295–299.
223. Daily MD, Upadhyaya TJ, Gray JJ. Contact rearrangements form coupled
networks from local motions in allosteric proteins. Proteins. 2008;71(1):455–466.
224. Kannan N, Vishveshwara S. Identification of side-chain clusters in protein
structures by a graph spectral method. J Mol Biol. 1999;292(2):441–464.
225. van den Bedem H, Bhabha G, Yang K, Wright PE, Fraser JS. Automated
identification of functional dynamic contact networks from X-ray
crystallography. Nat Methods. 2013;10(9):896–902.
226. Boehr DD, Schnell JR, McElheny D, Bae SH, Duggan BM, Benkovic SJ, et al.
A distal mutation perturbs dynamic amino acid networks in dihydrofolate
reductase. Biochemistry. 2013;52(27):4605–4619.
227. Ferreiro DU, Hegler JA, Komives EA, Wolynes PG. Localizing frustration in
native proteins and protein assemblies. Proc Natl Acad Sci USA.
2007;104(50):19819–19824.
228. Brooks B, Karplus M. Harmonic dynamics of proteins: normal modes and
fluctuations in bovine pancreatic trypsin inhibitor. Proc Natl Acad Sci USA.
1983;80:6571–6575.
229.
Go N, Noguti T, Nishikawa T. Dynamics of a small globular protein in terms of
low-frequency vibrational modes. Proc Natl Acad Sci USA.
1983;80(12):3696–3700.
230. Levitt M, Sander C, Stern PS. The normal-modes of a protein-native bovine
pancreatic trypsin-inhibitor. Intl J Quant Chem. 1983;Suppl 10:181–199.
231. Garcia AE. Large-amplitude nonlinear motions in proteins. Phys Rev Lett.
1992;68(17):2696–2699.
232.
Amadei A, Linssen AB, Berendsen HJ. Essential dynamics of proteins. Proteins.
1993;17(4):412–425.
233. Lange OF, Grubm¨uller H. Full correlation analysis of conformational protein
dynamics. Proteins. 2008;70(4):1294–1312.
PLOS 55/83
234. Girvan M, Newman MEJ. Community structure in social and biological
networks. Proc Natl Acad Sci USA. 2002;99(12):7821–7826.
235. McClendon CL, Friedland G, Mobley DL, Amirkhani H, Jacobson MP.
Quantifying correlations between allosteric sites in thermodynamic ensembles. J
Chem Theory Comput. 2009;5(9):2486–2502.
236. Sethi A, Eargle J, Black AA, Luthey-Schulten Z. Dynamical networks in
tRNA:protein complexes. Proc Natl Acad Sci USA. 2009;106(16):6620–6625.
237. Eargle J, Luthey-Schulten Z. NetworkView: 3D display and analysis of protein
RNA interaction networks. Bioinformatics. 2012;28(22):3000–3001.
238. Vanwart AT, Eargle J, Luthey-Schulten Z, Amaro RE. Exploring residue
component contributions to dynamical network models of allostery. J Chem
Theory Comput. 2012;8(8):2949–2961.
239. Kaya C, Armutlulu A, Ekesan S, Haliloglu T. MCPath: Monte Carlo path
generation approach to predict likely allosteric pathways and functional residues.
Nucleic Acids Res. 2013;41(Web Server Issue):W249–W255.
240.
Johnston JM, Wang H, Provasi D, Filizola M. Assessing the relative stability of
dimer interfaces in G-protein coupled receptors. PLoS Comput Biol.
2012;8(8):e100264.
241. Filizola M, Wang SX, Weinstein H. Dynamic models of G-protein coupled
receptor dimers: indications of asymmetry in the rhodopsin dimer from
molecular dynamics simulations in a POPC bilayer. J Comput Aided Mol Des.
2006;20(7-8):405–416.
242. Chen R, Li L, Weng Z. ZDock: an initial-stage protein-docking algorithm.
Proteins: Struct Funct Bioinf. 2003;52(1):80–87.
243. Dominguez C, Boelens R, Bonvin AMJJ. HADDOCK: A protein-protein
docking approach based on biochemical or biophysical information. J Am Chem
Soc. 2003;125:1731–1737.
244. Comeau SR, Gatchell DW, Vajda S, Camacho CJ. ClusPro: a fully automated
algorithm for protein-protein docking. Nucl Acids Res. 2004;32(S1):W96–W99.
245. Duhovny-Schneidman D, Inbar Y, Nussinov R, Wolfson HJ. PatchDock and
SymmDock: servers for rigid and symmetric docking. Nucl Acids Res.
2005;33(S2):W363–W367.
246. Duhovny-Schneidman D, Inbar Y, Nussinov R, Wolfson HJ. Geometry based
flexible and symmetric protein docking. Proteins: Struct Funct Bioinf.
2005;60(2):224–231.
247. Zacharias M. ATTRACT: protein-protein docking in CAPRI using a reduced
protein model. Proteins: Struct Funct Bioinf. 2005;60(2):252–256.
248. Tovchigrechko A, Vakser IA. GRAMM-X public web server for protein-protein
docking. Nucl Acids Res. 2006;34(Web Server issue):W310–4.
249. Cheng TM, Blundell TL, Fernandez-Recio J. pyDock: electrostatics and
desolvation for effective scoring of rigid-body protein-protein docking. Proteins.
2007;68(2):503–515.
PLOS 56/83
250. Terashi G, Takeda-Shitaka M, Kanou K, Iwadate M, Takaya D, Umeyama H.
The SKE-DOCK server and human teams based on a combined method of shape
complementarity and free energy estimation. Proteins: Struct Funct Bioinf.
2007;69(4):866–887.
251. Lyskov S, Gray JJ. The RosettaDock server for local protein-protein docking.
Nucl Acids Res. 2008;36(S2):W233–W238.
252. Huang SY, Zou X. MDockPP: A hierarchical approach for protein-protein
docking and its application to CAPRI rounds 15-19. Proteins: Struct Funct
Bioinf. 2010;78(15):3096–3103.
253. Mukherjee S, Zhang Y. Protein-Protein Complex Structure Predictions by
Multimeric Threading and Template Recombination. Structure.
2011;19(7):955–966.
254. Guerler A, Govindarajoo B, Zhang Y. Mapping Monomeric Threading to
Protein-Protein Structure Prediction. J Chem Inf and Model.
2013;53(3):717–725.
255. Cavalli A, Salvatella X, Dobson CM, Vendruscolo M. Protein structure
determination from NMR chemical shifts. Proc Natl Acad Sci USA.
2007;104(23):9615–9620.
256.
Lensink MF, Wodak SJ. Docking and scoring protein interactions: CAPRI 2009.
Proteins: Struct Funct Bioinf. 2009;78(15):3073–3084.
257. Lensink MF, Wodak SJ. Blind predictions of protein interfaces by docking
calculations in CAPRI. Proteins: Struct Funct Bioinf. 2010;78(15):3085–3095.
258.
Mashiach E, Nussinov R, Wolfson HJ. FiberDock: Flexible induced-fit backbone
refinement in molecular docking. Proteins: Struct Funct Bioinf.
2010;78(6):1503–1519.
259. Pedotti M, Simonelli L, Livoti E, Varani L. Computational Docking of
Antibody-Antigen Complexes, Opportunities and Pitfalls Illustrated by
Influenza Hemagglutinin. Int J Mol Sci. 2011;12:226–251.
260.
Gray JJ, Moughon S, Wang C, Schueler-Furman O, Kuhlman B, Rohl CA, et al.
Protein-protein docking with simultaneous optimization of rigid-body
displacement and side-chain conformations. J Mol Biol. 2003;331(1):281–299.
261. Chaudhury S, Berrondo M, Weitzner BD, Muthu P, Bergman H, Gray JJ.
Benchmarking and Analysis of Protein Docking Performance in Rosetta v3.2.
PLoS ONE. 2011;6(8):e22477.
262. Ellingson SR, Miao Y, Baudry J, Smith JC. Multi-Conformer Ensemble
Docking to Difficult Protein Targets. Phys Chem B. 2015;119(3):1026–1034.
263. Kozakov D, Beglov D, Bohnuud T, Mottarella SE, Xia B, Hall DR, et al. How
good is automated protein docking? Proteins: Struct Funct Bioinf.
2013;81(12):2159–2166.
264. Moitessier N, Englebienne P, Lee D, Lawandi J, Corbeil CR. Towards the
development of universal, fast and highly accurate docking/scoring methods: a
long way to go. British J Pharmacology. 2009;153(S1):S7–S27.
PLOS 57/83
265. Zhu H, Domingues FS, Sommer I, Lengauer T. NOXclass: prediction of
protein-protein interaction types. BMC Bioinf. 2006;7:27.
266.
Moreira IS, Fernandes PA, Ramos MJ. Hot spots-A review of the protein-protein
interface determinant amino-acid residues. Proteins. 2007;68(4):803–812.
267. Li N, Sun Z, Jiang F. Prediction of protein-protein binding site by using core
interface residue and support vector machine. BMC Bioinf. 2008;9:553.
268. Liu Q, J L. Propensity vectors of low-ASA residue pairs in the distinction of
protein interactions. Proteins. 2009;78(3):589–602.
269. Hashmi I, Shehu A. idDock+: Integrating Machine Learning in Probabilistic
Search for Protein-protein Docking. J Comp Biol. 2015;22(9):1–18.
270. Russel D, Lasker K, Webb B, J V, Tjioe E, Schneidman-Duhovny D, et al.
Putting the pieces together: integrative modeling platform software for structure
determination of macromolecular assemblies. PLoS Biology. 2012;10(1):e1001244.
271.
Montalvao RW, Cavalli A, Salvatella X, Blundell TL, Vendruscolo M. Structure
determination of protein-protein complexes using NMR chemical shifts: case of
an endonuclease colicin-immunity protein complex. J Am Chem Soc.
2008;130(4):15990–1596.
272. Das R, Andr´e I, Shen Y, Wu Y, Lemak A, Bansal S, et al. Simultaneous
prediction of protein folding and docking at high resolution. Proc Natl Acad Sci
USA. 2009;106(45):18978–18983.
273.
Cavalli A, Montalvao RW, Vendruscolo M. Using Chemical Shifts to Determine
Structural Changes in Proteins upon Complex Formation. Phys Chem B.
2011;115(30):9491–9494.
274. Alber F, Dokudovskaya S, Veenhoff LM, Zhang W, Kipper J, Devos D, et al.
Determining the architectures of macromolecular assemblies. Nature.
2007;450(7170):683–694.
275.
Fernandez-Martinez J, Phillips J, Sekedat MD, Diaz-Avalos R, Velazquez-Muriel
J, Franke JD, et al. Structure-function mapping of a heptameric module in the
nuclear pore complex. J Cell Biol. 2012;196(4):419–434.
276.
Wang L, Yang MQ, Yang JY. Prediction of DNA-binding residues from protein
sequence information using random forests. BMC Genomics. 2009;10(Suppl1):S1.
277.
Ofran Y, Mysore V, Rost B. Prediction of DNA-binding residues from sequence.
Bioinformatics. 2007;23(13):347–353.
278. Qin S, Zhou H. Structural Models of Protein-DNA Complexes Based on
Interface Prediction and Docking. Curr Protein Pept Sci. 2011;12(6):531–539.
279. Roberts VA, Pique ME, Ten Eyck LF, Li S. Predicting protein–DNA
interactions by full search computational docking. Proteins.
2013;8(12):2106–2118.
280. van Dijk M, van Dijk AD, Hsu V, Boelens R, Bonvin AM. Information-driven
protein-DNA docking using HADDOCK: it is a matter of flexibility. Nucleic
Acids Res. 2013;34(11):3317–3325.
PLOS 58/83
281. Persikov AV, Wetzel JL, Rowland EF, Oakes BL, Xu DJ, Singh M, et al. A
systematic survey of the Cys2His2 zinc finger DNA-binding landscape. Nucleic
Acids Res. 2015;43(3):1965–1984.
282. Ghersi D, M S. Interaction-based discovery of functionally important genes in
cancers. Nucleic Acids Res. 2014;42(3):e18.
283. Ferr´e S, Navarro G, Casad´o V, Cort´es A, Mallol J, Canela EI, et al. G
protein-coupled receptor heteromers as new targets for drug development. Prog
Mol Biol Transl Sci. 2011;91:41–54.
284.
Pietsch EC, Perchiniak E, Canutescu AA, Wang G, Dunbrack RL, Murphy ME.
Oligomerization of BAK by p53 utilizes conserved residues of the p53 DNA
binding domain. J Biol Chem. 2008;283(30):21294–21304.
285. Inbar Y, Benyamini H, Nussinov R, Wolfson HJ. Combinatorial docking
approach for structure prediction of large proteins and multi-molecular
assemblies. J Phys Biol. 2005;2:S156–S165.
286. Inbar Y, Benyamini H, Nussinov R, Wolfson HJ. Prediction of multimolecular
assemblies by multiple docking. J Mol Biol. 2005;349(2):435–447.
287. Potluri S, Yan AK, Chou JJ, Donald BR, Bailey-Kellogg C. Structure
determination of symmetric homo-oligomers by a complete search of symmetry
configuration space, using NMR restraints and van der Waals packing. Proteins:
Struct Funct Bioinf. 2006;65(1):203–219.
288. Sgourakis NG, Lange OF, DiMaio F, Andre I, Fitzkee NC, Rossi P, et al.
Determination of the Structures of Symmetric Protein Oligomers from NMR
Chemical Shifts and Residual Dipolar Couplings. J Am Chem Soc.
2011;133(16):6288–6298.
289. Martin JW, Yan AK, Bailey-Kellogg C, Zhou P, Donald BR. A geometric
arrangement algorithm for structure determination of symmetric protein
homo-oligomers from NOEs and RDCs. J Comp Biol. 2011;18(11):1507–1523.
290. DiMaio F, Leaver-Fay A, Bradley P, Baker D, Andre I. Modeling Symmetric
Macromolecular Structures in Rosetta3. PLoS ONE. 2011;6(6):e20450.
291. Pierce B, Tong W, Weng Z. M-ZDOCK: a grid-based approach for Cn
symmetric multimer docking. Bioinformatics. 2004;21(8):1472–1478.
292. Esquivel-Rodriguez J, Yang YD, Kihara D. Multi-LZerD: Multiple protein
docking for asymmetric complexes. Proteins: Struct Funct Bioinf.
2012;80(7):1818–1833.
293.
Robustello P, Kai K, Cavalli A, Vendruscolo M. Using NMR Chemical Shifts as
Structural Restraints in Molecular Dynamics Simulations of Proteins. Structure.
2010;18(8):923–933.
294. Camilloni C, Cavalli A, Vendruscolo M. Assessment of the Use of NMR
Chemical Shifts as Replica-Averaged Structural Restraints in Molecular
Dynamics Simulations to Characterize the Dynamics of Proteins. Phys Chem B.
2012;117(6):1838–1843.
295. Kannan A, Camilloni C, Sahakyan AB, Cavalli A, Vendruscolo M. A
Conformational Ensemble Derived Using NMR Methyl Chemical Shifts Reveals
a Mechanical Clamping Transition That Gates the Binding of the HU Protein to
DNA. J Am Chem Soc. 2014;136(6):2204–2207.
PLOS 59/83
296. Pietrucci F, Mollica L, Blackledge M. Mapping the Native Conformational
Ensemble of Proteins from a Combination of Simulations and Experiments: New
Insight into the src-SH3 Domain. J Phys Chem Lett. 2013;4(11):1943–1948.
297. Wall ME, Van Benschoten AH, Sauter NK, Adams PD, Fraser JS, Terwilliger
TC. Conformational dynamics of a crystalline protein from microsecond-scale
molecular dynamics simulations and diffuse X-ray scattering. Proc Natl Acad
Sci USA. 2014;111(50):17887–17892.
298. onig G, Brooks BR. Correcting for the free energy costs of bond or angle
constraints in molecular dynamics simulations. Biochim Biophys Acta.
2014;1850(5):932–942.
299. Mustoe AM, Brooks CL, Al-Hashimi HM. Topological constraints are major
determinants of tRNA tertiary structure and dynamics and provide basis for
tertiary folding cooperativity. Nucleic Acids Res. 2014;42(18):11792–11804.
300. Wu X, Subramaniam S, Case DA, Wu KW, Brooks BR. Targeted
conformational search with map-restrained self-guided Langevin dynamics:
Application to flexible fitting into electron microscopic density maps. J Struct
Biol. 2013;183(3):429–440.
301. Boomsma W, Ferkinghoff-Borg J, Lindorff-Larsen K. Combining Experiments
and Simulations Using the Maximum Entropy Principle. PLoS Comput Biol.
2014;10(2):e1003406.
302. Granata D, Camilloni C, Vendruscolo M, Laio A. Characterization of the
free-energy landscapes of proteins by NMR-guided metadynamics. Proc Natl
Acad Sci USA. 2013;110(17):6817–6822.
303.
Humphrey W, Dalke A, Schulten K. VMD - Visual Molecular Dynamics. J Mol
Graph Model. 1996;14(1):33–38. http://www.ks.uiuc.edu/Research/vmd/.
304. Cavalli A, Camilloni C, Vendruscolo M. Molecular dynamics simulations with
replica-averaged structural restraints generate structural ensembles according to
the maximum entropy principle. J Chem Phys. 2013;138(9):094112.
305. Bonvin AM, Boelens R, Kaptein R. Time- and ensemble-averaged direct NOE
restraints. J Biomol NMR. 1994;4(1):143–149.
306. Kessler H, Griesinger C, Lautz J, Mueller A, van Gunsteren WF, Berendsen
HJC. Conformational dynamics detected by nuclear magnetic resonance NOE
values and J coupling constants. J Am Chem Soc. 1998;110(11):3393–3396.
307. Loquet A, Sgourakis NG, Gupta R, Giller K, Riedel D, Goosmann C, et al.
Atomic model of the type III secretion system needle. Nature.
2012;486(7402):276–279.
308. Pieper U, Schlessinger A, Kloppmann E, Chang GA, Chou JJ, Dumont ME,
et al. Coordinating the impact of structural genomics on the human α-helical
transmembrane proteome. Nature Struct & Mol Biol. 2013;20(2):135–138.
309.
Torda AE, Scheek RM, van Gunsteren WF. Time-dependent distance restraints
in molecular dynamics simulations. Chem Phys Lett. 1989;157(4):289–294.
310. Vendruscolo M, Paci E, Dobson CM, Karplus M. Three key residues form a
critical contact network in a protein folding transition state. Nature.
2001;409(6820):641–645.
PLOS 60/83
311. Gong H, Y S, Rose GD. Building native protein conformation from NMR
backbone chemical shifts using Monte Carlo fragment assembly. Protein Sci.
2007;16(8):1515–1521.
312. Richter B, Gsponer J, V´arnai P, Salvatella X, Vendruscolo M. The MUMO
(minimal under-restraining minimal over-restraining) method for the
determination of native state ensembles of proteins. J Biomol NMR.
2007;37(2):117–135.
313. Montalvao RW, De Simone A, Vendruscolo M. Determination of structural
fluctuations of proteins from structure-based calculations of residual dipolar
couplings. J Biomol NMR. 2012;53(4):281–292.
314. Fu B, Kukic P, Camilloni C, Vendruscolo M. MD Simulations of Intrinsically
Disordered Proteins with Replica-Averaged Chemical Shift Restraints. Biophys
J. 2014;106(2):481a.
315.
Shen Y, Bax A. Homology modeling of larger proteins guided by chemical shifts.
Nature Methods. 2015;12(8):747–750.
316. Nasedkin A, Marcellini M, Religa TL, Freund SM, Menzel A, Fersht AR, et al.
Deconvoluting Protein (Un)folding Structural Ensembles Using X-Ray
Scattering, Nuclear Magnetic Resonance Spectroscopy and Molecular Dynamics
Simulation. PLoS One. 2015;10(5):e0125662.
317.
de Groot BL, van Aalten DM, Scheek RM, Amadei A, Vriend G, Berendsen HJ.
Prediction of protein conformational freedom from distance constraints.
Proteins. 1997;29(2):240–251.
318. Wells SA. Geometric simulation of flexible motion in proteins. Methods Mol
Biol. 2014;1084:173–192.
319. Wells S, Menor S, Hespenheide B, Thorpe MF. Constrained geometric
simulation of diffusive motion in proteins. J Phys Biol. 2005;2(4):127–136.
320. Shehu A, Clementi C, Kavraki LE. Modeling Protein Conformational
Ensembles: From Missing Loops to Equilibrium Fluctuations. Proteins: Struct
Funct Bioinf. 2006;65(1):164–179.
321. Shehu A, Clementi C, Kavraki LE. Sampling Conformation Space to Model
Equilibrium Fluctuations in Proteins. Algorithmica. 2007;48(4):303–327.
322. Shehu A, Kavraki LE, Clementi C. On the Characterization of Protein Native
State Ensembles. Biophys J. 2007;92(5):1503–1511.
323. Chubunsky M, Hespenheide B, Jacobs DJ, Kuhn LA, Lei M, Menor S, et al.
Constraint Theory Applied to Proteins. Nanotech Res J. 2008;2(1):61–72.
324. Clausen R, Shehu A. A Data-driven Evolutionary Algorithm for Mapping
Multi-basin Protein Energy Landscapes. J Comp Biol. 2015;22(9):844–860.
325. Huang YPJ, Montellione GT. Structural biology: Proteins flex to function.
Nature. 2005;438(7064):36–37.
326. Takala H, Bj¨orling A, Berntsson O, Lehtivuori H, Niebling S, Hoernke M, et al.
Signal amplification and transduction in phytochrome photosensors. Nature.
2014;509(7499):245–248.
PLOS 61/83
327. Majek P, Weinstein H, Elber R. 13. In: Voth GA, editor. Pathways of
conformational transitions in proteins. Taylor and Francis group; 2008. p.
185–203.
328. Nury H, Poitevin F, Van Renterghem C, Changeux JP, Corringer PJ, Delarue
M, et al. One-microsecond molecular dynamics simulation of channel gating in a
nicotinic receptor homologue. Proc Natl Acad Sci USA. 2010;107(14):6275–6280.
329.
Calimet N, Simoes M, Changeux JP, Karplus M, Taly A, Cecchini M. A gating
mechanism of pentameric ligand-gated ion channels. Proc Natl Acad Sci USA.
2013;110(42):E3987–E3996.
330.
Ma J, Karplus M. Molecular switch in signal transduction: reaction paths of the
conformational changes in ras p21. Proc Natl Acad Sci USA.
1997;94(22):11905–11910.
331. Ovchinnikov V, Karplus M. Analysis and Elimination of a Bias in Targeted
Molecular Dynamics Simulations of Conformational Transitions: Application to
Calmodulin. J Phys Chem B. 2012;116(29):8584–8603.
332. Hamelberg D, Mongan J, McCammon JA. Accelerated molecular dynamics: a
promising and efficient simulation method for biomolecules. J Chem Phys.
2004;120(24):11919–11929.
333.
Yao XQ, Grant BJ. Domain opening and dynamic coupling in the alpha subunit
of heterotrimeric G proteins. Biophys J. 2013;105(2):L09–L10.
334. Beckstein O, Denning EJ, Perilla JR, Woolf TB. Zipping and unzipping of
adenylate kinase: atomistic insights into the ensemble of open-closed transitions.
J Mol Biol. 2009;394(1):160–176.
335. Zuckerman DM, Woolf TB. Efficient dynamic importance sampling of rare
events in one dimension. Phys Rev E. 2000;63(1):016702.
336. Perilla JR, Beckstein O, Denning EJ, Woolf TB. Computing ensembles of
transitions from stable states: dynamic importance sampling. J Comput Chem.
2011;32(2):196–209.
337.
Krebs WG, Gerstein M. The morph server: a standardized system for analyzing
and visualizing macromolecular motions in a database framework. Nucleic Acids
Res. 2000;28(8):1665–1675.
338.
Ye YZ, Godzik A. FATCAT: a web server for flexible structure comparison and
structure similarity searching. Nucleic Acids Res. 2004;32(Web Server
Issue):W582–W585.
339. Lindahl E, Azuara C, Koehl P, Delarue M. NOMAD-Ref: visualization,
deformation and refinement of macromolecular structures based on all-atom
normal mode analysis. Nucleic Acids Res. 2006;34(Web Server Issue):W52–W56.
340.
Weiss DR, Levitt M. Can morphing methods predict intermediate structures? J
Mol Biol. 2009;385(2):665–674.
341.
Kim KM, Jernigan RL, Chirikjian GS. Efficient generation of feasible pathways
for protein conformationa transitions. Biophys J. 2002;83(3):1620–1630.
342. Chu JW, Trout BL, Brooks CLI. A super-linear minimization scheme for the
nudged elastic band method. J Chem Phys. 2003;119(24):12708–12717.
PLOS 62/83
343. Maragliano L, Fiser A, Vanden-Eijnden EJ, Ciccotti G. String method in
collective variables: minimum free energy paths and isocommittor surfaces. J
Chem Phys. 2006;125:024106.
344. Weinan E, Ren W, Vanden-Eijnden E. Simplified and improved string method
for computing the minimum energy paths in barrier-crossing events. J Chem
Phys. 2007;126:164103.
345. Maragliano L, Vanden-Eijnden E. On-the-fly string method for minimum free
energy paths calculation. Chem Phys Lett. 2007;446:182–190.
346. Weinan E, Ren W, Vanden-Eijnden E. Finite temperature string methods for
the study of rare events. J Phys Chem. 2005;109:6688–6693.
347. Ren W, Vanden-Eijnden E, Maragakis P, Weinan E. Transition pathways in
complex systems: application of the finite-temperature string method to the
alanine dipeptide. J Chem Phys. 2005;123:134109.
348. Zhang BW, Jasnow D, Zuckermann DM. Efficient and verified simulation of a
path ensemble for conformational change in a united-residue model of
calmodulin. Proc Natl Acad Sci USA. 2007;104(46):18043–18048.
349. Adelman JL, Dale AL, Zwier MC, Bhatt D, Chong LT, Zuckerman DM, et al.
Simulations of the alternating access mechanism of the sodium symporter mhp1.
Biophys J. 2011;101(10):2399–2407.
350. Huber GA, Kim S. Weighted-ensemble Brownian dynamics simulations for
protein association reactions. Biophys J. 1996;70(1):97–110.
351. Jaillet L, Corcho FJ, Perez JJ, Cortes J. Randomized tree construction
algorithm to explore energy landscapes. J Comput Chem.
2011;32(16):3464–3474.
352. Haspel N, Moll M, Baker ML, Chiu W, E KL. Tracing conformational changes
in proteins. BMC Struct Biol. 2010;10(Suppl1):S1.
353. Molloy K, Shehu A. Elucidating the Ensemble of Functionally-relevant
Transitions in Protein Systems with a Robotics-inspired Method. BMC Struct
Biol. 2013;13(Suppl 1):S8.
354. Molloy K, Clausen R, Shehu A. A Stochastic Roadmap Method to Model
Protein Structural Transitions. Robotica. 2014;In press.
355. Dill KA, MacCallum JL. The Protein-Folding Problem, 50 Years On. Science.
2012;338(6110):1042–1046.
356.
Onuchic JN, Wolynes PG. Theory of protein folding. Curr Opinion Struct Biol.
2004;14:70–75.
357. Best RB. Atomistic molecular simulations of protein folding. Curr Opinion
Struct Biol. 2012;22(1):52–61.
358.
Shaw DE, et al . Millisecond-scale molecular dynamics simulations on anton. In:
Conf on High Performance Computing, Networking, Storage and Analysis
(SC09). New York, NY: ACM; 2009. p. 39.
359. Hess B, Kutzner C, Van der Spoel D, Lindahl E. GROMACS4: algorithms for
highly efficient, load-balanced, and scalable molecular simulation. J Chem
Theory Comput. 2008;4(3):435–447.
PLOS 63/83
360.
Case DA, Darden TA, Cheatham TEI, Simmerling CL, Wang J, Duke RE, et al..
AMBER 14. University of California, San Francisco; 2014.
361.
Shirts M, Pande VJ. COMPUTING: Screen Savers of the World Unite! Science.
2000;290(5498):1903–1904.
362. Snow CD, Zagrovic B, Pande VS. The Trp-cage: folding kinetics and unfolded
state topology via molecular dynamics simulations. J Am Chem Soc.
2002;124(49):14548–14549.
363.
Singhal N, Snow CD, Pande VS. Using path sampling to build better Markovian
state models: Predicting the folding rate and mechanism of a tryptophan zipper
beta hairpin. J Chem Phys. 2004;121(1):415–425.
364. Jayachandran G, Vishal V, Pande VS. Using massively parallel simulation and
Markovian models to study protein folding: Examining the dynamics of the
villin headpiece. J Chem Phys. 2006;124(16):164902–164914.
365.
Seibert MM, Patriksson AP, Hess B, van der Spoel D. Reproducible Polypeptide
Folding and Structure Prediction using Molecular Dynamics Simulations. J Mol
Biol. 2005;354(1):173–183.
366. Sosnick TR, Hinshaw JR. How proteins fold. Science. 2011;334(6055):464–465.
367. Stigler J, Ziegler F, Gieseke A, Gebhardt JC, Rief M. The complex folding
network of single calmodulin molecules. Science. 2011;28(6055):512–516.
368. Best RB, Hummer G, Eaton WA. Native contacts determine protein folding
mechanisms in atomistic simulations. Proc Natl Acad Sci USA.
2013;110(44):17874–17879.
369.
Maity H, Maity M, Krishna MG, Mayne L, Englander SW. Protein folding: the
stepwise assembly of foldon units. Proc Natl Acad Sci USA.
2005;102(13):4741–4746.
370. Bai Y, Sosnick TR, Mayne L, Englander SW. Protein folding intermediates:
native state hydrogen exchange. Science. 1995;269(5221):192–197.
371. Walters BT, Mayne L, Hinshaw JR, Sosnick TR, Englander SW. Folding of a
large protein at high structural resolution. Proc Natl Acad Sci USA.
2013;110(47):18898–18903.
372. Beauchamp KA, Ensign DL, Das R, Pande VS. Quantitative comparison of
villin headpiece subdomain simulations and triplet–triplet energy transfer
experiments. Proc Natl Acad Sci USA. 2011;108(31):12734–12739.
373. Pande VS, Beachamp K, Bowman GR. Everything you wanted to know about
Markov state models but were afraid to ask. Nature Methods. 2010;52(1):99–105.
374.
Prinz JH, Wu H, Sarich M, Keller B, Senne M, Held M, et al. Markov models of
molecular kinetics: generation and validation. J Chem Phys.
2011;134(17):174105.
375.
Da LT, Sheong FK, Silva DA, Huang X. Application of Markov State Models to
simulate long timescale dynamics of biological macromolecules. Adv Exp Med
Biol. 2014;805:29–66.
PLOS 64/83
376. Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical
assessment of methods of protein structure prediction (CASP) – round x.
Proteins: Struct Funct Bioinf. 2014;82(S2):1–6.
377. S oding J, Biegert A, Lupas AN. The HHpred interactive server for protein
homology detection and structure prediction. Nucleic Acids Res. 2005;33(Web
Server Issue):W244.
378. Ko J, Park H, Seok C. GalaxyTBM: template-based modeling by building a
reliable core and refining unreliable local regions. BMC Bioinf.
2012;13(1):198–206.
379. Han KF, Baker D. Global properties of the mapping between local amino acid
sequence and local structure in proteins. Proc Natl Acad Sci USA.
1996;93(12):5814–5818.
380.
Zhang Y. Progress and Challenges in protein structure prediction. Curr Opinion
Struct Biol. 2008;18(3):342–348.
381.
Xu J, Zhang Y. How significant is a protein structure similarity with TM-score
= 0.5? Bioinformatics. 2010;26(7):889–895.
382. Zhang Y, Skolnick J. Scoring function for automated assessment of protein
structure template quality. Proteins: Structure, Function, and Bioinformatics.
2004;57(4):702–710.
383. Roy A, Kucukural A, Zhang Y. I-TASSER: a unified platform for automated
protein structure and function prediction. Nat Protoc. 2010;5(4):725–738.
384. DeBartolo J, Colubri A, Jha AK, Fitzgerald JE, Freed KF, Sosnick TR.
Mimicking the folding pathway to improve homology-free protein structure
prediction. Proc Natl Acad Sci USA. 2009;106(10):3734–3739.
385. Simoncini D, Berenger F, Shrestha R, Zhang KYJ. A Probabilistic
Fragment-Based Protein Structure Prediction Algorithm. PLoS ONE.
2012;7(7):e38799.
386. Brunette TJ, Brock O. Guiding conformation space search with an all-atom
energy potential. Proteins: Struct Funct Bioinf. 2009;73(4):958–972.
387. Shehu A, Olson B. Guiding the Search for Native-like Protein Conformations
with an Ab-initio Tree-based Exploration. Int J Robot Res.
2010;29(8):1106–1127.
388. Olson B, Shehu A. Evolutionary-inspired probabilistic search for enhancing
sampling of local minima in the protein energy surface. Proteome Sci.
2012;10(10):S5.
389. Olson B, Hashmi I, Molloy K, Shehu A. Basin Hopping as a General and
Versatile Optimization Framework for the Characterization of Biological
Macromolecules. Advances in AI J. 2012;2012(674832).
390.
Olson B, Shehu A. Rapid Sampling of Local Minima in Protein Energy Surface
and Effective Reduction through a Multi-objective Filter. Proteome Sci.
2013;11(Suppl1):S12.
391. Olson B, Jong KAD, Shehu A. Off-Lattice Protein Structure Prediction with
Homologous Crossover. In: Conf on Genetic and Evolutionary Computation
(GECCO). New York, NY: ACM; 2013. p. 287–294.
PLOS 65/83
392. Olson B, Shehu A. Multi-Ob jective Stochastic Search for Sampling Local
Minima in the Protein Energy Surface. In: ACM Conf on Bioinf and Comp Biol
(BCB). Washington, D. C.; 2013. p. 430–439.
393. Zhou J, W Y, Hu G, Shen B. Amino acid network for the discrimination of
native protein structures from decoys. Curr Protein Pept Sci.
2014;15(6):522–528.
394. Uversky VN. Natively unfolded proteins: a point where biology waits for
physics. Protein Sci. 2002;11:739–756.
395. Uversky VN. A decade and a half of protein intrinsic disorder: biology still
waits for physics. Protein Sci. 2013;22:693–724.
396. Monastyrskyy B, Kryshtafovych A, Moult J, Tramontano A, Fidelis K.
Assessment of protein disorder region predictions in CASP10. Proteins: Struct
Funct Bioinf. 2014;82(S2):127–137.
397. Varadi M, Kosol S, Lebrun P, Valentini E, Blackledge M, Dunker AK, et al.
pE-DB: a database of structural ensembles of intrinsically disordered and of
unfolded proteins. Nucleic Acids Res. 2014;42(Database issue):D326–335.
398. Sickmeier M, Hamilton JA, LeGall T, Vacic V, Cortese MS, Tantos A, et al.
DisProt: the database of disordered proteins. Nucleic acids research.
2007;35(suppl 1):D786–D793.
399. Fukuchi S, Sakamoto S, Nobe Y, Murakami SD, Amemiya T, Hosoda K, et al.
IDEAL: intrinsically disordered proteins with extensive annotations and
literature. Nucleic acids research. 2012;40(D1):D507–D511.
400. osner H, Papaleo E, Haxholm GW, Best RB, Kragelund BB, Lindorff-Larsen
K. CECAM workshop on intrinsically disordered proteins: Connecting
computation, physics, and biology ETH Z¨urich September 2nd to 5th, 2013.
Intrinsically Disordered Proteins. 2014;p. 1–5.
401.
Dunker AK, Babu MM, Barbar E, Blackledge M, Bondos SE, Doszt´anyi Z, et al.
What’s in a name? Why these proteins are intrinsically disordered. Intrinsically
Disordered Proteins. 2013;1(1):e24157.
402.
van der Lee R, Buljan M, Lang B, Weatheritt RJ, Daughdrill GW, Dunker AK,
et al. Classification of intrinsically disordered regions and proteins. Chem Rev.
2014;114(13):6589–6631.
403. Nussinov R, Wolynes PG. A second molecular biology revolution? The energy
landscapes of biomolecular function. Phys Chem Chem Phys.
2014;16(14):6321–6322.
404. Csermely P, Sandhu KS, Hazai E, Hoksza Z, Kiss HJM, Miozzo F, et al.
Disordered proteins and network disorder in network descriptions of protein
structure, dynamics and function. Hypotheses and a comprehensive review.
Current Protein Peptide Sci. 2012;13(1):19–33.
405. Uversky VN. Unusual biophysics of intrinsically disordered proteins. Biochim
Biophys Acta. 2013;1834(5):932–951.
406. Luo Y, Ma B, Nussinov R, Wei G. Structural Insight into Tau Protein’s
Paradox of Intrinsically Disordered Behavior, Self-Acetylation Activity, and
Aggregation. J Phys Chem Lett. 2014;5(17):3026–3031.
PLOS 66/83
407. Campen A, Williams RM, Brown CJ, Meng J, Uversky VN, Dunker AK.
TOP-IDP-scale: a new amino acid scale measuring propensity for intrinsic
disorder. Protein Pept Lett. 2008;15:956–963.
408. Jensen MR, Zweckstetter M, Huang J, Blackledge M. Exploring Free-Energy
Landscapes of Intrinsically Disordered Proteins at Atomic Resolution Using
NMR Spectroscopy. Chem Rev. 2014;114(13):6632–6660.
409. Deng X, Eickholt J, Cheng J. A comprehensive overview of computational
protein disorder prediction methods. Mol Biosyst. 2012;8:114–121.
410. Doszt´anyi Z, M´esz´aros B, Simon I. Bioinformatical approaches to characterize
intrinsically disordered/unstructured proteins. Briefings in Bioinformatics.
2009;p. bbp061.
411. Zhou H, Pang X, Lu C. Rate constants and mechanisms of intrinsically
disordered proteins binding to structured targets. Phys Chem Chem Phys.
2012;14(30):10466–10476.
412. Zhu X, Lopes REM, Shim J, MacKerell AD. Intrinsic energy landscapes of
amino acid side-chains. J Chem Inf Model. 2012;52(6):1559–1572.
413. Palazzesi F, Prakash MK, Bonomi M, Barducci A. Accuracy of Current
All-Atom Force-Fields in Modeling Protein Disordered States. J Chem Theory
Comput. 2015;11(1):2–7.
414. Wang RY, Han Y, Krassovsky K, Sheffler W, Tyka M, Baker D. Modeling
disordered regions in proteins using Rosetta. PLoS ONE. 2011;6(7):e22060.
415. Jensen MR, Blackledge M. Testing the validity of ensemble descriptions of
intrinsically disordered proteins. Proc Natl Acad Sci USA.
2014;111(16):E1557–1558.
416. Lindorff-Larsen K, Trbovic N, Maragakis P, Piana S, Shaw DE. Structure and
Dynamics of an Unfolded Protein Examined by Molecular Dynamics Simulation.
J Am Chem Soc. 2012;134(8):3787–3791.
417. Parigi G, Rezaei-Ghaleh N, Giachetti A, Becker S, Fernandez C, Blackledge M,
et al. Long-Range Correlated Dynamics in Intrinsically Disordered Proteins. J
Am Chem Soc. 2014;136(46):16201–16209.
418. Zhang W, Chen J. Replica exchange with guided annealing for accelerated
sampling of disordered protein conformations. J Comput Chem.
2014;35(23):1682–1689.
419. Fleishman SJ, Baker D. Role of the biomolecular energy gap in protein design,
structure, and evolution. Cell. 2012;149(2):262–273.
420. Donald BR. Algorithms in structural molecular biology. Cambridge, MA: MIT
Press; 2011.
421.
Kuhlman B, Dantas G, Ireton GC, Varani G, Stoddard BL, Baker D. Design of
a novel globular proteing fold with atomic-level accuracy. Science.
2003;302(5649):1364–1368.
422. Ashworth J, Havranek JJ, Duarte CM, Sussman D, Monnat RJ, Stoddard BL,
et al. omputational redesign of endonuclease DNA binding and cleavage
specificity. Nature. 2006;441(7093):656–659.
PLOS 67/83
423.
Grigoryan G, Reinke AW, Keating AE. Design of protein-interaction specificity
gives selective bZIP-binding peptides. Nature. 2009;458(7240):859–864.
424.
Havranek JJ, Duarte CM, Baker D. A simple physical model for the prediction
and design of protein-DNA interactions. J Mol Biol. 2004;344(1):59–70.
425. Havranek JJ, Harbury PB. Automated design of specificity in molecular
recognition. Nat Struct Biol. 2003;10(1):45–52.
426. Fleishman SJ, Khare SD, Koga N, Baker D. Restricted sidechain plasticity in
the structures of native proteins and complexes. Protein Sci. 2011;20(4):753–757.
427.
Fleishman SJ, et al . Community-wide assessment of protein-interface modeling
suggests improvements to design methodology. J Mol Biol. 2011;414(2):289–302.
428. Jha RK, Leaver-Fay A, Yin S, Wu Y, Butterfoss GL, Szyperski T, et al.
Computational design of a PAK1 binding protein. J Mol Biol.
2010;400(2):257–270.
429. Karanicolas J, Corn JE, Chen I, Joachimiak LA, Dym O, Peck SH, et al. A de
novo protein binding pair by computational design and directed evolution.
Molecular Cell. 2011;42(2):250–260.
430.
Richter F, Leaver-Fay A, Khare SD, Bjelic S, Baker D. De novo enzyme design
using Rosetta3. PLoS ONE. 2011;6(5):e19230.
431. Pabo C. Molecular technology. Designing proteins and peptides. Nature.
1983;301(5897):200–200.
432. Janin J. Conformation of amino acid sidechains in proteins. J Mol Biol.
1978;125(3):357–386.
433. Kuhlman B, Baker D. Native protein sequences are close to optimal for their
structures. Proc Natl Acad Sci USA. 2000;97(19):10383–10388.
434. Dunbrack R. Rotamer libraries in the 21st century. Curr Opinion Struct Biol.
2002;12(4):431–440.
435. Dunbrack R, Cohen FE. Bayesian statistical analysis of protein side-chain
rotamer preferences. Protein Sci. 1997;6(8):1661–1681.
436. Dunbrack R, Karplus M. Backbone-dependent rotamer library for proteins.
Application to side-chain prediction. J Mol Biol. 1993;230(2):543–574.
437.
Poole AM, Ranganathan R. Knowledge-based potentials in protein design. Curr
Opinion Struct Biol. 2006;16(4):508–513.
438. Pierce NA, Winfree E. Protein Design is NP-hard. Protein Eng Des Sel.
2002;15(10):779–782.
439.
Desmet J, de Maeyer M, Hazes B, Lasters I. The dead-end elimination theorem
and its use in protein side-chain positioning. Nature. 1992;356:539–542.
440. Gordon DB, Mayo SL. Branch-and-terminate: a combinatorial optimization
algorithm for protein design. Structure. 1999;7(9):1089–1098.
441. Hong EJ, Lippow SM, Tidor B, T L. Rotamer optimization for protein design
through MAP estimation and problem-size reduction. J Comput Chem.
2009;30(12):1923–1945.
PLOS 68/83
442. Wernisch L, Hery S, Wodak SJ. Automatic protein design with all atom
force-fields by exact and heuristic optimization. J Mol Biol. 2000;301(3):713–736.
443. Althaus E, Kohlbacher O, Lenhof HP, M¨uller P. A combinatorial approach to
protein docking with flexible side chains. J Comp Biol. 2002;9(4):597–612.
444. Kingsford CL, Chazelle B, Singh M. Solving and analyzing side-chain
positioning problems using linear and integer programming. Bioinformatics.
2005;21(7):1028–1039.
445. Leaver-Fay A, Kuhlman B, Snoeyink J. An adaptive dynamic programming
algorithm for the side chain placement problem. In: Pac Symp Biocomput; 2005.
p. 16–27.
446.
Traor´e S, Allouche D, Andr´e I, de Givry S, Katsirelos G, Schiex T, et al. A new
framework for computational protein design through cost function network
optimization. Bioinformatics. 2013;29(17):2129–2136.
447. Li Z, Yang Y, Zhan J, Dai L, Zhou Y. Energy Functions in De Novo Protein
Design: Current Challenges and Future Prospects. Annu Rev Biophys.
2013;42:315–335.
448.
Arnold FH. Combinatorial and computational challenges for biocatalyst design.
Nature. 2001;409(6817):253–257.
449.
Gainza P, Roberts KE, Donald BR. Protein Design Using Continuous Rotamers.
PLOS Comput Biol. 2012;8:1.
450. Gainza P, Roberts KE, Georgiev I, Lilien RH, Keedy DA, Chen CY, et al.
OSPREY: protein design with ensembles, flexibility, and provable algorithms.
Methods Enzymol. 2013;523:87.
451. Shapovalov MV, Dunbrack RL. A smoothed backbone-dependent rotamer
library for proteins derived from adaptive kernel density estimates and
regressions. Structure. 2011;19(6):844–858.
452.
Reeve SM, Gainza P, Frey KM, Georgiev I, Donald BR, Anderson AC. Protein
design algorithms predict viable resistance to an experimental antifolate. Proc
Natl Acad Sci USA. 2015;112(3):749–754.
453. Voigt CA, Gordon DB, Mayo SL. Trading accuracy for speed: A quantitative
comparison of search algorithms in protein sequence design. J Mol Biol.
2000;299(3):789–803.
454.
Desjarlais JR, Handel TM. De novo design of the hydrophobic cores of proteins.
Protein Sci. 1995;4(10):2006–2018.
455. Raha K, Wollacott AM, Italia MJ, Desjarlais JR. Prediction of amino acid
sequence from structure. Protein Sci. 2000;9(6):1106–1119.
456. Allen BD, Mayo SL. Dramatic performance enhancements for the FASTER
optimization algorithm. J Comput Chem. 2006;27(10):1071–1075.
457.
Desmet J, Spriet J, Lasters I. Fast and accurate side-chain topology and energy
refinement (FASTER) as a new method for protein structure optimization.
Proteins: Struct Funct Bioinf. 2002;48(1):31–43.
458.
Liu Y, Kuhlman B. RosettaDesign Server for protein design. Nucleic Acids Res.
2006;34(Web Server Issue):W235–W238.
PLOS 69/83
459. Canutescu AA, Shelenkov AA, Dunbrack Jr RL. A graph-theory algorithm for
rapid protein side chain prediction. Protein Sci. 2003;12(9):2001–2014.
460. Schueler-Furman O, Wang C, Bradley P, Misura K, Baker D. Progress in
modeling of protein structures and interactions. Science.
2005;310(5748):638–642.
461. Skolnick J. In quest of an empirical potential for protein structure prediction.
Curr Opinion Struct Biol. 2006;16(2):166–171.
462. Humphris EL, Kortemme T. Prediction of protein-protein interface sequence
diversity using flexible backbone computational protein design. Structure.
2008;16(12):1777–1788.
463. Smith CA, Kortemme T. Backrub-like backbone simulation recapitulates
natural protein conformational variability and improves mutant side-chain
prediction. J Mol Biol. 2008;380(4):742–756.
464. Friedland GD, Linares AJ, Smith CA, Kortemme T. A simple model of
backbone flexibility improves modeling of side-chain conformational variability.
J Mol Biol. 2008;380(4):757–774.
465. Smith CA, Kortemme T. Predicting the tolerated sequences for proteins and
protein interfaces using RosettaBackrub flexible backbone design. PLoS One.
2011;6(7):e20451.
466. Canutescu AA, Dunbrack RL. Cyclic Coordinate Descent: A Robotics
Algorithm for Protein Loop Closure. Protein Sci. 2003;12(5):963–972.
467.
Georgiev I, Keedy D, Richardson JS, Richardson DC, Donald BR. Algorithm for
backrub motions in protein design. Bioinformatics. 2008;24(13):i196–204.
468.
Keedy DA, Georgiev I, Triplett EB, Donald RR, Richardson DC, Richardson JS.
The role of local backrub motions in evolved and designed mutations. PLoS
Comp Biol. 2012;8(8):e1002629.
469. Murphy GS, Mills JL, Miley MJ, Machius M, Szyperski T, Kuhlman B.
Increasing sequence diversity with flexible backbone protein design: the complete
redesign of a protein hydrophobic core. Structure. 2012;20(6):1086–1096.
470.
Ollikainen N, Kortemme T. Computational protein design quantifies structural
constraints on amino acid covariation. PLoS Comp Biol. 2013;9(11):e1003313.
471.
Jana B, Morcos F, Onuchic JN. From structure to function: the convergence of
structure based models and co-evolutionary information. Phys Chem Chem
Phys. 2014;16(14):6496–6507.
472. Sandler I, Zigdon N, Levy E, Aharoni A. The functional importance of
co-evolving residues in proteins. Cell Mol Life Sci. 2014;71(4):673–682.
473. Kajan L, Hopf TA, Kalaus M, Marks DS, Rost B. FreeContact: fast and free
software for protein contact prediction from residue co-evolution. BMC Bioinf.
2014;15:85.
474. Ovchinnikov S, Kamisetty H, Baker D. Robust and accurate prediction of
residue-residue interactions across protein interfaces using evolutionary
information. eLife. 2014;3:e02030.
PLOS 70/83
475. Kosciolek T, Jones DT. De Novo Structure Prediction of Globular Proteins
Aided by Sequence Variation-Derived Contacts. PLoS ONE. 2014;9(3):e92197.
476.
Huang H, Ozkirimli E, Post CB. A comparison of three perturbation molecular
dynamics methods for mModeling conformational transitions. J Chem Theory
Comput. 2009;5(5):1301–1314.
477.
Malek R, Mousseau N. Dynamics of Lennard-Jones clusters: A characterization
of the activation-relaxation technique. Phys Rev E. 2000;62(6):7723–7728.
478. Earl DJ, Deem MW. Parallel tempering: theory, applications, and new
perspectives. Phys Chem Chem Phys. 2005;7:3910–3916.
479. Arora K, Brooks CLI. Large-scale allosteric conformational transitions of
adenylate kinase appear to involve a population-shift mechanism. Proc Natl
Acad Sci USA. 2007;104(47):18496–18501.
480. Zhang Y, Kihara D, Skolnick J. Local energy landscape flattening: parallel
hyperbolic Monte Carlo sampling of protein folding. Proteins: Struct Funct
Bioinf. 2002;48(2):192–201.
481. Huber T, Torda AE, van Gunsteren WF. Local elevation: a method for
improving the searching properties of molecular dynamics simulation. J Comput
Aided Mol Design. 1994;8(6):695–708.
482. Schulze BG, Grubmueller H, Evanseck JD. Functional significance of
hierarchical tiers in carbonmonoxy myoglobin: conformational substates and
transitions studied by conformational flooding simulations. J Am Chem Soc.
2000;122(36):8700–8711.
483. Krueger P, Verheyden S, Declerck PJ, Engelborghs Y. Extending the
capabilities of targeted molecular dynamics: simulation of a large conformational
transition in plasminogen activator inhibitor 1. Protein Sci. 2001;10(4):798–808.
484. Schlitter J, Engels M, Krueger P. Targeted molecular dynamics - a new
approach for searching pathways of conformational transitions. Proteins: Struct
Funct Bioinf. 1994;12(2):84–89.
485. Mashi RJ, Jakobsson E. End-point targeted molecular dynamics: large-scale
conformational changes in potassium channels. Biophys J.
2008;94(11):4307–4319.
486. van der Vaart A, Karplus M. Minimum free energy pathways and free energy
profiles for conformational transitions based on atomistic molecular dynamics
simulations. J Chem Phys. 2007;126:164106.
487. Ding F, Tsao D, Nie H, Dokholyan NV. Ab initio folding of proteins with
all-atom discrete molecular dynamics. Structure. 2008;16(7):1010–1018.
488.
Pan AC, Sezer D, Roux B. Finding transition pathways using the string method
with swarms of trajectories. J Phys Chem B. 2008;112(11):3432–3440.
489. No´e F, Doose S, Daidone I, L¨ollmann M, Sauer M, Chodera JD, et al.
Dynamical fingerprints for probing individual relaxation processes in
biomolecular dynamics with simulations and kinetic experiments. Proc Natl
Acad Sci USA. 2011;108(12):4822–4827.
PLOS 71/83
490. Sim AYL, Minary P, Levitt M. Modeling nucleic acids. Curr Opinion Struct
Biol. 2012;22(3):273–278.
491. Schneidman-Duhovny D, Pellarin R, Sali A. Uncertainty in integrative
structural modeling. Curr Opinion Struct Biol. 2014;28(null):96–104.
492. Rohrdanz MA, Zheng W, Clementi C. Discovering mountain passes via
torchlight: methods for the definition of reaction coordinates and pathways in
complex macromolecular reactions. Annu Rev Phys Chem.
2013;64(null):295–316.
493. Kalyaanamoorthy S, Chen YPP. Modelling and enhanced molecular dynamics
to steer structure-based drug discovery. Prog Biophys Mol Biol.
2014;114(3):123–136.
494. Sponer J, Banas P, Jurecka P, Zgarbova M, Kuhrova P, Havrila M, et al.
Molecular Dynamics Simulations of Nucleic Acids. From Tetranucleotides to the
Ribosome. Phys Chem Lett. 2014;5(10):1771–1782.
495. Biedermann J, Ullrich A, Sch¨oneberg J, No´e F. ReaDDyMM: Fast Interacting
Particle Reaction-Diffusion Simulations Using Graphical Processing Units.
Biophys J. 2015;108(3):457–461.
496. Hamelberg D, Mongan J, McCammon JA. Accelerated molecular dynamics: a
promising and efficient simulation method for biomolecules. J Chem Phys.
2004;120(24):11919–11929.
497. Wang Y, Harrison CB, Schulten K, McCammon JA. Implementation of
accelerated molecular dynamics in NAMD. Computational science & discovery.
2011;4(1):015002.
498.
Pierce LC, Salomon-Ferrer R, Augusto F de Oliveira C, McCammon JA, Walker
RC. Routine access to millisecond time scale events with accelerated molecular
dynamics. J Chem Theory Comput. 2012;8(9):2997–3002.
499.
Miao Y, Nichols SE, Gasper PM, Metzger VT, McCammon JA. Activation and
dynamic network of the M2 muscarinic receptor. Proc Natl Acad Sci USA.
2013;110(27):10982–10987.
500.
Miao Y, Nichols SE, McCammon JA. Mapping of Allosteric Druggable Sites in
Activation-Associated Conformers of the M2 Muscarinic Receptor. Chem Biol &
Drug Design. 2014;83(2):237–246.
501. Sinko W, Miao Y, de Oliveira CAF, McCammon JA. Population Based
Reweighting of Scaled Molecular Dynamics. J Phys Chem B.
2013;117(42):12759–12768.
502. Tribello GA, Ceriotti M, Parrinello M. A self-learning algorithm for biased
molecular dynamics. Proc Natl Acad Sci USA. 2010;107(41):17509–17514.
503. Swendsen RH, Wang JS. Replica Monte Carlo simulation of spin glasses. Phys
Rev Lett. 1986;57:2607–2609.
504.
Sugita Y, Okamoto Y. Replica-exchange molecular dynamics method for protein
folding. Chem Phys Lett. 1999;314(1):141–151.
505. Wang L, Friesner RA, Berne B. Replica exchange with solute scaling: A more
efficient version of replica exchange with solute tempering (REST2). J Phys
Chem B. 2011;115(30):9431–9438.
PLOS 72/83
506. van der Spoel D, Seibert MM. Protein Folding Kinetics and Thermodynamics
from Atomistic Simulations. Phys Rev Lett. 2006;96(3):238102.
507. Hess B, Scheek RM. Orientation restraints in molecular dynamics simulations
using time and ensemble averaging. J Magn Reson. 2003;164(1):19–27.
508. De Simone A, Richter B, Salvatella X, Vendruscolo M. Toward an Accurate
Determination of Free Energy Landscapes in Solution States of Proteins. J Am
Chem Soc. 2009;131(11):3810–3811.
509.
De Simone A, Montalvao RW, Vendruscolo M. Determination of Conformational
Equilibria in Proteins Using Residual Dipolar Couplings. J Chem Theory
Comput. 2011;7(12):4189–4195.
510.
Allison JR, Hertig S, Missimer JH, Smith LJ, Steinmetz MO, Dolenc J. Probing
the Structure and Dynamics of Proteins by Combining Molecular Dynamics
Simulations and Experimental NMR Data. J Chem Theory Comput.
2012;8(10):3430–3444.
511. Markwick PRL, Nilges M. Computational approaches to the interpretation of
NMR data for studying protein dynamics. J Chem Phys. 2012;396(2):124–134.
512. Salmon L, Pierce L, Grimm A, Roldan JO, Mollica L, Jensen MR, et al.
Multi-Timescale Conformational Dynamics of the SH3 Domain of
CD2-Associated Protein using NMR Spectroscopy and Accelerated Molecular
Dynamics. Angew Chem Int Ed Engl. 2012;51(25):6103–6106.
513. Jaynes ET. Information Theory and Statistical Mechanics. Phys Rev.
1957;106(4):620–630.
514. Roux B, Weare J. On the statistical equivalence of restrained-ensemble
simulations with the maximum entropy method. J Chem Phys.
2013;138(8):084107.
515. Fu B, Sahakyan AB, Camilloni C, Tartaglia GG, Paci E, Caflisch A, et al.
ALMOST: An all atom molecular simulation toolkit for protein structure
determination. J Comput Chem. 2014;35(14):1101–1105.
516. Camilloni C, Cavalli A, Vendruscolo M. Replica-Averaged Metadynamics. J
Chem Theory and Comput. 2013;9(12):5610–5617.
517.
Torrie GM, Valleau JP. Monte Carlo free energy estimates using non-Boltzmann
sampling: application to the sub-critical Lennard-Jones fluid. Chem Phys Lett.
1974;28(4):578–581.
518. Roux B. The calculation of the potential of mean force using computer
simulations. Computer Physics Communications. 1995;91(1):275–282.
519. Bartels C, Karplus M. Multidimensional adaptive umbrella sampling:
applications to main chain and side chain peptide conformations. J Comput
Chem. 1997;18(12):1450–1462.
520.
Kumar S, Rosenberg JM, Bouzida D, Swendsen RH, Kollman PA. The weighted
histogram analysis method for free-energy calculations on biomolecules. I. The
method. J Comput Chem. 1992;13(8):1011–1021.
521.
Zhu F, Hummer G. Convergence and error estimation in free energy calculations
using the weighted histogram analysis method. J Comput Chem.
2012;33(4):453–465.
PLOS 73/83
522.
Hub JS, de Groot BL, van der Spoel D. g whams - A Free Weighted Histogram
Analysis Implementation Including Robust Error and Autocorrelation Estimates.
J Chem Theory Comput. 2010;6(12):3713–3720.
523. Wojtas-Niziurski W, Meng Y, Roux B, Berneche S. Self-learning adaptive
umbrella sampling method for the determination of free energy landscapes in
multiple dimensions. J Chem Theory Comput. 2013;9(4):1885–1895.
524. Snyder R, Wang B, Roark M, Feller SE. Replica Exchange Umbrella Sampling
Simulations Provide Insight into the Role of Docosahexaenoic Acid in
Modulating the Stability of Transmembrane Proteins. Biophys J.
2014;106(2):16a.
525. Krivov SV, Karplus M. Hidden complexity of free energy surfaces for peptide
(protein) folding. Proc Natl Acad Sci USA. 2004;101(41):14766–14770.
526. Zheng W, Rohrdanz MA, Maggioni M, Clementi C. Polymer reversal rate
calculated via locally scaled diffusion map. J Chem Phys. 2011;134(14):144109.
527. Rohrdanz MA, Zheng W, Maggioni M, Clementi C. Determination of reaction
coordinates via locally scaled diffusion map. J Chem Phys. 2011;134(12):124116.
528. Zheng W, Rohrdanz MA, Clementi C. Rapid Exploration of Configuration
Space with Diffusion-Map-Directed Molecular Dynamics. J Phys Chem B.
2013;117(42):12769–12776.
529. Preto J, Clementi C. Fast recovery of free energy landscapes via
diffusion-map-directed molecular dynamics. Phys Chem Chem Phys.
2014;16(36):19181–19191.
530. Becker OM, Karplus M. The topology of multidimensional potential energy
surfaces: Theory and application to peptide structure and kinetics. J Chem
Phys. 1997;106(4):1495–1517.
531.
Doye J, Miller M, Wales D. Evolution of the Potential Energy Surface with Size
for Lennard-Jones Clusters. J Chem Phys. 1999;111(18):8417–8428.
532. Krivov SV, Karplus M. Free energy disconnectivity graphs: Application to
peptide models. J Chem Phys. 2002;117(23):10894–10903.
533. Rao F, Caflisch A. The protein folding network. J Mol Biol.
2004;342(1):299–306.
534. Muff S, Caflisch A. Kinetic analysis of molecular dynamics simulations reveals
changes in the denatured state and switch of folding pathways upon single-point
mutation of a β-sheet miniprotein. Proteins: Struct Funct Bioinf.
2008;70(4):1185–1195.
535. Caflisch A. Network and graph analyses of folding free energy surfaces. Curr
Opinion Struct Biol. 2006;16(1):71–78.
536. Krivov SV, Karplus M. Diffusive reaction dynamics on invariant free energy
profiles. Proc Natl Acad Sci USA. 2008;105(37):13841–13846.
537. Zhou R. Free energy landscape of protein folding in water: explicit vs. implicit
solvent. Proteins: Struct, Funct, Bioinf. 2003;53(2):148–161.
PLOS 74/83
538. Barron LD, Hecht L, Wilson G. The lubricant of life: A proposal that solvent
water promotes extremely fast conformational fluctuations in mobile
heteropolypeptide structure. Biochemistry. 1997;36(43):13143–13147.
539. Singhal N, Pande VS. Error analysis and efficient sampling in Markovian state
models for molecular dynamics. J Chem Phys. 2005;123(20):204909.
540. Noe F, Fischer S. Transition networks for modeling the kinetics of
conformational change in macromolecules. Curr Opinion Struct Biol.
2008;18:154–162.
541.
erez-Hern´andez G, Paul F, Giorgino T, De Fabritiis G, No´e F. Identification of
slow molecular order parameters for Markov model construction. J Chem Phys.
2013;139(1):015102.
542. Piana S, Lindorff-Larsen K, Shaw DE. Atomic-level description of ubiquitin
folding. Proc Natl Acad Sci USA. 2013;110(15):5915–5920.
543. Weber JK, Jack RL, Pande VS. Emergence of glass-like behavior in Markov
state models of protein folding dynamics. J Amer Chem Soc.
2013;135(15):5501–5504.
544. Deng Nj, Dai W, Levy RM. How kinetics within the unfolded state affects
protein folding: An analysis based on Markov state models and an ultra-long
MD trajectory. J Phys Chem B. 2013;117(42):12787–12799.
545. Voelz VA, Ja¨ager M, Yao S, Chen Y, Zhu L, Waldauer SA, et al. Slow
unfolded-state structuring in Acyl-CoA binding protein folding revealed by
simulation and experiment. J Amer Chem Soc. 2012;134(30):12565–12577.
546.
Weber M, Bujotzek A, Haag R. Quantifying the rebinding effect in multivalent
chemical ligand-receptor systems. J Chem Phys. 2012;137(5):054111.
547.
Shukla D, Meng Y, Roux B, Pande VS. Activation pathway of Src kinase reveals
intermediate states as targets for drug design. Nature Communications. 2014;5.
548. Kohlhoff KJ, Shukla D, Lawrenz M, Bowman GR, Konerding DE, Belov D,
et al. Cloud-based simulations on Google Exacycle reveal ligand modulation of
GPCR activation pathways. Nature Chem. 2014;6(1):15–21.
549. Bowman GR, Geissler PL. Equilibrium fluctuations of a single folded protein
reveal a multitude of potential cryptic allosteric sites. Proc Natl Acad Sci USA.
2012;109(29):11681–11686.
550. Lin YS, Bowman GR, Beauchamp KA, Pande VS. Investigating how peptide
length and a pathogenic mutation modify the structural ensemble of amyloid
beta monomer. Biophysic J. 2012;102(2):315–324.
551.
Qiao Q, Bowman GR, Huang X. Dynamics of an intrinsically disordered protein
reveal metastable conformations that potentially seed aggregation. J Amer
Chem Soc. 2013;135(43):16092–16101.
552.
Du WN, Bolhuis PG. Adaptive single replica multiple state transition interface
sampling. J Chem Phys. 2013;139(4):044105.
553. Noe F. Beating the millisecond barrier in molecular dynamics simulations.
Biophys J. 2015;108:228–229.
PLOS 75/83
554.
Laio A, Rodriguez-Fortea A, Gervasio FL, Ceccarelli M, Parrinello M. Assessing
the accuracy of metadynamics. J Phys Chem B. 2005;109(14):6714–6721.
555. Barducci A, Bonomi M, Parrinello M. Metadynamics. Wiley Interdisciplinary
Reviews: Computational Molecular Science. 2011;1(5):826–843.
556. Bonomi M, Branduardi D, Bussi G, Camilloni C, Provasi D, Raiten P, et al.
PLUMED: a portable plugin for free-energy calculations wit h molecular
dynamics. Comput Phys Communications. 2009;180(10):1961–1972.
557. Bonomi M, Branduardi D, Gervasio FL, Parrinello M. The unfolded ensemble
and folding mechanism of the C-terminal GB1 β-hairpin. J Am Chem Soc.
2008;130(42):13938–13944.
558. Piana S, Laio A, Marinelli F, Van Troys M, Bourry D, Ampe C, et al.
Predicting the effect of a point mutation on a protein fold: the villin and advillin
headpieces and their Pro62Ala mutants. J Mol Biol. 2008;375(2):460–470.
559.
Berteotti A, Cavalli A, Branduardi D, Gervasio FL, Recanatini M, Parrinello M.
Protein conformational transitions: the closure mechanism of a kinase explored
by atomistic simulations. J Am Chem Soc. 2008;131(1):244–250.
560. Melis C, Bussi G, Lummis SC, Molteni C. Trans- cis Switching Mechanisms in
Proline Analogues and Their Relevance for the Gating of the 5-HT3 Receptor. J
Phys Chem B. 2009;113(35):12148–12153.
561.
Prakash MK, Barducci A, Parrinello M. Probing the mechanism of pH-induced
large-scale conformational changes in dengue virus envelope protein using
atomistic simulations. Biophys J. 2010;99(2):588–594.
562. Bocahut A, Bernad S, Sebban P, Sacquin-Mora S. Relating the diffusion of
small ligands in human neuroglobin to its structural and mechanical properties.
J Phys Chem B. 2009;113(50):16257–16267.
563. Nishihara Y, Hayashi S, Kato S. A search for ligand diffusion pathway in
myoglobin using a metadynamics simulation. Chem Phys Lett.
2008;464(4):220–225.
564. Provasi D, Bortolato A, Filizola M. Exploring molecular mechanisms of ligand
recognition by opioid receptors with metadynamics. Biochemistry.
2009;48(42):10020–10029.
565.
Limongelli V, Bonomi M, Marinelli L, Gervasio FL, Cavalli A, Novellino E, et al.
Molecular basis of cyclooxygenase enzymes (COXs) selective inhibition. Proc
Natl Acad Sci USA. 2010;107(12):5411–5416.
566.
Masetti M, Cavalli A, Recanatini M, Gervasio FL. Exploring Complex Protein-
Ligand Recognition Mechanisms with Coarse Metadynamics. J Phys Chem B.
2009;113(14):4807–4816.
567. Cavalli A, Spitaleri A, Saladino G, Gervasio FL. Investigating Drug–Target
Association and Dissociation Mechanisms Using Metadynamics-Based
Algorithms. Accounts Chem Res. 2014;.
568. Gur M, Madura JD, Bahar I. Global transitions of proteins explored by a
multiscale hybrid methodology: application to adenylate kinase. Biophys J.
2013;105(7):1643–1652.
PLOS 76/83
569. Atilgan A, Durell S, Jernigan R, Demirel M, Keskin O, Bahar I. Anisotropy of
fluctuation dynamics of proteins with an elastic network model. Biophys J.
2001;80(1):505–515.
570. Das A, Gur M, Cheng MH, Jo S, Bahar I, Roux B. Exploring the
Conformational Transitions of Biomolecular Systems Using a Simple Two-State
Anisotropic Network Model. PLoS Comput Biol. 2014;10(4):e1003521.
571. Baron R. Fast Sampling of A-to-B Protein Global Conformational Transitions:
From Galileo Galilei to Monte Carlo Anisotropic Network Modeling. Biophys J.
2013;105(7):1545–1546.
572.
Suarez E, Lettieri S, Zwier MC, Stringer CA, Subramanian SR, Chong LT, et al.
Simultaneous computation of dynamical and equilibrium information using a
weighted ensemble of trajectories. J Chem Theory Comput.
2014;10(7):2658–2667.
573. Rojnuckarin A, Kim S, Subramaniam S. Brownian dynamics simulations of
protein folding: access to milliseconds time scale and beyond. Proc Natl Acad
Sci USA. 1998;95(8):4288–4292.
574. Bhatt D, Zuckerman DM. Beyond microscopic reversibility: Are observable
nonequilibrium processes precisely reversible? J Chem Theory Comput.
2011;7(8):2520–2527.
575. Bhatt D, Zuckerman DM. Heterogeneous path ensembles for conformational
transitions in semiatomistic models of adenylate kinase. J Chem Theory
Comput. 2010;6(11):3527–3539.
576. Echols N, Milburn D, Gerstein M. MolMovDB: analysis and visualization of
conformational change and structural flexibility. Nucleic Acids Res.
2003;31(1):478–482.
577. Flores S, Echols N, Milburn D, Hespenheide B, Keating K, Lu J, et al. The
Database of Macromolecular Motions: new features added at the decade mark.
Nucleic Acids Res. 2006;34(suppl 1):D296–D301.
578. Cecchini M, Houdusse A, Karplus M. Allosteric communication in myosin V:
from small conformational changes to large directed movements. PLoS Comput
Biol. 2008;4(8):e1000129.
579.
Zhu F, Hummber G. Gating transition of pentameric ligand-gated ion channels.
Biophys J. 2009;97(9):2456–2463.
580.
Zimmermann MT, Kloczkowski A, Jernigan RL. MAVENs: motion analysis and
visualization of elastic networks and structural ensembles. BMC Bioinf.
2011;12(1):264.
581. Go N, Scheraga H. Analysis of contribution of internal vibrations to statistical
weights of equilibrium conformations of macromolecules. J Chem Phys.
1969;51(11):4751–4767.
582.
Go N, Scheraga H. On the use of classical statistical-mechanics in treatment of
polymer-chain conformation. Macromolecules. 1976;9(4):535–542.
583. Flory PJ. Statistical thermodynamics of random networks. Proc Royal Soc.
1976;351(1666):351–380.
PLOS 77/83
584. Tirion MM. Large amplitude elastic motions in proteins from a single
parameter, atomic analysis. Phys Rev Lett. 1996;77(9):1905–1908.
585. Bahar I, Atilgan A, Erman B. Direct evaluation of thermal fluctuations in
proteins using a single-parameter harmonic potential. Fold Des.
1997;2(3):173–181.
586. Micheletti C, Seno F, Banavar JR, Maritan A. Learning effective amino acid
interactions through iterative stochastic techniques. Proteins.
2001;42(3):422–431.
587. Halle B. Flexibility and packing in proteins. Proc Natl Acad Sci USA.
2002;99(3):1274–1279.
588.
Haliloglu T, Bahar I, Erman B. Gaussian dynamics of folded proteins. Phys Rev
Lett. 1997;79(16):3090–3093.
589. Kundu S, Melton JS, Sorensen DC, Phillips GN. Dynamics of proteins in
crystals: comparison of experiment with simple models. Biophys J.
2002;83(2):723–732.
590. Li G, Cui Q. Analysis of functional motions in Brownian molecular machines
with an efficient block normal mode approach: myosin-II and Ca2+-ATPase.
Biophys J. 2004;86(2):743–763.
591. Ming D, Kong Y, Lambert MA, Huang Z, Ma J. How to describe protein
motion without amino acid sequence and atomic coordinates. Proc Natl Acad
Sci USA. 2002;99(13):8620–8625.
592.
Delarue M, Sanejouand YH. Simplified normal mode analysis of conformational
transitions in dna-dependent polymerases: the elastic network model. J Mol
Biol. 2002;320(5):1011–1024.
593. Tama F, Valle M, Frank J, Brooks CL. Dynamic reorganization of the
functionally active ribosome explored by normal mode analysis and cryo-electron
microscopy. Proc Natl Acad Sci USA. 2003;100(16):9319–9323.
594. Reuter N, Hinsen K, Lacap´ere JJ. Transconformations of the SERCA1
Ca-ATPase: a normal mode study. Biophys J. 2003;85(4):2186–2197.
595. Xu C, Tobi D, Bahar I. Allosteric changes in protein structure computed by a
simple mechanical model: hemoglobin T–R2 transition. J Mol Biol.
2003;333(1):153–158.
596. Tama F, Sanejouand YH. Conformational change of proteins arising from
normal mode calculations. Protein Eng. 2001;14(1):1–6.
597.
Zheng W, Doniach S. A comparative study of motor-protein motions by using a
simple elastic-network model. Proc Natl Acad Sci USA.
2003;100(23):13253–13258.
598. Ikeguchi M, Ueno J, Sato M, Kidera A. Protein structural change upon ligand
binding: linear response theory. Phys Rev Lett. 2005;94(7):078102.
599. Kim MK, Chirikjian GS, Jernigan RL. Elastic models of conformational
transitions in macromolecules. J Mol Graph Model. 2002;21(2):151–160.
PLOS 78/83
600. Tama F, Brooks CL. Diversity and identity of mechanical properties of
icosahedral viral capsids studied with elastic network normal mode analysis. J
Mol Biol. 2005;345(2):299–314.
601.
Tama F, Feig M, Liu J, Brooks CL, Taylor KA. The requirement for mechanical
coupling between head and s2 domains in smooth muscle Myosin ATPase
regulation and its implications for dimeric motor function. J Mol Biol.
2005;345(4):837–854.
602.
conformational transitions explored by mixed elastic network models P. Protein
conformational transitions explored by mixed elastic network models. Proteins:
Struct Funct Bioinf. 2007;69(1):43–57.
603. Maragakis P, Karplus M. Large amplitude conformational change in proteins
explored with a plastic network model: adenylate kinase. J Mol Biol.
2005;352(4):807–822.
604.
Miayshita O, Onuchic JN, Wolynes PG. Nonlinear elasticity, proteinquakes, and
the energy landscapes of functional transitions in proteins. Proc Natl Acad Sci
USA. 2003;100(22):12570–12575.
605.
Miayshita O, Wolynes PG, Onuchic JN. Simple energy landscape model for the
kinetics of functional transitions in proteins. J Phys Chem B.
2005;5(1959-1969):109.
606. Chu JW, Voth GA. Coarse-grained free energy functions for studying protein
conformational changes: a double-well network model. Biophys J.
2007;93(11):3860–3871.
607. Fischer S, Karplus M. Conjugate peak refinement: an algorithm for finding
reaction paths and accurate transition states in systems with many degrees of
freedom. Chem Phys Lett. 1992 Jun;194(3):252–261. Available from:
http://dx.doi.org/10.1016/0009-2614(92)85543-j.
608. Weiss DR, Koehl P. Morphing Methods to Visualize Coarse-Grained Protein
Dynamics. In: Protein Dynamics. Springer; 2014. p. 271–282.
609.
Seo S, Kim MK. KOSMOS: a universal morph server for nucleic acids, proteins
and their complexes. Nucleic Acids Res. 2012;40(Web Server issue):W531–W536.
610. Pratt L. A statistical method for identifying transition states in high
dimensional problems. J Chem Phys. 1986;85(9):5045–5048.
611. Dellago C, Bolhuis PG, Csajka FS, Chandler D. Transition path sampling and
the calculation of rate constants. J Chem Phys. 1998;108(5):1964–1977.
612.
Woolf T. Path corrected functionals of stochastic trajectories: Towards relative
free energy and reaction coordinate calculations. Chem Phys Lett.
1998;289(5-6):433–441.
613. van Erp TS, Moroni D, Bolhuis PG. A novel path sampling method for the
calculation of rate constants. J Chem Phys. 2003;118(17):7762–7774.
614. Faradjian AK, Elber R. Computing time scales from reaction coordinates by
milestoning. J Chem Phys. 2004;120(23):10880–10889.
615. Allen RJ, Warren PB, Ten Wolde PR. Sampling rare switching events in
biochemical networks. Phys Rev Lett. 2005;94(1):018104.
PLOS 79/83
616. Warmflash A, Bhimalapuram P, , Dinner AR. Umbrella sampling for
nonequilibrium processes. J Chem Phys. 2007;127(15):154112.
617. Bolhuis PG, Chandler D, Dellago C, Geissler PL. Transition path sampling:
throwing ropes over mountain passes in the dark. Annu Rev Phys Chem.
2002;53:291–318.
618. Dellago C, Bolhuis PG. Transition path sampling and other advanced
simulation techniques for rare events. In: Holm C, Kremer K, editors. Advanced
Computer Simulation Approaches for Soft Matter Sciences III. vol. 221 of
Advances in Polymer Science. Springer Berlin Heidelberg; 2009. p. 167–233.
619. Vanden-Eijnden EW. Towards a theory of transition paths. J Stat Phys.
2006;123(3):503–523.
620. Elber R, Karplus M. A method for determining reaction paths in large
molecules: Application to myoglobin. Chem Phys Lett. 1987;139(5):375–380.
621. Henkelmann G, J´onsson H. Improved tangent estimate in the nudged elastic
band method for finding minimum energy paths and saddle points. J Chem
Phys. 2000;113:9978–9985.
622. Weinan E, Ren W, Vanden-Eijnden E. String method for the study of rare
events. Phys Rev B. 2002;66:052301.
623. Bohner MU, Zeman J, Smiatek J, Arnold A, K¨astner J. Nudged-elastic band
used to find reaction coordinates based on the free energy. J Chem Phys.
2014;140(7):074109.
624. onsson H, Mills G, Jacobsen KW. Nudged Elastic Band Method for Finding
Minimum Energy Paths of Transitions. In: Berne BJ, Ciccotti G, Coker DF,
editors. Classical and Quantum Dynamics in Condensed Phase Simulations.
Singapore: World Scientific; 1998. p. 385–404.
625.
Olender R, Elber R. Yet another look at the steepest descent path. J Mol Struct
THEOCHEM. 1997;398-399:63–71.
626.
Crehuet R, Field MJ. A temperature-dependent nudged-elastic-band algorithm.
J Chem Phys. 2003;118(21):9653–9571.
627. Ren W, Vanden-Eijnden E. Finite temperature string method for the study of
rare events. J Phys Chem B. 2005;109(14):6688–6693.
628. Pan AC, Weinreich TM, Shan Y, Scarpazza DP, Shaw DE. Assessing the
accuracy of two enhanced sampling methods using EGFR kinase transition
pathways: the influence of collective variable choice. J Chem Theory and
Comput. 2014;10(7):2860–2865.
629. Ovchinnikov V, Karplus M. Investigations of α-helix - β-sheet transition
pathways in a miniprotein using the finite-temperature string method. J Chem
Phys. 2014;140(17):175103.
630. Ovchinnikov V, Karplus M, Vanden-Eijnden E. Free energy of conformational
transition paths in biomolecules: The string method and its application to
myosin VI. J Chem Phys. 2011;134(8):085103.
631. Stober ST, Abrams CF. Energetics and mechanism of the
normal-to-amyloidogenic isomerization of β2-microglobulin: On-the-fly string
method calculations. J Phys Chem B. 2012;116(31):9371–9375.
PLOS 80/83
632. Matsunaga Y, Fujisaki H, Terada T, Kidera A. Conformational Transition
Pathways of Adenylate Kinase Explored by the String Method. Biophys J.
2012;102(3):733a.
633. Kumari M, Kozmon S, Kulhanek P, Stepan J, Tvaroska I, Koˇca J. Exploring
Reaction Pathways for O-GlcNAc Transferase Catalysis. A String Method Study.
J Phys Chem B. 2015;.
634. Fajer M, Meng Y, Roux B. Simulation of the Conformational Transition
Pathway for the Activation of Full-Length C-Src Kinase using the String
Method. Biophys J. 2014;106(2):639a–640a.
635. Ovchinnikov V, Cecchini M, Vanden-Eijnden E, Karplus M. Free energy of
conformational transition paths in biomolecules: The string method and its
application to myosin VI. Biophys J. 2011;101(10):2436–2444.
636.
Adelman JL, Grabe M. Simulating rare events using a weighted ensemble-based
string method. J Chem Phys. 2013;138(4):044105.
637.
Gan W, Yang S, Roux B. Atomistic view of the conformational activation of Src
kinase using the string method with swarms-of-trajectories. Biophys J.
2009;97(4):L8–L10.
638. Maragliano L, Roux B, Vanden-Eijnden E. Comparison between Mean Forces
and Swarms-of-Trajectories String Methods. J Chem Theory Comput.
2014;10(2):524–533.
639. Sanchez-Martinez M, Field M, Crehuet R. Enzymatic Minimum Free Energy
Path Calculations Using Swarms of Trajectories. J Phys Chem B. 2014;.
640. Peters B, Heyden A, Bell AT, Chakraborty A. A growing string method for
determining transition states: comparison to the nudged elastic band and string
methods. J Chem Phys. 2004;120(17):7877–7886.
641. Quapp W. A growing string method for the reaction pathway defined by a
Newton trajectory. J Chem Phys. 2005;122(17):174106/1–174106/11.
642.
Goodrow A, Bell AT, Head-Gordon M. Development and application of a hybrid
method involving interpolation and ab initio calculations for the determination
of transition states. J Chem Phys. 2008;129(17):174109/1–174109/12.
643. Goodrow A, Bell AT, Head-Gordon M. Transition state-finding strategies for
use with the growing string method. J Chem Phys.
2009;130(24):244108/1–244108/14.
644.
Goodrow A, Bell AT, Head-Gordon M. A strategy for obtaining a more accurate
transition state estimate using the growing string method. Chem Phys Lett.
2010;484(4-6):392–398.
645. Behn A, Zimmerman PM, Bell AT, Head-Gordon M. Efficient exploration of
reaction paths via a freezing string method. J Chem Phys.
2011;135(22):224108–224116.
646.
Mallikarjun Sharada S, Zimmerman PM, Bell AT, Head-Gordon M. Automated
transition state searches without evaluating the Hessian. J Chem Theory
Comput. 2012;8(12):5166–5174.
PLOS 81/83
647.
De Jong KA. Evolutionary Computation: A Unified Approach. Cambridge, MA:
MIT Press; 2006.
648. Unger R. The Genetic Algorithm Approach to Protein Structure Prediction.
Structure and Bonding. 2004;110:153–175.
649. Wales DJ, Doye JPK. Global Optimization by Basin-Hopping and the Lowest
Energy Structures of Lennard-Jones Clusters Containing up to 110 Atoms. J
Phys Chem A. 1997;101(28):5111–5116.
650.
Shehu A. Probabilistic Search and Optimization for Protein Energy Landscapes.
In: Aluru S, Singh A, editors. Handbook of Computational Molecular Biology.
Chapman & Hall/CRC Computer & Information Science Series; 2013. .
651. Shehu A. omputer-Aided Drug Discovery. In: Zhang W, editor. Methods in
Pharmacology and Toxicology. Springer Verlag; 2015. .
652. Hoque M, Chetty M, Sattar A. Genetic Algorithm in Ab Initio Protein
Structure Prediction Using Low Resolution Model: A Review. Biomed Data and
Applications. 2009;p. 317–342.
653. Dotu II, Cebri´an MM, Van Hentenryck PP, Clote PP. On lattice protein
structure prediction revisited. IEEE/ACM Trans Comput Biol Bioinf. 2011
Nov;8(6):1620–1632.
654. Prentiss MC, Wales DJ, Wolynes PG. Protein structure prediction using
basin-hopping. J Chem Phys. 2008;128(22):225106–225106.
655.
Olson B, Shehu A. Multi-Ob jective Optimization Techniques for Conformational
Sampling in Template-Free Protein Structure Prediction. In: Intl Conf on Bioinf
and Comp Biol (BICoB). Las Vegas, NV; 2014. .
656.
Verma A, Schug A, Lee KH, Wenzel W. Basin hopping simulations for all-atom
protein folding. J Chem Phys. 2006;124(4):044515.
657.
Baldwin JM. A new factor in evolution. American Naturalists. 1896;p. 441–451.
658. Rusu M, Birmanns S. Evolutionary tabu search strategies for the simultaneous
registration of multiple atomic structures in cryo-EM reconstructions. J Struct
Biol. 2010;170(1):164–171.
659. Rusu M, Wriggers W. Evolutionary bidirectional expansion for the tracing of
alpha helices in cryo-electron microscopy reconstructions. J Struct Biol.
2012;177(2):410–419.
660. Clausen R, Ma B, Nussinov R, Shehu A. Mapping the Conformation Space of
Wildtype and Mutant H-Ras with a Memetic, Cellular, and Multiscale
Evolutionary Algorithm. PLoS Comput Biol. 2015;11(9):e1004470.
661. Kim D, Blum B, Bradley P, Baker D. Sampling bottlenecks in de novo protein
structure prediction. J Mol Biol. 2009;393(1):249–60.
662. Choset H, et al . Principles of Robot Motion: Theory, Algorithms, and
Implementations. 1st ed. Cambridge, MA: MIT Press; 2005.
663. Kavraki LE, Svetska P, Latombe JC, Overmars M. Probabilistic roadmaps for
path planning in high-dimensional configuration spaces. IEEE Trans Robot
Autom. 1996;12(4):566–580.
PLOS 82/83
664. Amato NM, Dill KA, Song G. Using motion planning to map protein folding
landscapes and analyze folding kinetics of known native structures. J Comp Biol.
2002;10(3-4):239–255.
665. Song G, Amato NM. A Motion Planning Approach to Folding: From Paper
Craft to Protein Folding. IEEE Trans Robot Autom. 2004;20(1):60–71.
666. Molloy K, Shehu A. Interleaving Global and Local Search for Protein Motion
Computation. In: Harrison R, Li Y, Mandoiu I, editors. LNCS: Bioinformatics
Research and Applications. vol. 9096. Norfolk, VA: Springer International
Publishing; 2015. p. 175–186.
667. Chiang TH, Apaydin MS, Brutlag DL, Hsu D, Latombe JC. Using stochastic
roadmap simulation to predict experimental quantities in protein folding
kinetics: folding rates and phi-values. J Comp Biol. 2007;14(5):578–593.
668. Cortes J, Simeon T, de Angulo R, Guieysse D, Remaud-Simeon M, Tran V. A
path planning approach for computing large-amplitude motions of flexible
molecules. Bioinformatics. 2005;21(S1):116–125.
669.
Shehu A. An Ab-initio tree-based exploration to enhance sampling of low-energy
protein conformations. In: Trinkle J, Matsuoka Y, A CJ, editors. Robotics:
Science and Systems V. Seattle, WA, USA; 2009. p. 241–248.
670. Olson B, Molloy K, Shehu A. In Search of the Protein Native State with a
Probabilistic Sampling Approach. J Bioinf & Comp Biol. 2011;9(3):383–398.
671. Behzadi M, Roonasi P, Assle taghipoura K, van der Spoel D, Manzetti S.
Relationship between electronic properties and drug activity of seven
quinoxaline compounds: A DFT study. J Phys Chem. 2015;1091(5):196–202.
672. Khaliullin RZ, VandeVondele J, Hutter J. Efficient Linear-Scaling Density
Functional Theory for Molecular Systems. J Chem Theory Comput.
2013;9(10):4421–4427.
673.
Senn HM, Thiel W. QM/MM methods for biomolecular systems. Angew Chem
Int Ed Engl. 2009;48(7):1198–1229.
674. Larsson DSD, Liljas L, van der Spoel D. Virus capsid dissolution studied by
microsecond molecular dynamics simulations. PLoS Comput Biol.
2012;8(5):e1002502.
675.
Roy A, Zhang Y. Protein Structure Prediction. In: Encyclopeda of Life Sciences.
John Wiley & Sons, Ltd; 2012. p. a0003031.
676. Scheres SHW. A Bayesian View on Cryo-EM Structure Determination. J Mol
Biol. 2012;415(2):406–418.
677. Topf M, Lasker K, Webb B, Wolfson H, Chiu W, Sali A. Protein Structure
Fitting and Refinement Guided by cryoEM Density. Structure.
2008;16(2):295–307.
678.
Engel A, Gaub HE. Structure and Mechanics of Membrane Proteins. Annu Rev
Biochem. 2008;77:127–148.
PLOS 83/83

Supplementary resource (1)

... This section explores recent advances in using equivariant diffusion models to design and generate small molecules, proteins, and protein-ligand interactions. In the past decade, protein structures have been widely studied [189], with the majority of the research focusing on optimization and aiming to enhance the sampling capabilities of Monte Carlo or molecular dynamics methods. These studies frequently strive to expand the structural space under physiological conditions [189]. ...
... In the past decade, protein structures have been widely studied [189], with the majority of the research focusing on optimization and aiming to enhance the sampling capabilities of Monte Carlo or molecular dynamics methods. These studies frequently strive to expand the structural space under physiological conditions [189]. Addressing this challenging task, most studies leverage prior knowledge, such as physical models or structural data, to guide search algorithms and sample relevant areas within the vast structure space [190,191,192]. ...
Article
Full-text available
Recent breakthroughs in deep learning have revolutionized protein sequence and structure prediction. These advancements are built on decades of protein design efforts, and are overcoming traditional time and cost limitations. Diffusion models, at the forefront of these innovations, significantly enhance design efficiency by automating knowledge acquisition. In the field of de novo protein design, the goal is to create entirely novel proteins with predetermined structures. Given the arbitrary positions of proteins in 3-D space, graph representations and their properties are widely used in protein generation studies. A critical requirement in protein modelling is maintaining spatial relationships under transformations (rotations, translations, and reflections). This property, known as equivariance, ensures that predicted protein characteristics adapt seamlessly to changes in orientation or position. Equivariant graph neural networks offer a solution to this challenge. By incorporating equivariant graph neural networks to learn the score of the probability density function in diffusion models, one can generate proteins with robust 3-D structural representations. This review examines the latest deep learning advancements, specifically focusing on frameworks that combine diffusion models with equivariant graph neural networks for protein generation.
... More specifically, cyanine dyes were effectively used i) in cell labeling, including life-cell imaging [6,7], labeling neural circuits for the visualization of the structure and function of the brain [8,19], and stem cell tracking in neurodegenerative medicine [10,11]; ii) detection of oxidative stress and reactive oxygen species [12][13][14], iii) for synthesis of the fluorescently labeled antibodies [15]; iv) in cancer research for tumor imaging in the fluorescence-guided surgery [16,17] and in photodynamic therapy [18,19]; v) for pathogen detection [20]; vi) in gene expression studies to measure the levels of specific mRNAs or miRNAs [20,21]; vii) in high-throughput screening assays to evaluate the effectiveness and toxicity of potential drug candidates [21], viii) for the detection of biomolecules and their interactions [22][23][24][25], to name only a few. However, one of the greatest potentials of cyanine dyes lies in their application in DNA research [26][27][28][29][30][31][32]. The advantageous photophysical properties of cyanines, such as long-lasting photostability, high brightness, low cytotoxicity, and the sharp increase in emission upon their association with nucleic acids, gave impetus for their use in DNA bioanalytical assays [27,28], sizing, and purification of DNA fragments [29], DNA damage detection [30]; DNA sequencing [31], etc. ...
... In recent decades computational methods have been recommended as particularly useful for providing insights into interactions between potential ligands and their macromolecular targets, thereby significantly decreasing traditional resource requirements encountered in experimental testing [32]. Molecular docking and molecular dynamics simulation have been successfully applied to investigate the potential mode of binding of fluorescent dyes to DNA [33,34]. ...
Article
Full-text available
Among the various fluorescent probes currently used for biomedical and biochemical studies, significant attention attracts cyanine dyes possessing advantageous properties upon their complexation with biomolecules, particularly nucleic acids. Given the wide range of cyanine applications in DNA studies, a better understanding of their binding mode and intermolecular interactions governing dye-DNA complexation would facilitate the synthesis of new molecular probes of the cyanine family with optimized properties and would be led to the development of new cyanine-based strategies for nucleic acid detection and characterization. In the present study molecular docking techniques have been employed to evaluate the mode of interaction between one representative of monomethines (AK12-17), three trimethines (AK3-1, AK3-3, AK3-5), three pentamethines (AK5-1, AK5-3, AK5-9) and one heptamethine (AK7-6) cyanine dyes and B–DNA dodecamer d(CGCGAATTCGCG)2 (PDB ID: 1BNA). The molecular docking studies indicate that: i) all cyanines under study (excepting AK5-9 and AK7-6) form the most stable dye-DNA complexes with the minor groove of double-stranded DNA; ii) cyanines AK5-9 and AK7-6 interact with the major groove of the DNA on the basis of their more extended structure and higher lipophilicity in comparison with other dyes; iii) cyanine dye binding is governed by the hydrophobic and Van der Waals interactions presumably with the nucleotide residues C9A, G10A (excepts AK3-1, AK3-5), A17B (excepts AK3-5, AK5-3) and A18B in the minor groove and the major groove residues С16B, A17B, A18B, C3A, G4A, A5A, A6A (AK5-9 and AK7-6); iv) all dyes under study (except AK3-1, AK3-5 and AK5-39 possess an affinity to adenine and cytosine residues, whereas AK3-1, AK3-5 and AK5-3 also interact with thymine residues of the double-stranded DNA.
Article
Nuclear magnetic resonance (NMR) relaxation experiments shine light onto the dynamics of molecular systems in the picosecond to millisecond timescales. As these methods cannot provide an atomically resolved view of the motion of atoms, functional groups, or domains giving rise to such signals, relaxation techniques have been combined with molecular dynamics (MD) simulations to obtain mechanistic descriptions and gain insights into the functional role of side chain or domain motion. In this work, we present a comparison of five computational methods that permit the joint analysis of MD simulations and NMR relaxation experiments. We discuss their relative strengths and areas of applicability and demonstrate how they may be utilized to interpret the dynamics in MD simulations with the small protein ubiquitin as a test system. We focus on the aliphatic side chains given the rigidity of the backbone of this protein. We find encouraging agreement between experiment, Markov state models built in the χ1/χ2 rotamer space of isoleucine residues, explicit rotamer jump models, and a decomposition of the motion using ROMANCE. These methods allow us to ascribe the dynamics to specific rotamer jumps. Simulations with eight different combinations of force field and water model highlight how the different metrics may be employed to pinpoint force field deficiencies. Furthermore, the presented comparison offers a perspective on the utility of NMR relaxation to serve as validation data for the prediction of kinetics by state-of-the-art biomolecular force fields.
Article
Full-text available
In recent decade, nanotechnology has got an extensive advancement in terms of production and application of nanomaterials. With the advancement, concern has risen for their biomedical and ecological safety, provoking a detailed analysis of the safety assement. Numerous experimental and computational approach has been developed to accomplish the goal of safety assessment of nanomaterials leading to orgin of interdisciplinary fields like nanoinformatics. Nanoinformatics has accomplished significant strides with the development of several modeling frameworks, data platforms, knowledge infrastructures, and in silico tools for risk assessment forecasts of nanomaterials. This review is an attemption to decipher and establish the bridge between the two emerging scientific arenas that includes computational modeling and nanotoxicity. We have reviewed the recent informations to uncover the link between the computational toxicology and nanotoxicology in terms of biomedical and ecological applications. In addition to the details about nanomaterials interaction with the biological system, this article offers a concise evaluation of recent developments in the various nanoinformatics domains. In detail, the computational tools like molecular docking, QSAR, etc. for the prediction of nanotoxicity here have been described. Moreover, techniques like molecular dynamics simulations used for experimental data collection and their translation to standard computational formats are explored.
Article
The design of protein interaction inhibitors is a promising approach to address aberrant protein interactions that cause disease. One strategy in designing inhibitors is to use peptidomimetic scaffolds that mimic the natural interaction interface. A central challenge in using peptidomimetics as protein interaction inhibitors, however, is determining how best the molecular scaffold aligns to the residues of the interface it is attempting to mimic. Here we present the Scaffold Matcher algorithm that aligns a given molecular scaffold onto hotspot residues from a protein interaction interface. To optimize the degrees of freedom of the molecular scaffold we implement the covariance matrix adaptation evolution strategy (CMA‐ES), a state‐of‐the‐art derivative‐free optimization algorithm in Rosetta. To evaluate the performance of the CMA‐ES, we used 26 peptides from the FlexPepDock Benchmark and compared with three other algorithms in Rosetta, specifically, Rosetta's default minimizer, a Monte Carlo protocol of small backbone perturbations, and a Genetic algorithm. We test the algorithms' performance on their ability to align a molecular scaffold to a series of hotspot residues (i.e., constraints) along native peptides. Of the 4 methods, CMA‐ES was able to find the lowest energy conformation for all 26 benchmark peptides. Additionally, as a proof of concept, we apply the Scaffold Match algorithm with CMA‐ES to align a peptidomimetic oligooxopiperazine scaffold to the hotspot residues of the substrate of the main protease of severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2). Our implementation of CMA‐ES into Rosetta allows for an alternative optimization method to be used on macromolecular modeling problems with rough energy landscapes. Finally, our Scaffold Matcher algorithm allows for the identification of initial conformations of interaction inhibitors that can be further designed and optimized as high‐affinity reagents.
Article
Full-text available
During the last decade, network approaches became a powerful tool to describe protein structure and dynamics. Here we review the links between disordered proteins and the associated networks, and describe the consequences of local, mesoscopic and global network disorder on changes in protein structure and dynamics. We introduce a new classification of protein networks into ‘cumulus-type’, i.e., those similar to puffy (white) clouds, and ‘stratus-type’, i.e., those similar to flat, dense (dark) low-lying clouds, and relate these network types to protein disorder dynamics and to differences in energy transmission processes. In the first class, there is limited overlap between the modules, which implies higher rigidity of the individual units; there the conformational changes can be described by an ‘energy transfer’ mechanism. In the second class, the topology presents a compact structure with significant overlap between the modules; there the conformational changes can be described by ‘multi-trajectories’; that is, multiple highly populated pathways. We further propose that disordered protein regions evolved to help other protein segments reach ‘rarely visited’ but functionally-related states. We also show the role of disorder in ‘spatial games’ of amino acids; highlight the effects of intrinsically disordered proteins (IDPs) on cellular networks and list some possible studies linking protein disorder and protein structure networks.
Article
A method is presented that generates random protein structures that fulfil a set of upper and lower interatomic distance limits. These limits depend on distances measured in experimental structures and the strength of the interatomic interaction. Structural differences between generated structures are similar to those obtained from experiment and from MD simulation. Although detailed aspects of dynamical mechanisms are not covered and the extent of variations are only estimated in a relative sense, applications to an IgG-binding domain, an SH3 binding domain, HPr, calmodulin, and lysozyme are presented which illustrate the use of the method as a fast and simple way to predict structural variability in proteins. The method may be used to support the design of mutants, when structural fluctuations for a large number of mutants are to be screened. The results suggest that motional freedom in proteins is ruled largely by a set of simple geometric constraints. Proteins 29:240–251, 1997. © 1997 Wiley-Liss, Inc.
Article
A novel and robust automated docking method that predicts the bound conformations of flexible ligands to macromolecular targets has been developed and tested, in combination with a new scoring function that estimates the free energy change upon binding. Interestingly, this method applies a Lamarckian model of genetics, in which environmental adaptations of an individual's phenotype are reverse transcribed into its genotype and become heritable traits (sic). We consider three search methods, Monte Carlo simulated annealing, a traditional genetic algorithm, and the Lamarckian genetic algorithm, and compare their performance in dockings of seven protein–ligand test systems having known three-dimensional structure. We show that both the traditional and Lamarckian genetic algorithms can handle ligands with more degrees of freedom than the simulated annealing method used in earlier versions of AUTODOCK, and that the Lamarckian genetic algorithm is the most efficient, reliable, and successful of the three. The empirical free energy function was calibrated using a set of 30 structurally known protein–ligand complexes with experimentally determined binding constants. Linear regression analysis of the observed binding constants in terms of a wide variety of structure-derived molecular properties was performed. The final model had a residual standard error of 9.11 kJ mol⁻¹ (2.177 kcal mol⁻¹) and was chosen as the new energy function. The new search methods and empirical free energy function are available in AUTODOCK, version 3.0. © 1998 John Wiley & Sons, Inc. J Comput Chem 19: 1639–1662, 1998
Article
With the increasing need to integrate different areas of science in the study of intrinsically disordered proteins we arranged a meeting entitled “Intrinsically Disordered Proteins: Connecting Computation, Physics and Biology” in Zürich in September 2013. The aim of the meeting was to bring together scientists from a range of disciplines to provide a snapshot of the field, as well as to promote future interdisciplinary studies that link the fundamental physical and chemical properties of intrinsically disordered proteins with their biological function. A range of important topics were covered at the meeting including studies linking structural studies of intrinsically disordered proteins with their function, the effect of post-translational modifications, studies of folding-upon-binding, as well as presentation of a number of systems in which intrinsically disordered proteins play a central role in important biological processes. A recurring theme was how computation, including various forms of molecular simulations, can be integrated with experimental and theoretical studies to help understand the complex properties of intrinsically disordered proteins. With this Meeting Report we hope to give a brief overview of the inspiration obtained from presentations, discussions and conversations held at the workshop and point out possible future directions within the field of intrinsically disordered proteins.
Article
We present a framework for studying folding problems from a motion planning perspective. Modeling foldable objects as tree-like multi-link objects allows one to apply motion planning techniques to folding problems. An important feature of this approach is that it not only allows one to study foldability questions, such as, can an object be folded (or unfolded) into another object, but also provides one with another tool for investigating the dynamic folding process itself. The framework proposed here has application to traditional motion planning areas such as automation and animation, and presents a novel approach for studying protein folding pathways. Preliminary experimental results with traditional paper crafts (e.g., box folding) and small proteins (approximately 60 residues) are quite encouraging.