ArticlePDF AvailableLiterature Review

Principles and Overview of Sampling Methods for Modeling Macromolecular Structure and Dynamics

PLOS Computational Biology

April 2016
12(4):e1004619

DOI:10.1371/journal.pcbi.1004619

License
CC BY 4.0

Authors:

Tatiana Maximova

Ryan Moffatt

George Mason University

Buyong Ma

Leidos Biomedical Research, Inc. National Cancer Institute, NIH

Ruth Nussinov

Tel Aviv University

Show all 5 authorsHide

Investigation of macromolecular structure and dynamics is fundamental to understanding how macromolecules carry out their functions in the cell. Significant advances have been made toward this end in silico, with a growing number of computational methods proposed yearly to study and simulate various aspects of macromolecular structure and dynamics. This review aims to provide an overview of recent advances, focusing primarily on methods proposed for exploring the structure space of macromolecules in isolation and in assemblies for the purpose of characterizing equilibrium structure and dynamics. In addition to surveying recent applications that showcase current capabilities of computational methods, this review highlights state-of-the-art algorithmic techniques proposed to overcome challenges posed in silico by the disparate spatial and time scales accessed by dynamic macromolecules. This review is not meant to be exhaustive, as such an endeavor is impossible, but rather aims to balance breadth and depth of strategies for modeling macromolecular structure and dynamics for a broad audience of novices and experts.

Free-energy landscape of GB3 obtained with work in [302] using chemical shifts as collective variables. Panel A shows a two-dimensional projection of sampled conformations. The x-axis shows values of the CamShift collective variables for each conformation, which measures the difference between the wet-laboratory and calculated chemical shifts for the backbone. The y-axis shows the backbone RMSD between each conformation and the reference structure (PDB ID 2oed). Some selected conformations, from extended to compact, are highlighted, drawn with the Visual Molecular Dynamics (VMD) software [303]. Panel B shows a conformation with the lowest backbone RMSD (0.5 Å) from the reference structure. Such native-like conformations are visited multiple times by the method. Panel C draws hydrophobic side chains to illustrate that the internal packing of these side chains is practically identical to that observed in the reference structure. This figure is reproduced with permission of the executive editor of PNAS from article Granata et al., 2013 [302].

…

Probing of coupled motions in the Eg5 kinesin motor domains in [91] through accelerated MD simulations. The top panel shows the structure and catalytic cycle of the kinesin motor domain. The ATPase catalytic site sits at the top of the β-sheet, flanked by three highly-conserved loops (P-loop, SI, and SII) connected to helices (also annotated) on either side of the sheet. The secondary structure topology is drawn, with β -strands drawn as triangles and α-helices as circles. The kinesin catalytic cycle is shown: Kinesin (K) has a weak affinity for the microtubule in the ADP-state. ADP release is followed by strong microtubule-binding. ATP binding may occur followed by hydrolysis and product release to regenerate the weakly-bound ADP state. The bottom panel projects conformations sampled by 200 nanosecond-long accelerated MD every 20 picoseconds on the two principal modes of motion. The latter are obtained through principal component analysis of collected X-ray structures for wildtype and variant Eg5. Three simulations are highlighted, the nucleotide-free (APO) one in (A), ADP-bound one in (B), and ATP-bound one in (C). The nucleotide-free simulation covers more of the conformation space, whereas restricted sampling is observed when Eg5 is bound to ATP or ADP. One of the conclusions in [91] is that structural changes from the ADP- to ATP-bound states which are evident in the collection of X-ray structures, are encoded in the intrinsic dynamics of the nucleotide-free motor domain; the nucleotides effectively rigidify the motor domain by narrowing the conformation space accessible by it, as evident in the restricted sampling observed through accelerated MD. This figure is reused from Scarabelli et al., 2013. CC-BY PLOS ONE [91].

…

Sampling of the ensemble of closed-to-open and open-to-closed transition trajectories in AdK through the DIMS method [334]. An ensemble of 330 DIMS trajectories is compared to 45 Escherichia coli AdK X-ray structures. The conformations in each trajectory are projected onto a progress variable δRMSD measured as the RMSD of the conformation from the closed AdK structure (PDB ID 1ake:A) minus the RMSD of the conformation from the open AdK structure (PDB ID 4ake:A). For each of the 45 collected X-ray structures and each trajectory, the conformation in the trajectory closest in backbone RMSD to an X-ray structure is recorded, and the δRMSD value of the conformation along a trajectory is recorded. A probability distribution is then constructed for each X-ray structure over all DIMS trajectories to indicate where an X-ray structure is located along the simulated trajectories. The color bar indicates the probability density. The median of each distribution is marked by a white circle. The X-ray structures whose PDB IDs are listed on the y-axis are rank ordered based on the median. The second white line traces the location of the median when the simulations are repeated to sample open-to-closed transition trajectories. Out of 45 structures sorted by δRMSD, about 24 are closed-state structures, four are open, and 17 are intermediates. This work is an example of the capability of computational methods to elucidate transitions in detail and accurately map the location of experimentally determined structures in the transitions. This figure is adapted from Beckstein et al., 2009 [334]. The image was created by O. Beckstein.

…

Predicting a pathogen’s resistance mutations [452]. (A) Pictured is an illustration of a game between scientists and bacteria. For every drug that scientists develop against bacteria (a “move”), bacteria respond with mutations that confer resistance to the drug. This paper shows that these “moves” by bacteria can be predicted in silico ahead of time by the Osprey protein design algorithm. Donald, Anderson, and coworkers used Osprey to prospectively predict in silico mutations in Staphylococcus aureus against a novel preclinical antibiotic, and validated their predictions in vitro and in resistance selection experiments. Image (A) was created for this paper by Lei Chen and Yan Liang (L2Molecule.com). (B–C) Computationally predicting drug resistance mutations early in the discovery phase would be an important breakthrough in drug development. The most meaningful predictions of target mutations will show reduced affinity for the drug (C) while maintaining viability in the complex context of a cell (B). The protein design algorithm, K* in Osprey, was used to predict a single nucleotide polymorphism in the target DHFR that confers resistance to an experimental antifolate (Compound 1) in the preclinical discovery phase. Excitingly, the mutation was also selected in bacteria under antifolate pressure, confirming the prediction of a viable molecular response to external stress. Images (B–C) were created by Adegoke Ojewole in the Bruce Donald Lab, Duke University.

…

Figures - available via license: Creative Commons Attribution 4.0 International

Content may be subject to copyright.

Content uploaded by Amarda Shehu

Content may be subject to copyright.

Principles and Overview of Sampling Methods for Modeling

Macromolecular Structure and Dynamics

Tatiana Maximova1, Ryan Moffatt1, Buyong Ma2,

Ruth Nussinov2,3,*, and Amarda Shehu1,4,5,*

1 Department of Computer Science, George Mason University, Fairfax, VA,

USA

2 Basic Science Program, Leidos Biomedical Research, Inc. Cancer and

Inflammation Program, National Cancer Institute, Frederick, MD, USA

3 Sackler Institute of Molecular Medicine, Department of Human Genetics

and Molecular Medicine, Sackler School of Medicine, Tel Aviv University,

Tel Aviv, Israel

4 Department of Biongineering, George Mason University, Fairfax, VA,

USA

5 School of Systems Biology, George Mason University, Manassas, VA,

USA

* nussinor@helix.nih.gov, amarda@gmu.edu

Abstract

Investigation of macromolecular structure and dynamics is fundamental to

understanding how macromolecules carry out their functions in the cell. Significant

advances have been made toward this end in silico, with a growing number of

computational methods proposed yearly to study and simulate various aspects of

macromolecular structure and dynamics. This review aims to provide an overview of

recent advances, focusing primarily on methods proposed for exploring the structure

space of macromolecules in isolation and in assemblies for the purpose of characterizing

equilibrium structure and dynamics. In addition to surveying recent applications that

showcase current capabilities of computational methods, this review highlights

state-of-the-art algorithmic techniques proposed to overcome challenges posed in silico

by the disparate spatial and time scales accessed by dynamic macromolecules. This

review is not meant to be exhaustive, as such an endeavor is impossible, but rather aims

to balance breadth and depth of strategies for modeling macromolecular structure and

dynamics for a broad audience of novices and experts.

Author Summary

This paper provides an overview of recent advancements in computational methods for

modeling macromolecular structure and dynamics. The focus is on methods aimed at

providing efficient representations of macromolecular structure spaces for the purpose of

characterizing equilibrium dynamics. The overview is meant to provide a summary of

state-of-the-art capabilities of these methods from an application point of view, as well

as highlight important algorithmic contributions responsible for recent advances in

macromolecular structure and dynamics modeling.

PLOS 1/83

Introduction 1

A detailed understanding of how fundamental biological macromolecules, such as 2

proteins and nucleic acids, carry out their biological functions is central to obtaining a 3

detailed and complete picture of molecular mechanisms in the healthy and diseased cell.

Furthering our understanding of macromolecules is central to understanding our own 5

biology, as proteins and nucleic acids are central components of cellular organization 6

and function. Many abnormalities involve macromolecules incapable of performing their

biological function [1–4], either due to external perturbations, such as environmental 8

changes, or internal perturbations, such as mutations [5–10], affecting their ability to 9

assume specific function-carrying structures. 10

It has long been known that the ability of a macromolecule to carry out its biological

function is dependent on its ability to assume a specific three-dimensional structure (in

other words, structure carries function) [11,12]. However, an increasing number of 13

experimental, theoretical, and computational studies have demonstrated that function is

the result of a complex yet precise relationship between macromolecular structure and 15

dynamics [13–21]. Most notably, in proteins, the ability to access and switch between 16

different structural states is key to biomolecular recognition and function 17

modulation [22, 23]. 18

The intrinsic dynamic personality of macromolecules [18] is not surprising and can 19

indeed be derived from first principles. Feynman highlighted the jiggling and wiggling of

atoms well before wet-laboratory techniques provided evidence of macromolecular 21

dynamics [24]. In the late 1970s and early 1980s, it became clear that treating 22

macromolecules as thermodynamic systems and employing basic principles allowed 23

anticipating and simulating their intrinsic state of perpetual motion [25, 26]. The 24

thermodynamic uncertainty principle was coined by Cooper in [26] to refer to the 25

inherent uncertainty about the particular state a macromolecule is or it will evolve to at

any given time. Cooper was among the first to employ tools from statistical 27

thermodynamics to show that macromolecular fluctuations are a direct result of thermal

interaction with the environment, and that any detailed description of macromolecular

structure and dynamics entailed employing probability distributions. Further work by 30

Wolynes and colleagues continued in this spirit, popularizing a statistical treatment of 31

macromolecules with tools borrowed from statistical mechanics and culminating in the 32

energy landscape view [5,13, 27, 28]. 33

Great advances have been made in the wet laboratory to elucidate macromolecular 34

structure and dynamics. Nowadays, techniques, such as X-ray crystallography, Nuclear

Magnetic Resonance (NMR), and cryo-Electron Microscopy (cryo-EM), can resolve 36

equilibrium structures and quantify equilibrium dynamics. Macroscopic measurements 37

obtained in the wet laboratory are Boltzmann-weighted averages over 38

microstates/structures populated by a macromolecule at equilibrium. Though in 39

principle wet-laboratory techniques are limited in their description of equilibrium 40

structures and dynamics to the time scales probed in the wet laboratory (a problem also

known as ensemble-averaging), much progress has been made [29

–

31]. The ensemble of

structures contributing to macroscopic measurements obtained in the wet laboratory 43

can be unraveled with complementary computational techniques [32–36]. In addition, 44

wet-laboratory techniques, such as NMR spectroscopy, can on their own directly 45

elucidate picosecond-millisecond long relaxation phenomena [37, 38]. Indeed, recent 46

single-molecule techniques have achieved great success at bypassing the ensemble 47

averaging problem and elucidating equilibrium dynamics [31,39–47]. 48

Transitions of a macromolecule between successive structural states can be captured

in the wet laboratory [31, 46, 48–53]. Wet-laboratory techniques can resolve key 50

well-populated intermediate structures along a transition [52,54], but they are generally

unable to span all the time scales involved in a transition and so fully account for a 52

PLOS 2/83

macromolecule’s equilibrium dynamics. A complete characterization of macromolecular

dynamics remains elusive in the wet laboratory due to the disparate time scales that 54

may be involved. Dwell times at successive states along a reaction may be too short to

be detected in the wet laboratory. The actual time a macromolecule spends during a 56

transition event can be short compared to its dwell time in any particular 57

thermodynamically-stable or meta-stable structural state. Indeed, neither wet- nor 58

dry-laboratory techniques can on their own span all spatial and time scales involved in

dynamic macromolecular processes [55]. 60

Macromolecular modeling research in silico is driven by the need to complement 61

wet-laboratory techniques and obtain a comprehensive and detailed characterization of

equilibrium dynamics. Such a characterization poses outstanding challenges in silico. In

principle, a full account of macromolecular dynamics requires a comprehensive 64

characterization of both the structure space available to a macromolecule at equilibrium

as well as the underlying free energy surface that governs accessibility of structures and

transitions between structures. Early work on protein modeling focused on short protein

chains and simplified representations models that laid out amino-acid chains on lattices.

These distinct choices made it possible to perform interesting calculations revealing key

properties of protein folding and unfolding [56], as well as predict quantities of 70

importance in protein stability and function, such as pKas of ionizable groups [57]. 71

On-lattice models incidentally also allowed key theoretical findings on the 72

computational complexity associated with computing lowest free-energy states in the 73

context of ab initio (now also known as de novo) protein structure prediction [58–60]. 74

The computational complexity of finding the global minimum energy conformation was

shown to be NP-hard. These findings made the case that sophisticated algorithms 76

would be needed to complement wet-laboratory characterizations of macromolecular 77

structure and dynamics for the purpose of elucidating biological function. 78

The advent of Molecular Dynamics (MD) simulations and the concept of an energy 79

function promised to revolutionize macromolecular modeling, as in principle the entire 80

equilibrium dynamics could be simulated by simply following the motions of the atoms

constituting a macromolecule down the slope of the energy function. Research in this 82

direction was made possible by a growing set of equilibrium structures resolved in the 83

wet laboratory, from myoglobin [61,62] and lysozyme [63] by 1967 to more than a 84

hundred thousand structures now freely available for anyone in the Protein Data Bank

(PDB) [64]. Seminal work in the Karplus laboratory on the MD method and in the 86

Lifson laboratory on the design of consistent energy functions and simplified molecular

models set the stage for a computational revolution in structural biology. 88

Commercialization of computers was critical to this revolution. 89

MD simulations had been shown successful in reproducing equilibrium properties of

argon [65], but it was McCammon and Karplus who provided the earliest demonstration

in 1977 of the power of MD-based modeling to simulate protein dynamics [25]: a short 92

9.2 picosecond-long trajectory was obtained showing in-vacuum, atomistic fluctuations 93

of the bovine pancreatic trypsin inhibitor around its native, folded structure. Realizing

the power of MD simulations in extracting precious information on macromolecular 95

structure and dynamics, the Karplus laboratory democratized modeling by offering the

CHARMM program to the computational community [66]. Further work by Karplus 97

and McCammon showed that significant features of protein dynamics would only 98

emerge over longer time scales. The simulation in [67] reached 100 picoseconds, but it 99

would soon become clear that MD-based probings of macromolecular structure and 100

dynamics were in practice limited by both macromolecular size (spatial scale) and time

101

of a phenomenon under investigation (time scale). A significant body of complementary

102

work in macromolecular structure and modeling investigated non-MD based methods. 103

In fact, two years earlier to the 1977 MD simulation by Karplus of equilibrium 104

PLOS 3/83

fluctuations of the bovine pancreatic trypsin inhibitor, Levitt and Warshel had 105

presented a computer simulation of the folding of the same inhibitor through a 106

simplified (now known as coarse-grained) model, where each residue was reduced to one

107

pseudo-atom, and an algorithm based on steepest descent [68]. Reproducibility of this 108

work has so far remained elusive. 109

Further work by Levitt and Warshel, prompted by the visionary Lifson at the 110

Weizmann Institute of Science, focused on the design of a consistent energy function for

111

proteins [69]. The idea was to come up with a small number of consistent parameters 112

that could be transferable from molecule to molecule and not depend on the local 113

environment of an atom. Once such an energy function was implemented, simple 114

algorithms could then be put together by making use of the function, its first derivative

115

(the force vector), and the second derivative (the curvature of the energy surface). It is

116

interesting to note that, though Lifson and Warshel were the first to introduce a 117

consistent energy function, they did so for small organic hydrocarbon molecules. It was

118

Levitt who realized that their parameters could be used to carry out calculations on 119

proteins. In 1969, Levitt published the first non-MD, steepest descent algorithm on a 120

simplified model encoding only heavy atoms of the X-ray structures of hemoglobin and

121

lysozyme [70]. This work was seminal for Levitt and Warshel to claim the first 122

simulation of protein folding [68]. The algorithm used in these simulations was quite 123

sophisticated, changing torsion angles, as proposed by Scheraga [71], and using normal

124

modes to rapidly compute low-energy paths out of local minima [72]. 125

Further work on coarse-grained and multiscale models built with the quantum 126

mechanics (QM)/molecular mechanics (MM) method proposed by Warshel [73] was 127

seminal in allowing simulation to reach longer spatial and time scales. Warshel, who 128

had a background in quantum mechanics, realized that large molecular systems could be

129

spatially divided into a region demanding quantum mechanical calculations (e.g., due to

130

bonds being broken) with the rest sufficiently represented by empirical force fields. This

131

method remains the cornerstone of modern multiscale modeling [74–80] and, together 132

with the idea of representing complex systems in different resolutions at different time 133

and length scales [76], has allowed simulations to elucidate structures, dynamics, and 134

the biological activity of systems of increasing complexity, from enzymes [74, 77, 81] to 135

complex molecular machines [82–91]. 136

In tandem with these developments, a new method, Metropolis Monte Carlo 137

(MC) [92, 93], made its debut in computational structural biology. In 1987, important 138

work in the Scheraga laboratory introduced an MC-based minimization method to 139

simulate protein folding [94]. In 1996, the Karplus laboratory demonstrated the ability

140

of MC simulations on a cubic lattice to simulate the folding mechanism of a protein-like

141

heteropolymer of 125 beads [95]. Following work in the Scheraga laboratory further 142

made the case for the utility of MC-based methods in studies of macromolecular 143

structure and dynamics [96

–

98]. Kinetic MC methods were designed to address the lack

144

of kinetics in the classic MC framework [99]. In light of contributions that gave birth to

145

computational structural biology [100], it is no surprise that the Nobel 2013 prize in 146

chemistry recognized computational scientists, namely, Karplus, Warshel, and Levitt for

147

their seminal work in the development of multiscale models for complex chemical 148

systems [101–103]. 149

Improvements in hardware over the last forty years have been critical to extending 150

the reach of MD- and MC-based modeling. For example, MD-based studies have 151

expanded their scope, scale, and thus applicability due to specialized architectures, such

152

as Anton [104,105], GPUs [106–109], and petascale national supercomputers, such as 153

BlueWaters, Titan, Mira, Stampede [110, 111]. The pervasiveness of supercomputing 154

has spurred great advances in algorithmic techniques to effectively parallelize MD. 155

Typically, in parallel MD, the interacting particles are spatially divided into subdomains

156

PLOS 4/83

that are assigned to different processors. In this framework, load balancing becomes an

157

issue for large-scale MD simulations now performed on thousands of processors and 158

involving billions of particles [112]. Many techniques now exist for dynamic load 159

balancing [113]. In addition, while each processor is responsible for advancing its own 160

particles in time, processors need to exchange information; accurate force calculations 161

require knowledge of neighbor particle positions. Work in [114] describes recent 162

strategies for efficient neighbor searches in parallel MD. Other techniques that permit 163

parallelization of MD address and optimize force splitting in the context of the 164

particle-mesh Ewald algorithm [115]. It is worth noting that many of these techniques 165

are now integrated in publicly-available parallel MD code, such as NAMD [116]. 166

Important contributions in enhancing exploration capability have also been made 167

from non-MD or non-MC frameworks but rather adaptations of stochastic optimization

168

frameworks often designed for modeling other complex, non-biological systems. These 169

frameworks, though less mature than MD and MC, are summarized here in the interest

170

of introducing readers to interesting complementary ideas. Algorithmic advances, 171

whether to extend the applicability of MD- and MC-based frameworks or adapt other 172

frameworks for macromolecular modeling, now allow predicting native structures of 173

given protein amino-acid sequences [117–120], mapping equilibrium ensembles, 174

structures spaces and underlying energy landscapes of macromolecules [6, 8,121–126], 175

revealing detailed transitions between stable and meta-stable structures [127–134], 176

modeling binding and docking reactions [135–137], revealing not only equilibrium 177

structures of bound protein-ligand or protein-protein assemblies but also calculating 178

association and disassociation rates [138, 139], and more. 179

This review aims to provide an overview of such advances. Given the rapidly 180

growing body of research in macromolecular modeling, aiming to provide an exhaustive

181

review would be a task in futility. For instance, while the development of molecular 182

force fields is recognized as crucial to accurate modeling [140, 141], this review does not

183

focus on force field development. Other important contributions due to the development

184

of ever-accurate coarse-grained representations of macromolecules, solvent models, and

185

multiscaling techniques are acknowledged, but the reader is referred to existing 186

comprehensive reviews on these topics [76,142–144]. Instead, this review focuses on 187

sampling methods for the exploration of macromolecular structure spaces and 188

underlying energy surfaces for the purpose of characterizing equilibrium structure and 189

dynamics. This focus is warranted due to the recognition that sampling remains a 190

problem [102, 128, 145]. The goal is to introduce a broad audience of researchers both to

191

most recent and exciting research from an application point of view, as well as highlight

192

important algorithmic contributions responsible for recent advancements in modeling 193

macromolecular structure and dynamics. 194

Recent Applications Made Possible by Hardware and 195

Algorithmic Advancements 196

There is by now a wealth of computational studies aimed at extracting information on 197

equilibrium structures and dynamics of macromolecules in molecular assemblies or 198

isolation. Non-MD based studies can extract information about 199

thermodynamically-stable or meta-stable structures while foregoing simulations of a 200

system’s dynamics. On the other hand, MD-based studies readily provide information 201

on the dynamics but can only elucidate structures accessible within the time of the 202

simulation. While non-MD based methods have made it possible to predict, for instance,

203

biologically-active structures of proteins given their amino-acid sequences, a problem 204

known as de novo structure prediction, only MD-based methods can provide detailed 205

PLOS 5/83

information on protein folding and unfolding. Different aspects of protein-ligand 206

binding, protein-DNA, protein-protein docking, equilibrium fluctuations, structure 207

prediction, folding, and unfolding can be modeled with MD and non-MD methods. 208

Disparate time scales are involved in macromolecular dynamics, and they constitute

209

the main challenge in describing macromolecular dynamics in fullness and detail via 210

MD-based simulations. For instance, bond vibrations occur on the femtosecond time 211

scale, solvent effects take anywhere from a few picoseconds up to a few nanoseconds, 212

transitions in side-chain rotation and secondary structure occur on the 10−100 213

nanosecond time scale, large global structural transitions can occur on the microsecond

214

time scale, ligand binding and allosteric regulation are usually on the millisecond time 215

scale, and protein folding takes anywhere from a few microseconds to a few seconds, 216

depending on protein size. In extreme cases, natural ligand and drug binding is a much

217

longer event that can occur on the hours scale [146]. 218

Despite such challenges, much progress has been made. Equilibrium, atomistic, MD

219

simulations can reproduce in detail microsecond-long folding events for small proteins 220

on specially-designed supercomputers [104, 105, 147, 148]. Protein-ligand binding with 221

full ligand flexibility and protein flexibility limited to the binding site can be simulated

222

up to 100 microseconds [146, 149]. Brownian dynamics simulations can capture events 223

that occur in the microsecond time scale; when coupled with enhanced sampling 224

techniques, these simulations have been reported to capture slow events of large proteins

225

binding and sliding on DNA at 25 microseconds at a coarse resolution [150]. Longer 226

simulations of an estimated time scale of more than 48 milliseconds of the lac repressor

227

sliding on DNA have been reported via atomistic MD in explicit solvent [151]. 228

Coarse-grained modeling and longer time steps can can further increase time scales 229

but often at the cost of essential details [152]. However, multiscale MC simulations have

230

been reported to allow studying in detail processes that occur in the range of 231

milliseconds [76, 78]. Organizations of short MD or MC trajectories in Markov state 232

models (MSMs) can extract precious information on structure and dynamics for events

233

that occur on longer time scales, from a few milliseconds to a few seconds [146, 153]. 234

In the following we provide a short overview of the current applications pursued by 235

MD and non-MD methods without describing in detail the algorithmic ingredients of 236

such methods. We highlight key examples where recent advances in MD and non-MD 237

methods have made it possible to address problems and systems not possible before due

238

to the large spatial and time scales involved. Descriptions of the algorithmic ingredients

239

responsible for such computational advancements follow. 240

Simulation and Modeling of Macromolecular Interactions 241

Simulating interactions of macromolecules with other macromolecules or small 242

molecules is important to understand the molecular basis of mechanisms in the healthy

243

and diseased cell. Typically, three categories of interactions are of interest to 244

researchers, those of a protein with a small ligand, those of a protein with another 245

protein, and those of a protein with other molecular systems that include DNA, RNA, 246

and membranes. These specific applications can be approached in two different ways. 247

One considers simply the problem of predicting the three-dimensional native structure 248

of the complexed system from knowledge of the structures of the unbound units, 249

whereas the other additionally simulates the process of the units diffusing towards and

250

then binding with one another. For the problem of structure prediction, non-MD based

251

methods are currently the norm. They include algorithms enhancing MC or adapting 252

other stochastic optimization frameworks under the umbrella of evolutionary 253

computation. For the problem of actually simulating the dynamics of interacting units,

254

MD-based studies provide more detail but typically require more computational 255

PLOS 6/83

resources or algorithmic enhancements in order to surpass the long time scale often 256

needed for a complexation (binding) event to occur. 257

One of the challenges with modeling and simulating macromolecular interactions 258

with other small molecules or macromolecules is the possibility of induced fit. Induced 259

fit, introduced by Koshland in [154], refers to the mechanism of an initially loose 260

complex that induces a conformational change in either one or all loosely-bound units, 261

which then triggers a cascade of rearrangements ultimately resulting in a tighter-bound

262

complex. The induced fit mechanism seems to question the idea that structure-guided 263

studies can focus on shape complementarity first, but many wet-laboratory studies, as 264

well as the success of complementarity-driven methods, have demonstrated that induced

265

fit cannot describe all binding events [155]. 266

In response, inspired by the free energy landscape view presented by Frauenfelder 267

and Wolynes [13, 27], Nussinov and colleagues proposed a new concept to explain 268

binding events, that of conformational selection, also known as population 269

shift [156–158]. Conformational selection refers to the idea that all conformational 270

states of an unbound unit are present and accessible by the bound unit. The binding or

271

docking event causes a shift in the populations observed in the unbound ensembles 272

towards the specific bound conformational state. Though Nussinov and colleagues were

273

inspired by the free energy landscape view of Frauenfelder and Wolynes, it is worth 274

noting that the conformational selection model is a generalization of a much earlier 275

model, the Monod-Wyman-Changeaux (MWC) model [159]. The MWC model, also 276

known as the concerted or symmetry model, proposed the idea that regulated proteins 277

exist in different interconvertible states in the absence of any regulator, and that the 278

ratio of the different states is determined by the thermal equilibrium. The MWC model

279

has been credited with introducing the concept of conformational equilibrium and 280

selection by ligand binding, though in its original formulation the model was restricted

281

to two distinct symmetric states and to proteins made up of identical subunits. 282

The review in [23] summarizes many studies that observe conformational selection 283

for protein-ligand, protein-protein, protein-DNA, protein-RNA and RNA-ligand 284

interactions. We highlight work in [160], where unfolded structures of uncomplexed 285

ubiquitin in explicit solvent were subjected simultaneously to restraints from NMR 286

Nuclear Overhauser Effect (NOE) and Residual Dipolar Coupling (RDC) data 287

comprising solution dynamics up to microseconds. The obtained ensemble of structures

288

covered the structural homogeneity observed in 46 crystal structures of ubiquitin at the

289

time; the majority of the crystal structures were in complex with other proteins. These

290

results suggest that conformational selection rather than induced fit suffices to explain

291

the molecular recognition dynamics of ubiquitin. 292

While at face value the concepts of induced fit and conformational selection appear 293

mutually exclusive, studies have shown that versions of each are indeed observed; for 294

instance, conformational selection is usually followed by slight conformational 295

adjustments. In 2010, Nussinov and colleagues presented an extended view of binding 296

events where conformational selection and induced fit were seen as complementary to 297

each-other [161]. In many cases, following conformational selection, minor adjustments

298

of side chains and backbone are observed to take place to optimize interactions [161]. 299

Based on such observations, extended models have been proposed that combine 300

conformational selection, induced fit, and the classical lock-and-key mechanisms [162]. 301

A better understanding of contributions of each of these three mechanisms has 302

contributed over the years to several, effective methods for modeling and simulating 303

binding and docking events. A detailed review in the context of protein-ligand binding

304

for structure-based drug discovery is presented in [163]. 305

The overview below summarizes methods based on the lock-and-key mechanism, as 306

well as methods based on the induced-fit and conformational selection mechanisms. 307

PLOS 7/83

While the lock-and-key mechanism allows disregarding flexibility, the other mechanisms

308

clearly make the case for modeling the flexibility of the units participating in the 309

complexation event. While the induced-fit mechanism seems to suggest that only 310

MD-based methods can describe a complexation event, the conformational selection 311

mechanism has inspired many non-MD methods to integrate flexibility during or prior 312

to complexation, thus contributing to a rich and still growing literature. In the following

313

we provide an overview of this work, guided by applications on protein-ligand binding, 314

protein-protein docking, and protein-DNA docking. 315

Protein-Ligand Binding 316

In protein-ligand binding, the structure prediction problem involves predicting both the

317

binding site, unless this is known, the pose of the ligand, and its configuration. 318

Established and widely-adopted software now exist and include DOCK [164], 319

FlexX [165, 166], GOLD [167, 168], Autodock [169–171], Glide [172], 320

RosettaLigand [173, 174], SwissDock [175], Surflex-Dock [176], DOCKLASP [177], 321

rDock [178], istar [179] and more. The majority of existing software employ 322

evolutionary algorithms that approach the problem of protein-ligand binding under 323

stochastic optimization, where the goal is to find the lowest-energy structure of the 324

complex of bound units. Evolutionary algorithms have been demonstrated more 325

effective than other MD- or MC-based algorithms at finding the lowest-energy binding 326

pose (position and orientation) and configuration of a ligand on a macromolecule. For 327

instance, while earlier versions of the well-known Autodock software employed MC 328

simulated annealing (MC-SA), Autodock 3.0.5 and onwards switched to the Lamarckian

329

Genetic Algorithm (GA) due its higher efficiency and robustness over the MC-SA of 330

earlier versions for binding flexible ligands onto rigid receptors [180]. 331

The superiority of evolutionary algorithms for binding flexible ligands onto rigid 332

receptors is additionally demonstrated in a high-throughput screening setting. In this 333

context, we note representative work in the Caflisch laboratory [181], where a set of 334

publicly-available tools have been developed for high-throughput screening of large sets

335

of small ligand molecules by fragment-based docking for the purpose of 336

computer-assisted drug discovery (CADD). The high-throughput setting is made 337

possible due to a fast decomposition of a flexible ligand into rigid fragments, fast 338

docking and evaluation of binding free energy of docked fragments, and efficient docking

339

of a full flexible ligand through a GA rapidly searching over poses of fragment triplets 340

and evaluating poses with an efficient scoring function. Fragment-based docking can be

341

traced back to Karplus, whose work with Miranker on the minimization of multiple 342

copies of functional groups in the MCSS force field is considered the first 343

fragment-based procedure for drug discovery [182]. 344

Fragment-based high-throughput binding is leading to significant advances in CADD.

345

For instance, recent work in [183] identifies inhibitor chemotypes for the EphA3 tyrosine

346

kinase, a transmembrane protein belonging to the class of erythropoietin-producing 347

hepatocellular receptors with deregulations implicated in severe human pathologies such

348

as atherosclerosis, diabetes, and Alzheimer’s disease. 349

While the majority of protein-ligand binding software can handle flexible ligands, 350

the computational costs that would be incurred by fully flexible receptors remain 351

impractical in most settings. Fortunately, a significant number of binding modes fall 352

under the lock-and-key mechanism, which has been demonstrated effective in cases of 353

predicting structures of enzyme-inhibitor complexes with largely static binding 354

interfaces [184–188]. As expected, however, rigid receptor docking algorithms are 355

ineffective in cases of induced fit, where structural flexibility during binding is not 356

limited to the ligand. 357

To take into account ligand and receptor flexibility without incurring impractical 358

PLOS 8/83

computational costs, many protein-ligand binding algorithms implement soft docking, 359

where some overlap between the flexible, bound ligand and the rigid receptor is allowed

360

during docking. Unfavorable interactions due to the overlap are resolved in a 361

post-processing stage on selected bound complexes, effectively providing some localized

362

flexibility to the bound receptor. This approach is practical and warranted in settings 363

where the goal is to screen large libraries of potential drug compounds [189–191]. An 364

extensive review of the unique challenges in these settings can be found in [163,192]. 365

One way to control computational cost while taking into account both ligand and 366

receptor flexibility is by limiting flexibility to specific dihedral angles [193–197]. 367

Typically, existing approaches limit receptor flexibility to side-chain and/or backbone 368

bonds of receptor amino acids on or near the binding site. 369

Other methods attempt to take into account full receptor flexibility without 370

explicitly modeling it during binding. These methods, known as ensemble or conformer

371

docking, obtain an ensemble of low-energy conformations/conformers of the receptor 372

prior to the binding simulation [198]. The ensemble is obtained via any conformational

373

sampling methods, whether MD- or non-MD based (reviewed below). The ligand or a 374

library of ligands are then bound to each of the receptor conformers [199]. While 375

effective at controlling computational cost, these methods are limited in what aspects of

376

flexibility they model [200]. It is worth noting that they make use of the conformational

377

selection principle of which there is now increasing evidence [201]. 378

Methods that consider full receptor flexibility and go beyond ensemble docking exist,

379

and are based on MC or MD. MC-based methods are represented by the RosettaLigand

380

software [173,174]. Work in [202] employs long, unbiased MD simulations to simulate 381

the physical process by which a ligand diffuses and then binds a protein target. Studies

382

on specific protein-ligand complexes provide an opportunity for MD-based methods to 383

reveal the kinetics of ligand-receptor interactions and estimate binding affinities from a

384

large number of MD simulations of the binding process. Yet, even in such studies 385

computational cost needs to be controlled, as binding can be too slow to observe on the

386

time scales routinely accessible via MD [203]. 387

Given the time scale challenge, many enhanced sampling strategies have been 388

proposed for MD simulations. These include accelerated MD, replica-exchange MD, 389

umbrella sampling MD, and metadynamics methods [8, 149, 203

–

206]. Replica exchange

390

MD and metadynamics methods are among the most popular to simulate binding. To 391

control computational cost, the simulation is limited to the immediate binding and 392

unbinding events. To discourage spending computational resources on the diffusion 393

process, the ligand is either tethered (through distance restraints) to the receptor, or 394

many short MD simulations are conducted at various placements of the ligand relative 395

to the receptor. In the former, explicit geometric restraints are enforced on the ligand to

396

keep it within the binding volume and save the MD simulation from wasting precious 397

computational time on simulating the diffusion process [149]. In the latter, the sampled

398

receptor and ligand configurations are organized in an MSM, which allows obtaining 399

estimates of association and disassociation rates [139]. Other approaches include the 400

powerful self-guided Langevin dynamics method and the Accelerated adaptive 401

integration method, among others. A description of these methods and others is 402

provided later in this review. In summary, the goal of all these methods is to enhance 403

sampling of the receptor and ligand poses so that the binding event can be observed 404

within a reasonable computational budget. 405

Here we highlight some successful protein-ligand binding simulations. One concerns

406

the GTP and GDP nucleotide binding that is accompanied with a conformational 407

switch in the Ras and Rho proteins, which was studied in [207] due to the central role of

408

these proteins in cell growth regulation and a variety of human cancers [122]. In [207], 409

MD is used to simulate the ligand-free Ras and Rho proteins. In the absence of the 410

PLOS 9/83

ligand, these proteins show intrinsic flexibility and are able to convert between different

411

conformations. The presence of the nucleotide restricts the conformation space 412

accessible by the GTP-bound structure. Significant coupling is observed in the bound 413

state between motions on the nucleotide-binding site and motions of the 414

membrane-interacting C-terminus via the highly flexible loop 3. The importance of this

415

loop was originally suggested in [208]. Classic MD simulations with a double loop 3 416

mutant of Ras confer greater flexibility during conformational switching. This provides

417

evidence that loop 3 may represent a potential allosteric site in Ras and other 418

monomeric G-protein coupled receptors. This information, pieced together from various

419

studies, is valuable for structure-based drug design, because it highlights relevant 420

receptor structures for CADD [163]. 421

Another successful example of the utility of computational methods for 422

protein-ligand binding concerns drug prediction for the influenza virus. Several 423

inhibitors have been widely used as anti-influenza drugs. However, due to 424

naturally-occurring drug-resistant mutations [209], their inhibition ability has gradually

425

decreased. The family of influenza virus proteins, like M2, H1-H9, attaches itself to 426

sialic acids on the surface of epithelial cells of the upper respiratory tract of the host 427

using its own proteins that cover the surface of the virus, hemagglutinin and 428

neuraminidase [210, 211]. Inhibitors bind to the active sites of hemagglutinin and 429

neuraminidase, preventing linkage of the virus to epithelial cells. 430

Protein-ligand docking via MD simulations is being used to model inhibitor binding

431

to the influenza virus (or only the surface proteins hemagglutinin and neuraminidase). 432

One group of methods focuses on finding new inhibitors (ligands) that can bind to the 433

continuously mutating hemagglutinin and neuraminidase active sites [210,211]. 434

Representative findings are illustrated in Fig. 1. 435

In particular, work in [211] focuses on finding new inhibitors for hemagglutinin. 436

Several ligands are considered to bind to the hemagglutinin H5 and H7 trimers. The 437

exposed position of the binding site is used to guide the development of a trimeric 438

ligand with a centrally positioned core structure with radial topology. The core 439

structure of the ligands mimicks the C3 symmetry of the trimers. A specific ligand, 440

referred to as ligand 1, is found to bind to all three binding sites on H5 (deposited in 441

the PDB under PDB id 3M5G) at two different times of an MD simulation. Motion is 442

predominantly found at the core structure, while all three sialic acid residues remain in

443

their binding site during the simulation, indicating that 1 is also a good ligand for H7. 444

Ligand 1 also has a

in the high nanomolar range and is therefore a compound with

445

one of the best reported affinities. 446

Another group of methods aims to modify (add new residues or suggest mutations) 447

to already known inhibitors in order to increase their binding ability [212,213]. Finally,

448

some methods focus on calculating binding free energies by quantum 449

mechanics/molecular mechanics simulations to predict binding abilities of possible 450

inhibitors [214]. The combined result of all these methods has been to suggest a 451

mechanism through which the inhibitor-virus binding can significantly influence viral 452

neutralization. 453

In addition to MD simulation methods, we draw attention to Brownian Dynamics 454

methods [215], which have been employed to simulate protein-ligand [216] and 455

protein-protein [217, 218] binding. In these methods, the net force experienced by a 456

modeled particle contains a random element, which models the implicit interactions 457

with solvent molecules. The norm of the random element is chosen from a probability 458

distribution function that is a solution to the Einstein diffusion equation (a list of 459

already built probability distribution functions can be found in [219]). By 460

coarse-graining out the fast motions, Brownian dynamics methods can simulate longer 461

time scales than can be typically approached in a classic MD simulation [220]. However,

462

PLOS 10/83

the particle-based part still necessitates using relatively small time steps for an accurate

463

description of the particle interactions. The Reaction Before Move method determines 464

reaction probability functions that extend time steps and further speed up such 465

simulations [219]. 466

The importance of accounting for receptor flexibility in protein-ligand binding is 467

further appreciated in light of allosteric effects. Allostery refers to couplings between 468

the active site and a regulatory, allosteric site, which is typically far away from the 469

active site, but causes chemical and/or physical changes in the active site that affect 470

binding. A detailed review of all observed interactions between allosteric and binding 471

sites is presented in [221]. The structural view of allostery considers interactions among

472

residues responsible for the allosteric coupling between allosteric and binding sites. 473

Uncovering allosteric communication among residues is becoming increasingly important

474

in CADD, as residues that mediate the allosteric communication may make for 475

druggable binding sites. Many methods are devoted to uncovering allosteric 476

communication, and a review of such methods is presented in [137]. Successful methods

477

include early ones based mainly on topological analyses of structures resolved in the wet

478

laboratory, such as graph theory, statistical coupling analysis, and perturbation 479

algorithms [222–227], and methods based on analyses of simulation trajectories. While 480

MD and enhanced versions of MD-based methods are used for the simulations, the 481

analysis is conducted with normal mode analysis (NMA) [228–230], correlation 482

matrices [231–233], community-network analysis [234], mutual information [235], and 483

dynamical network analysis [236

–

238]. MC-based methods have also been applied. The

484

MCPath method introduced in [239] models a receptor as a weighted network of 485

interacting residues and builds an MC trajectory by repeatedly applying MC moves 486

that directly propagate a signal between two interacting residues. MCPath is able to 487

uncover allostery pathways as well as allostery sites. 488

Protein-Nucleic Acid and Protein-Protein Docking 489

The computational challenges incurred when modeling protein-ligand binding grow 490

more severe when modeling interactions between macromolecules due to the much larger

491

spatial scales involved. Most current research addresses only the dimeric setting, where

492

the number of bound units is limited to two. In addition, the majority of methods 493

applied to the pairwise docking setting are non-MD based methods focused on obtaining

494

the native structure of the complex without information on the kinetics of the docking 495

process. Methods implementing MC or evolutionary algorithms are by now the most 496

popular. This is not surprising, given the overwhelming number of atoms whose motions

497

would have to be followed in an MD simulation. Specific MD-based studies on dimeric 498

systems of known proteins exist, and typically some information is employed from 499

wet-laboratory studies on the docking site to orient the units favorably and additionally

500

tether them to each-other so as to steer the simulation towards the docking 501

event [240,241]. In general, however, even when foregoing kinetics, predicting the 502

correct native structure of the bound units remains challenging. 503

Computational research in structure prediction for macromolecular pairwise docking

504

is active, and there are now many methods [242–255] driven by the community-wide 505

CAPRI experiment [256, 257]. The focused computational setting of a protein dimer has

506

allowed the application of demanding energy-driven optimization methods and even 507

modeling of structural flexibility for high-accuracy docking [243,251, 258]. In the light of

508

variable interfaces, such as antibody-antigen interfaces [259], accounting for flexibility is

509

key but exceptionally expensive. Methods, such as RosettaDock [260], allow full 510

flexibility and employ various models of increasing detail (from low-resolution, to 511

centroid-mode, coarse-grained, and then all-atom). RosettaDock has been reported to 512

achieve docking funnels for 63% of antibody-antigen targets, 62% of enzyme-inhibitor 513

PLOS 11/83

targets, and 35% of other targets; funnels are achieved on only 14% of targets deemed 514

difficult, where substantial conformational changes are expected to accompany 515

docking [261]. Other methods that consider ensemble docking have also been applied, 516

though with limited success due to the difficulty of obtaining a conformational ensemble

517

representative of the intrinsic structural flexibility of a macromolecule [262]. 518

Several CAPRI summaries make the case that high-accuracy pairwise docking is to 519

remain challenging for the near future [257,263, 264]. There is great difficulty, for 520

instance, in locating the native interaction interface or even part of it, with top methods

521

shown to predict only 30-58% of the correct interface in any given target [257]. An 522

energy-based treatment is not guaranteed to drive the optimization process towards the

523

right interface. Much research is invested in this direction. Machine learning methods, 524

though not the focus of this review, are showing promise in elucidating features of native

525

interaction interfaces so as to bypass the employment of interaction energy functions at

526

a global layer [265–268]. For instance, work in [269] proposes a learned model to be 527

used as a top filter to label sampled protein-protein dimers before attempting to refine

528

them with more accurate and computationally costly interaction energy functions. 529

Rather than employing information from machine learning models, methods such as 530

HADDOCK [243], the Integrative Modeling Platform (IMP) [270] and others [271,272],

531

employ wet-laboratory data to restrict sampling of bound conformations to those that 532

reproduce the wet-laboratory data. Work in [273] uses chemical shifts from NMR to 533

predict conformational changes upon complex formation in a class of engineered binding

534

proteins known as affibodies. Similarly, Haddock also restricts sampling through NMR

535

chemical shifts [243], whereas the IMP software provides more versatility by allowing the

536

integration of different types of wet-laboratory, biochemical and biophysical data and 537

the employment of models of various resolutions [270]. It is worth noting that, while the

538

majority of protein-protein docking algorithms are restricted to the dimeric setting, the

539

IMP software allows modeling multimeric assemblies of an arbitrary number of units. 540

Work in [274], for instance, reveals the native structure of the nuclear pore complex, a 541

50 MDA complex comprised of 456 proteins. Work in [275] reveals a higher-resolution 542

structure of a heptameric module in the yeast NPC by satisfying spatial restraints 543

derived from negative-stain electron microscopy and protein domain-mapping data. 544

While wet-laboratory techniques such as X-ray crystallography can provide 545

high-resolution structures for protein-protein dimers and even multimers, protein-DNA

546

dimers are typically difficult to crystallize. There is great need for docking methods to 547

reveal both binding mechanisms and final bound structures of protein-DNA complexes.

548

In contrast to the diversity of protein-protein interaction interfaces, protein-DNA 549

interaction interfaces often exhibit conserved sequence motifs and are thus accurately 550

detected with machine learning techniques [276,277]. Knowledge, even if partial, of the

551

interaction interface has greatly helped the applicability of docking methods for 552

protein-DNA binding [278, 279]. Haddock, for instance, already a top protein-protein 553

docking method, has been demonstrated effective for protein-DNA docking [280]. By 554

now, comprehensive maps of protein-DNA binding landscapes have been put together 555

for the largest class of metazoan DNA-binding domains, known as zinc fingers [281]. 556

These landscapes are essential to support efforts to determine, predict, and engineer 557

DNA-binding specificities. For instance, work in [282] studying interactions that 558

proteins make with nucleic acids, small molecules, ions, and peptides reveals genes that

559

are rich in mutations in the binding sites of proteins for which they encode and are thus

560

functionally-important in cancer. 561

The setting of modeling macromolecular interactions naturally suggests expanding 562

the focus beyond dimeric docking to multimeric docking. Elucidating structural details

563

of oligomers suggested by wet-laboratory studies is indeed key to advancing further 564

research on the role of oligomerization in the healthy and diseased cell [283,284] and is

565

PLOS 12/83

expected to keep motivating the design of algorithms for multimeric docking. 566

Computationally-demanding optimization and willingness to spend significant 567

computational resources on a dimeric assembly make application of current pairwise 568

docking methods to protein assemblies of an arbitrary number of units impractical. 569

Adaptations of these methods to extend their applicability to the multimeric setting are

570

neither trivial nor obvious. 571

Early work by Nussinov and colleagues introduced a greedy, systematic algorithm, 572

CombDock, for the problem of multimeric docking [285,286]. The algorithm is general 573

and can handle heteromeric and asymmetric complexes but is challenged by the 574

combinatorial explosion in the number of dimensions of the space of configurations with

575

increasing number of units. Other following work narrows the focus to symmetric 576

complexes and applies search and bound techniques from AI with additional 577

information of distance-based constraints from NMR to control the size of the search 578

space [287–291]. Work in the Sali lab, culminating in the IMP software [270], focuses 579

exclusively on the setting where integration of wet-laboratory data is key to narrow the

580

search space and model assemblies of hundreds of units at a low resolution. Research on

581

multimeric docking in the absence of wet-laboratory data is sparse. 582

In [292], an evolutionary algorithm, Multi-LZerD, is proposed that operates in the 583

absence of wet-laboratory data but is guided by interaction energy. Its success varies 584

with complex size. The mixed results obtained by Multi-LZerD reflect the mixed state 585

of the art in multimeric docking. In addition to successful cases, where the native 586

multimeric structure is reproduced, Multi-LZerD reports in various cases decoys that do

587

not reproduce the known native structures. While the decoys can be as far as 588

23.59˚

A away from a particular native structure, typically, the decoys contain correct 589

subcomplexes within 4.0˚

A. It is worth noting that the evolutionary algorithm is also 590

computationally demanding. Time concerns as well as the quality of current predictions

591

suggest that there is much room for improvement in multimeric docking. 592

Modeling of Macromolecular Structural Flexibility 593

Modeling the structural flexibility of uncomplexed proteins is key not only to allow 594

application of methods such as ensemble docking to the protein-ligand and 595

protein-protein docking problems, but also to obtain detailed information on the role of

596

protein sequence on structure, dynamics, and function. While it is in principle very 597

difficult to map the entire conformation space and underlying energy landscape of a 598

protein sequence, many methods are dedicated to specialized sub-problems. For 599

instance, literature is rich in methods that obtain a sample-based representation of the

600

equilibrium conformation ensemble of a protein. Other methods extend this 601

characterization to proteins that exhibit not only local fluctuations around an average, 602

wet-laboratory, equilibrium structure but indeed are characterized by multi-basin 603

landscapes where distinct structural states have comparable Boltzmann probabilities. 604

Many methods focus on such proteins and particularly on modeling transitions between

605

similarly stable structural states as a way to obtain information on function modulation

606

and changes to function upon sequence mutations. Other methods are dedicated to 607

capturing allosteric regulation and identifying coupled motions not in the vicinity of 608

binding sites. Yet others focus on obtaining detailed structural characterizations of 609

meta-stable states and other states present at low populations, even in natively unfolded

610

proteins, as a way to understand aggregation, misfunction, and other disorders. In the 611

following we provide an overview of these applications, highlighting selected ones to 612

showcase current capabilities. 613

PLOS 13/83

Sampling of Equilibrium Conformation Ensembles 614

In principle, complete information about structure and dynamics can be obtained from

615

mapping the energy landscape of a given macromolecular sequence. Despite advances in

616

atomistic MD simulations, this remains an insurmountable computational task but for 617

the smallest peptides. As such, we separate here the discussion of work on sampling the

618

ensemble of folded conformations from work that focuses on protein folding and/or 619

structure prediction. Methods that initiate their search for other conformations of the 620

equilibrium ensemble from one or a few given conformations or wet-laboratory data are

621

in practice more efficient and have been employed to characterize both local fluctuations

622

and large-scale motions connecting conformations of the equilibrium or native state in 623

proteins. 624

We highlight here work that builds over the MD or MC frameworks but restricts 625

sampling in conformation space to regions that reproduce wet-laboratory data. In 626

particular, chemical shifts, which are NMR observables measured under a wide range of

627

conditions and with great accuracy, are proving very useful to methods in generating 628

conformation ensembles that capture macromolecular dynamics in solution. For 629

instance, work in [293,294] uses chemical shifts for backbone atoms as restraints in a 630

replica-averaged MD simulation. Work in [295] additionally incorporates NMR chemical

631

shifts for side chains and demonstrates as a result great agreement between 632

reconstructed conformation ensembles and wet-laboratory data, thus improving the 633

accuracy of computational methods and ability to make useful predictions on 634

macromolecular structure and dynamics. Work in [296] characterizes in detail the native

635

conformation ensemble of the src-SH3 domain and role of water. Work in [297] 636

incorporates diffuse X-ray scattering data to characterize the conformational dynamics

637

of a crystalline protein at the µs time scale. In other works [129,298–301], restraints 638

from wet-laboratory data are employed to improve the quality and thus accuracy of 639

simulation methods. 640

In the above works, the main idea is to incorporate the wet-laboratory data into a 641

restraint potential that is added to a molecular mechanics force field. In [302], the free

642

energy landscapes of small-size proteins are characterized by using the NMR chemical 643

shifts as collective variables, also known as reaction coordinates in slight abuse of 644

terminology) in metadynamics simulations. Doing so enhances sampling and allows 645

visiting multiple free energy minima not typically reached by classic MD 646

simulations [302]. The free-energy landscape reconstructed for the third Ig-binding 647

domain of protein G from streptococcal bacteria (GB3) in [302] is shown in Fig. 1. 648

In [34], the interdomain motions of the hen lysozome are characterized using RDC data

649

to restrain MD simulations. 650

Fig. 1 651

Free-energy landscape of GB3 obtained with work in [302] using chemical 652

shifts as collective variables 653

Panel A shows a two-dimensional projection of sampled conformations. The x axis 654

shows values of the CamShift collective variables for each conformation, which measures

655

the difference between the wet-laboratory and calculated chemical shifts for the 656

backbone. The y axis shows the backbone RMSD between each conformation and the 657

reference structure (PDB id 2oed). Some selected conformations, from extended to 658

compact, are highlighted, drawn with the Visual Molecular Dynamics (VMD) 659

software [303]. Panel B shows a conformation with the lowest backbone RMSD (0.5˚

A) 660

from the reference structure. Such native-like conformations are visited multiple times 661

by the method. Panel B draws hydrophobic side chains to illustrate that the internal 662

packing of these side chains is practically identical to that observed in the reference 663

PLOS 14/83

structure. This figure is reproduced with permission of the Executive Editor PNAS 664

from article Granata et al, 2013 [302]. 665

The idea of incorporating wet-laboratory data in energy functions, thus resulting in

666

pseudo-energy functions, has been popular for over a decade and demonstrated effective

667

not only in the context of MD sampling but also of MC sampling for reconstructing 668

equilibrium conformation ensembles (and even structure prediction, as we review below).

669

For instance, work in [304] demonstrates that the use of replica-averaged structural 670

restraints in MD simulations with a particular force field and a set of wet-laboratory 671

data can provide an accurate approximation of the Boltzmann distribution of a 672

macromolecule. Though NMR chemical shifts are proving more general at capturing the

673

extensive equilibrium dynamics, NOE, RDCs, S2order parameters, J couplings, and 674

hydrogen exchange data have been used to restrain both MD and MC sampling and 675

obtain detailed information on structure and dynamics of equilibrium states and 676

transition states in proteins [32, 35, 36,305–313]. The main advantage of incorporating 677

wet-laboratory data is to remedy inherent biases in force fields and guide the sampling

678

of the conformation space to relevant regions. Concerns of accuracy then entirely shift 679

on the breadth of sampling and the generality of the wet-laboratory data to capture the

680

equilibrium dynamics. Recent work affirms that NMR chemical shifts are very powerful

681

in this regard, and combined with enhanced sampling techniques for MD and MC, allow

682

sampling equilibrium conformation ensembles and thus faithfully capturing equilibrium

683

dynamics [273, 293

–

295, 314]. It is worth noting that there is great difficulty in the wet

684

laboratory in calculating chemical shifts, J-couplings, and other measurements from 685

structures. A central issue is the large uncertainty inherent in such calculations. One 686

way in which computational methods address this issue is by integrating different types

687

of experimental data [315, 316]. 688

Other non-MD based methods have also been applied, particularly to model internal,

689

equilibrium structural fluctuations of uncomplexed proteins. These methods, such as 690

CONCOORD [317], FIRST/FRODA [318,319], and PEM [320–322], are designed to 691

rapidly populate the conformation space in a neighborhood around a given structure. 692

They typically restrict an underlying stochastic optimization process based on MC or 693

other non-MD algorithms with geometric constraints. The constraints are obtained from

694

analysis of a given structure resolved in the wet laboratory and considered 695

representative of the equilibrium conformation ensemble. For instance, work in [317] 696

repeatedly generates and then corrects random conformations until a set of upper and 697

lower geometric bounds obtained from the given structure are satisfied. Work 698

in [318,319, 323] is based on constraint theory and models a given structure as a bar and

699

joint framework. This model allows employing rigidity analysis to reveal 700

underconstrained backbone angles on which sampling focuses to obtain inherent internal

701

fluctuations. Work in [320–322] is based on the treatment of inverse kinematics in 702

robotics and computes local fluctuations by restricting ends of consecutive overlapping

703

segments of the protein chain to positions in the given structure. 704

Structure-guided methods, while useful at probing regions of a conformation space 705

around a given structure, are not readily useful when the goal is to populate a highly 706

heterogeneous equilibrium ensemble for which there may not be sufficient representative

707

structures. On such proteins, often referred to as multi-basin proteins due to the 708

existence of potentially comparably-deep basins in the free-energy landscape, large 709

conformational changes are observed between basins. Detailed reconstruction of the 710

energy landscape of a protein is at this point challenging. Non-MD methods have been

711

devised and applied to capture thermodynamically-stable and semi-stable structural 712

states in multi-basin proteins [125,126]. In [126], an MC-SA method is devised that 713

employs multiple scales of representational detail and the fragment replacement 714

PLOS 15/83

technique popular in de novo structure prediction to map the energy landscape of the 715

uncomplexed adenylate kinase (AdK) protein. However, only a subset of the known 716

states are captured, pointing to the general challenge to devise enhance sampling 717

techniques capable of reconstructing energy landscapes of proteins in the absence of any

718

a priori information. Fortunately, significant, even if partial, information now exists 719

from wet-laboratory techniques on stable or semi-stable states of wildtype and variant 720

sequences of proteins. The method in [324] exploits this information to define a 721

lower-dimensional search space on which extensive sampling can be afforded to reveal 722

diverse thermodynamically-stable and semi-stable structural states. We note that such

723

states are stable in the lower-dimensional space, as no information is available on the 724

true potential energy surface. 725

While MD-based methods are challenged in a de novo setting, they are particularly

726

suitable to reveal the detailed structural transitions connecting two known structural 727

states. Providing detailed transitions is key to understanding the mechanistic basis of 728

several disorders linked to transition-modifying mutations. This promise has attracted 729

other non-MD methods that can sample conformational paths connecting two structural

730

states of interest without direct time-scale information on the transition. In the 731

following we provide an overview of work in modeling and simulating structural 732

transitions. 733

Modeling of Structural Transitions 734

Many proteins undergo large conformational changes that allow them to tune their 735

biological function by transitioning between different structural states, effectively acting

736

as dynamic molecular machines [325]. Since it is generally difficult for wet-laboratory 737

techniques to elucidate a transition in terms of intermediate conformations (though 738

successful examples exist [326]), computational techniques provide an alternative 739

approach [327]. However, transition trajectories may span multiple length and time 740

scales, connecting structural states more than 100

A apart. This length scale is up to 2

741

orders of magnitude larger than a typical interatomic distance of 2˚

A. Transitions can 742

also demand micro-millisecond time scales, which is 6−12 orders of magnitude larger 743

than typical atomic oscillations of the femto-pico second time scale. 744

Typically, three types of methods are applied to model structural transitions, 745

MD-based methods, morphing-based methods, and robotics-inspired methods. 746

MD-based methods typically have to employ powerful algorithmic enhancements to

747

surpass high-energy barriers in structural transitions. However, cases exist when classic

748

MD methods have been able to capture spontaneous transitions of allosteric proteins by

749

monitoring the structural relaxation upon removal of the bound molecule from the 750

binding pocket [328, 329]. These works further highlight the utility of the 751

conformational selection or population shift principle, as removal of the bound molecule

752

prompts spontaneous movement towards a new equilibrium state. 753

In cases of high-energy barriers, biased or targeted MD methods are useful to 754

expedite transitions between given structures [127,330], but the concern with such 755

methods is that the transition trajectory may not correspond to the true one, as these 756

methods modify the underlying energy landscape; the order of events in transition paths

757

computed via targeted MD methods depends on the direction in which the MD 758

simulations are performed. For example, an application of biased MD to capture 759

transitions of Ras between its active and inactive structures resulted in unrealistic, 760

high-energy structures [330]. It is worth noting, however, that recent work in [331] has 761

proposed a technique to remove the length-scale bias from targeted MD simulations. 762

Essentially, the technique formulates local restraints, each acting on a small connected 763

portion of the protein sequence, resulting in a number of potentials that are then used 764

in targeted MD simulations. The technique has been demonstrated effective on an 765

PLOS 16/83

application to the open

↔

closed transition in the protein calmodulin. The free energy

766

barriers associated with the computed paths have been shown comparable to those 767

obtained with a finite-temperature string method. 768

In contrast to biased MD methods, accelerated MD methods do not change the 769

entire landscape but only the relative height of the basins corresponding to the 770

structures that need connecting with intermediate conformations [332]. Accelerated MD

771

has been applied to several proteins to capture the transition of H-Ras between the 772

inactive and active structural states [10], map the structural and dynamical features of

773

kinesin motor domains [91], compute domain opening and dynamic coupling in alpha 774

subunit of heterotrimeric G proteins [333], and more. Representative results on an 775

application of accelerated MD for capturing the dynamics of the Eg5 kinesin motor 776

domain are shown in Fig. 2 . 777

Fig. 2 778

Probing of coupled motions in the Eg5 kinesin motor domains in [91] 779

through accelerated MD simulations 780

The top panel shows the structure and catalytic cycle of the kinesin motor domain. 781

The ATPase catalytic site sits at the top of the β-sheet, flanked by three 782

highly-conserved loops (P-loop, SI, and SII) connected to helices (also annotated) on 783

either side of the sheet. The secondary structure topology is drawn, with β-strands 784

drawn as triangles and

-helices as circles. The kinesin catalytic cycle is shown: Kinesin

785

(K) has a weak affinity for the microtubule in the ADP-state. ADP release is followed 786

by strong microtubule-binding. ATP binding may occur followed by hydrolysis and 787

product release to regenerate the weakly-bound ADP state. The bottom panel projects

788

conformations sampled by 200 nanosecond-long accelerated MD every 20 picoseconds on

789

the two principal modes of motion. The latter are obtained through principal 790

component analysis of collected X-ray structures for wildtype and variant Eg5. Three 791

simulations are highlighted, the nucleotide free (APO) one in (A), ADP-bound one in 792

(B), and ATP-bound one in (C). The nucleotide-free simulation covers more of the 793

conformation space, whereas restricted sampling is observed when Eg5 is bound to ATP

794

or ADP. One of the conclusions in [91] is that structural changes from the ADP- to 795

ATP-bound states which are evident in the collection of X-ray structures, are encoded 796

in the intrinsic dynamics of the nucleotide-free motor domain; the nucleotides effectively

797

rigidify the motor domain by narrowing the conformation space accessible by it, as 798

evident in the restricted sampling observed through accelerated MD. This figure is 799

reused from Scarabelli et al, 2013. CC-BY PLOS ONE [91]. 800

Even accelerated MD methods are limited in their ability to elucidate transition 801

trajectories that cross high energy barriers [10]. In contrast, the dynamic importance 802

sampling (DIMS) MD method [334, 335] is more effective at simulating macromolecular

803

transitions with energy barriers. In DIMS, the next conformational state sampled to 804

obtain a transition from a state A to a state B will be chosen to satisfy the most 805

productive movement to B and cross the energy barrier. The productive movement is 806

indicated by a robust progress variable, the instantaneous RMSD over heavy atoms 807

between a conformation and the target structure. DIMS is integrated in CHARMM and

808

has been tested on several systems [336], including modeling of slow transitions in 809

AdK [334], folding of protein A and protein G, and conformational changes in the 810

calcium sensor S100A6, the glucose–galactose-binding protein, maltodextrin, and 811

lactoferrin, showing good agreement between sampled intermediates and experimental 812

data [336]. 813

In particular, in [334], DIMS is applied to sample the ensemble of open-to-closed 814

PLOS 17/83

transitions for AdK. AdK is an enzyme that regulates the concentration of free 815

adenylate nucleotides in the cell by catalyzing the conversion of ATP and AMP into two

816

ADP molecules. The enzyme undergoes a large conformational change in its transition

817

between an open and a closed structural states, and this change has been observed even

818

in the absence of a substrate. As a result, AdK is one of the few proteins for which 819

wet-laboratory studies have been able to capture a great number of intermediate 820

structures populated during the open-to-closed transition. For this reason, AdK is a 821

poster system to measure the capability of computational methods to reproduce 822

transitions in great structural detail. Work in [334] is one of the few to provide atomistic

823

detail, as well as reproduce and map with great accuracy the location of known 824

intermediate structures along the transition. Representative results are shown in Fig. 3 .

825

Fig. 3 826

Sampling of the ensemble of closed-to-open and open-to-closed transition 827

trajectories in AdK through the DIMS method [334]. 828

An ensemble of 330 DIMS trajectories is compared to 45 E. Coli AdK X-ray 829

structures. The conformations in each trajectory are projected onto a progress variable

830

δRMSD measured as the RMSD of the conformation from the closed AdK structure 831

(PDB id 1ake:A) minus the RMSD of the conformation from the open AdK structure 832

(PDB id 4ake:A). For each of the 45 collected X-ray structures and each trajectory, the

833

conformation in the trajectory closest in backbone RMSD to an X-ray structure is 834

recorded, and the δRMSD value of the conformation along a trajectory is recorded. A 835

probability distribution is then constructed for each X-ray structure over all DIMS 836

trajectories to indicate where an X-ray structure is located along the simulated 837

trajectories. The color bar indicates the probability density. The median of each 838

distribution is marked by a white circle. The X-ray structures whose PDB ids are listed

839

on the y axis are rank ordered based on the median. The second white line traces the 840

location of the median when the simulations are repeated to sample open-to-closed 841

transition trajectories. Out of 45 structures sorted by

RMSD, about 24 are closed-state

842

structures, 4 are open, and 17 are intermediates. This work is an example of the 843

capability of computational methods to elucidate in detail transitions and accurately 844

map the location of experimentally-determined structures in the transitions. This figure

845

is adapted from Beckstein et al, 2009 [334]. The image was created by O. Beckstein. 846

Morphing- and string-based methods provide an alternative way to compute 847

transition trajectories. Morphing-based methods include MolMov [337], FATCAT [338],

848

NOMAD-Ref [339], MinAction [130], Climber [340], and more. In Climber, the 849

interresidue distances in a given start structure are pulled towards distances in the goal

850

structure, using harmonic restraints incorporated in a pseudo-energy function. MolMov

851

and FATCAT interpolate linearly in Cartesian space or over rigid-body motions. 852

NOMAD-Ref uses elastic normal modes and interpolates interresidue distances per the

853

elastic network algorithm in [341]. MinAction solves action minimization equations at 854

each of the provided structures assuming a harmonic potential at them. Other methods

855

include those based on elastic network models (ENMs) [131,341], the nudged elastic 856

band, zero- and finite-temperature string methods [340, 342–347]. In particular, the 857

string-based methods make use of the committor function to account for not generally 858

knowing the collective variables underlying the transition [343], whereas methods based

859

on ENMs show the ability of coarse-grained models at capturing allosteric transitions in

860

supramolecular systems on the order of megadaltons [131]. In general, while efficient, all

861

these methods tend to reproduce similar conformational paths in independent runs 862

rather than provide a possibly heterogeneous ensemble of conformational paths realizing

863

PLOS 18/83

the transition. 864

Work in [348, 349] tackles this issue of possibly high inter-run path correlations with

865

the weighted ensemble method (WEM). WEM, originally proposed in [350], has been 866

shown a useful enhanced sampling method for off-equilibrium and equilibrium processes.

867

WEM uses a multiple-trajectory strategy where MC trajectories spawn new ones upon 868

reaching new regions of the conformation space. One of the first applications of WEM 869

to path sampling was on a 72-residue domain of the calmodulin protein. Coupled with a

870

united residue model, WEM was able to capture the transition between the 871

calcium-bound and calcium-free structural states and compare well with brute force 872

simulations in a fraction of brute-force simulation time. In [349], WEM is used to 873

investigate the mechanism of the conformational change that the 5HIR benzylhydantoin

874

transporter Mhp1 undergoes from a state poised to bind extracellular substrates to a 875

state that is competent to deliver substrate to the cytoplasm. WEM reveals a 876

heterogeneous ensemble of outward-to-inward conformational paths and identifies two 877

distinct modes of transport. 878

Robotics-inspired methods have also been applied to model structural transitions. 879

They rely on deep analogies between robot motion planning and macromolecular motion

880

simulation. In particular, the T-RRT [351] and PDST [352] methods, adapted from 881

tree-based robot motion planning frameworks, have focused on the problem of 882

computing conformational changes connecting two given structures in small and large 883

proteins. While T-RRT has been shown to connect known low-energy states of the 884

dialanine peptide (2 amino acids long) [351], the PDST method has been shown to 885

produce credible information on the order of conformational changes connecting stable 886

structural states of large proteins (200−500 amino acids long) [352]. Both methods 887

control the dimensionality of the conformation space by either focusing on systems with

888

few amino acids [351] or by employing very coarse-grained representations to limit the 889

number of modeled parameters in large proteins [352]. Work in [353] extends the 890

capability of these frameworks to address large conformational changes in proteins, such

891

as calmodulin and AdK, while providing high-resolution intermediate conformations by

892

employing fragment-based moves. Other work detaches the sampling of the structure 893

space from analysis of motions [354]. MSM-based analysis of sampled conformations is 894

conducted to compute average properties of interest, such as expected number of 895

transitions connecting two given structural states in lieu of direct time-scale information.

896

Protein Folding and Structure Prediction 897

Protein folding and structure prediction are often treated as two sides of the same coin.

898

Protein folding, however, focuses on uncovering the detailed series of conformational 899

changes that a protein goes through from a denatured, unfolded state to its long-lived, 900

equilibrium, folded state. The folded or native structure is the end-result of this process,

901

but not the only goal. Indeed, there are many protein folding algorithms that employ 902

information about the native structure in order to expedite the search for the folding 903

mechanism. Structure prediction algorithms focus more on the end result; that is, the 904

goal is to uncover the native, folded structure even if the process by which these 905

methods do so does not resemble the physical folding one. In its broadest context, the 906

protein folding problem aims to shed light on the physical code by which a protein 907

amino-acid sequence determines the native structure, the speed with which proteins fold,

908

and the design of effective algorithms for predicting the native structure from sequence.

909

An extensive review of protein folding is presented in [355]. The credit with 910

introducing the problem to the computational biology community goes to Kendrew and

911

co-workers, who published the first structure of a globular protein, myoglobin and 912

showed the complexity and lack of symmetry or regularity in protein native 913

structures [61]. Since then, a general mechanism for folding has been elusive. Various 914

PLOS 19/83

paradigms have been proposed, evolving from the early days when folding was thought

915

to proceed deterministically, through a unique series of conformations for a protein at 916

hand, to the free energy landscape view founded upon description of an inherently 917

stochastic but biased process. The latter emerged from polymer statistical 918

thermodynamics and built evidence that protein folding energy landscapes are 919

funnel-like, narrower at the bottom, as the freedom of the protein to populate 920

low-energy regions is gradually restricted [5,28, 356]. While the energy landscape view 921

has inspired many folding and structure prediction algorithms, in itself there is no 922

suggestion of a mechanism that can be followed to efficiently fold proteins in silico.923

Application of MD simulations to observe the rare transition of a protein from an 924

unfolded state to a folded state have come a long way in both the size of the proteins 925

that can be handled and the time scales that can be modeled. Hardware advances, 926

improvements in force fields, coarse-grained models, multiscaling techniques, and novel

927

enhanced sampling techniques for MD have been crucial to surpassing spatial and time

928

scales. Atomistic MD simulations can now be afforded [357], with supercomputers such

929

as ANTON allowing running folding simulations of proteins of 50−100 amino acids for 930

milliseconds [358], and software such as GROMACS [359], NAMD [116], and 931

AMBER [360] becoming more accessible and easy to use to many researchers. In the 932

following we elect to highlight recent work that showcases the state of protein folding. 933

We then proceed with an overview of complementary work in de novo structure 934

prediction. 935

Protein Folding 936

Some of the most striking advances in protein folding with atomistic, equilibrium MD 937

simulations in the presence of water molecules have come from the Pande group, 938

particularly through the Folding@Home project [148, 361–364]. In 2005, van der Spoel 939

and colleagues provided the first folding simulation that also predicted the native 940

structure of a peptide based on the Gibbs energy landscape [365]. In 2010, Shaw and 941

colleagues successfully modeled the folding of a 35-residue protein in explicit 942

solvent [147]. Soon afterward, Lindorff-Larsen and colleagues in the Shaw group 943

managed to fold 12 fast-folding proteins of length up to 80 amino acids and diverse 944

native topologies with atomistic detail and in explicit solvent [105]. Some striking 945

observations were made from analysis of the folding trajectories of these small proteins,

946

which generated much discussion in the protein folding community [366]. In addition to

947

matching folding rates measured in the wet laboratory, work in [105] demonstrated that

948

the folding trajectories contained discrete transitions between native and unfolded 949

states, in agreement with barrier-limited cooperative folding. Pathway heterogeneity 950

was shown to be minimal for 9 of the 12 proteins, with pathways sharing more than 60%

951

of the native contacts. These results naturally suggested that the pathways observed in

952

simulation were variations of a single underlying folding pathway. 953

The conclusions in [105] were also supported by wet-laboratory work in [367], which

954

detected a limited set of pathways and only four intermediates for the folding of the 955

calmodulin. Moreover, in [105] it was observed that long-range contacts locking in place

956

the native fold formed early along, together with a significant amount of secondary 957

structures and surface burial. This was confirmed in other folding simulations, as 958

well [368]. While the amount of residual structure is questioned by wet-laboratory 959

studies and may possibly be the result of the bias of current force fields [366], the 960

observations in [105] build the case for sequential stabilization as a mechanism for the 961

folding of small, fast-folding proteins. The term sequential stabilization, coined in [369],

962

refers to the fact that folding may not be completely cooperative but is characterized by

963

small-scale events that add secondary structure elements named foldons [370] in a 964

stepwise manner. Because foldons are intrinsically unstable, low-energy paths are likely

965

PLOS 20/83

to involve foldons building on top of existing structures, thus resulting in sequential 966

stabilization. 967

Demonstration of the contribution and role of long-range native contacts early on in

968

folding provided further justification for the use of G

¯o

-models and other coarse-grained

969

models that assume native contacts are the only ones that are kinetically-relevant [143].

970

However, while the wet-laboratory study of the folding of calmodulin in [367] 971

demonstrated the presence of non-native intermediates in larger, more complex proteins,

972

which is certainly observed in de novo structure prediction algorithms in the richness of

973

non-native local minima. It is worth noting that a growing body of wet-laboratory 974

studies are adding to the list of proteins known to fold through distinct native-like 975

intermediates [371]. 976

From a methodological point of view, a significant body of recent work in protein 977

folding employs long, equilibrium, atomistic MD simulations in explicit solvent to 978

observe multiple, spontaneous folding and unfolding events and reliably measure 979

thermodynamic and kinetic quantities, such as folding rates, free energies, folding 980

enthalpies, heat capacities, φ-values, and temperature-jump relaxation 981

profiles [104, 105, 368]. While generally short, off-equilibrium MD simulations can at 982

best sufficiently capture a single folding event, recent work that embeds many short 983

off-equilibrium runs in coarse-grained kinetic models, such as MSMs, is able to 984

approximate well the underlying folding dynamics [123,133, 372]. Methods that embed 985

many short simulations (MD or other stochastic optimization methods) in MSMs for 986

the calculation of system dynamics is gaining ground in diverse applications, from 987

folding, to structural transitions, to binding [128, 132, 354,373–375]. 988

De Novo Protein Structure Prediction 989

The de novo structure prediction problem is perhaps one of the most popular and 990

recognized ones in computational biology. The goal is to compute a structure that is 991

representative of the protein native state given the amino-acid sequence of a protein 992

with no known sequence homologs. This problem sprung from Anfinsen’s findings that 993

the amino-acid sequence determines to a great extent the native state of a protein [11]. 994

Knowing the native structure of a protein is central to protein-ligand binding studies, 995

particularly in the context of CADD. The significant technological advances that have 996

made high-throughput sequencing possible have also resulted in 1000-fold more 997

sequences than structures known for proteins. 998

Advances in in-silico structure prediction can be attributed to Moult and colleagues,

999

who founded the important Critical Assessment of protein Structure Prediction (CASP)

1000

competition to spur research in the structure prediction community in a competitive 1001

setting. At CASP gatherings, structures resolved in the wet laboratory and withheld 1002

from computational competitors are later revealed and compared with predictions. 1003

Community evaluations are then published and serve as a good measure of the progress

1004

in structure prediction. For instance, the latest review of structure prediction methods

1005

in [376] demonstrates that overall performance in CASP 10 improved substantially 1006

compared to previous competitions. 1007

An exponential growth in the number of structures solved in the wet laboratory has

1008

had a dramatic effect on the utility of comparative modeling methods, which model 1009

structures of a target protein sequence after known structures/templates of proteins 1010

with similar sequences to the target; homologous structures can now be detected for 1011

most proteins [376]. HHPred is one of the most successful template-based predictors in

1012

CASP [377]. Nevertheless, de novo (or template-free, free, ab initio) modeling remains 1013

of great interest. Techniques used in de novo algorithms to model conformations of 1014

variable regions, such as loops, are also employed in template-based methods to fill in 1015

incomplete models [378]. Second, the goal of obtaining information on the equilibrium 1016

PLOS 21/83

structure(s) of a protein from its amino-acid sequence is key to understanding function

1017

and changes to function upon perturbations. 1018

Currently, state-of-the-art methods for de novo structure prediction rely on usage of

1019

the fragment replacement technique also known as fragment assembly. The technique 1020

allows simplifying and discretizing the conformation space explored by algorithms by 1021

essentially modifying a bundle of consecutive parameters, typically backbone angles of 1022

consecutive amino acids, simultaneously, as opposed to modifying individual backbone 1023

angles separately. A stretch of consecutive backbone angles is known as a fragment, and

1024

any protein conformation can yield a new one if a fragment can be selected in it and its

1025

configuration replaced with a new one. Originally introduced by Baker [379], the new 1026

configurations were obtained from a pre-compiled library configurations built over 1027

known protein structures in the PDB. Essentially, known protein structures are excised

1028

in consecutive overlapping fragments, and their configurations are recorded in a library

1029

indexed by the amino-acid sequence of a fragment. Replacement of fragment 1030

configurations naturally makes for a move or step in the context of an MC search, and 1031

most methods that use fragment replacement essentially implement enhanced sampling

1032

algorithms over baseline MC. For instance, the most recognized de novo structure 1033

prediction method, Rosetta [118], implements a multiscale MC method, which carefully

1034

switches from coarse-grained to atomistic representations in the growing MC trajectory,

1035

employing specifically-designed energy functions and even switching between two 1036

effective temperatures to cross energy barriers and so allow the MC search escape 1037

shallow local minima. 1038

It is worth noting that careful construction of energy functions and representations 1039

of various granularity can be credited as much as the fragment replacement technique 1040

with advances in de novo structure prediction [119]. However, at the moment, a 1041

saturation point has been reached [380], and current research is focusing either on 1042

specialized moves for MC-based methods or other, higher-level mechanisms by which to

1043

enhance MC sampling. In current top CASP performers, secondary structures are built

1044

and packed relatively easily, and the difficulty in correct predictions is localized to 1045

variable regions such as loops. For this reason, efforts are devoted to rethinking the 1046

moves in an MC-based setting beyond fragment replacement. 1047

Work in [119, 120], which has resulted in the highly-successful Quark method, shows

1048

the utility of designing different types of moves and employing them at various stages 1049

during the MC search. As reported inn [120], Quark performs very well in the free 1050

modeling category. Performacne on 34 free modeling targets is measured by calculating

1051

the TM-score between the best prediction and the known native structure for each 1052

target versus target length (TM-score is a metric for measuring structural similarity and

1053

is considered superior to RMSD [381]; the reader is directed to Ref. [382] for details.). 1054

Performance is unusually high (

0.5) for targets (R0006-D1, R0007-D1, and R0012-D1)

1055

that are longer than 150 amino acids. In particular, two of the targets, R0006-D1, 1056

R0007-D1, were considered difficult targets in the CASP10-ROLL experiment. On 1057

R0006-D1, which is a β-barrel protein 169 amino acids long, Quark generates five 1058

models with the highest TM-score of 0.32. Structural superposition extracts a model 1059

with TM-score 0

5, which improves to a TM-score of 0

622 after energetic refinement via

1060

I-TASSER [383]. On R0007-D1, which is an αprotein 161 amino acids long, Quark 1061

generates a best model with TM-score 0.43. Structural superposition extracts a model 1062

with TM-score 0.48 from the LOMETS template pool, which then improves to a 1063

TM-score of 0.62 after energetic refinement via I-TASSER. These results suggest that 1064

the focus on designing specialized moves is well placed. 1065

Other work is focusing on enhancing the sampling capability beyond a simple 1066

MC-based search or even an MC-SA, though there is a growing consensus that 1067

improving accuracy in scoring functions may be more important than enhancing 1068

PLOS 22/83

sampling to advance the state of de-novo structure prediction. Progress in enhancing 1069

sampling comes from different communities of computational biologists and computer 1070

scientists. One direction focuses on gradually narrowing the search space, either by 1071

iteratively fixing segments of the chain exhibiting low diversity among sampled 1072

low-energy conformations [384] or indirectly achieving the same effect but by changing 1073

the probability distribution function over the fragment configuration library [385]. 1074

Other work builds on model-based search and uses information gathered during the 1075

search to guide exploration towards promising regions of the conformation 1076

space [386, 387]. In [386] gathered information is used to identify near-optimal minima 1077

worth exploring in greater detail with all-atom energy functions. In [387], a 1078

robotics-inspired algorithm adapts the search towards under-sampled but low-energy 1079

regions of the conformation space to balance breadth versus depth. 1080

The issue of how to balance computational resources between exploring the breadth

1081

of conformational space while going deep down in local minima is a core one in 1082

stochastic optimization. Progress has been made over the years, particularly by 1083

evolutionary algorithms that are now competitive with MC-based methods such as 1084

Rosetta [388–390]. Pursuing evolutionary algorithms for conformation sampling in de 1085

novo structure prediction has opened up novel directions on the design of effective 1086

moves [391] and multi-objective optimization [392], where the goal is not to minimize an

1087

aggregate energy score but instead improve on several orthogonal categories. 1088

Currently, de novo structure prediction methods are focused on proteins with one 1089

well-defined native structure. Multi-basin proteins present a challenge, as they demand

1090

much more computational resources be spent on exploring the breadth of the energy 1091

landscape. In addition, conformation sampling (also known as decoy sampling) is not 1092

the only challenge with de novo structure prediction. Analysis of sampled 1093

conformations to identify the native structure and offer it as prediction presents its own

1094

challenges. This problem in itself is known as decoy selection, and a review of challenges

1095

and the state of the art is presented in [393]. 1096

Modeling Structure and Dynamics of Intrinsically-Disordered 1097

Proteins and Intrinsically-Disordered Protein Regions 1098

Lately, increasing attention is paid to the problem of characterizing the structure and 1099

dynamics of intrinsically-disordered proteins (IDPs) [394–396]. There are now growing 1100

databases of IDPS and intrinsically-disordered protein regions (IDPRs), such as pE-DB,

1101

DisProt and IDEAL [397–399]. CECAM now regularly includes a workshop dedicated 1102

to promoting the development of new modeling methods and better understanding 1103

IDPs [400]. Since 2002, even CASP provides an independent assessment of methods for

1104

IDPS [396]. Several reviews discuss the fundamental principles of disorder in the 1105

biological function of IDPs/IDPRSs biological functions, including the role of disorder 1106

in cancer, neurodegeneration, genetic forms of Parkinson’s disease, and cardiovascular 1107

diseases [395, 395, 401–405]. 1108

IDPs/IDPRs pose unique challenges in silico. They do not have stable tertiary 1109

structures but still demonstrate biological activity. This phenomenon challenges the 1110

fundamental structure-function relationship and is an extreme case of the exception to 1111

the lock-and-key model [395]. IDPs/IDPRs are not random coils. They exhibit different

1112

degrees of disorder, from molten globules to coils, but even coil-like structures exhibit 1113

residual structure [402, 405]. A recent replica exchange MD simulation study revealed 1114

the structural contents of intrinsically disordered tau proteins. Tau proteins were 1115

discovered to be able to catalyze self-acetylation, which may promote pathological 1116

aggregation. The work characterized the atomic structures of two truncated tau 1117

constructs, K18 and K19, providing structural insights into tau’s paradox [406]. 1118

PLOS 23/83

IDPRs sequences are very different from those of ordered proteins, poor in 1119

hydrophobic amino acids and rich in charged amino aids. Disorder-promoting amino 1120

acids have now been identified, and they include Ala, Arg, Gly, Gln, Ser, Glu, Lys, and

1121

Pro [404, 405]. Based on sequence information alone, tools now exist to estimate the 1122

propensity of a sequence for disorder [407]. There are many methods for disorder 1123

analysis and prediction of the location of disordered regions [124, 408–411]. 1124

Computational methods are being designed to characterize structures and dynamics

1125

of IDPs/IDPRs. With specifically-designed force fields, some methods have shown 1126

promise in this regard [412, 413]. Treatment of IDPRs is now included in Rosetta [414].

1127

Two main groups of methods focus on IDPs/IDPRs. The first group consists of 1128

wet-laboratory techniques based on NMR Chemical Shifts and RDCs [415]. The second

1129

consists of MD-based methods [152, 153, 408, 416–418]. 1130

Both unrestrained MD [416] and long-range correlated MD [417] for 1131

well-characterized disordered proteins demonstrate good agreement with wet-laboratory

1132

data. The replica exchange with guided annealing method has also been shown suitable

1133

for IDPs [418]. The method escapes nonspecific compact states more efficiently and 1134

speeds up the generation of correct ensembles compared to classic replica exchange 1135

simulations. Work in [153] additionally shows the effectiveness of MD and MSMs for 1136

IDP modeling. 1137

Other methods combine NMR-based knowledge and MD simulations [6,302, 314, 413].

1138

While NMR ensembles are better suited to characterize local conformational states of 1139

IDPs [415], MD simulations allow calculating kinetics and elucidating meta-stable states

1140

and barriers between states [314]. Given their unique characteristics, computational 1141

methods are expected to continue their treatment of IDPS to better understand the 1142

connection between disorder and biological function and misfunction. 1143

Protein Design 1144

The protein design problem is that of finding an amino-acid sequence whose global free

1145

energy minimum state corresponds to a desired, target structure or contains a structural

1146

motif associated with a desired function [419]. Also known as inverse folding or inverse

1147

structure prediction, this problem is now at the crux of protein engineering, with 1148

applications in medicine, biotechnology, synthetic biology, nanotechnology, biomimetics,

1149

and more [420]. Stated as an optimization problem, protein design is amenable to 1150

algorithmic frameworks employed for structure prediction. 1151

Computational approaches to protein design can be categorized into forward design,

1152

explicit negative design, and heuristic negative design [419]. In forward design, the 1153

sequence and target fold/structure of a protein are known, and the goal is to optimize 1154

the sequence so that the target structure reaches such a low energy that will make any

1155

other non-target structures less energetically favored. No explicit non-target structures

1156

are considered. A successful application of forward design has yielded a very stable 1157

protein, Top7 [421], whose native structure was later shown identical to the determined

1158

X-ray structure. In explicit negative design, alternative structures are explicitly 1159

considered. The sequence is optimized so that the target native structure is lower in 1160

energy than all the alternative structures. Explicit negative design has been used to 1161

design specific coiled coils and DNA-binding and -cleaving enzymes [422–425]. 1162

The limitation of explicit negative design regarding prior knowledge and 1163

enumeration of non-favored alternative states has motivated heuristic negative design. 1164

In heuristic negative design, the goal is not to disfavor specific alternative structures; 1165

instead, the sequence is optimized through features that are likely to increase the energy

1166

of most undesired structures. Features follow closely strategies employed by nature to 1167

achieve the energy gap between the native structure and other structures that seems to

1168

be required for thermodynamic stability and function [419]. It is worth noting that 1169

PLOS 24/83

conclusions regarding energy gaps between native and non-native structures when 1170

employing scoring functions need to be taken with a grain of salt. Work in [365] relates

1171

gaps in Gibbs free energy to structure deviations (from NMR data). 1172

Compared to the other two strategies summarized above, heuristic negative design 1173

seems particularly important for biomolecular interactions [426,427]. Heuristic negative

1174

design also seems to be employed by nature for IDPs and by pathogens to fend off the 1175

host immune system [419]. 1176

Successful cases of designing proteins with novel functions abound [428

–

430] and are

1177

made possible by considerable advances in methods for de novo protein design. The 1178

current predominant computational approach is based on the (inverse folding) paradigm

1179

proposed in [431], which assumes a fixed backbone and searches over discrete low-energy

1180

configurations/rotamers of side chains for rotameric combinations that result in a 1181

lowest-energy all-atom tertiary structure [432]. In the interest of tractability, energy 1182

models are limited to pairwise energy functions. State-of-the-art functions for protein 1183

design are knowledge-based, relying on statistical parameters derived from databases of

1184

known protein properties [433

–

437]. Even with such energy models, the design problem

1185

with a rigid backbone and a discrete set of rotamers has been proven to be 1186

NP-hard [438]. 1187

Two types of methods have been proposed to address the combinatorial optimization

1188

problem of finding rotameric combinations. The first are based on exact optimization 1189

and seek completeness; that is, finding the global minimum energy conformation. The 1190

second forego completeness and are based on heuristic optimization. 1191

Exact optimization methods include dead-end elimination [439], branch-and-bound 1192

algorithms [440–442], integer linear programming [443,444], dynamic 1193

programming [445], or cost function networks [446]. These exact methods are efficient 1194

and they limit inaccuracies to the inadequacy of the energy model, but their focus on 1195

one single assignment is highly subjective to possible artifacts in the energy function, 1196

known and lamented in [447]. Moreover, the solution provided by such methods may be

1197

overly stabilized (effectively residing in a narrow basin), that it lacks the structural 1198

flexibility for the protein to operate the sought biological function under physiological 1199

conditions [448]. It is worth noting that unlike discrete rotamer assignments, work by 1200

Donald and colleagues pursues continuous rotamers and is able to reach lower-energy 1201

conformations [449]. This functionality is integrated in the popular OSPREY 1202

software [450]. It is expected that the design of a smoothed backbone-dependent 1203

rotamer library in [451], which allows evaluating rotamer characteristics as smooth and

1204

continuous functions of the

φ, ψ

angles will lead to more advances in taking into account

1205

side-chain flexibility in de novo design. An illustration of the capability of protein 1206

design algorithms is provided in Fig. 4. 1207

Fig. 4 1208

Predicting a pathogen’s resistance mutations [452] (A) Pictured is an 1209

illustration of a game between scientists and bacteria. For every drug that scientists 1210

develop against bacteria (a ”move”), bacteria respond with mutations that confer 1211

resistance to the drug. This paper shows that these ”moves’ by bacteria can be 1212

predicted in silico ahead of time by the Osprey protein design algorithm. Donald, 1213

Anderson, and co-workers used Osprey to prospectively predict in-silico mutations in 1214

Staphylococcus aureus against a novel preclinical antibiotic, and validated their 1215

predictions in vitro and in resistance selection experiments. Image (A) was created for 1216

this paper by Lei Chen and Yan Liang (L2Molecule.com

<http://l2molecule.com/>

1217

(B-C) Computationally predicting drug resistance mutations early in the discovery 1218

phase would be an important breakthrough in drug development. The most meaningful

1219

predictions of target mutations will show reduced affinity for the drug (C) while 1220

PLOS 25/83

maintaining viability in the complex context of a cell (B) . The protein design 1221

algorithm, K* in Osprey, was used to predict a single nucleotide polymorphism in the 1222

target DHFR that confers resistance to an experimental antifolate (Compound 1) in the

1223

preclinical discovery phase. Excitingly, the mutation was also selected in bacteria under

1224

antifolate pressure, confirming the prediction of a viable molecular response to external

1225

stress. Images (B-C) were created by Adegoke Ojewole in the Bruce Donald Lab, Duke

1226

University. 1227

Heuristic optimization methods for de novo design build on stochastic optimization 1228

or meta-heuristics, such as MC-SA [433,453], Genetic Algorithms [454, 455], and other 1229

stochastic optimization methods [442, 456, 457]. Methods based on stochastic 1230

optimization, best represented by RosettaDesign [458], currently dominate, mainly due

1231

to the ability to provide an ensemble of near-optimal solutions through their 1232

sampling-based approach. The backbone is kept fixed, and rotameric states are sampled

1233

systematically or in a sampling-based manner [433, 453] over pre-built rotamer 1234

libraries [435, 459]. All-atom energy minimization of the entire resulting all-atom 1235

conformation is often carried out [460, 461]. It is here, in the minimization stage to 1236

which all constructed conformations and sequences are subjected, that localized 1237

backbone fluctuations are allowed. The extent of these fluctuations is small, limited to

1238

backrub motions [462

–

465]. Larger motions are allowed, but only on loop regions, made

1239

possible by efficient inverse kinematics techniques like Cyclic Coordinate 1240

Descent [320, 466]. 1241

The importance of allowing backbone flexibility in the design process cannot be 1242

overestimated. The simple model of the backrub motion consists of a small dipeptide 1243

rotation about the C-C

axis. Recent studies suggest that integrating backrub motions

1244

in the design process leads to improved designs of protein-protein interaction interfaces

1245

and more realistic templates with improved fit between simulated side-chain dynamics 1246

and NMR data [462, 464, 467]. Additionally, work in [468] has demonstrated that taking

1247

into account backrub motions expands sequence diversity during search and allows new

1248

residue interactions that rigid-backbone approaches cannot accommodate. This leads to

1249

better designs with lower energies and has been confirmed in other studies, as 1250

well [469, 470]. 1251

Finally, an important highlight in protein design is the fact, that, despite the 1252

absence of evolutionary history in newly-designed proteins, evolutionary information can

1253

be accommodated in the design process. Work in [470] reveals strong correlations 1254

between residue covariance in naturally-occurring protein sequences and sequences 1255

optimized for the same structures by computational protein design. Covariance has 1256

been demonstrated for complementary changes in residue size, residue charge, and 1257

hydrogen bonding [471–475]. These findings suggest that structural restrains on 1258

co-evolving residues in contact can lead to further improvements both in de novo 1259

protein design and structure prediction. 1260

Categorization by Algorithmic Frameworks 1261

In the following we categorize methods by the algorithmic frameworks they modify and

1262

adapt for investigating macromolecular structure and dynamics. 1263

MD-based Methods and Enhancements 1264

In the classic MD setting, Newton’s equation of motion is iteratively solved on a finely 1265

discretized time scale to observe collective movements of the atoms comprising a 1266

molecular system through successive conformations terminating at a local minimum 1267

PLOS 26/83

conformation in the system’s energy surface. The ensemble of conformations obtained at

1268

equilibrium conditions observes the Boltzmann distribution. A distinct advantage of 1269

employing MD to simulate the equilibrium dynamics of a macromolecule is the ability 1270

to obtain great detail on individual and correlated motions of specific atoms and specific

1271

sites on a macromolecule, as well as correlated motions between macromolecular units of

1272

a complex. A disadvantage of the classic MD simulation setting is the inability to 1273

sample rare events that occur on long time scales. In particular, in the presence of high

1274

energetic barriers separating local minima in the energy surface, a classic MD 1275

simulation may be trapped and never escape within the time scale of the simulation. 1276

Limited sampling of the conformation space is a fundamental issue in classical MD, 1277

and algorithmic enhancements are proposed on a regular basis to enhance sampling 1278

capability. These include replica exchange, accelerated MD, umbrella sampling, biased 1279

or steered MD, importance sampling, activation relaxation, local elevation, 1280

conformational flooding, jump walking, multicanonical ensemble, MSM-driven MD, 1281

discrete timestep MD, swarm methods, and others [8, 149, 203–206,334, 476–489]. 1282

Recent reviews of advanced MD-based methods and outstanding issues are discussed

1283

in [124, 490

–

495]. A comprehensive list of commonly used MD packages for biomolecular

1284

simulation is presented in [493]. Examples of MD applications on proteins with large 1285

conformational changes that occur on long time scales, such as G-proteins, Ras-proteins,

1286

kinases, signaling proteins, and others can be found in [121, 122]. In the following, we 1287

highlight some of the algorithmic enhancements to the classical MD setting that are 1288

responsible for surpassing traditional MD time scales and characterizing the dynamics 1289

of complex systems. 1290

Accelerated MD and Adaptations 1291

The accelerated MD method [496, 497] locally flattens the potential energy surface to 1292

decrease the free energy barriers between two conformational states. When the system’s

1293

potential energy falls below some predefined threshold energy E, a bias potential is 1294

added. The level of flattening is regulated by two parameters that are typically specified

1295

by the user: the threshold energy E, which controls the portion of the potential surface

1296

affected by the bias, and the acceleration factor α, which determines the shape of the 1297

bias potential and thus how flattened the energy surface becomes. The bias potential 1298

allows escaping deep minima separated by high energy barriers, thus accelerating the 1299

transition between two conformational states of interest and extending the time scale of

1300

events that can be observed in simulation. Recent accelerated MD simulations with 1301

nanosecond steps [498] can explore more conformational dynamic events [499,500]. 1302

However, Boltzmann statistics need to be recovered from the simulations, and the effect

1303

of the bias potential must be unwinded. A reweighting procedure is typically used, 1304

which attempts to convert an accelerated MD trajectory to the canonical ensemble at a

1305

given temperature [8, 501]. 1306

Enhancements and adaptations of the baseline accelerated MD method are being 1307

proposed. We note here first the self-learning, reconnaissance metadynamics 1308

method [502], which combines principles of accelerated MD and the concept of collective

1309

variables that is the foundation of the metadynamics strategy. Similar to the baseline 1310

method, a bias potential is added to the true potential to locally flatten the energy 1311

surface. However, the bias potential is constructed over the low free energy region 1312

defined over a large number of locally-valid collective variables. The accelerated 1313

adaptive integration method [203] can be considered another adaptation of the baseline

1314

accelerated MD method for the problem of modeling ligand-binding processes. A ligand

1315

coupling parameter λis introduced to keep track of the end points of the 1316

receptor-ligand coupling and decoupling process; λtakes values from 0 to 1. The 1317

method assumes that some transitions can be more accessible if a certain stage of 1318

PLOS 27/83

coupling/decoupling (λ) is reached; the potential energy function is flattened at 1319

intermediate values of λinstead of at some threshold energy value E. 1320

Replica Exchange MD methods 1321

Replica exchange is a popular enhancement of the classical MD method; it is also 1322

known as parallel tempering. Originally, replica exchange was introduced to improve 1323

properties of the MC framework [503], but has since then been adapted to enhance MD

1324

sampling [504]. The usual continuous MD tra jectory is broken into several replica 1325

simulations randomly initialized and conducted at different temperatures. The number

1326

of replica simulations is typically determined by the user. So is the decision on 1327

temperatures assigned to the replica simulations. The simulations exchange information

1328

with one another by exchanging conformations at regular intervals. At a time, two 1329

simulations are selected, and their instantaneous conformations are exchanged according

1330

to the Metropolis criterion. The exchange often allows a particular simulation to escape

1331

a local minimum by making conformations accessed at higher temperatures available to

1332

those at lower temperatures, thus enhancing sampling capability. In addition, the 1333

setting of multiple simulations encourages parallel implementation and employment of 1334

distributed architectures with message passing. This gives replica exchange high 1335

exploration capability. Many adaptations and applications of replica exchange 1336

exist [149, 478, 505]. Work in [506] proposes a technique to deduce kinetics data from a

1337

heterogeneous ensemble of simulation trajectories. A detailed review of methods based

1338

on replica exchange can be found in [478]. 1339

Restrained Ensemble MD Methods 1340

We note here two methods to illustrate the employment of experimental data as 1341

restraints in MD-based simulations, the replica-averaged MD method and the 1342

replica-averaged metadynamics method. The employment of experimental data to 1343

correct a molecular force field and thus steer the sampled conformation ensemble 1344

towards the Boltzmann distribution has a rich history in macromolecular modeling. The

1345

idea of using experimental measurements as averaged structural restraints in MD 1346

simulations was first implemented for distances derived from NOE [35]. A penalty term

1347

was added to the force field if the time-average of an NMR observable calculated from 1348

an MD trajectory differed from that provided by experiment. A variation of this idea is

1349

to measure not a time-average but an ensemble-average observable. The latter is 1350

referred to as the replica-averaged approach, and a variety of restraining algorithms, 1351

including those that conduct both time and ensemble averaging, have been developed 1352

and applied to sample and characterize native, transition, intermediate, and unfolded 1353

states of proteins [17, 32, 34, 312, 316, 507–512]. 1354

Vendruscolo and colleagues [304] have demonstrated that MD simulations with 1355

replica-averaged structural restraints allow generating structural ensembles according to

1356

the maximum entropy principle introduced by Jaynes [513]. Jaynes addressed the 1357

problem of incorporating information from experiments into a structural model while 1358

avoiding corrupting the model with spurious and arbitrary biases. His maximum 1359

entropy method, however, proved too cumbersome. The restrained ensemble methods of

1360

Vendruscolo and others provide an alternative practical approach, but, until recently, it

1361

was not known whether these methods obey the maximum entropy principle. In 1362

addition to work in [304], Roux and collaborators demonstrate in [514] that 1363

restrained-ensemble MD simulations produce statistical distributions that are formally 1364

consistent with the maximum entropy principle. 1365

Distance restraints from NOE data, if available, can be integrated in ALMOST, an 1366

all-atom molecular simulation open-source package for macromolecules structure 1367

PLOS 28/83

determination and analysis [515]. In the replica-averaged metadynamics method [516], 1368

in addition to making use of replica-averaged restraints in the force field, the 1369

metadynamics framework is exploited to enhance sampling. Application on the 1370

-conotoxin SI, a 13-residue peptide that has been characterized extensively in the wet

1371

laboratory, shows that the method enables accurate reconstruction of the free energy 1372

landscape. 1373

Umbrella Sampling Umbrella sampling [517–519] is another method that employs 1374

collective variables. Umbrella sampling is related to importance sampling in statistics. 1375

Umbrella sampling addresses systems with energy landscapes where a high energy 1376

barrier separates two regions of the conformation space. The relevant system 1377

coordinates are grouped into sets of collective variables, with each set determining a 1378

separate umbrella window. A restraint bias potential forces the collective variables in a

1379

window to remain close to the center of mass. The restraint potential often takes a 1380

quadratic or harmonic form, determining the weighting function of a given window. If 1381

the configurations in a window are far from the equilibrium state, the weighting 1382

function will be large, and the simulation will be biased away from the initial 1383

configuration. The sets of collective variables must allow for slight overlap of their 1384

windows for proper reconstruction of the transitions between them. Extracting 1385

corresponding Boltzmann averages and handling overlapping weighting functions are 1386

key issues. The information from each window-biased simulation is converted into local

1387

probability histograms. The weighted histogram analysis method (WHAM) [520] is now

1388

the standard method to combine results from a set of umbrella sampling simulations. 1389

Work in [521] introduces superlinear numerical optimization algorithms to diagnose and

1390

quantify systematic errors due to limited sampling and to obtain fast and accurate 1391

solutions of coupled nonlinear WHAM equations. Work in [522] introduces a bootstrap

1392

method to accurately estimate error due to insufficient sampling and incorporates 1393

autocorrelations to reduce such errors. The method, g wham, has been incorporated in

1394

the popular GROMACS molecules simulation suite [359]. The umbrella sampling 1395

scheme can be integrated into other enhanced MD or MC strategies. We highlight here

1396

the self-learning umbrella sampling method in [523], which learns, through a feedback 1397

mechanism, which regions of a multidimensional space are worth exploring and 1398

automatically generates a set of windows. This method needs a significant smaller 1399

number of umbrella windows to characterize the free energy landscape over the most 1400

relevant regions without any loss in accuracy. Umbrella sampling has been employed to

1401

study processes with large conformational changes or rare events, such as ligand binding

1402

and ion induced diffusion in membrane proteins [523,524]. 1403

Adaptive MD Sampling Methods 1404

Guiding MD sampling via on-the-fly analysis of obtained conformations to determine 1405

undersampled regions of the conformation space is gaining ground in macromolecular 1406

modeling. The principal difficulty with adaptive sampling is the identification of 1407

meaningful collective variables over which to project conformations and obtain 1408

lower-dimensional embeddings of the conformation space for the identification of 1409

under-sampled regions and calculation of interesting statistics. While collective 1410

variables, such as number of native and non-native contacts, hydrogen bonds, dihedral 1411

angles, RMSD, radius of gyration remain popular, these variables have been shown to 1412

result in overly smooth landscapes [525] and mask interesting transitions. Recent work

1413

by Clementi and colleagues has reintroduced diffusion-based dimensionality reduction 1414

methods for extracting collective variables and has demonstrated the power of such 1415

methods for characterizing complex energy landscapes [526, 527]. Further work by the 1416

PLOS 29/83

same authors in [528, 529] employs the identified collective variables to guide and 1417

expedite sampling of rare events via MD. 1418

In contrast to methods that rely on the identification of collective variables, a 1419

different line of work in the early 2000s introduced the concept of kinetic clustering and

1420

conformation space network. Both were precursors of the MSM. The main idea was to 1421

organize conformations in discrete, graph-based models of connectivity to both visualize

1422

the free energy surface and carry out interesting calculations on such models. 1423

The concept of kinetic clustering evolved from the disconnectivity graphs put forth 1424

separately by Karplus and Wales [530–532]. Work by Rao and Caflisch took this idea 1425

further by proposing complex network analysis both to visualize and study the 1426

conformation space and folding of peptides [533]. In lieu of geometric clustering, 1427

conformations in [533] were grouped together by secondary structure, and the different

1428

emerging groups were abstracted as nodes of a network, with links between nodes 1429

recording observed transitions between groups. Interesting observations were made 1430

regarding network topology and peptide folding kinetics in [533] and in later 1431

applications investigating the impact of single-point mutations on peptide folding [534]

1432

(a detailed review of the conformation network idea can be found in [535]), but the 1433

broader analogy (and generalization) between conformation space networks and MSMs

1434

would emerge later. In tandem with the conformation space network proposed by 1435

Caflisch, related work by Karplus further propelled the disconnectivity graphs to 1436

additionally employ max-flow/min-cut algorithms to lay bare the hidden complexity of

1437

free energy surfaces of peptides and proteins [525, 536]. It is worth noting in this context

1438

that the free energy surface generated by implicit solvent is often very different and 1439

more complex than that generated by explicit solvent [537]. Early work in [538] 1440

demonstrates that explicit solvent smooths the energy surface. 1441

Kinetic clustering continues to be useful and has been used successfully to 1442

characterize protein folding through very long MD simulations [147]. In [147], 1443

conformations are assigned to clusters so that the long time scale behavior in 1444

cluster-space mimics that in the MD simulation. Autocorrelation functions of the time

1445

series of a large number of atomic distances are calculated to match the long time scale

1446

of these functions with corresponding correlation functions calculated over dynamics in

1447

cluster space. The assignments and then the construction of transitions between distinct

1448

long-lived states identifies the slower transitions [147]. 1449

It was only around 2005 that the analogy between the conformation space network 1450

and the MSM would be made by Pande and coworkers [363,539]. The notion of kinetic

1451

clustering was generalized, and the conformation space networks evolved into kinetic 1452

networks connecting meta-stable states, effectively MSMs [540]. The integration of 1453

MSMs [146,153, 541] into MD simulations allows investigating macromolecular dynamics

1454

even beyond the second time scale [123]. Originally, MSMs were only employed to 1455

analyze the connectivity of conformational states sampled through multiple, long MD 1456

simulations and employ calculations over the MSM to derive kinetic measurements [363].

1457

In [123], MSMs were employed to reconstruct folding pathways from short 1458

off-equilibrium, all-atom simulations in explicit solvent. MSM and MD methods have 1459

been applied to model folding [542–545], protein–ligand binding [136, 138, 546], protein 1460

switches in kinase and GPCRs [547,548], allostery [549] and IDPs [541, 550], revealing 1461

extensive statistical details about intermediates states [136,542, 551] and molecular 1462

interaction mechanisms. The employment of MSMs to focus computational resources to

1463

under-sampled regions of the conformation space in an adaptive manner is a rather 1464

recent development in macromolecular modeling. A semi-automatic protocol has been 1465

proposed in [552] to simulate the folding and unfolding of the villin headpiece in a very

1466

efficient manner. Work in [128] also proposes a semi-automatic protocol analyzing MD

1467

trajectories with a constructed MSM model to pinpoint where more sampling needs to 1468

PLOS 30/83

be conducted. As of now, a fully automatic protocol remains elusive [553]. 1469

While MSM-guided MD sampling relies on obtaining a discrete model of the 1470

connectivity of the sampled conformation space to guide further sampling, other 1471

methods rely on modifying the energy function itself to bias the simulation away from 1472

already-sampled conformations. One of the earliest methods to do so was local 1473

elevation [481]. In local elevation, the actual potential energy surface is modified in 1474

order to drive conformational sampling away from visited conformations (a bias term 1475

that is the sum of of repulsive functions is added to the potential energy function). 1476

Metadynamics methods follow a similar approach [554,555]. The assumption in 1477

these methods is that the system can be described in terms of a few collective variables.

1478

During the MD simulation, the location of the system is calculated in terms of the 1479

collective variables. A positive Gaussian potential is then added to the energy landscape

1480

so that the simulation is biased to return to the previous location. During the 1481

simulation, more and more Gaussians add up to the point that the system is 1482

discouraged from going back to previous locations in the energy landscape, thus 1483

exploring the full landscape. The time interval between the addition of two Gaussians 1484

and the height and width of a Gaussian are all tunable parameters to optimize the ratio

1485

between accuracy and computational cost. The crucial issue in metadynamics, as in 1486

other techniques based on collective variables, is to identify the right collective variables.

1487

Strategies to do so are reviewed in [555]. The metadynamics strategy is available as a 1488

portable plugin for MD simulation platforms in PLUMED [556]. Metadynamics MD has

1489

been applied to study the folding process of small proteins [557, 558], protein 1490

switches [559–561], and ion induced diffusion of small molecules in cavities and 1491

channels [562,563]. Metadynamics methods have also allowed modeling the docking 1492

process with full protein flexibility [135, 564–567]. 1493

MC-based Methods and Enhancements 1494

While a significant portion of research on macromolecular structure and dynamics 1495

employs MD-based methods, a just as significant portion employs MC sampling. In MC,

1496

the evolution of a conformation into another is not guided by Newton’s equation of 1497

motion but instead a programmed move or step designed to introduce a small or large 1498

conformational change. The end result of the move is only accepted according to the 1499

Metropolis criterion in order to promote the trajectory of consecutive conformations to

1500

converge to the global minimum while allowing some non-zero probability of escaping a

1501

current minimum. MC-based methods employ the notion of effective temperature to 1502

regulate the height of energy barriers that can be crossed. While generally regarded to

1503

have higher sampling capability than MD, MC methods also are prone to convergence 1504

to local minima and forego any direct information of time scales and kinetics. Many of

1505

the enhancement strategies for MD can be applied to MC-based methods. In the 1506

following we highlight two such enhancements. 1507

Collective Motions Molecular Dynamics and Monte Carlo 1508

Collective MD [568] belongs to the family of enhanced MD sampling methods that 1509

simplify sampling considering only the most dominant, low-frequency, low-resolution, 1510

collective motions. The latter are identified by modeling a structure through the 1511

anisotropic network model (ANM) [569]. The basic approach is to deform the structure

1512

collectively along the modes predicted by the ANM. A Metropolis-based MC scheme is

1513

employed to select the ANM modes; the stochasticity permits the system to occasionally

1514

circumvent energy barriers. The ANMPathway is a related sampling method that uses 1515

modes extracted from two ENMs representative of the experimental structures that 1516

constitute the end points of the transition under investigation [570]. Both methods have

1517

PLOS 31/83

been tested on modeling open-close transitions in AdK [568, 570] and several 1518

transporting membrane proteins [570]; the transition pathways were captured in great 1519

detail and at significantly lower computational cost than other methods [571]. 1520

Weighted Ensemble Method 1521

The weighted ensemble method (WEM) [572] is an enhanced sampling method with 1522

simplified sampling. WEM uses a multiple-trajectory strategy in which individual 1523

trajectories can spawn multiple daughter trajectories upon reaching new regions of 1524

configuration space called bins. The daughters are suitably weighted to ensure 1525

statistical rigor. WEM can yield rigorous estimates for time scales that are much longer

1526

than the simulations themselves. The idea to split and propagate re-weighted 1527

trajectories had been initially introduced in MC simulations, but WEM can be used as

1528

a sampling method for MD simulations, as well [572]. WEM has been employed to 1529

model folding [573], non-equilibrium [574] and equilibrium and processes [572], and 1530

conformational transitions between end-points separated by high energy barriers [575]. 1531

Other Algorithmic Frameworks 1532

Morphing Methods 1533

Geometric morphing uses the linear interpolation of each atom to construct a path 1534

between conformations. MolMovDB [337, 576] was the first online tool to allow 1535

obtaining and visualizing such paths. After each linear interpolation, the morphing 1536

algorithm in MolMovDB conducts an energy minimization to fix possible distortions 1537

and restore the stereochemistry of the intermediate points in the interpolated trajectory.

1538

The created morphs are stored in the database of motions and can be found by protein

1539

name, PDB ID, or motion type [577]. 1540

Conformational trajectories based on linear interpolation do not necessarily 1541

represent actual conformational pathways. Several morphing-based methods have been

1542

developed that provide non-linear interpolations between the start and goal structures 1543

to be connected through intermediate conformations [130, 338, 341,578

–

580]. Non-linear

1544

morphing methods rely on normal mode analysis (NMA) of harmonic-type models, such

1545

as the ENM and its variants, to obtain principle motions of a macromolecule about a 1546

local minimum. Such models are based on early concepts by Go, Scheraga, and 1547

Flory [581

–

583], and they rely on the assumption that macromolecules can be treated as

1548

deformable elastic bodies, where the interatomic potential function can be represented 1549

by a harmonic model [584, 585], and interactions depend only on the density of 1550

neighbors [586, 587]. The earliest application of NMA to elucidate equilibrium dynamics

1551

was conducted in the Karplus laboratory [228], though the usage of normal modes 1552

predates this by 7 years; Levitt and Warshel used normal modes to jump out of local 1553

minima in pioneering folding simulations [68,72]. Further work demonstrated the 1554

effectiveness of such models for capturing thermal vibrations and predicting 1555

experimental B-factors [584, 585, 588–590]. Other work employed normal modes 1556

extracted via NMA from a single structure to model equilibrium fluctuations and in 1557

some cases even capture simple conformational switching [591–598]. The NOMAD-Ref 1558

server [339] provides tools for online NMA of large molecules (of up to 100,000 atoms, 1559

maintaining atomistic detail of their structures) and access to a number of programs 1560

that use the normal modes to model deformations and conduct refinements of 1561

experimental structures. 1562

The earliest employment of NMA in the non-linear morphing setting, to extract 1563

information on intermediate conformations mediating the transition between a goal and

1564

start structure, appeared in [341, 599]. In [599], a geometric morphing technique is 1565

PLOS 32/83

proposed to bridge two ENMs corresponding to given start and goal structures. Related

1566

ideas appeared in [600, 601], moving along a few normal modes from the start structure

1567

pointing to the target structure and then parameterizing the elastic network along the 1568

pathway. In [578], the start and goal structures are interpolated upon optimal 1569

superposition of the CA atoms, but, in contrast to linear morphing methods, the 1570

resulting displacement vector is expanded as a linear combination of the normal modes

1571

calculated on the start structure. 1572

Since, typically ENMs involve only a single energy minimum and are not 1573

immediately applicable to model transitions between multiple stable and semi-stable 1574

structural states of a macromolecule, mixed ENMs [579, 602] and other, related, 1575

ENM-based models have been developed [130, 603–606]. The fundamental issue 1576

addressed in different ways in these works is how to interpolate the ENMs at the start 1577

and goal structures so that the resulting potential retains these structures as local 1578

minima [602]. The plastic network model (PNM) introduced in [603] can include 1579

additional known intermediate structures and is parameterized to account for known 1580

fluctuations available as experimental B-factors. 1581

A group of non-linear morphing methods based on ENMs, mixed ENMs, and 1582

variants such as PNM, compute transitions that are minimum-energy paths (MEP) in 1583

the energy landscape. In [603], the conjugate peak refinement (CPR) algorithm [607] is

1584

used to compute a series of steepest descent paths from saddle points to nearest minima

1585

to connect two structures of interest with a continuous curve in the conformation space.

1586

Similarly, in the Climber method [340, 608], a restraining energy depends linearly on the

1587

distance deviation between the current conformation and the target conformation in a 1588

way that allows full flexibility and enables the protein to move around high-energy 1589

barriers, rather than over them, resulting in the MEP. KOSMOS [609] is another online

1590

morph server that, in addition to offering NMA for nucleic acids, proteins, and their 1591

complexes, also generates plausible transition pathways by optimizing a 1592

topology-oriented cost function that guarantees a smooth transition without steric 1593

clashes. 1594

Transition Path Sampling and Chain-of-States Methods 1595

The main challenge with computing transitions of a macromolecule between meta-stable

1596

states or basins is due to the fact that a macromolecule may spend a very long time in

1597

one basin before transitioning to another. The disparity between the effective thermal 1598

energy and the typical energy barrier is manifested in long waiting periods where the 1599

macromolecule diffuses in a basin followed by a sudden jump to another basin. Such 1600

sudden jumps are rare events, and a significant body of work in macromolecular 1601

modeling is dedicated to enhancing conventional MC or MD simulation frameworks to 1602

capture such events in a reasonable time frame. These methods operationalize seminal 1603

ideas put forth by Pratt on transition path sampling (TPS) [610]. Even though the 1604

energy landscape of a complex system is typically dense in saddle points, only a few 1605

saddle points are relevant for transitions between basins. TPS methods do not rely on 1606

identifying saddle points in the potential energy surface. Instead, they implement 1607

importance sampling over a reduced set of collective variables that span the important 1608

regions of the high-dimensional search space [611–616]. TPS methods are numerical 1609

techniques that effectively conduct MC sampling of the ensemble of transition 1610

paths [617]. Detailed reviews of these methods can be found in [617, 618]. 1611

Transition paths obtained via TPS methods can be quite complicated for systems 1612

with high-dimensional conformation spaces and rugged energy landscapes; a statistical 1613

mechanics framework, known as the transition path theory (TPT) [619], is needed to 1614

organize and analyze the transition path ensemble. Moreover, the success of TPS 1615

methods depends on the particular progress coordinate defined to distinguish the 1616

PLOS 33/83

transition path in the search space, but finding an effective coordinate is non-trivial. 1617

Indeed, multiple progress coordinates may need to be defined to describe the transition.

1618

Therefore, a second group of methods founded on TPT implement the 1619

chain-of-states approach, which assumes that the transition path can be meaningfully 1620

encoded as a series or chain of structures (also referred to as images) [342, 607, 620

–

623].

1621

These methods can track an arbitrary number of progress coordinates while restraining

1622

sampling to effectively one dimension. In chain-of-states methods, a string of images is

1623

created between the given meta-stable states, and the images are relaxed to the 1624

transition pathway. Similar ideas had already appeared in [607,620]. Two types of 1625

chain-of-states methods were proposed afterwards, the nudged elastic band (NEB) 1626

methods and the string methods. 1627

The NEB method [624] addresses a key issue that arises when an artificial spring 1628

force is introduced to maintain even spacing between images. The problem is that when

1629

minimizing the elastic band, the component of the spring force that is perpendicular to

1630

the elastic band tends to pull the images off the MEP. To address this problem, in NEB,

1631

a minimization of the elastic band is carried out where the perpendicular component of

1632

the spring force and the parallel component of the true force are projected out. In this

1633

way, the spring force does not interfere with the relaxation of the images perpendicular

1634

to the path. The result is that the series of relaxed configurations is an approximation 1635

to the MEP, converging to the MEP when there is sufficient resolution in the discrete 1636

representation of the path (when enough images are included in the chain). It is worth

1637

noting that the MEP is just one, special path selected from curves connecting two given

1638

conformations. Work in [625] explains that this special path minimizes the absolute 1639

value of the mechanical work and so is the most probable path for an overdamped 1640

Brownian particle at 0 K [625] (in other words, the most probable Brownian trajectory

1641

in the absence of kinetic energy). Improvements to the NEB method introduced in [624]

1642

have been proposed, particularly regarding improving the tangent estimate [621] and 1643

lowering the computational cost of minimizations [342]. 1644

Generally, NEB methods require that the energy landscape be relatively smooth and

1645

are not effective on rugged energy landscapes [619]. Remedies have been proposed by 1646

having NEB methods operate on the free energy landscape [623], which is expected to 1647

be smoother, or by introducing temperature corrections to the MEP [626]. Caution 1648

must be exercised not to double count entropy when operating on free energy 1649

landscapes. One implication is that implicit solvent potentials cannot be employed to 1650

model dynamics on free energy landscapes. 1651

In string methods, splines are used instead to calculate tangents. In addition, image

1652

spacing is maintained via reparameterization. The first string method proposed in [622]

1653

belongs to the sub-category of zero-temperature string methods [344]. Extensions to 1654

operate on the space of collective variables and compute the minimum free energy path

1655

(MFEP) rather than MEP have also been proposed [343, 345]. Finite-temperature string

1656

methods were later proposed [347, 627] to better deal with overly rugged energy 1657

landscapes. 1658

String methods do not assume the energy landscape is smooth. They can also handle

1659

a large number of collective variables. Effective choices of collective variables have been

1660

discussed and tested in [628]. Work in [619] draws a difference between string methods

1661

and chain-of-states methods, as string methods start with an intrinsic formulation of the

1662

dynamics of curves/strings in configuration space and only resemble chain-of-states 1663

methods after discretization of the curves. String methods sample the configuration 1664

space with strings, which are smooth curves with intrinsic parameterization. The mean

1665

force and other conditional expectations are computed locally over the discretization 1666

points along the string. The string satisfies a differential equation that by construction

1667

guarantees that the string evolves to the most probable transition path connecting two

1668

PLOS 34/83

meta-stable states. 1669

In particular, the finite-temperature string method has been applied recently to 1670

model the complex α-helix to β-sheet transition in a β-hairpin mini protein in implicit 1671

solvent [629]. Transition pathways constructed by string methods have been reported 1672

in [630–634]. To fully appreciate the scope of the string method proposed in [343], we 1673

additionally note here its application to model in detail the transition of the converter 1674

of myosin VI between the PPS and R conformations by computing the associated 1675

MFEP for the R ↔PPS isomerization, the free-energy profile along the transition 1676

pathway, and estimating the interconversion rate [635]. 1677

String methods make use of the approximation that, with high probability, the flux 1678

associated with transition paths is concentrated inside one or a few thin (reaction) 1679

tubes. This may not be a reasonable assumption, particularly for complex systems. The

1680

WEM is combined with a string method in [636] to address this issue. Another method,

1681

proposed in [637] and tested in [638, 639], combines a string method with swarms of 1682

trajectories [637]. 1683

Another drawback of string methods is their computational cost due to the multiple

1684

gradient calculations performed on images located far away from the transition state. 1685

Many methods are proposed to reduce this computational burden. We note here the 1686

growing and the freezing string methods [640–645]. The growing string method 1687

attempts to reduce the number of calculations in the iterative steps of string methods. 1688

Essentially, two string segments are grown independently from the start and goal 1689

structures until they join each-other. The freezing string method additionally reduces 1690

costs related to the parameterization in string methods. The images are optimized in a

1691

direction perpendicular to the progress coordinate with a few conjugate gradient steps 1692

and are then frozen in place, effectively constructing an approximate Hessian. Work 1693

in [646] demonstrates that this approximation performs as well as growing string 1694

methods that use the exact Hessian. As evidenced by the rich number works cited, work

1695

on methods for computing transition paths, rates, and transition states is very active. 1696

Evolutionary Algorithms 1697

An important group of methods to address optimization-related problems in 1698

macromolecular modeling consists of evolutionary algorithms (EAs). EAs approach 1699

stochastic optimization under the umbrella of evolutionary computation, where the 1700

main idea is for computation to mimic the process of evolution and natural selection to

1701

find local optima of a complex objective/fitness function. The realization that the 1702

potential energy landscape of a macromolecule can be non-linear and multimodal, and 1703

that many structure-centric macromolecular modeling problems can be cast as 1704

optimization problems makes EAs highly appealing for macromolecular modeling. 1705

Though EAs are highly customizable algorithms, they all follow a simple template. 1706

A population of samples of a configuration space (generally referred to as individuals) is

1707

evolved over a number of generations. An initialization mechanism specifies the initial 1708

population, which can consist of random samples or include configurations known to be

1709

local optima (for instance, experimentally-available structures may play this role). The

1710

population evolves either over a fixed, user-defined number of generations or until a 1711

different termination criterion is reached. In each generation, individuals with high 1712

fitness are repeatedly selected and varied upon. The selection mechanism specifies 1713

which individuals to select as parents for reproduction. The improvement mechanism 1714

consists of reproductive or variation operators, which can be asexual, introducing a 1715

mutation on a parent, or sexual, combining the material of two parents at one or more

1716

crossover points to generate offspring. A survival mechanism determines which 1717

individuals survive to the next generation. In non-overlapping or generational survival 1718

mechanisms, the offspring replace the parents. In overlapping ones, a subset of 1719

PLOS 35/83

individuals from the combined parent and offspring pool are selected for survival onto 1720

the next generation. A comprehensive review of EAs can be found in [647]. 1721

EAs are very rich algorithmic frameworks, as different design decisions in the 1722

initialization, variation, selection, and survival mechanisms can lead to very different 1723

behaviors. The decision on how to represent individuals is key both to the effectiveness

1724

and ease with which variation operators can be designed to produce good-quality 1725

individuals. EAs that employ crossover in addition to the asexual (mutation) operator 1726

are referred to as genetic algorithms (GAs). EAs that additionally incorporate a meme,

1727

which is a local improvement operator to improve an offspring and effectively map it to

1728

a nearby optimum, are referred to as hybrid or memetic EAs (MAs). The employment 1729

of multiple, independent objective functions as opposed to a single fitness function 1730

results in multi-objective EAs (MO-EAs). Specific variants that build over GA are 1731

respectively referred to as MGAs and MO-GAs. 1732

One of the first EAs for macromolecular structure modeling was a GA, proposed 1733

in [648] for the de novo protein structure prediction problem. Work in [648] also 1734

demonstrated that EAs are better able to escape local minima of a protein energy 1735

function than MC [648]. This result is not surprising, considering that the algorithm 1736

able to compute Lennard-Jones optima of atomic clusters in [649] was in fact an EA. 1737

Referred to as Basin Hopping, the algorithm was a 1+1 MA, which refers to an MA 1738

that has only one parent and one offspring. In a 1+1 MA, the population evolving over

1739

generations has size 1, and the offspring competes with the parent. We recall that MA 1740

refers to an EA where the offspring is subjected to a local improvement operator 1741

(energetic minimization). In Basin Hopping, the offspring replaces the parent with a 1742

probability resembling the Metropolis criterion. An MC search can also be viewed as an

1743

EA, specifically, a 1+1 EA, and all MC-based methods can be conceptualized as EAs 1744

employing highly specific insight about the optimization problem at hand. 1745

Given the early work in [648], EAs have a long history in de novo protein structure

1746

prediction. Customized EAs for this problem contain many evolutionary strategies and

1747

meta-heuristics, including the employment of a hall of fame to preserve “good” 1748

individuals (decoys), tabu search to improve the performance of a meme, co-evolving 1749

memes, niching, crowding, twin removal for population diversification, structuring of the

1750

solution space to facilitate distributed implementations capable of exploiting parallel 1751

computing architectures, and more. The main focus of algorithmic research on EAs is 1752

what mechanisms avoid premature convergence and allow finding the global optimum in

1753

overly rugged fitness landscapes. This is of particular interest on applications of EAs for

1754

different structure-centric problems in macromolecular modeling [650]. A comprehensive

1755

review of EAs for de novo protein structure prediction can be found in [651]. 1756

Though they have a long history in de novo structure prediction, EAs are not 1757

considered among the top performers in this problem for proteins no longer than 200 1758

amino acids. On long protein chains, where off-lattice models result in impractical 1759

computational demands, on-lattice EAs are by now the only viable algorithms [652,653].

1760

However, on shorter chains, where off-lattice models can be afforded, the injection of 1761

specialized operators (moves), such as molecular fragment replacement, and 1762

sophisticated hybrid potential energy functions have allowed rather simple MC-based 1763

algorithms to outperform non-customized EAs. Of note here are the Rosetta and Quark

1764

methods that often dominate the leader board in the CASP competition [118–120]. 1765

Even though EAs have yet to become state of the art in the de novo structure 1766

prediction setting, much progress has been made in recent years [390, 391, 654]. Recently,

1767

EAs have incorporated state-of-the-art, off-lattice representations and energy functions

1768

to become competitive with MC-based methods such as Rosetta [390, 391]. The 1769

additional recasting of the structure prediction problem as a multi-objective 1770

optimization one has resulted in higher exploration capability and conformation quality

1771

PLOS 36/83

over single-objective optimization approaches such as Rosetta [392, 655]. EAs are also 1772

employed to address protein folding [656]. 1773

While there is still much work to be done to demonstrate EAs as the state-of-the-art

1774

approaches for de novo structure prediction, there are three domains in macromolecular

1775

structure modeling where EAs are by now the best performers: protein-ligand binding, 1776

multimeric protein-protein docking, and cryo-EM reconstruction; 1777

In protein-ligand binding, some of the top algorithms are EAs. For instance, 1778

Autodock now employs a Lamarckian GA, which has been demonstrated to result in 1779

better-quality receptor-ligand bound configurations over the MC-SA algorithm 1780

employed in earlier releases [180]. In particular, work in [180] demonstrates that both 1781

the Lamarckian GA and a traditional GA can handle ligands of more degrees of freedom

1782

than MC-SA, and that the Lamarckian GA outperforms the traditional GA. The latter

1783

is due to the fact that in a Lamarckian GA, contrary to the Darwinian model of 1784

evolution, where only genetic traits are inheritable, an offspring is replaced with the 1785

result of the local improvement operator to which it is subjected. This results in 1786

essentially introducing phenotypic traits in the genotypic pool (improvements are passed

1787

onto the next generation), per Jean Baptiste Lamarck’s now discredited claim that 1788

phenotypic characteristics acquired during an individual’s lifetime can be become 1789

inheritable traits; (epigenetics is bringing more credibility, however, to Lamarck’s 1790

claims). It is worth pointing out that many MAs (for instance, even Basin Hopping) are

1791

Lamarckian EAs. MAs that are not Lamarckian choose not to replace the offspring with

1792

the result of the local improvement operator to which it is subjected but use the 1793

improved fitness in the survival mechanism; this is known as the Baldwin effect [657]. 1794

A domain where EAs are showing promise is in structure prediction for asymmetric,

1795

heteromeric assemblies. Currently, the only algorithm that has been shown capable of 1796

producing native asymmetric structures of heteromeric assemblies in the absence of 1797

wet-laboratory data is Multi-LZerD [292]. Multi-LZerD is a GA that represents 1798

multimeric conformations through spanning trees. The nodes in the tree represent the 1799

units, and the edges encode the presence of a direct interaction. As presented, 1800

Multi-LZerD proceeds over 3000 generations. While promising, the algorithm incurs a 1801

high computational cost to be practical in its current form for multimeric assemblies of

1802

more than 6 units. 1803

Another domain where EAs are shown to be highly successful is the simultaneous 1804

registration problem in cryo-EM microscopy reconstruction. One issue with cryo-EM is

1805

that low-resolution maps are often obtained for large asymmetric and/or dynamic 1806

macromolecular assemblies. In such cases, an important problem is how to 1807

simultaneously fit known structures of the units in the given map. A GA with 1808

specialized variation operators and tabu search has been proposed in [658] to 1809

successfully address this problem. This GA has also been used in later work in [659] to

1810

trace αhelices in low- to mid-resolution cryo-EM maps. 1811

While most of the work on EAs in the evolutionary computation community is 1812

driven by algorithmic design and analysis of the exploration capability rather than data

1813

quality, key ideas and strategies on evolutionary search are proving powerful in 1814

enhancing exploration capability in macromolecular structure modeling problems. For 1815

instance, several algorithmic decisions on how to select which parents for reproduction,

1816

generate offspring, and setup the competition for survival are key for balancing the 1817

breadth (exploration) and depth (exploitation) issue in exploration [647]. Lately, 1818

interesting ideas from multi-objective optimization are being incorporated in EAs for 1819

conformation sampling in de novo protein structure prediction. Namely, instead of 1820

pursuing the global minimum of an aggregate energy score, EA-based methods are 1821

proposed to obtain conformations that optimize specific sub-groupings of interatomic 1822

interactions [392]. EA-based methods are also showing promise in mapping energy 1823

PLOS 37/83

landscapes of proteins with large conformational changes [324, 660]. Due to the ongoing

1824

work in the evolutionary computation community on powerful and effective algorithmic

1825

strategies for obtaining solutions of complex objective functions and the realization of 1826

outstanding sampling bottlenecks in de novo structure prediction [661], adoption of EAs

1827

holds great promise for macromolecular structure modeling. 1828

Robotics-inspired Methods 1829

Since simulation of dynamics is the limiting factor in dynamics-based methods, 1830

efficiency concerns can be addressed by foregoing or at least delaying dynamics until 1831

credible conformational paths have been obtained. A different class of methods focuses

1832

not on producing transition trajectories but rather computing a sequence of 1833

conformations (a conformational path) with a credible energy profile. The working 1834

assumption is that, once obtained, credible conformational paths can then be locally 1835

deformed with techniques that consider dynamics to obtain actual transition 1836

trajectories. Such methods adapt sampling-based algorithms developed to address the 1837

robot motion-planning problem and are thus known as robotics-inspired methods. 1838

The objective in robot motion planning is to obtain paths that take a robot from a 1839

start to a goal configuration. The robot motion planning problem bears mechanistic 1840

analogies to the problem of computing conformations along a transition trajectory; in 1841

both problems the goal is to uncover what of the underlying conformation or 1842

configuration space is employed in motions of a mechanical or biological system from a

1843

start to a goal conformation or configuration. Analogies between molecular bonds and 1844

robot links and atoms and robot joints are made to perform fast molecular kinematics. 1845

Robotics-inspired methods are tree-based or roadmap-based [662]. Tree-based 1846

methods grow a tree in conformation space from a given, start to a given, goal 1847

conformation representing the structures bridged by the sought transition. The growth

1848

of the tree is biased so the goal conformation can be reached in reasonable 1849

computational time. As a result, tree-based methods are efficient but limited in their 1850

sampling. They are known as single-query methods, as they can only answer one 1851

start-to-goal query at a time; that is, only one path of consecutive conformations that 1852

connect the start to the goal can be extracted from the tree. Running them multiple 1853

times to sample an ensemble of conformational paths for the same query results in an 1854

ensemble with high inter-path correlations due to the biasing of the conformation tree. 1855

Roadmap-based methods adapt the Probabilistic Road Map (PRM) framework [663]. 1856

These methods support multiple queries. Rather than grow a tree in conformation 1857

space, these methods detach the sampling of conformations from the structure that 1858

encodes neighborhood relationships among conformations in the conformation space. 1859

Typically, a sampling stage first provides a discrete representation of the conformation 1860

space of interest, and then a roadmap building stage embeds sampled conformations in

1861

a graph/roadmap by connecting each one to its nearest neighbors. 1862

Roadmap-based methods bring their own unique set of challenges. 1863

Randomly-sampled conformations have very low probability of being in the region of 1864

interest for the transition. In particular, for long chains with many degrees of freedom 1865

(hundreds of backbone angles in small-to-medium protein chains), a protein 1866

conformation sampled at random is very unlikely to be physically realistic. Biased 1867

sampling techniques can be used to remedy this issue [664, 665], but it is hard to know 1868

which ones will focus sampling to regions of interest for the transition. In addition, both

1869

roadmap- and tree-based methods rely on local planners or local deformation techniques

1870

to connect two neighboring conformations. It is hard to find reasonable local planners 1871

for protein conformations. A linear interpolation is often carried over the employed 1872

parameters, typically backbone angles, but this can produce unrealistic conformations, 1873

and a lot of time can be spent energetically refining these conformations. Recent work 1874

PLOS 38/83

is considering complex local planners that are not based on interpolation but are instead

1875

re-formulations of the motion computation problem. Recent work in [666] introduces a

1876

prioritized path sampling scheme to address the computational demands of complex 1877

local planners in roadmap-based methods for protein motion computation. 1878

Roadmap-based methods have been employed to model unfolding of small 1879

proteins [665, 667]. Tree-based methods have been employed to model conformational 1880

changes and flexibility, predict the native structure, and compute conformational paths

1881

connecting given structural states [351,352, 387, 668–670]. In particular, the T-RRT 1882

method described in [351] and the PDST method described in [352] have focused on the

1883

problem of computing conformational paths connecting two given structures. While 1884

T-RRT has been shown to connect known low-energy states of the dialanine peptide (2

1885

amino acids long) [351], the PDST method has been shown to produce credible 1886

information on the order of conformational changes connecting stable states of large 1887

proteins (200

−

500 amino acids long) [352]. Both methods control the dimensionality of

1888

the conformation space by either focusing on systems with few amino acids [351] or by 1889

employing coarse-grained representations to reduce the number of modeled parameters

1890

in large proteins [352]. The tree-based method in [353] employs the fragment 1891

replacement technique to reduce the dimensionality of the conformation space and 1892

sample conformational paths connecting two given structural states of proteins ranging

1893

from from a few dozen to a few hundred amino acids. At each iteration, a conformation

1894

in the tree is selected for expansion. The expansion employs molecular fragment 1895

replacement and the Metropolis criterion to bias the tree towards low-energy 1896

conformations over time. The selection penalizes the tree from growing towards regions

1897

of the conformation space that have been oversampled, thus resulting in enhanced 1898

sampling of the conformation space. 1899

Conclusions 1900

This review has highlighted the breadth and depth of research in macromolecular 1901

modeling and simulation. A plethora of computational methods have been developed to

1902

study a wide spectrum of molecular events. QM methods are used to study molecular 1903

electronic structures and obtain detailed and accurate electronic structure calculations. 1904

Work in [671] employs such calculations to correlate quantum descriptors and the 1905

biological activity of 13 quinoxaline drug compounds and then suggest effective 1906

compounds against drug-resistant Mycobacterium tuberculosis. Recent efforts in 1907

quantum chemistry are devoted to circumventing computational bottlenecks of 1908

large-scale electronic structure calculations and extending applicability to molecular 1909

systems composed of hundreds of atoms [672]. At present, QM methods have too high a

1910

computational cost to be a competitive alternative to MD or MC methods and their 1911

variants. For this reason, the focus of this review has been on MM methods, such as 1912

MD and variations, which are the methods of choice to study macromolecular structure

1913

and dynamics. It should be noted that hybrid, QM/MM methods exist and are the 1914

methods of choice for modeling reactions in biomolecular systems [673]. 1915

One of the major themes in MM-based macromolecular modeling is the choice of 1916

resolution or detail. As this review has summarized, atomistic, explicit solvent MD 1917

simulations are becoming more affordable, both due to improvements in hardware and 1918

techniques that allow aggressive parallelization. Despite the challenges posed by the 1919

disparate spatial and time scales employed by macromolecules flexing their structures 1920

and interacting with their environment, significant algorithmic and hardware advances 1921

have allowed breaking the millisecond barrier [147]. Dynamical processes that involve 1922

millions of atoms can now be characterized. For example, work in [674] tracks via MD 1923

simulations the microsecond-long atomic motions of 1.2 million particles to study the 1924

PLOS 39/83

dissolution of the capsid of the satellite tobacco necrosis virus. 1925

MD and non-MD methods that employ reduced, coarse-grained macromolecular 1926

models are often regarded as “cheaper” albeit less accurate alternatives to atomistic 1927

MD methods. Such cheaper methods currently complement or facilitate atomistic 1928

MD-based studies. For example, protein docking methods are routinely employed to 1929

assist cryo-EM in resolving structures of molecular assemblies. Once such methods 1930

narrow down the possible conformation space, subsequent atomistic MD simulations are

1931

employed to make final predictions by examining stability and dynamics [111]. 1932

In some settings, these cheaper methods provide the only practical approach. Even 1933

with various accelerated MD simulations, mapping of protein energy landscapes remains

1934

challenging. For example, work in [10] shows that the sampling capability of accelerated

1935

MD greatly depends on the structure used to initiate a trajectory [10]. In our own 1936

laboratories, we have been able to compare the cheaper methods to published atomistic

1937

MD simulations of H-Ras [660]. In particular, on H-Ras, the evolutionary algorithm 1938

in [660] is able to map the energy landscape of H-Ras wildtype and selected variants in

1939

atomistic detail better than what can currently be achieved via known MD methods. In

1940

addition, in a similar comparison on known TIR domains, MD simulations are found to

1941

only cover a small portion of the known conformation space (unpublished data - Qi, 1942

Chen, Wei, Nussinov, and Ma, “L265P mutation changes the energy landscape of 1943

MyD88 protein”) 1944

In MD-based research, two different directions seem to be pursued by researchers at

1945

the moment. The first involves the employment of very long MD simulations, made 1946

possible by complex MD-customized architectures, like Anton. Thermodynamic and 1947

kinetic quantities can be readily extracted from such simulations. The second involves 1948

the employment of several short, off-equilibrium MD simulations, which allows the 1949

employment of parallel architectures but necessitates the employment of statistical 1950

models, such as Markov state models, to collect and organize the simulations to describe

1951

the long-time behavior of a system. Both directions are exciting and complementary. In

1952

particular, the second direction is leading to advances in the combination of continuous

1953

and discrete models for expediting modeling of long-time scale phenomena and is likely

1954

to lead to further algorithmic advancements. Within each of these directions, several 1955

open questions remain for researchers to pursue. A combination of both directions, 1956

dedicated architectures and continuous and discrete models promises to push the spatial

1957

and time scales that can be observed in silico even further. 1958

As summarized in this review, many non-MD algorithmic frameworks are being 1959

pursued to model different aspects of macromolecular structure and dynamics. Often, 1960

these frameworks are inspired or initiated from diverse communities of researchers. Of 1961

note here are evolutionary algorithms and robotics-inspired algorithms. While 1962

components of these algorithms are often investigated in detail in each of the 1963

corresponding communities, the focus in these communities has traditionally been on 1964

often on computational performance rather than quality of findings. Broad employment

1965

of these algorithms as tools complementary to MD is currently challenged by an 1966

inability to demonstrate utility on a broad class of macromolecular systems and validate

1967

findings with existing wet-laboratory or MD-based studies. Nonetheless, a growing body

1968

of researchers within each of these communities is introducing treatments focused on 1969

both computational performance and data quality. 1970

This review has summarized the current state of the art in diverse application areas.

1971

An emerging theme is the need to characterize in detail the structural flexibility of a 1972

macromolecular system under specific conditions. While great progress is being made, 1973

computing a conformation ensemble consistent with explicit or implicit constraints is 1974

likely to motivate the development of novel algorithms for years to come. 1975

Many other directions of research in macromolecular modeling and simulation could

1976

PLOS 40/83

not be described in detail here. These include the development of accurate and sensitive

1977

molecular force fields [140, 141] for macromolecular simulation, the development of 1978

increasingly accurate coarse-grained representations of macromolecules, solvent models,

1979

and multiscaling techniques [76,142

–

144], decoy/model selection algorithms [675] in de

1980

novo structure prediction, as well as the development of algorithmic tools to assist 1981

structure resolution in the wet laboratory [676, 677]. Additionally, while this review 1982

highlights some of the unique challenges posed by intrinsically disordered proteins and 1983

regions, it does not provide an overview of similar challenges posed by membrane 1984

proteins. The reader is referred to work in [678] for a review of such challenges and 1985

algorithmic advancements. 1986

Expected advances in each of the reviewed application areas promise to provide us 1987

with a more comprehensive and detailed understanding of our biology. In particular, 1988

unraveling the behavior of macromolecules in isolation and assembly will help us 1989

understand the molecular basis of mechanisms in the healthy and diseased cell. A truly

1990

synergistic employment of in-silico and wet-lab research to unravel molecular 1991

mechanisms also promises to lead to better therapeutics for combating cancer, 1992

neurodegenerative disorders, infections, and other important human disorders of our 1993

time. The journey into the future of computational structural biology promises to be 1994

exciting, and we hope that this review has inspired a few more researchers to join us on

1995

this journey. 1996

Acknowledgments 1997

Funding for this work is provided in part by the National Science Foundation (Grant 1998

No. 1421001, Grant No. 1440581, and CAREER Award No. 1144106 to AS) and and 1999

the Thomas F. and Kate Miller Jeffress Memorial Trust Award. This work has also 2000

been funded in whole or in part with Federal funds from the NCI, NIH, under contract

2001

number HHSN261200800001E to BM and RN. The content of this publication does not

2002

necessarily reflect the views or policies of the National Science Foundation or 2003

Department of Health and Human Services, nor does mention of trade names, 2004

commercial products, or organizations imply endorsement by the U.S. Government. 2005

This study was supported (in part) by the Intramural Research Program of the NIH, 2006

NCI, Center for Cancer Research. 2007

References

1. Soto C. Protein misfolding and neurodegeneration. JAMA Neurology.

2008;65(2):184–189.

2. Uversky VN. Intrinsic disorder in proteins associated with neurodegenerative

diseases. Front Biosci. 2009;14:5188–5238.

3. Fern´andez-Medarde A, Santos E. Ras in cancer and developmental diseases.

Genes Cancer. 2011;2(3):344–358.

4. Neudecker P, Robustelli P, Cavalli A, Walsh P, Lundstr om P, Zarrine-Afsar A,

et al. Structure of an intermediate state in protein folding and aggregation.

Science. 2012;336(6079):362–366.

5. Onuchic JN, Luthey-Schulten Z, Wolynes PG. Theory of protein folding: the

energy landscape perspective. Annu Rev Phys Chem. 1997;48:545–600.

PLOS 41/83

6. Ozenne V, Schneider R, Yao M, Huang JR, Salmon L, Zweckstetter M, et al.

Mapping the potential energy landscape of intrinsically disordered proteins at

amino acid resolution. J Am Chem Soc. 2012;134(36):15138–15148.

7. Levy Y, Jortner J, Becker OM. Solvent effects on the energy landscapes and

folding kinetics of polyalanine. Proc Natl Acad Sci USA. 2001;98(5):2188–2193.

8. Miao Y, Nichols SE, McCammon JA. Free energy landscapes of

G-protein-coupled receptors, explored by accelerated molecular dynamics. Phys

Chem Chem Phys. 2014;16(14):6398–6406.

9. Gorfe AA, Grant BJ, McCammon JA. Mapping the nucleotide and

isoform-dependent structural and dynamical features of Ras proteins. Structure.

2008;16(6):885–896.

10. Grant BJ, Gorfe AA, McCammon JA. Ras Conformational Switching:

Simulating Nucleotide-Dependent Conformational Transitions with Accelerated

Molecular Dynamics. PLoS Comput Biol. 2009;5(3):e1000325.

11. Anfinsen CB. Principles that govern the folding of protein chains. Science.

1973;181(4096):223–230.

12. Fersht AR. Structure and Mechanism in Protein Science. A Guide to Enzyme

Catalysis and Protein Folding. 3rd ed. New York, NY: W. H. Freeman and Co.;

1999.

13.

Frauenfelder H, Sligar SG, Wolynes PG. The energy landscapes and motion on

proteins. Science. 1991;254(5038):1598–1603.

14. Sawaya MR, Kraut J. Loop and Domain Movements in the Mechanism of E.

Coli Dihydrofolate Reductase: Crystallographic Evidence. Biochemistry.

1997;36(3):586–603.

15. Radkiewicz JL, Brooks CL. Protein dynamics in enzymatic catalysis:

Exploration of dihydrofolate reductase. J Am Chem Soc. 2000;122(2):225–231.

16. Vendruscolo M, Dobson CM. Dynamic visions of enzymatic reactions. Science.

2006;313(5793):1586–1587.

17.

Clore GM, Schwieters CD. How much backbone motion in ubiquitin is required

to account for dipolar coupling data measured in multiple alignment media as

assessed by independent cross-validation? J Am Chem Soc.

2004;126(9):2923–2938.

18. Henzler-Wildman K, Kern D. Dynamic personalities of proteins. Nature.

2007;450:964–972.

19.

Okazaki K, Koga N, Takada S, Onuchic JN, Wolynes PG. Multiple-basin energy

landscapes for large-amplitude conformational motions of proteins:

Structure-based molecular dynamics simulations. Proc Natl Acad Sci USA.

2006;103(32):11844–11849.

20. Hub JS, de Groot BL. Detection of Functional Modes in Protein Dynamics.

PLoS Comput Biol. 2009;5(8):e1000480.

21. Bahar I, Lezon TR, Yang LW, Eyal E. Global dynamics of proteins: bridging

between structure and function. Annu Rev Biophys. 2010;39:23–42.

PLOS 42/83

22. Boehr DD, Wright PE. How do proteins interact? Science.

2008;320(5882):1429–1430.

23. Boehr DD, Nussinov R, Wright PE. The role of dynamic conformational

ensembles in biomolecular recognition. Nature Chem Biol. 2009;5(11):789–96.

24. Feynman RP, Leighton RB, Sands M. The Feynman Lectures on Physics.

Reading, MA: Addison-Wesley; 1963.

25. McCammon JA, Gelin BR, Karplus M. Dynamics of folded proteins. Nature.

1977;267:585–590.

26. Cooper A. Protein fluctuations and the thermodynamic uncertainty principle.

Prog Biophys Mol Biol. 1984;44(3):181–214.

27. Frauenfelder H, Wolynes PG. Biomolecules: Where the Physics of Complexity

and Simplicity Meet. Physics Today. 1994;47(2):58–64.

28. Dill KA, Chan HS. From Levinthal to pathways to funnels. Nat Struct Biol.

1997;4(1):10–19.

29.

Heymann JB, Conway JF, Steven AC. Molecular dynamics of protein complexes

from four-dimensional cryo-electron microscopy. J Struct Biol.

2004;147(3):291–301.

30. Kleckner IR, Foster MP. An introduction to NMR-based approaches for

measuring protein dynamics. Biochim Biophys Acta. 2011;14(8):942–968.

31.

Fenwick RB, van den Bedem H, Fraser JS, Wright PE. Integrated description of

protein dynamics from room-temperature X-ray crystallography and NMR. Proc

Natl Acad Sci USA. 2014;111(4):E445–E454.

32. Best RB, Vendruscolo M. Determination of ensembles of structures consistent

with NMR order parameters. J Am Chem Soc. 2004;126(26):8090–8091.

33. Berlin K, Casta˜neda CA, Schneidman-Duhovny D, Sali A, Nava-Tudela A,

Fushman D. Recovering a representative conformational ensemble from

underdetermined macromolecular structural data. J Am Chem Soc.

2013;135(44):16595–16609.

34.

De Simone A, Montalvao RW, Dobson CM, Vendruscolo M. Characterization of

the Interdomain Motions in Hen Lysozyme Using Residual Dipolar Couplings as

Replica-Averaged Structural Restraints in Molecular Dynamics Simulations.

Biochemistry. 2013;52(37):6480–6486.

35. Lindorff-Larsen K, Best RB, DePristo MA, Dobson CM, Vendruscolo M.

Simultaneous determination of protein structure and dynamics. Nature.

2005;433(7022):128–132.

36. Vendruscolo M, Pacci E, Dobson C, Karplus M. Rare Fluctuations of Native

Proteins Sampled by Equilibrium Hydrogen Exchange. J Am Chem Soc.

2003;125(51):15686–15687.

37. Kay LE. Protein Dynamics from NMR. Nat Struct Biol. 1998;5(2-3):513–517.

38. Kay LE. NMR studies of protein structure and dynamics. J Magn Reson.

2005;173(2):193–207.

PLOS 43/83

39. Torella JP, Holden SJ, Santoso Y, Hohlbein J, Kapanidis AN. Identifying

Molecular Dynamics in Single-Molecule FRET Experiments with Burst Variance

Analysis. Biophys J. 2011;100(6):1568–1577.

40. Zhu G, editor. NMR of proteins and small biomolecules. vol. 326 of Topics in

Current Chemistry. Springer-Verlag; 2012.

41. Karam P, Powdrill MH, Liu HW, Vasquez C, Mah W, Bernatchez J, et al.

Dynamics of hepatitis C Virus (HCV) RNA-dependent RNA Polymerase NS5B

in Complex with RNA. J Biol Chem. 2014;289(20):14399–14411.

42.

Moerner WE, Fromm DP. Methods of single-molecule fluorescence spectroscopy.

Rev Scientific Instruments. 2003;74(8):3597–3619.

43. Greenleaf WJ, Woodside MT, Block SM. High-Resolution, Single-Molecule

Measurements of Biomolecular Motion. Annu Rev Biophys Biomol Struct.

2007;36:171–190.

44. Michalet X, Weiss S, J¨ager M. Single-Molecule Fluorescence Studies of Protein

Folding and Conformational Dynamics. Chem Rev. 2006;106(5):1785–1813.

45. Diekmann S, Hoischen C. Biomolecular dynamics and binding studies in the

living cell. Physics of Life Reviews. 2014;11(1):1–30.

46.

Hohlbein J, Craggs TD, Cordes T. Alternating-laser excitation: single-molecule

FRET and beyond. Chem Soc Rev. 2014;43:1156–1171.

47. Schlau-Cohen GS, Wang Q, Southall J, Cogdell RJ, Moerner WE.

Single-molecule spectroscopy reveals photosynthetic LH2 complexes switch

between emissive states. Proc Natl Acad Sci USA. 2013;110(27):10899–10903.

48. Moffat K. The frontiers of time-resolved macromolecular crystallography:

movies and chirped X-ray pulses. Faraday Discuss. 2003;122(79-88):65–77.

49. Schotte F, Lim M, Jackson TA, Smirnov AV, Soman J, Olson JS, et al.

Watching a protein as it functions with 150-ps time-resolved X-ray

crystallography. Science. 2003;300(5627):1944–1947.

50. Roy R, Hohng S, Ha T. A practical guide to single-molecule FRET. Nature

Methods. 2008;5(6):507–516.

51. Lee HM, M KS, Kim HM, Suh YD. Single-molecule surface-enhanced Raman

spectroscopy: a perspective on the current status. Phys Chem Chem Phys.

2013;15:5276–5287.

52.

Socher E, Imperiali B. FRET-CAPTURE: A sensitive method for the detection

of dynamic protein interactions. Chem Biochem. 2013;14(1):53–57.

53. Gall A, Ilioaia C, Kr¨uger TP, Novoderezhkin VI, Robert B, van Grondelle R.

Conformational Switching in a Light-Harvesting Protein as Followed by

Single-Molecule Spectroscopy. Biophys J. 2015;108(11):2713–2720.

54. ˚

Ad´en J, Wolf-Watz M. NMR Identification of Transient Complexes Critical to

Adenylate Kinase Catalysis. J Am Chem Soc. 2007;129(45):14003 –14012.

55. Russel D, Lasker K, Phillips J, Schneidman-Duhovny D, Vel´aquez-Muriel JA,

Sali A. The structural dynamics of macromolecular processes. Curr Opin Cell

Biol. 2009;21(1):97–108.

PLOS 44/83

56. Taketomi H, Ueda Y, Go N. Studies on protein folding, unfolding and

fluctuations by computer simulation: The effect of specific amino acid sequence

represented by specific inter-unit interactions. Int J Peptide Prot Res.

1975;7(6):445–459.

57. Bashford D, Karplus M. pKa’s of ionizable groups in proteins: atomic detail

from a continuum electrostatic model. Biochemistry. 1990;29(44):10219–10225.

58. Lau KF, Dill AK. A lattice statistical mechanics model of the conformational

and sequence spaces of of proteins. Macromolecules. 1989;22(10):3986–3997.

59. Unger R, Moult J. Finding lowest free energy conformation of a protein is an

NP-hard problem: Proof and implications. Bull Math Biol.

1993;55(6):1183–1198.

60.

Hart WE, Istrail S. Robust Proofs of NP-Hardness for Protein Folding: General

Lattices and Energy Potentials. J Comp Biol. 1997;4(1):1–22.

61. Kendrew JC, Bodo G, Dintzis HM, Parrish RG, Wyckoff H, Phillips DC. A

three-dimensional model of the myoglobin molecule obtained by X-ray analysis.

Nature. 1958;181(4610):662–666.

62.

Kendrew JC, Dickerson RE, Strandberg BE, Hart RG, Davies DR, Phillips DC,

et al. Structure of myoglobin: A three-dimensional fourier synthesis at 2 ˚

resolution. Nature. 1960;185(4711):422–427.

63.

Phillips DC. The Hen Egg-White Lysozyme Molecule. Proc Natl Acad Sci USA.

1967;57(3):483–495.

64.

Berman HM, Henrick K, Nakamura H. Announcing the worldwide Protein Data

Bank. Nat Struct Biol. 2003;10(12):980–980.

65. Verlet L. Computer ”experiments” on Classical Fluids. I. Thermodynamical

Properties of Lennard-Jones Molecules. Phys Rev Lett. 1967;159:98–103.

66.

Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M.

CHARMM: a program for macromolecular energy, minimization, and dynamics

calculations. J Comput Chem. 1983;4(2):187–217.

67. Karplus M, McCammon JA. Protein structural fluctuations during a period of

100 ps. Nature. 1979;277(5697):578.

68. Levitt M, Warshel A. Computer simulation of protein folding. Nature.

1975;253(5494):94–96.

69. Lifson S, Warshel A. A Consistent Force Field for Calculation on

Conformations, Vibrational Spectra and Enthalpies of Cycloalkanes and

n-Alkane Molecules”. J Phys Chem. 1968;49:5116–5129.

70. Levitt M, Lifson S. Refinement of Protein Conformations Using a

Macromolecular Energy Minimization Procedure. J Mol Biol. 1969;46:269–279.

71. Gibson KD, Scheraga A. Minimization of Polypeptide Energy. I. Preliminary

Structures of Bovine Pancreatic Ribonuclease S-peptide. Proc Natl Acad Sci

USA. 1967;58:420–427.

72. Levitt M. A Simplified Representation of Protein Conformations for Rapid

Simulation of Protein Folding. J Mol Biol. 1976;104:59–107.

PLOS 45/83

73. A W, Levitt M. Theoretical Studies of Enzymatic Reactions: Dielectric,

Electrostatic and Steric Stabilization of the Carbonium Ion in the Reaction of

Lysozyme. J Mol Biol. 1976;103:227–249.

74. Warshel A. Computer simulations of enzyme catalysis: Methods, progress, and

insights. Annu Rev Biophys Biomol Struct. 2003;32:425–443.

75. Donchev AG, Ozrin VD, Subbotin MV, Tarasov OV, Tarasov VI. A Quantum

Mechanical Polarizable Force Field for Biomolecular Interactions. Proc Natl

Acad Sci USA. 2005;102(22):7829–7834.

76. Zhou H. Theoretical frameworks for multiscale modeling and simulation. Curr

Opinion Struct Biol. 2014;25:67–76.

77. Kamerlin SC, Haranczyk M, Warshel A. Progresses in Ab Initio QM/MM Free

Energy Simulations of Electrostatic Energies in Proteins: Accelerated QM/MM

Studies of pKa, Redox Reactions and Solvation Free Energies. J Phys Chem B.

2009;113(5):1253–1272.

78. Kamerlin SCL, Vicatos S, Dryga A, Warshel A. Coarse-Grained (Multiscale)

Simulations in Studies of Biophysical and Chemical Systems. Ann Rev Phys

Chem. 2011;62(1):41–64.

79. Plotnikov NV, Warshel A. Exploring, Refining, and Validating the

Paradynamics QM/MM Sampling,”. J Phys Chem B. 2012;116(34):10342–10356.

80. Vicatos S, Rychkova A, Mukherjee S, Warshel A. An effective Coarse-grained

model for biological simulations: Recent refinements and validations. Proteins:

Structure, Function, and Bioinformatics. 2014;82(7):1168–1185.

81. Warshel A. Energetics of enzyme catalysis. Proc Natl Acad Sci USA.

1978;75(11):5250–5254.

82. Mukherjee S, Warshel A. Electrostatic origin of the mechanochemical rotary

mechanism and the catalytic dwell of F1-ATPase. Proc Natl Acad Sci USA.

2011;108(51):20550–20555.

83. Mukherjee S, Warshel A. Realistic simulations of the coupling between the

protomotive force and the mechanical rotation of the F0-ATPase. Proc Natl

Acad Sci USA. 2012;109(3):14876–14881.

84. Dryga A, Chakrabarty S, Vicatos S, Warshel A. Realistic simulation of the

activation of voltage-gated ion channels. Proc Natl Acad Sci USA.

2011;109(9):3335–3340.

85. Rychkova A, Mukherjee S, Bora RP, Warshel A. Simulating the pulling of

stalled elongated peptide from the ribosome by the translocon. Proc Natl Acad

Sci USA. 2013;110(25):10195–10200.

86.

Mukherjee S, Warshel A. Electrostatic origin of the unidirectionality of walking

myosin V motors. Proc Natl Acad Sci USA. 2013;110(43):17326–17331.

87. Ma J, Sigler PB, Xu Z, Karplus M. A Dynamic Model for the Allosteric

Mechanism of GroEL. J Mol Biol. 2000;302:303–313.

88. Henzler-Wildman KA, Thai V, Lei M, Ott M, Wolf-Watz M, Fenn T, et al.

Intrinsic motions along an enzymatic reaction trajectory. Nature.

2007;450(7171):838–844.

PLOS 46/83

89. Gao YQ, Yang W, Karplus M. A structure-based model for the synthesis and

hydrolysis of ATP by F1-ATPase. Cell. 2005;123(2):195–205.

90. Pu J, Karplus M. How subunit coupling produces the γ-subunit rotary motion

in F1-ATPase. Proc Natl Acad Sci USA. 2008;105(4):1192–1197.

91. Scarabelli G, Grant BJ. Mapping the Structural and Dynamical Features of

Kinesin Motor Domains. PLoS Comput Biol. 2013;9(11):e1003329.

92. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. Equation

of state calculations by fast computing machines. J Chem Phys.

1953;21(6):1087–1092.

93. Torrie GM, Valleau JP. Nonphysical sampling distributions in Monte Carlo

free-energy estimation: umbrella sampling. J Comput Phys. 1977;23(2):187–199.

94.

Li Z, Scheraga HA. Monte Carlo-minimization approach to the multiple-minima

problem in protein folding. Proc Natl Acad Sci USA. 1987;84(19):6611–6615.

95.

Dinner AR, Sali A, Karplus M. The folding mechanism of larger model proteins:

role of native structure. Proc Natl Acad Sci USA. 1996;93(16):8356–8361.

96.

Lee J, Scheraga HA, Rackovsky S. New optimization method for conformational

energy calculations on polypeptides: Conformational space annealing. J Comput

Chem. 1997;18(9):1222–1232.

97. Lee J, Scheraga HA, Rackovsky S. Conformational analysis of the 20-residue

membrane-bound portion of melittin by conformational space annealing.

Biopolymers. 1998;46(2):103–115.

98.

Lee J, Scheraga HA. Conformational space annealing by parallel computations:

Extensive conformational search of Met-enkephalin and of the 20-residue

membrane-bound portion of melittin. Int J Quantum Chem. 1999;75(3):255–265.

99. Voter AF. Introduction to the Kinetic Monte Carlo Method. In: Sickafus KE,

Kotomin EA, Uberuaga BP, editors. Radiation Effects in Solids. vol. 235 of

NATO Science Series. Springer Verlag; 2007. p. 1–23.

100. Levitt M. The birth of computational structural biology. Nat Struct Biol.

2001;8:392–393.

101. Karplus M. Development of multiscale models for complex chemical systems

from H+H2 to Biomolecules. Nobel Lecture. 2013;p. 1–33. Available from:

http://www.nobelprize.org/nobel_prizes/chemistry/laureates/2013/

karplus-lecture.pdf.

102. Warshel A. Multiscale modeling of biological functions: from enzymes to

molecular machines. Nobel Lecture. 2013;p. 1–25. Available from:

http://www.nobelprize.org/nobel_prizes/chemistry/laureates/2013/

warshel-lecture.pdf.

103. Levitt M. Birth and future of multiscale modeling for macromolecular systems.

Nobel Lecture. 2013;p. 1–31. Available from: http://www.nobelprize.org/

nobel_prizes/chemistry/laureates/2013/levitt-lecture.pdf.

104. Piana S, Lindorff-Larsen K, Shaw DE. Protein folding kinetics and

thermodynamics from atomistic simulation. Proc Natl Acad Sci USA.

2012;109(44):17845–17850.

PLOS 47/83

105.

Lindorff-Larsen K, Piana S, Dror RO, Shaw DE. How fast-folding proteins fold.

Science. 2011;334(6055):517–520.

106. Stone JE, Phillips JC, Freddolino PL, Hardy DJ, Trabuco LG, Schulten K.

Accelerating molecular modeling applications with graphics processors. J

Comput Chem. 2007;28(16):2618–2640.

107. Harvey MJ, Giupponi G, de Fabritiis G. ACEMD: Accelerating Biomolecular

Dynamics in the microsecond timescale. J Comput Theor Chem.

2009;5(6):1632–1639.

108. Tanner DE, Phillips JC, Schulten K. GPU/CPU Algorithm for Generalized

Born/Solvent-Accessible Surface Area Implicit Solvent Calculations. J Chem

Theory Comput. 2012;8(7):2521–2530.

109. G otz AW, Williamson MJ, Xu D, Poole D, Le Grand S, Walker RC. Routine

Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 1.

Generalized Born. J Chem Theory Comput. 2012;8(5):1542–1555.

110. Dubrow A. What Got Done in One Year at NSF’s Stampede Supercomputer.

Comput Sci Eng. 2015;17(2):83–88.

111. Zhao G, Perilla JR, Yufenyuy EL, Meng X, Chen B, Ning J, et al. Mature

HIV-1 capsid structure by cryo-electron microscopy and all-atom molecular

dynamics. Nature. 2013;497(7451):643–646.

112. Perilla JR, Goh BC, Cassidy CK, Liu B, Bernardi RC, Rudack T, et al.

Molecular dynamics simulations of large macromolecular complexes. Curr Opin

Struct Biol. 2015;31:64–74.

113. Fattebert JL, Richards DF, Glosli JN. Dynamic load balancing algorithm for

molecular dynamics based on Voronoi cells domain decompositions. Comput

Phys Communic. 2012;183(12):2608–2615.

114.

Proctor AJ, Lipscomb TJ, Zou A, Anderson JA, Cho SS. Performance Analyses

of a Parallel Verlet Neighbor List Algorithm for GPU-Optimized MD

Simulations; 2012.

115. Batcho P, Case DA, Schlick T. Optimized particle-mesh Ewald/multiple-time

step integration for molecular dynamics simulations. J Chem Phys.

2001;115(9):4003–4018.

116. Phillips JC, Braun R, Wang W, Gumbart J, Tajkhorshid E, Villa E, et al.

Scalable molecular dynamics with NAMD. J Comput Chem.

2005;26(16):1781–1802.

117. Bradley P, Misura KMS, Baker D. Toward High-Resolution de Novo Structure

Prediction for Small Proteins. Science. 2005;309(5742):1868–1871.

118. Leaver-Fay A, Tyka M, Lewis SM, Lange OF, Thompson J, Jacak R, et al.

ROSETTA3: an object-oriented software suite for the simulation and design of

macromolecules. Methods Enzymol. 2011;487:545–574.

119.

Xu D, Zhang Y. Ab initio protein structure assembly using continuous structure

fragments and optimized knowledge-based force field. Proteins: Struct Funct

Bioinf. 2012;80(7):1715–1735.

PLOS 48/83

120.

Zhang Y. Interplay of I-TASSER and QUARK for template-based and ab initio

protein structure prediction in CASP10. Proteins. 2014;82(Suppl 2):175–187.

121.

Grant BJ, Gorfe AA, McCammon JA. Large conformational changes in proteins:

signaling and other functions. Curr Opinion Struct Biol. 2010;20(2):142–147.

122. Prakash P, Gorfe AA. Lessons from computer simulations of Ras proteins in

solution and in membrane. Biochim Biophys Acta. 2013;1830(11):5211–5218.

123. No´e F, Schutte C, Vanden-Eijnden E, Reich L, Weikl TR. Constructing the

equilibrium ensemble of folding pathways from short off-equilibrium simulations.

Proc Natl Acad Sci USA. 2009;106(45):19011–19016.

124. Whitford PC, Sanbonmatsu KY, Onuchic JN. Biomolecular dynamics:

order-disorder transitions and energy landscapess. Reports on Progress in

Physics. 2012;75(7):076601.

125. Shehu A, Kavraki LE, Clementi C. Unfolding the Fold of Cyclic Cysteine-rich

Peptides. Protein Sci. 2008;17(3):482–493.

126. Shehu A, Kavraki LE, Clementi C. Multiscale Characterization of Protein

Conformational Ensembles. Proteins: Struct Funct Bioinf. 2009;76(4):837–851.

127.

Diaz JF, Wroblowski B, Schlitter J, Engelborghs Y. Calculation of pathways for

the conformational transition between the GTP- and GDP-bound states of the

Ha-ras-p21 protein: calculations with explicit solvent simulations and

comparison with calculations in vacuum. Proteins. 1997;28(3):434–451.

128. Malmstrom RD, Lee CT, Van Wart AT, Amaro RE. Application of

Molecular-Dynamics Based Markov State Models to Functional Proteins. J

Chem Theory Comput. 2014;10(7):2648–2657.

129. Maragliano L, Vanden-Eijnden E, Roux B. Free Energy and Kinetics of

Conformational Transitions from Voronoi Tessellated Milestoning with

Restraining Potentials. J Chem Theory Comput. 2009;5(10):2589–2594.

130. Franklin J, Koehl P, Doniach S, Delarue M. MinActionPath: maximum

likelihood trajectory for large-scale structural transitions in a coarse-grained

locally harmonic energy landscape. Nucleic Acids Res. 2007;35(Web Server

issue):W477–W482.

131. Yang Z, Mˆajek P, Bahar I. Allosteric Transitions of Supramolecular Systems

Explored by Network Models: Application to Chaperonin GroEL. PLoS Comput

Biol. 2009;5(4):e1000360.

132. Prinz JH, Keller B, No´e F. Probing molecular kinetics with Markov models:

metastable states, transition pathways and spectroscopic observables. Phys

Chem Chem Phys. 2011;13(38):16912–16927.

133. Beauchamp KA, Bowman GR, Lane TJ, Maibaum L, Haque IS, Pande VS.

MSMBuilder2: Modeling Conformational Dynamics at the Picosecond to

Millisecond Scale. J Chem Theory Comput. 2011;7(10):3412–3419.

134.

Ravindranathan KP, Gallicchio E, Levy RM. Conformational equilibria and free

energy profiles for the allosteric transition of the ribose-binding protein. J Mol

Biol. 2005;353(1):196–210.

PLOS 49/83

135. Pietrucci F, Marinelli F, Carloni P, Laio A. Substrate binding mechanism of

HIV-1 protease from explicit-solvent atomistic simulations. J Amer Chem Soc.

2009;131(33):11811–11818.

136. Buch I, Giorgino T, De Fabritiis G. Complete reconstruction of an

enzyme-inhibitor binding process by molecular dynamics simulations. Proc Natl

Acad Sci USA. 2011;108(25):10184–10189.

137.

Feher VA, Durrant JD, Van Wart AT, Amaro RE. Computational approaches to

mapping allosteric pathways. Curr Opinion Struct Biol. 2014;25:98–103.

138.

Held M, Metzner P, Prinz JH, No´e F. Mechanisms of protein-ligand association

and its modulation by protein mutations. Biophys J. 2011;100(3):701–710.

139.

Held M, No´e F. Calculating kinetics and pathways of protein-ligand association.

Eur J Cell Biol. 2012;91(4):357–364.

140. Freddolino PL, Park S, Roux B, Schulten K. Force field bias in protein folding

simulations. Biophys J. 2009;96(9):3772–3780.

141. Vitalini F, Mey AS, No´e F, Keller BG. Dynamic properties of force fields. J

Chem Phys. 2015;142:084101.

142. Sakae Y, Okamoto Y. Optimizations of protein force fields. In: Liwo A, editor.

Computational Methods to Study the Structure and Dynamics of Biomolecules

and Biomolecular Processes. Berlin, Heidelberg: Springer-Verlag; 2014. p.

195–247.

143.

Clementi C. Coarse-grained models of protein folding: Toy-models or predictive

tools? Curr Opinion Struct Biol. 2008;18:10–15.

144. Kleinjung J, Fraternali F. Design and application of implicit solvent models in

biomolecular simulations. Curr Opinion Struct Biol. 2014;25(100):126–134.

145.

Dryga A, Warshel A. Renormalizing SMD: The Renormalization Approach and

Its Use in Long Time Simulations and Accelerated PAU Calculations of

Macromolecules,. J Phys Chem B. 2010;114(39):12720–12728.

146. Chodera JD, No´e F. Markov state models of biomolecular conformational

dynamics. Curr Opinion Struct Biol. 2014;25:135–144.

147. Shaw DE, Maragakis P, Lindorff-Larsen K, Piana S, Dror RO, Eastwood MP,

et al. Atomic-Level Characterization of the Structural Dynamics of Proteins.

Science. 2010;330(6002):341–346.

148. Zagrovic B, Snow CD, Shirts MR, Pande VS. Simulation of folding of a small

alpha-helical protein in atomistic detail using worldwide-distributed computing.

J Mol Biol. 2002;323(5):927–937.

149.

Wang K, Chodera JD, Yang Y, Shirts MR. Identifying ligand binding sites and

poses using GPU-accelerated Hamiltonian replica exchange molecular dynamics.

J Computer-Aided Mol Des. 2013;27(12):989–1007.

150. Ando T, Skolnick J. Sliding of Proteins Non-specifically Bound to DNA:

Brownian Dynamics Studies with Coarse-Grained Protein and DNA Models.

PLoS Comput Biol. 2014;10(12):e1003990.

PLOS 50/83

151.

Marklund EG, Mahmutovic A, Berg OG, Hammar P, van der Spoel D, Fange D,

et al. Transcription-factor binding and sliding on DNA studied using micro- and

macroscopic models. Proc Natl Acad Sci USA. 2013;110(49):19796–19801.

152. Sz¨oll˝osi D, Horv´ath T, Han K, Dokholyan NV, Tompa P, Kalm´ar L, et al.

Discrete molecular dynamics can predict helical prestructured motifs in

disordered proteins. PLoS one. 2014;9(4):e95795.

153.

Shukla D, Hern´andez CX, Weber JK, Pande VS. Markov State Models Provide

Insights into Dynamic Modulation of Protein Function. Acc Chem Res.

2015;48(2):414–422.

154. Koshland D. Application of a theory of enzyme specificity to protein synthesis.

Proc Natl Acad Sci USA. 1958;44(2):98–104.

155. Bosshard HR. Molecular recognition by induced fit: how fit is the concept?

Physiology. 2001;16:171–173.

156.

Ma B, Kumar S, Tsai C, Nussinov R. Folding funnels and binding mechanisms.

Protein Eng. 1999;12(9):713–720.

157. Tsai C, Ma B, Nussinov R. Folding and binding cascades: shifts in energy

landscapes. Proc Natl Acad Sci USA. 1999;96(18):9970–9972.

158. Tsai C, Kumar S, Ma B, Nussinov R. Folding funnels, binding funnels, and

protein function. Protein Sci. 1999;8(6):1181–1190.

159. Monod J, Wyman J, Changeaux JP. On the nature of allosteric transitions: a

plausible model. J Mol Biol. 1965;12:88–118.

160. Lange OF, Lakomek NA, Far´es C, Schr¨oder GF, Walter KF, Becker S, et al.

Recognition Dynamics Up to Microseconds Revealed from an RDC-Derived

Ubiquitin Ensemble in Solution. Science. 2008;320(5882):1471–1475.

161. Csermely P, Palotai R, Nussinov R. Induced fit, conformational selection and

independent dynamic segments: an extended view of binding events. Trends

Biochem Sci. 2010;35(10):539–546.

162. Cui Q, Karplus M. Allostery and cooperativity revisited. Protein Sci.

2008;17(8):1295–1307.

163. Feixas F, Lindert S, Sinko W, McCammon JA. Exploring the role of receptor

flexibility in structure-based drug discovery. Biophys Chem. 2014;186:31–45.

164.

Ewing TJ, Makino S, Skillman AG, Kuntz ID. DOCK 4.0: search strategies for

automated molecular docking of flexible molecule databases. J Comput Aided

Mol Des. 2001;15(5):411–428.

165. Kramer B, Rarey M, Lengauer T. Evaluation of the FLEXX incremental

construction algorithm for protein-ligand docking. Proteins: Struct Funct Bioinf.

1999;37(2):228–241.

166. Wagener M, Vlieg J, Nabuurs SB. Flexible protein-ligand docking using the

Fleksy protocol. J Comput Chem. 2012;33(12):1215–1217.

167. Verdonk ML, Cole JC, Hartshorn MJ, Murray CW, Taylor RD. Improved

protein-ligand docking using GOLD. Proteins: Struct Funct Bioinf.

2003;52(4):609–623.

PLOS 51/83

168. Verdonk ML, Chessari G, Cole JC, Hartshorn MJ, Murray CW, Nissink JW,

et al. Modeling water molecules in protein-ligand docking using GOLD. J Med

Chem. 2005;48(20):6504–6515.

169. Goodsell DS, Morris GM, Olson AJ. Automated docking of flexible ligands:

applications of AutoDock. J Mol Recogn. 1996;9(1):1–5.

170. Morris GM, Huey R, Lindstrom W, Sanner MF, Belew RK, Goodsell DS, et al.

AutoDock4 and AutoDockTools4: Automated docking with selective receptor

flexibility. J Comput Chem. 2009;30(16):2785–2791.

171. Trott O, Olson AJ. AutoDock Vina: improving the speed and accuracy of

docking with a new scoring function, efficient optimization, and multithreading.

J Comput Chem. 2010;31(2):455–461.

172.

Vass M, Tarcsay A, Keser¨u GM. Multiple ligand docking by Glide: implications

for virtual second-site screening. J Comput Aided Mol Des. 2012;26(7):821–834.

173. Davis IW, Baker D. RosettaLigand docking with full ligand and receptor

flexibility. J Mol Biol. 2009;385(2):381–392.

174. Meiler J, Baker D. ROSETTALIGAND: Protein-small molecule docking with

full side-chain flexibility. Proteins: Struct Funct Bioinf. 2006;65(3):538–548.

175. Grosdidier A, Zoete V, Michielin O. SwissDock, a protein-small molecule

docking web service based on EADock DSS. Nucleic Acids Res. 2011;39(Suppl

2):W270–W277.

176. Spitzer R, Jain AN. Surflex-Dock: Docking benchmarks and real-world

application. J Comput Aided Mol Des. 2012;26(6):687–699.

177.

Chakraborty S. DOCLASP-Docking ligands to target proteins using spatial and

electrostatic congruence extracted from a known holoenzyme and applying

simple geometrical transformations. F1000Research. 2014;3.

178.

Ruiz-Carmona S, Alvarez-Garcia D, Foloppe N, Garmendia-Doval AB, Juhos S,

et al . rDock: A Fast, Versatile and Open Source Program for Docking Ligands

to Proteins and Nucleic Acids. PLoS Comput Biol. 2014;10(4):e1003571.

179. Li H, Leung KS, Ballester PJ, Wong MH. istar: A web platform for large-scale

protein-ligand docking. PLoS one. 2014;9(1):e85678.

180. Morris GM, Goodsell DS, Halliday RS, Huey R, Hart WE, Belew RK, et al.

Automated Docking Using a Lamarckian Genetic Algorithm and an Empirical

Binding Free Energy Function. J Comput Chem. 1998;19(14):1639–1662.

181. Huang D, Caflisch A. Library screening by fragment-based docking. J Mol

Recogn. 2010;23(2):183–193.

182. Miranker A, Karplus M. Functionality maps of binding sites: a multiple copy

simultaneous search method. Proteins. 1991;11(1):29–34.

183. Dong J, Zhao H, Zhou T, Spiliotopoulos D, Rajendran C, Li XD, et al.

Structural Analysis of the Binding of Type I, I1/2, and II Inhibitors to Eph

Tyrosine Kinases. ACS Med Chem Lett. 2015;6(1):79–83.

184. Jones S, Thornton JM. Principles of protein-protein interactions. Proc Natl

Acad Sci USA. 1996;93(1):13–20.

PLOS 52/83

185. Conte LL, Chothia C, Janin J. The atomic structure of protein-protein

recognition sites. J Mol Biol. 1999;285(5):2177–2198.

186. Norel R, Retrey D, Wolfson HJ, Nussinov R. Examination of shape

complementarity in docking of unbound proteins. Proteins. 1999;36(3):307–317.

187. Betts MJ, Sternberg MJ. An analysis of conformational changes on

protein-protein association: implications for predictive docking. Protein Eng.

1999;12(4):271–283.

188. Decanniere K, Transue TR, Desmyter A, Maes D, Muyldermans S, Wyns L.

Degenerate interfaces in antigen-antibody complexes. J Mol Biol.

2001;313(3):473–478.

189. Ferrari AM, Wei BQ, Costantino L, Shoichet BK. Soft Docking and Multiple

Receptor Conformations in Virtual Screening. J Med Chem.

2004;47(21):5076–5084.

190.

Sherman W, Beard HS, R F. Use of an induced fit receptor structure in virtual

screening. Chem Biol Drug Des. 2006;67(1):83–84.

191.

Nabuurs SB, Wagener M, De Vlieg J. A flexible approach to induced fit docking.

J Med Chem. 2007;50(26):6507–6518.

192. Ieong PU, Sorensen J, Vemu PL, Wong CW, Demir O, Williams NP, et al.

Progress towards automated Kepler scientific workflows for computer-aided drug

discovery and molecular simulations. Procedia Computer Science.

2014;29:1745–1755.

193.

Amaro RE, Baron R, McCammon JA. An improved relaxed complex scheme for

receptor flexibility in computer-aided drug design. J Comput Aided Mol Des.

2008;22(9):693–705.

194. B-Rao C, Subramanian J, Sharma SD. Managing protein flexibility in docking

and its applications. Drug Discov today. 2009;14(7-8):394–400.

195. Lexa KW, Carlson HA. Protein flexibility in docking and surface mapping. Q

Rev Biophys. 2012;45(3):301–343.

196.

Kokh DB, Wade RC, Wenzel W. Receptor flexibility in small-molecule docking

calculations. WIREs Comput Mol Sci. 2011;1(2):298–314.

197.

Leach AR. Ligand docking to proteins with discrete side-chain flexibility. J Mol

Biol. 1994;235(1):345–356.

198. Tian S, Sun H, Pan P, Li D, Zhen X, Li Y, et al. Assessing an ensemble

docking-based virtual screening strategy for kinase targets by considering

protein flexibility. J Chem Inf Model. 2014;54(10):2664–2679.

199. Sorensen J, Demir O, Swift RV, Feher VA, Amaro RE. Molecular docking to

flexible targets. Method Mol Biol. 2015;1215:445–469.

200.

Korb O, Olsson TS, Bowden SJ, Hall RJ, Verdonk ML, Liebeschuetz JW, et al.

Potential and limitations of ensemble docking. J Chem Inf Model.

2012;52(5):1262–1274.

201. Bohnuud T, Kozakov D, Vajda S. Evidence of conformational selection driving

the formation of ligand binding sites in protein-protein interfaces. PLoS Comput

Biol. 2014;10(10):e1003872.

PLOS 53/83

202.

Shan Y, Kim ET, Eastwood MP, Dror RO, Seeliger MA, Shaw DE. How does a

drug moelcule find its target binding site? J Am Chem Soc.

2011;133(24):9181–9183.

203.

Kaus JW, Arrar M, McCammon JA. Accelerated Adaptive Integration Method.

J Phys Chem B. 2014;118(19):5109–5118.

204. Wu X, Brooks BR. Toward canonical ensemble distribution from self-guided

Langevin dynamics simulation. J Chem Phys. 2011;134(13):134108.

205. Wu X, Hodoscek M, Brooks BR. Replica exchanging self-guided Langevin

dynamics for efficient and accurate conformational sampling. J Chem Phys.

2012;137(4):044106.

206. Kaus JW, Pierce LT, Walker RC, McCammon JA. Improving the Efficiency of

Free Energy Calculations in the Amber Molecular Dynamics Package. J Chem

Theory Comput. 2013;9(9):4131–4139.

207. Grant BJ, McCammon JA, Gorfe AA. Conformational Selection in G-Proteins:

Lessons from Ras and Rho. Biophys J. 2010;99(11):L87–L89.

208. Abankwa D, Hanzal-Bayer M, Ariotti N, Plowman SJ, Gorfe AA, Parton RG,

et al. A novel switch region regulates H-Ras membrane orientation and signal

output. EMBO J. 2008;27(5):727–735.

209. Gu RX, Liu LA, Wang YH, Xu Q, Wei DQ. Structural comparison of the

wild-type and drug-resistant mutants of the influenza A M2 proton channel by

molecular dynamics simulations. J Phys Chem B. 2013;117(20):6042–6051.

210.

Bozdaganyan ME, Orekhov PS, Bragazzi NL, Panatto D, Amicizia D, Pechkova

E, et al. Docking and Molecular Dynamics (MD) Simulations in Potential Drugs

Discovery: An Application to Influenza Virus M2 Protein. American J Biochem

Biotech. 2014;10(3):180–188.

211. Waldmann M, Jirmann R, Hoelscher K, Wienke M, Niemeyer FC, Rehders D,

et al. A Nanomolar Multivalent Ligand as Entry Inhibitor of the Hemagglutinin

of Avian Influenza. J Am Chem Soc. 2014;136(2):783–788.

212.

Greenway KT, LeGresley EB, Pinto BM. The influence of 150-cavity binders on

the dynamics of influenza A neuraminidases as revealed by molecular dynamics

simulations and combined clustering. PLoS ONE. 2013;8(3):e59873.

213. Goh BC, Rynkiewicz MJ, Cafarella TR, White MR, Hartshorn KL, Allen K,

et al. Molecular mechanisms of inhibition of influenza by surfactant protein d

revealed by large-scale molecular dynamics simulation. Biochemistry.

2013;52(47):8527–8538.

214. Woods CJ, Shaw KE, Mulholland AJ. Combined Quantum

Mechanics/Molecular Mechanics (QM/MM) Simulations for Protein-Ligand

Complexes: Free Energies of Binding of Water Molecules in Influenza

Neuraminidase. J Phys Chem B. 2014;119(3).

215.

Ermak DL, McCammon J. Brownian dynamics with hydrodynamic interactions.

J Chem Phys. 1978;69(4):1352–1360.

216. ElSawy KM, Twarock R, Lane DP, Verma CS, Caves LS. Characterization of

the ligand receptor encounter complex and its potential for in silico

kinetics-based drug development. J Chem Theory Comput. 2011;8(1):314–321.

PLOS 54/83

217. Mereghetti P, Wade RC. Atomic detail Brownian dynamics simulations of

concentrated protein solutions with a mean field treatment of hydrodynamic

interactions. J Phys Chem B. 2012;116(29):8523–8533.

218. ElSawy K, Verma CS, Joseph TL, Lane DP, Twarock R, Caves L. On the

interaction mechanisms of a p53 peptide and nutlin with the MDM2 and

MDMX proteins: a Brownian dynamics study. Cell Cycle. 2013;12(3):394–404.

219. Frazier Z, Alber F. A Computational Approach to Increase Time Scales in

Brownian Dynamics–Based Reaction-Diffusion Modeling. J Comput Biol.

2012;19(6):606–618.

220. Beck M, Topf M, Frazier Z, Tjong H, Xu M, Zhang S, et al. Exploring the

spatial and temporal organization of a cell’s proteome. J Struct Biol.

2011;173(3):483–496.

221.

Tsai C, Nussinov R. A Unified View of ”How Allostery Works”. PLoS Comput

Biol. 2014;10(2):e1003394.

222. Lockless SW, Ranganathan R. Evolutionarily conserved pathways of energetic

connectivity in protein families. Science. 1999;286(5438):295–299.

223. Daily MD, Upadhyaya TJ, Gray JJ. Contact rearrangements form coupled

networks from local motions in allosteric proteins. Proteins. 2008;71(1):455–466.

224. Kannan N, Vishveshwara S. Identification of side-chain clusters in protein

structures by a graph spectral method. J Mol Biol. 1999;292(2):441–464.

225. van den Bedem H, Bhabha G, Yang K, Wright PE, Fraser JS. Automated

identification of functional dynamic contact networks from X-ray

crystallography. Nat Methods. 2013;10(9):896–902.

226. Boehr DD, Schnell JR, McElheny D, Bae SH, Duggan BM, Benkovic SJ, et al.

A distal mutation perturbs dynamic amino acid networks in dihydrofolate

reductase. Biochemistry. 2013;52(27):4605–4619.

227. Ferreiro DU, Hegler JA, Komives EA, Wolynes PG. Localizing frustration in

native proteins and protein assemblies. Proc Natl Acad Sci USA.

2007;104(50):19819–19824.

228. Brooks B, Karplus M. Harmonic dynamics of proteins: normal modes and

fluctuations in bovine pancreatic trypsin inhibitor. Proc Natl Acad Sci USA.

1983;80:6571–6575.

229.

Go N, Noguti T, Nishikawa T. Dynamics of a small globular protein in terms of

low-frequency vibrational modes. Proc Natl Acad Sci USA.

1983;80(12):3696–3700.

230. Levitt M, Sander C, Stern PS. The normal-modes of a protein-native bovine

pancreatic trypsin-inhibitor. Intl J Quant Chem. 1983;Suppl 10:181–199.

231. Garcia AE. Large-amplitude nonlinear motions in proteins. Phys Rev Lett.

1992;68(17):2696–2699.

232.

Amadei A, Linssen AB, Berendsen HJ. Essential dynamics of proteins. Proteins.

1993;17(4):412–425.

233. Lange OF, Grubm¨uller H. Full correlation analysis of conformational protein

dynamics. Proteins. 2008;70(4):1294–1312.

PLOS 55/83

234. Girvan M, Newman MEJ. Community structure in social and biological

networks. Proc Natl Acad Sci USA. 2002;99(12):7821–7826.

235. McClendon CL, Friedland G, Mobley DL, Amirkhani H, Jacobson MP.

Quantifying correlations between allosteric sites in thermodynamic ensembles. J

Chem Theory Comput. 2009;5(9):2486–2502.

236. Sethi A, Eargle J, Black AA, Luthey-Schulten Z. Dynamical networks in

tRNA:protein complexes. Proc Natl Acad Sci USA. 2009;106(16):6620–6625.

237. Eargle J, Luthey-Schulten Z. NetworkView: 3D display and analysis of protein

RNA interaction networks. Bioinformatics. 2012;28(22):3000–3001.

238. Vanwart AT, Eargle J, Luthey-Schulten Z, Amaro RE. Exploring residue

component contributions to dynamical network models of allostery. J Chem

Theory Comput. 2012;8(8):2949–2961.

239. Kaya C, Armutlulu A, Ekesan S, Haliloglu T. MCPath: Monte Carlo path

generation approach to predict likely allosteric pathways and functional residues.

Nucleic Acids Res. 2013;41(Web Server Issue):W249–W255.

240.

Johnston JM, Wang H, Provasi D, Filizola M. Assessing the relative stability of

dimer interfaces in G-protein coupled receptors. PLoS Comput Biol.

2012;8(8):e100264.

241. Filizola M, Wang SX, Weinstein H. Dynamic models of G-protein coupled

receptor dimers: indications of asymmetry in the rhodopsin dimer from

molecular dynamics simulations in a POPC bilayer. J Comput Aided Mol Des.

2006;20(7-8):405–416.

242. Chen R, Li L, Weng Z. ZDock: an initial-stage protein-docking algorithm.

Proteins: Struct Funct Bioinf. 2003;52(1):80–87.

243. Dominguez C, Boelens R, Bonvin AMJJ. HADDOCK: A protein-protein

docking approach based on biochemical or biophysical information. J Am Chem

Soc. 2003;125:1731–1737.

244. Comeau SR, Gatchell DW, Vajda S, Camacho CJ. ClusPro: a fully automated

algorithm for protein-protein docking. Nucl Acids Res. 2004;32(S1):W96–W99.

245. Duhovny-Schneidman D, Inbar Y, Nussinov R, Wolfson HJ. PatchDock and

SymmDock: servers for rigid and symmetric docking. Nucl Acids Res.

2005;33(S2):W363–W367.

246. Duhovny-Schneidman D, Inbar Y, Nussinov R, Wolfson HJ. Geometry based

flexible and symmetric protein docking. Proteins: Struct Funct Bioinf.

2005;60(2):224–231.

247. Zacharias M. ATTRACT: protein-protein docking in CAPRI using a reduced

protein model. Proteins: Struct Funct Bioinf. 2005;60(2):252–256.

248. Tovchigrechko A, Vakser IA. GRAMM-X public web server for protein-protein

docking. Nucl Acids Res. 2006;34(Web Server issue):W310–4.

249. Cheng TM, Blundell TL, Fernandez-Recio J. pyDock: electrostatics and

desolvation for effective scoring of rigid-body protein-protein docking. Proteins.

2007;68(2):503–515.

PLOS 56/83

250. Terashi G, Takeda-Shitaka M, Kanou K, Iwadate M, Takaya D, Umeyama H.

The SKE-DOCK server and human teams based on a combined method of shape

complementarity and free energy estimation. Proteins: Struct Funct Bioinf.

2007;69(4):866–887.

251. Lyskov S, Gray JJ. The RosettaDock server for local protein-protein docking.

Nucl Acids Res. 2008;36(S2):W233–W238.

252. Huang SY, Zou X. MDockPP: A hierarchical approach for protein-protein

docking and its application to CAPRI rounds 15-19. Proteins: Struct Funct

Bioinf. 2010;78(15):3096–3103.

253. Mukherjee S, Zhang Y. Protein-Protein Complex Structure Predictions by

Multimeric Threading and Template Recombination. Structure.

2011;19(7):955–966.

254. Guerler A, Govindarajoo B, Zhang Y. Mapping Monomeric Threading to

Protein-Protein Structure Prediction. J Chem Inf and Model.

2013;53(3):717–725.

255. Cavalli A, Salvatella X, Dobson CM, Vendruscolo M. Protein structure

determination from NMR chemical shifts. Proc Natl Acad Sci USA.

2007;104(23):9615–9620.

256.

Lensink MF, Wodak SJ. Docking and scoring protein interactions: CAPRI 2009.

Proteins: Struct Funct Bioinf. 2009;78(15):3073–3084.

257. Lensink MF, Wodak SJ. Blind predictions of protein interfaces by docking

calculations in CAPRI. Proteins: Struct Funct Bioinf. 2010;78(15):3085–3095.

258.

Mashiach E, Nussinov R, Wolfson HJ. FiberDock: Flexible induced-fit backbone

refinement in molecular docking. Proteins: Struct Funct Bioinf.

2010;78(6):1503–1519.

259. Pedotti M, Simonelli L, Livoti E, Varani L. Computational Docking of

Antibody-Antigen Complexes, Opportunities and Pitfalls Illustrated by

Influenza Hemagglutinin. Int J Mol Sci. 2011;12:226–251.

260.

Gray JJ, Moughon S, Wang C, Schueler-Furman O, Kuhlman B, Rohl CA, et al.

Protein-protein docking with simultaneous optimization of rigid-body

displacement and side-chain conformations. J Mol Biol. 2003;331(1):281–299.

261. Chaudhury S, Berrondo M, Weitzner BD, Muthu P, Bergman H, Gray JJ.

Benchmarking and Analysis of Protein Docking Performance in Rosetta v3.2.

PLoS ONE. 2011;6(8):e22477.

262. Ellingson SR, Miao Y, Baudry J, Smith JC. Multi-Conformer Ensemble

Docking to Difficult Protein Targets. Phys Chem B. 2015;119(3):1026–1034.

263. Kozakov D, Beglov D, Bohnuud T, Mottarella SE, Xia B, Hall DR, et al. How

good is automated protein docking? Proteins: Struct Funct Bioinf.

2013;81(12):2159–2166.

264. Moitessier N, Englebienne P, Lee D, Lawandi J, Corbeil CR. Towards the

development of universal, fast and highly accurate docking/scoring methods: a

long way to go. British J Pharmacology. 2009;153(S1):S7–S27.

PLOS 57/83

265. Zhu H, Domingues FS, Sommer I, Lengauer T. NOXclass: prediction of

protein-protein interaction types. BMC Bioinf. 2006;7:27.

266.

Moreira IS, Fernandes PA, Ramos MJ. Hot spots-A review of the protein-protein

interface determinant amino-acid residues. Proteins. 2007;68(4):803–812.

267. Li N, Sun Z, Jiang F. Prediction of protein-protein binding site by using core

interface residue and support vector machine. BMC Bioinf. 2008;9:553.

268. Liu Q, J L. Propensity vectors of low-ASA residue pairs in the distinction of

protein interactions. Proteins. 2009;78(3):589–602.

269. Hashmi I, Shehu A. idDock+: Integrating Machine Learning in Probabilistic

Search for Protein-protein Docking. J Comp Biol. 2015;22(9):1–18.

270. Russel D, Lasker K, Webb B, J V, Tjioe E, Schneidman-Duhovny D, et al.

Putting the pieces together: integrative modeling platform software for structure

determination of macromolecular assemblies. PLoS Biology. 2012;10(1):e1001244.

271.

Montalvao RW, Cavalli A, Salvatella X, Blundell TL, Vendruscolo M. Structure

determination of protein-protein complexes using NMR chemical shifts: case of

an endonuclease colicin-immunity protein complex. J Am Chem Soc.

2008;130(4):15990–1596.

272. Das R, Andr´e I, Shen Y, Wu Y, Lemak A, Bansal S, et al. Simultaneous

prediction of protein folding and docking at high resolution. Proc Natl Acad Sci

USA. 2009;106(45):18978–18983.

273.

Cavalli A, Montalvao RW, Vendruscolo M. Using Chemical Shifts to Determine

Structural Changes in Proteins upon Complex Formation. Phys Chem B.

2011;115(30):9491–9494.

274. Alber F, Dokudovskaya S, Veenhoff LM, Zhang W, Kipper J, Devos D, et al.

Determining the architectures of macromolecular assemblies. Nature.

2007;450(7170):683–694.

275.

Fernandez-Martinez J, Phillips J, Sekedat MD, Diaz-Avalos R, Velazquez-Muriel

J, Franke JD, et al. Structure-function mapping of a heptameric module in the

nuclear pore complex. J Cell Biol. 2012;196(4):419–434.

276.

Wang L, Yang MQ, Yang JY. Prediction of DNA-binding residues from protein

sequence information using random forests. BMC Genomics. 2009;10(Suppl1):S1.

277.

Ofran Y, Mysore V, Rost B. Prediction of DNA-binding residues from sequence.

Bioinformatics. 2007;23(13):347–353.

278. Qin S, Zhou H. Structural Models of Protein-DNA Complexes Based on

Interface Prediction and Docking. Curr Protein Pept Sci. 2011;12(6):531–539.

279. Roberts VA, Pique ME, Ten Eyck LF, Li S. Predicting protein–DNA

interactions by full search computational docking. Proteins.

2013;8(12):2106–2118.

280. van Dijk M, van Dijk AD, Hsu V, Boelens R, Bonvin AM. Information-driven

protein-DNA docking using HADDOCK: it is a matter of flexibility. Nucleic

Acids Res. 2013;34(11):3317–3325.

PLOS 58/83

281. Persikov AV, Wetzel JL, Rowland EF, Oakes BL, Xu DJ, Singh M, et al. A

systematic survey of the Cys2His2 zinc finger DNA-binding landscape. Nucleic

Acids Res. 2015;43(3):1965–1984.

282. Ghersi D, M S. Interaction-based discovery of functionally important genes in

cancers. Nucleic Acids Res. 2014;42(3):e18.

283. Ferr´e S, Navarro G, Casad´o V, Cort´es A, Mallol J, Canela EI, et al. G

protein-coupled receptor heteromers as new targets for drug development. Prog

Mol Biol Transl Sci. 2011;91:41–54.

284.

Pietsch EC, Perchiniak E, Canutescu AA, Wang G, Dunbrack RL, Murphy ME.

Oligomerization of BAK by p53 utilizes conserved residues of the p53 DNA

binding domain. J Biol Chem. 2008;283(30):21294–21304.

285. Inbar Y, Benyamini H, Nussinov R, Wolfson HJ. Combinatorial docking

approach for structure prediction of large proteins and multi-molecular

assemblies. J Phys Biol. 2005;2:S156–S165.

286. Inbar Y, Benyamini H, Nussinov R, Wolfson HJ. Prediction of multimolecular

assemblies by multiple docking. J Mol Biol. 2005;349(2):435–447.

287. Potluri S, Yan AK, Chou JJ, Donald BR, Bailey-Kellogg C. Structure

determination of symmetric homo-oligomers by a complete search of symmetry

configuration space, using NMR restraints and van der Waals packing. Proteins:

Struct Funct Bioinf. 2006;65(1):203–219.

288. Sgourakis NG, Lange OF, DiMaio F, Andre I, Fitzkee NC, Rossi P, et al.

Determination of the Structures of Symmetric Protein Oligomers from NMR

Chemical Shifts and Residual Dipolar Couplings. J Am Chem Soc.

2011;133(16):6288–6298.

289. Martin JW, Yan AK, Bailey-Kellogg C, Zhou P, Donald BR. A geometric

arrangement algorithm for structure determination of symmetric protein

homo-oligomers from NOEs and RDCs. J Comp Biol. 2011;18(11):1507–1523.

290. DiMaio F, Leaver-Fay A, Bradley P, Baker D, Andre I. Modeling Symmetric

Macromolecular Structures in Rosetta3. PLoS ONE. 2011;6(6):e20450.

291. Pierce B, Tong W, Weng Z. M-ZDOCK: a grid-based approach for Cn

symmetric multimer docking. Bioinformatics. 2004;21(8):1472–1478.

292. Esquivel-Rodriguez J, Yang YD, Kihara D. Multi-LZerD: Multiple protein

docking for asymmetric complexes. Proteins: Struct Funct Bioinf.

2012;80(7):1818–1833.

293.

Robustello P, Kai K, Cavalli A, Vendruscolo M. Using NMR Chemical Shifts as

Structural Restraints in Molecular Dynamics Simulations of Proteins. Structure.

2010;18(8):923–933.

294. Camilloni C, Cavalli A, Vendruscolo M. Assessment of the Use of NMR

Chemical Shifts as Replica-Averaged Structural Restraints in Molecular

Dynamics Simulations to Characterize the Dynamics of Proteins. Phys Chem B.

2012;117(6):1838–1843.

295. Kannan A, Camilloni C, Sahakyan AB, Cavalli A, Vendruscolo M. A

Conformational Ensemble Derived Using NMR Methyl Chemical Shifts Reveals

a Mechanical Clamping Transition That Gates the Binding of the HU Protein to

DNA. J Am Chem Soc. 2014;136(6):2204–2207.

PLOS 59/83

296. Pietrucci F, Mollica L, Blackledge M. Mapping the Native Conformational

Ensemble of Proteins from a Combination of Simulations and Experiments: New

Insight into the src-SH3 Domain. J Phys Chem Lett. 2013;4(11):1943–1948.

297. Wall ME, Van Benschoten AH, Sauter NK, Adams PD, Fraser JS, Terwilliger

TC. Conformational dynamics of a crystalline protein from microsecond-scale

molecular dynamics simulations and diffuse X-ray scattering. Proc Natl Acad

Sci USA. 2014;111(50):17887–17892.

298. K¨onig G, Brooks BR. Correcting for the free energy costs of bond or angle

constraints in molecular dynamics simulations. Biochim Biophys Acta.

2014;1850(5):932–942.

299. Mustoe AM, Brooks CL, Al-Hashimi HM. Topological constraints are major

determinants of tRNA tertiary structure and dynamics and provide basis for

tertiary folding cooperativity. Nucleic Acids Res. 2014;42(18):11792–11804.

300. Wu X, Subramaniam S, Case DA, Wu KW, Brooks BR. Targeted

conformational search with map-restrained self-guided Langevin dynamics:

Application to flexible fitting into electron microscopic density maps. J Struct

Biol. 2013;183(3):429–440.

301. Boomsma W, Ferkinghoff-Borg J, Lindorff-Larsen K. Combining Experiments

and Simulations Using the Maximum Entropy Principle. PLoS Comput Biol.

2014;10(2):e1003406.

302. Granata D, Camilloni C, Vendruscolo M, Laio A. Characterization of the

free-energy landscapes of proteins by NMR-guided metadynamics. Proc Natl

Acad Sci USA. 2013;110(17):6817–6822.

303.

Humphrey W, Dalke A, Schulten K. VMD - Visual Molecular Dynamics. J Mol

Graph Model. 1996;14(1):33–38. http://www.ks.uiuc.edu/Research/vmd/.

304. Cavalli A, Camilloni C, Vendruscolo M. Molecular dynamics simulations with

replica-averaged structural restraints generate structural ensembles according to

the maximum entropy principle. J Chem Phys. 2013;138(9):094112.

305. Bonvin AM, Boelens R, Kaptein R. Time- and ensemble-averaged direct NOE

restraints. J Biomol NMR. 1994;4(1):143–149.

306. Kessler H, Griesinger C, Lautz J, Mueller A, van Gunsteren WF, Berendsen

HJC. Conformational dynamics detected by nuclear magnetic resonance NOE

values and J coupling constants. J Am Chem Soc. 1998;110(11):3393–3396.

307. Loquet A, Sgourakis NG, Gupta R, Giller K, Riedel D, Goosmann C, et al.

Atomic model of the type III secretion system needle. Nature.

2012;486(7402):276–279.

308. Pieper U, Schlessinger A, Kloppmann E, Chang GA, Chou JJ, Dumont ME,

et al. Coordinating the impact of structural genomics on the human α-helical

transmembrane proteome. Nature Struct & Mol Biol. 2013;20(2):135–138.

309.

Torda AE, Scheek RM, van Gunsteren WF. Time-dependent distance restraints

in molecular dynamics simulations. Chem Phys Lett. 1989;157(4):289–294.

310. Vendruscolo M, Paci E, Dobson CM, Karplus M. Three key residues form a

critical contact network in a protein folding transition state. Nature.

2001;409(6820):641–645.

PLOS 60/83

311. Gong H, Y S, Rose GD. Building native protein conformation from NMR

backbone chemical shifts using Monte Carlo fragment assembly. Protein Sci.

2007;16(8):1515–1521.

312. Richter B, Gsponer J, V´arnai P, Salvatella X, Vendruscolo M. The MUMO

(minimal under-restraining minimal over-restraining) method for the

determination of native state ensembles of proteins. J Biomol NMR.

2007;37(2):117–135.

313. Montalvao RW, De Simone A, Vendruscolo M. Determination of structural

fluctuations of proteins from structure-based calculations of residual dipolar

couplings. J Biomol NMR. 2012;53(4):281–292.

314. Fu B, Kukic P, Camilloni C, Vendruscolo M. MD Simulations of Intrinsically

Disordered Proteins with Replica-Averaged Chemical Shift Restraints. Biophys

J. 2014;106(2):481a.

315.

Shen Y, Bax A. Homology modeling of larger proteins guided by chemical shifts.

Nature Methods. 2015;12(8):747–750.

316. Nasedkin A, Marcellini M, Religa TL, Freund SM, Menzel A, Fersht AR, et al.

Deconvoluting Protein (Un)folding Structural Ensembles Using X-Ray

Scattering, Nuclear Magnetic Resonance Spectroscopy and Molecular Dynamics

Simulation. PLoS One. 2015;10(5):e0125662.

317.

de Groot BL, van Aalten DM, Scheek RM, Amadei A, Vriend G, Berendsen HJ.

Prediction of protein conformational freedom from distance constraints.

Proteins. 1997;29(2):240–251.

318. Wells SA. Geometric simulation of flexible motion in proteins. Methods Mol

Biol. 2014;1084:173–192.

319. Wells S, Menor S, Hespenheide B, Thorpe MF. Constrained geometric

simulation of diffusive motion in proteins. J Phys Biol. 2005;2(4):127–136.

320. Shehu A, Clementi C, Kavraki LE. Modeling Protein Conformational

Ensembles: From Missing Loops to Equilibrium Fluctuations. Proteins: Struct

Funct Bioinf. 2006;65(1):164–179.

321. Shehu A, Clementi C, Kavraki LE. Sampling Conformation Space to Model

Equilibrium Fluctuations in Proteins. Algorithmica. 2007;48(4):303–327.

322. Shehu A, Kavraki LE, Clementi C. On the Characterization of Protein Native

State Ensembles. Biophys J. 2007;92(5):1503–1511.

323. Chubunsky M, Hespenheide B, Jacobs DJ, Kuhn LA, Lei M, Menor S, et al.

Constraint Theory Applied to Proteins. Nanotech Res J. 2008;2(1):61–72.

324. Clausen R, Shehu A. A Data-driven Evolutionary Algorithm for Mapping

Multi-basin Protein Energy Landscapes. J Comp Biol. 2015;22(9):844–860.

325. Huang YPJ, Montellione GT. Structural biology: Proteins flex to function.

Nature. 2005;438(7064):36–37.

326. Takala H, Bj¨orling A, Berntsson O, Lehtivuori H, Niebling S, Hoernke M, et al.

Signal amplification and transduction in phytochrome photosensors. Nature.

2014;509(7499):245–248.

PLOS 61/83

327. Majek P, Weinstein H, Elber R. 13. In: Voth GA, editor. Pathways of

conformational transitions in proteins. Taylor and Francis group; 2008. p.

185–203.

328. Nury H, Poitevin F, Van Renterghem C, Changeux JP, Corringer PJ, Delarue

M, et al. One-microsecond molecular dynamics simulation of channel gating in a

nicotinic receptor homologue. Proc Natl Acad Sci USA. 2010;107(14):6275–6280.

329.

Calimet N, Simoes M, Changeux JP, Karplus M, Taly A, Cecchini M. A gating

mechanism of pentameric ligand-gated ion channels. Proc Natl Acad Sci USA.

2013;110(42):E3987–E3996.

330.

Ma J, Karplus M. Molecular switch in signal transduction: reaction paths of the

conformational changes in ras p21. Proc Natl Acad Sci USA.

1997;94(22):11905–11910.

331. Ovchinnikov V, Karplus M. Analysis and Elimination of a Bias in Targeted

Molecular Dynamics Simulations of Conformational Transitions: Application to

Calmodulin. J Phys Chem B. 2012;116(29):8584–8603.

332. Hamelberg D, Mongan J, McCammon JA. Accelerated molecular dynamics: a

promising and efficient simulation method for biomolecules. J Chem Phys.

2004;120(24):11919–11929.

333.

Yao XQ, Grant BJ. Domain opening and dynamic coupling in the alpha subunit

of heterotrimeric G proteins. Biophys J. 2013;105(2):L09–L10.

334. Beckstein O, Denning EJ, Perilla JR, Woolf TB. Zipping and unzipping of

adenylate kinase: atomistic insights into the ensemble of open-closed transitions.

J Mol Biol. 2009;394(1):160–176.

335. Zuckerman DM, Woolf TB. Efficient dynamic importance sampling of rare

events in one dimension. Phys Rev E. 2000;63(1):016702.

336. Perilla JR, Beckstein O, Denning EJ, Woolf TB. Computing ensembles of

transitions from stable states: dynamic importance sampling. J Comput Chem.

2011;32(2):196–209.

337.

Krebs WG, Gerstein M. The morph server: a standardized system for analyzing

and visualizing macromolecular motions in a database framework. Nucleic Acids

Res. 2000;28(8):1665–1675.

338.

Ye YZ, Godzik A. FATCAT: a web server for flexible structure comparison and

structure similarity searching. Nucleic Acids Res. 2004;32(Web Server

Issue):W582–W585.

339. Lindahl E, Azuara C, Koehl P, Delarue M. NOMAD-Ref: visualization,

deformation and refinement of macromolecular structures based on all-atom

normal mode analysis. Nucleic Acids Res. 2006;34(Web Server Issue):W52–W56.

340.

Weiss DR, Levitt M. Can morphing methods predict intermediate structures? J

Mol Biol. 2009;385(2):665–674.

341.

Kim KM, Jernigan RL, Chirikjian GS. Efficient generation of feasible pathways

for protein conformationa transitions. Biophys J. 2002;83(3):1620–1630.

342. Chu JW, Trout BL, Brooks CLI. A super-linear minimization scheme for the

nudged elastic band method. J Chem Phys. 2003;119(24):12708–12717.

PLOS 62/83

343. Maragliano L, Fiser A, Vanden-Eijnden EJ, Ciccotti G. String method in

collective variables: minimum free energy paths and isocommittor surfaces. J

Chem Phys. 2006;125:024106.

344. Weinan E, Ren W, Vanden-Eijnden E. Simplified and improved string method

for computing the minimum energy paths in barrier-crossing events. J Chem

Phys. 2007;126:164103.

345. Maragliano L, Vanden-Eijnden E. On-the-fly string method for minimum free

energy paths calculation. Chem Phys Lett. 2007;446:182–190.

346. Weinan E, Ren W, Vanden-Eijnden E. Finite temperature string methods for

the study of rare events. J Phys Chem. 2005;109:6688–6693.

347. Ren W, Vanden-Eijnden E, Maragakis P, Weinan E. Transition pathways in

complex systems: application of the finite-temperature string method to the

alanine dipeptide. J Chem Phys. 2005;123:134109.

348. Zhang BW, Jasnow D, Zuckermann DM. Efficient and verified simulation of a

path ensemble for conformational change in a united-residue model of

calmodulin. Proc Natl Acad Sci USA. 2007;104(46):18043–18048.

349. Adelman JL, Dale AL, Zwier MC, Bhatt D, Chong LT, Zuckerman DM, et al.

Simulations of the alternating access mechanism of the sodium symporter mhp1.

Biophys J. 2011;101(10):2399–2407.

350. Huber GA, Kim S. Weighted-ensemble Brownian dynamics simulations for

protein association reactions. Biophys J. 1996;70(1):97–110.

351. Jaillet L, Corcho FJ, Perez JJ, Cortes J. Randomized tree construction

algorithm to explore energy landscapes. J Comput Chem.

2011;32(16):3464–3474.

352. Haspel N, Moll M, Baker ML, Chiu W, E KL. Tracing conformational changes

in proteins. BMC Struct Biol. 2010;10(Suppl1):S1.

353. Molloy K, Shehu A. Elucidating the Ensemble of Functionally-relevant

Transitions in Protein Systems with a Robotics-inspired Method. BMC Struct

Biol. 2013;13(Suppl 1):S8.

354. Molloy K, Clausen R, Shehu A. A Stochastic Roadmap Method to Model

Protein Structural Transitions. Robotica. 2014;In press.

355. Dill KA, MacCallum JL. The Protein-Folding Problem, 50 Years On. Science.

2012;338(6110):1042–1046.

356.

Onuchic JN, Wolynes PG. Theory of protein folding. Curr Opinion Struct Biol.

2004;14:70–75.

357. Best RB. Atomistic molecular simulations of protein folding. Curr Opinion

Struct Biol. 2012;22(1):52–61.

358.

Shaw DE, et al . Millisecond-scale molecular dynamics simulations on anton. In:

Conf on High Performance Computing, Networking, Storage and Analysis

(SC09). New York, NY: ACM; 2009. p. 39.

359. Hess B, Kutzner C, Van der Spoel D, Lindahl E. GROMACS4: algorithms for

highly efficient, load-balanced, and scalable molecular simulation. J Chem

Theory Comput. 2008;4(3):435–447.

PLOS 63/83

360.

Case DA, Darden TA, Cheatham TEI, Simmerling CL, Wang J, Duke RE, et al..

AMBER 14. University of California, San Francisco; 2014.

361.

Shirts M, Pande VJ. COMPUTING: Screen Savers of the World Unite! Science.

2000;290(5498):1903–1904.

362. Snow CD, Zagrovic B, Pande VS. The Trp-cage: folding kinetics and unfolded

state topology via molecular dynamics simulations. J Am Chem Soc.

2002;124(49):14548–14549.

363.

Singhal N, Snow CD, Pande VS. Using path sampling to build better Markovian

state models: Predicting the folding rate and mechanism of a tryptophan zipper

beta hairpin. J Chem Phys. 2004;121(1):415–425.

364. Jayachandran G, Vishal V, Pande VS. Using massively parallel simulation and

Markovian models to study protein folding: Examining the dynamics of the

villin headpiece. J Chem Phys. 2006;124(16):164902–164914.

365.

Seibert MM, Patriksson AP, Hess B, van der Spoel D. Reproducible Polypeptide

Folding and Structure Prediction using Molecular Dynamics Simulations. J Mol

Biol. 2005;354(1):173–183.

366. Sosnick TR, Hinshaw JR. How proteins fold. Science. 2011;334(6055):464–465.

367. Stigler J, Ziegler F, Gieseke A, Gebhardt JC, Rief M. The complex folding

network of single calmodulin molecules. Science. 2011;28(6055):512–516.

368. Best RB, Hummer G, Eaton WA. Native contacts determine protein folding

mechanisms in atomistic simulations. Proc Natl Acad Sci USA.

2013;110(44):17874–17879.

369.

Maity H, Maity M, Krishna MG, Mayne L, Englander SW. Protein folding: the

stepwise assembly of foldon units. Proc Natl Acad Sci USA.

2005;102(13):4741–4746.

370. Bai Y, Sosnick TR, Mayne L, Englander SW. Protein folding intermediates:

native state hydrogen exchange. Science. 1995;269(5221):192–197.

371. Walters BT, Mayne L, Hinshaw JR, Sosnick TR, Englander SW. Folding of a

large protein at high structural resolution. Proc Natl Acad Sci USA.

2013;110(47):18898–18903.

372. Beauchamp KA, Ensign DL, Das R, Pande VS. Quantitative comparison of

villin headpiece subdomain simulations and triplet–triplet energy transfer

experiments. Proc Natl Acad Sci USA. 2011;108(31):12734–12739.

373. Pande VS, Beachamp K, Bowman GR. Everything you wanted to know about

Markov state models but were afraid to ask. Nature Methods. 2010;52(1):99–105.

374.

Prinz JH, Wu H, Sarich M, Keller B, Senne M, Held M, et al. Markov models of

molecular kinetics: generation and validation. J Chem Phys.

2011;134(17):174105.

375.

Da LT, Sheong FK, Silva DA, Huang X. Application of Markov State Models to

simulate long timescale dynamics of biological macromolecules. Adv Exp Med

Biol. 2014;805:29–66.

PLOS 64/83

376. Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical

assessment of methods of protein structure prediction (CASP) – round x.

Proteins: Struct Funct Bioinf. 2014;82(S2):1–6.

377. S oding J, Biegert A, Lupas AN. The HHpred interactive server for protein

homology detection and structure prediction. Nucleic Acids Res. 2005;33(Web

Server Issue):W244.

378. Ko J, Park H, Seok C. GalaxyTBM: template-based modeling by building a

reliable core and refining unreliable local regions. BMC Bioinf.

2012;13(1):198–206.

379. Han KF, Baker D. Global properties of the mapping between local amino acid

sequence and local structure in proteins. Proc Natl Acad Sci USA.

1996;93(12):5814–5818.

380.

Zhang Y. Progress and Challenges in protein structure prediction. Curr Opinion

Struct Biol. 2008;18(3):342–348.

381.

Xu J, Zhang Y. How significant is a protein structure similarity with TM-score

= 0.5? Bioinformatics. 2010;26(7):889–895.

382. Zhang Y, Skolnick J. Scoring function for automated assessment of protein

structure template quality. Proteins: Structure, Function, and Bioinformatics.

2004;57(4):702–710.

383. Roy A, Kucukural A, Zhang Y. I-TASSER: a unified platform for automated

protein structure and function prediction. Nat Protoc. 2010;5(4):725–738.

384. DeBartolo J, Colubri A, Jha AK, Fitzgerald JE, Freed KF, Sosnick TR.

Mimicking the folding pathway to improve homology-free protein structure

prediction. Proc Natl Acad Sci USA. 2009;106(10):3734–3739.

385. Simoncini D, Berenger F, Shrestha R, Zhang KYJ. A Probabilistic

Fragment-Based Protein Structure Prediction Algorithm. PLoS ONE.

2012;7(7):e38799.

386. Brunette TJ, Brock O. Guiding conformation space search with an all-atom

energy potential. Proteins: Struct Funct Bioinf. 2009;73(4):958–972.

387. Shehu A, Olson B. Guiding the Search for Native-like Protein Conformations

with an Ab-initio Tree-based Exploration. Int J Robot Res.

2010;29(8):1106–1127.

388. Olson B, Shehu A. Evolutionary-inspired probabilistic search for enhancing

sampling of local minima in the protein energy surface. Proteome Sci.

2012;10(10):S5.

389. Olson B, Hashmi I, Molloy K, Shehu A. Basin Hopping as a General and

Versatile Optimization Framework for the Characterization of Biological

Macromolecules. Advances in AI J. 2012;2012(674832).

390.

Olson B, Shehu A. Rapid Sampling of Local Minima in Protein Energy Surface

and Effective Reduction through a Multi-objective Filter. Proteome Sci.

2013;11(Suppl1):S12.

391. Olson B, Jong KAD, Shehu A. Off-Lattice Protein Structure Prediction with

Homologous Crossover. In: Conf on Genetic and Evolutionary Computation

(GECCO). New York, NY: ACM; 2013. p. 287–294.

PLOS 65/83

392. Olson B, Shehu A. Multi-Ob jective Stochastic Search for Sampling Local

Minima in the Protein Energy Surface. In: ACM Conf on Bioinf and Comp Biol

(BCB). Washington, D. C.; 2013. p. 430–439.

393. Zhou J, W Y, Hu G, Shen B. Amino acid network for the discrimination of

native protein structures from decoys. Curr Protein Pept Sci.

2014;15(6):522–528.

394. Uversky VN. Natively unfolded proteins: a point where biology waits for

physics. Protein Sci. 2002;11:739–756.

395. Uversky VN. A decade and a half of protein intrinsic disorder: biology still

waits for physics. Protein Sci. 2013;22:693–724.

396. Monastyrskyy B, Kryshtafovych A, Moult J, Tramontano A, Fidelis K.

Assessment of protein disorder region predictions in CASP10. Proteins: Struct

Funct Bioinf. 2014;82(S2):127–137.

397. Varadi M, Kosol S, Lebrun P, Valentini E, Blackledge M, Dunker AK, et al.

pE-DB: a database of structural ensembles of intrinsically disordered and of

unfolded proteins. Nucleic Acids Res. 2014;42(Database issue):D326–335.

398. Sickmeier M, Hamilton JA, LeGall T, Vacic V, Cortese MS, Tantos A, et al.

DisProt: the database of disordered proteins. Nucleic acids research.

2007;35(suppl 1):D786–D793.

399. Fukuchi S, Sakamoto S, Nobe Y, Murakami SD, Amemiya T, Hosoda K, et al.

IDEAL: intrinsically disordered proteins with extensive annotations and

literature. Nucleic acids research. 2012;40(D1):D507–D511.

400. R¨osner H, Papaleo E, Haxholm GW, Best RB, Kragelund BB, Lindorff-Larsen

K. CECAM workshop on intrinsically disordered proteins: Connecting

computation, physics, and biology ETH Z¨urich September 2nd to 5th, 2013.

Intrinsically Disordered Proteins. 2014;p. 1–5.

401.

Dunker AK, Babu MM, Barbar E, Blackledge M, Bondos SE, Doszt´anyi Z, et al.

What’s in a name? Why these proteins are intrinsically disordered. Intrinsically

Disordered Proteins. 2013;1(1):e24157.

402.

van der Lee R, Buljan M, Lang B, Weatheritt RJ, Daughdrill GW, Dunker AK,

et al. Classification of intrinsically disordered regions and proteins. Chem Rev.

2014;114(13):6589–6631.

403. Nussinov R, Wolynes PG. A second molecular biology revolution? The energy

landscapes of biomolecular function. Phys Chem Chem Phys.

2014;16(14):6321–6322.

404. Csermely P, Sandhu KS, Hazai E, Hoksza Z, Kiss HJM, Miozzo F, et al.

Disordered proteins and network disorder in network descriptions of protein

structure, dynamics and function. Hypotheses and a comprehensive review.

Current Protein Peptide Sci. 2012;13(1):19–33.

405. Uversky VN. Unusual biophysics of intrinsically disordered proteins. Biochim

Biophys Acta. 2013;1834(5):932–951.

406. Luo Y, Ma B, Nussinov R, Wei G. Structural Insight into Tau Protein’s

Paradox of Intrinsically Disordered Behavior, Self-Acetylation Activity, and

Aggregation. J Phys Chem Lett. 2014;5(17):3026–3031.

PLOS 66/83

407. Campen A, Williams RM, Brown CJ, Meng J, Uversky VN, Dunker AK.

TOP-IDP-scale: a new amino acid scale measuring propensity for intrinsic

disorder. Protein Pept Lett. 2008;15:956–963.

408. Jensen MR, Zweckstetter M, Huang J, Blackledge M. Exploring Free-Energy

Landscapes of Intrinsically Disordered Proteins at Atomic Resolution Using

NMR Spectroscopy. Chem Rev. 2014;114(13):6632–6660.

409. Deng X, Eickholt J, Cheng J. A comprehensive overview of computational

protein disorder prediction methods. Mol Biosyst. 2012;8:114–121.

410. Doszt´anyi Z, M´esz´aros B, Simon I. Bioinformatical approaches to characterize

intrinsically disordered/unstructured proteins. Briefings in Bioinformatics.

2009;p. bbp061.

411. Zhou H, Pang X, Lu C. Rate constants and mechanisms of intrinsically

disordered proteins binding to structured targets. Phys Chem Chem Phys.

2012;14(30):10466–10476.

412. Zhu X, Lopes REM, Shim J, MacKerell AD. Intrinsic energy landscapes of

amino acid side-chains. J Chem Inf Model. 2012;52(6):1559–1572.

413. Palazzesi F, Prakash MK, Bonomi M, Barducci A. Accuracy of Current

All-Atom Force-Fields in Modeling Protein Disordered States. J Chem Theory

Comput. 2015;11(1):2–7.

414. Wang RY, Han Y, Krassovsky K, Sheffler W, Tyka M, Baker D. Modeling

disordered regions in proteins using Rosetta. PLoS ONE. 2011;6(7):e22060.

415. Jensen MR, Blackledge M. Testing the validity of ensemble descriptions of

intrinsically disordered proteins. Proc Natl Acad Sci USA.

2014;111(16):E1557–1558.

416. Lindorff-Larsen K, Trbovic N, Maragakis P, Piana S, Shaw DE. Structure and

Dynamics of an Unfolded Protein Examined by Molecular Dynamics Simulation.

J Am Chem Soc. 2012;134(8):3787–3791.

417. Parigi G, Rezaei-Ghaleh N, Giachetti A, Becker S, Fernandez C, Blackledge M,

et al. Long-Range Correlated Dynamics in Intrinsically Disordered Proteins. J

Am Chem Soc. 2014;136(46):16201–16209.

418. Zhang W, Chen J. Replica exchange with guided annealing for accelerated

sampling of disordered protein conformations. J Comput Chem.

2014;35(23):1682–1689.

419. Fleishman SJ, Baker D. Role of the biomolecular energy gap in protein design,

structure, and evolution. Cell. 2012;149(2):262–273.

420. Donald BR. Algorithms in structural molecular biology. Cambridge, MA: MIT

Press; 2011.

421.

Kuhlman B, Dantas G, Ireton GC, Varani G, Stoddard BL, Baker D. Design of

a novel globular proteing fold with atomic-level accuracy. Science.

2003;302(5649):1364–1368.

422. Ashworth J, Havranek JJ, Duarte CM, Sussman D, Monnat RJ, Stoddard BL,

et al. omputational redesign of endonuclease DNA binding and cleavage

specificity. Nature. 2006;441(7093):656–659.

PLOS 67/83

423.

Grigoryan G, Reinke AW, Keating AE. Design of protein-interaction specificity

gives selective bZIP-binding peptides. Nature. 2009;458(7240):859–864.

424.

Havranek JJ, Duarte CM, Baker D. A simple physical model for the prediction

and design of protein-DNA interactions. J Mol Biol. 2004;344(1):59–70.

425. Havranek JJ, Harbury PB. Automated design of specificity in molecular

recognition. Nat Struct Biol. 2003;10(1):45–52.

426. Fleishman SJ, Khare SD, Koga N, Baker D. Restricted sidechain plasticity in

the structures of native proteins and complexes. Protein Sci. 2011;20(4):753–757.

427.

Fleishman SJ, et al . Community-wide assessment of protein-interface modeling

suggests improvements to design methodology. J Mol Biol. 2011;414(2):289–302.

428. Jha RK, Leaver-Fay A, Yin S, Wu Y, Butterfoss GL, Szyperski T, et al.

Computational design of a PAK1 binding protein. J Mol Biol.

2010;400(2):257–270.

429. Karanicolas J, Corn JE, Chen I, Joachimiak LA, Dym O, Peck SH, et al. A de

novo protein binding pair by computational design and directed evolution.

Molecular Cell. 2011;42(2):250–260.

430.

Richter F, Leaver-Fay A, Khare SD, Bjelic S, Baker D. De novo enzyme design

using Rosetta3. PLoS ONE. 2011;6(5):e19230.

431. Pabo C. Molecular technology. Designing proteins and peptides. Nature.

1983;301(5897):200–200.

432. Janin J. Conformation of amino acid sidechains in proteins. J Mol Biol.

1978;125(3):357–386.

433. Kuhlman B, Baker D. Native protein sequences are close to optimal for their

structures. Proc Natl Acad Sci USA. 2000;97(19):10383–10388.

434. Dunbrack R. Rotamer libraries in the 21st century. Curr Opinion Struct Biol.

2002;12(4):431–440.

435. Dunbrack R, Cohen FE. Bayesian statistical analysis of protein side-chain

rotamer preferences. Protein Sci. 1997;6(8):1661–1681.

436. Dunbrack R, Karplus M. Backbone-dependent rotamer library for proteins.

Application to side-chain prediction. J Mol Biol. 1993;230(2):543–574.

437.

Poole AM, Ranganathan R. Knowledge-based potentials in protein design. Curr

Opinion Struct Biol. 2006;16(4):508–513.

438. Pierce NA, Winfree E. Protein Design is NP-hard. Protein Eng Des Sel.

2002;15(10):779–782.

439.

Desmet J, de Maeyer M, Hazes B, Lasters I. The dead-end elimination theorem

and its use in protein side-chain positioning. Nature. 1992;356:539–542.

440. Gordon DB, Mayo SL. Branch-and-terminate: a combinatorial optimization

algorithm for protein design. Structure. 1999;7(9):1089–1098.

441. Hong EJ, Lippow SM, Tidor B, T L. Rotamer optimization for protein design

through MAP estimation and problem-size reduction. J Comput Chem.

2009;30(12):1923–1945.

PLOS 68/83

442. Wernisch L, Hery S, Wodak SJ. Automatic protein design with all atom

force-fields by exact and heuristic optimization. J Mol Biol. 2000;301(3):713–736.

443. Althaus E, Kohlbacher O, Lenhof HP, M¨uller P. A combinatorial approach to

protein docking with flexible side chains. J Comp Biol. 2002;9(4):597–612.

444. Kingsford CL, Chazelle B, Singh M. Solving and analyzing side-chain

positioning problems using linear and integer programming. Bioinformatics.

2005;21(7):1028–1039.

445. Leaver-Fay A, Kuhlman B, Snoeyink J. An adaptive dynamic programming

algorithm for the side chain placement problem. In: Pac Symp Biocomput; 2005.

p. 16–27.

446.

Traor´e S, Allouche D, Andr´e I, de Givry S, Katsirelos G, Schiex T, et al. A new

framework for computational protein design through cost function network

optimization. Bioinformatics. 2013;29(17):2129–2136.

447. Li Z, Yang Y, Zhan J, Dai L, Zhou Y. Energy Functions in De Novo Protein

Design: Current Challenges and Future Prospects. Annu Rev Biophys.

2013;42:315–335.

448.

Arnold FH. Combinatorial and computational challenges for biocatalyst design.

Nature. 2001;409(6817):253–257.

449.

Gainza P, Roberts KE, Donald BR. Protein Design Using Continuous Rotamers.

PLOS Comput Biol. 2012;8:1.

450. Gainza P, Roberts KE, Georgiev I, Lilien RH, Keedy DA, Chen CY, et al.

OSPREY: protein design with ensembles, flexibility, and provable algorithms.

Methods Enzymol. 2013;523:87.

451. Shapovalov MV, Dunbrack RL. A smoothed backbone-dependent rotamer

library for proteins derived from adaptive kernel density estimates and

regressions. Structure. 2011;19(6):844–858.

452.

Reeve SM, Gainza P, Frey KM, Georgiev I, Donald BR, Anderson AC. Protein

design algorithms predict viable resistance to an experimental antifolate. Proc

Natl Acad Sci USA. 2015;112(3):749–754.

453. Voigt CA, Gordon DB, Mayo SL. Trading accuracy for speed: A quantitative

comparison of search algorithms in protein sequence design. J Mol Biol.

2000;299(3):789–803.

454.

Desjarlais JR, Handel TM. De novo design of the hydrophobic cores of proteins.

Protein Sci. 1995;4(10):2006–2018.

455. Raha K, Wollacott AM, Italia MJ, Desjarlais JR. Prediction of amino acid

sequence from structure. Protein Sci. 2000;9(6):1106–1119.

456. Allen BD, Mayo SL. Dramatic performance enhancements for the FASTER

optimization algorithm. J Comput Chem. 2006;27(10):1071–1075.

457.

Desmet J, Spriet J, Lasters I. Fast and accurate side-chain topology and energy

refinement (FASTER) as a new method for protein structure optimization.

Proteins: Struct Funct Bioinf. 2002;48(1):31–43.

458.

Liu Y, Kuhlman B. RosettaDesign Server for protein design. Nucleic Acids Res.

2006;34(Web Server Issue):W235–W238.

PLOS 69/83

459. Canutescu AA, Shelenkov AA, Dunbrack Jr RL. A graph-theory algorithm for

rapid protein side chain prediction. Protein Sci. 2003;12(9):2001–2014.

460. Schueler-Furman O, Wang C, Bradley P, Misura K, Baker D. Progress in

modeling of protein structures and interactions. Science.

2005;310(5748):638–642.

461. Skolnick J. In quest of an empirical potential for protein structure prediction.

Curr Opinion Struct Biol. 2006;16(2):166–171.

462. Humphris EL, Kortemme T. Prediction of protein-protein interface sequence

diversity using flexible backbone computational protein design. Structure.

2008;16(12):1777–1788.

463. Smith CA, Kortemme T. Backrub-like backbone simulation recapitulates

natural protein conformational variability and improves mutant side-chain

prediction. J Mol Biol. 2008;380(4):742–756.

464. Friedland GD, Linares AJ, Smith CA, Kortemme T. A simple model of

backbone flexibility improves modeling of side-chain conformational variability.

J Mol Biol. 2008;380(4):757–774.

465. Smith CA, Kortemme T. Predicting the tolerated sequences for proteins and

protein interfaces using RosettaBackrub flexible backbone design. PLoS One.

2011;6(7):e20451.

466. Canutescu AA, Dunbrack RL. Cyclic Coordinate Descent: A Robotics

Algorithm for Protein Loop Closure. Protein Sci. 2003;12(5):963–972.

467.

Georgiev I, Keedy D, Richardson JS, Richardson DC, Donald BR. Algorithm for

backrub motions in protein design. Bioinformatics. 2008;24(13):i196–204.

468.

Keedy DA, Georgiev I, Triplett EB, Donald RR, Richardson DC, Richardson JS.

The role of local backrub motions in evolved and designed mutations. PLoS

Comp Biol. 2012;8(8):e1002629.

469. Murphy GS, Mills JL, Miley MJ, Machius M, Szyperski T, Kuhlman B.

Increasing sequence diversity with flexible backbone protein design: the complete

redesign of a protein hydrophobic core. Structure. 2012;20(6):1086–1096.

470.

Ollikainen N, Kortemme T. Computational protein design quantifies structural

constraints on amino acid covariation. PLoS Comp Biol. 2013;9(11):e1003313.

471.

Jana B, Morcos F, Onuchic JN. From structure to function: the convergence of

structure based models and co-evolutionary information. Phys Chem Chem

Phys. 2014;16(14):6496–6507.

472. Sandler I, Zigdon N, Levy E, Aharoni A. The functional importance of

co-evolving residues in proteins. Cell Mol Life Sci. 2014;71(4):673–682.

473. Kajan L, Hopf TA, Kalaus M, Marks DS, Rost B. FreeContact: fast and free

software for protein contact prediction from residue co-evolution. BMC Bioinf.

2014;15:85.

474. Ovchinnikov S, Kamisetty H, Baker D. Robust and accurate prediction of

residue-residue interactions across protein interfaces using evolutionary

information. eLife. 2014;3:e02030.

PLOS 70/83

475. Kosciolek T, Jones DT. De Novo Structure Prediction of Globular Proteins

Aided by Sequence Variation-Derived Contacts. PLoS ONE. 2014;9(3):e92197.

476.

Huang H, Ozkirimli E, Post CB. A comparison of three perturbation molecular

dynamics methods for mModeling conformational transitions. J Chem Theory

Comput. 2009;5(5):1301–1314.

477.

Malek R, Mousseau N. Dynamics of Lennard-Jones clusters: A characterization

of the activation-relaxation technique. Phys Rev E. 2000;62(6):7723–7728.

478. Earl DJ, Deem MW. Parallel tempering: theory, applications, and new

perspectives. Phys Chem Chem Phys. 2005;7:3910–3916.

479. Arora K, Brooks CLI. Large-scale allosteric conformational transitions of

adenylate kinase appear to involve a population-shift mechanism. Proc Natl

Acad Sci USA. 2007;104(47):18496–18501.

480. Zhang Y, Kihara D, Skolnick J. Local energy landscape flattening: parallel

hyperbolic Monte Carlo sampling of protein folding. Proteins: Struct Funct

Bioinf. 2002;48(2):192–201.

481. Huber T, Torda AE, van Gunsteren WF. Local elevation: a method for

improving the searching properties of molecular dynamics simulation. J Comput

Aided Mol Design. 1994;8(6):695–708.

482. Schulze BG, Grubmueller H, Evanseck JD. Functional significance of

hierarchical tiers in carbonmonoxy myoglobin: conformational substates and

transitions studied by conformational flooding simulations. J Am Chem Soc.

2000;122(36):8700–8711.

483. Krueger P, Verheyden S, Declerck PJ, Engelborghs Y. Extending the

capabilities of targeted molecular dynamics: simulation of a large conformational

transition in plasminogen activator inhibitor 1. Protein Sci. 2001;10(4):798–808.

484. Schlitter J, Engels M, Krueger P. Targeted molecular dynamics - a new

approach for searching pathways of conformational transitions. Proteins: Struct

Funct Bioinf. 1994;12(2):84–89.

485. Mashi RJ, Jakobsson E. End-point targeted molecular dynamics: large-scale

conformational changes in potassium channels. Biophys J.

2008;94(11):4307–4319.

486. van der Vaart A, Karplus M. Minimum free energy pathways and free energy

profiles for conformational transitions based on atomistic molecular dynamics

simulations. J Chem Phys. 2007;126:164106.

487. Ding F, Tsao D, Nie H, Dokholyan NV. Ab initio folding of proteins with

all-atom discrete molecular dynamics. Structure. 2008;16(7):1010–1018.

488.

Pan AC, Sezer D, Roux B. Finding transition pathways using the string method

with swarms of trajectories. J Phys Chem B. 2008;112(11):3432–3440.

489. No´e F, Doose S, Daidone I, L¨ollmann M, Sauer M, Chodera JD, et al.

Dynamical fingerprints for probing individual relaxation processes in

biomolecular dynamics with simulations and kinetic experiments. Proc Natl

Acad Sci USA. 2011;108(12):4822–4827.

PLOS 71/83

490. Sim AYL, Minary P, Levitt M. Modeling nucleic acids. Curr Opinion Struct

Biol. 2012;22(3):273–278.

491. Schneidman-Duhovny D, Pellarin R, Sali A. Uncertainty in integrative

structural modeling. Curr Opinion Struct Biol. 2014;28(null):96–104.

492. Rohrdanz MA, Zheng W, Clementi C. Discovering mountain passes via

torchlight: methods for the definition of reaction coordinates and pathways in

complex macromolecular reactions. Annu Rev Phys Chem.

2013;64(null):295–316.

493. Kalyaanamoorthy S, Chen YPP. Modelling and enhanced molecular dynamics

to steer structure-based drug discovery. Prog Biophys Mol Biol.

2014;114(3):123–136.

494. Sponer J, Banas P, Jurecka P, Zgarbova M, Kuhrova P, Havrila M, et al.

Molecular Dynamics Simulations of Nucleic Acids. From Tetranucleotides to the

Ribosome. Phys Chem Lett. 2014;5(10):1771–1782.

495. Biedermann J, Ullrich A, Sch¨oneberg J, No´e F. ReaDDyMM: Fast Interacting

Particle Reaction-Diffusion Simulations Using Graphical Processing Units.

Biophys J. 2015;108(3):457–461.

496. Hamelberg D, Mongan J, McCammon JA. Accelerated molecular dynamics: a

promising and efficient simulation method for biomolecules. J Chem Phys.

2004;120(24):11919–11929.

497. Wang Y, Harrison CB, Schulten K, McCammon JA. Implementation of

accelerated molecular dynamics in NAMD. Computational science & discovery.

2011;4(1):015002.

498.

Pierce LC, Salomon-Ferrer R, Augusto F de Oliveira C, McCammon JA, Walker

RC. Routine access to millisecond time scale events with accelerated molecular

dynamics. J Chem Theory Comput. 2012;8(9):2997–3002.

499.

Miao Y, Nichols SE, Gasper PM, Metzger VT, McCammon JA. Activation and

dynamic network of the M2 muscarinic receptor. Proc Natl Acad Sci USA.

2013;110(27):10982–10987.

500.

Miao Y, Nichols SE, McCammon JA. Mapping of Allosteric Druggable Sites in

Activation-Associated Conformers of the M2 Muscarinic Receptor. Chem Biol &

Drug Design. 2014;83(2):237–246.

501. Sinko W, Miao Y, de Oliveira CAF, McCammon JA. Population Based

Reweighting of Scaled Molecular Dynamics. J Phys Chem B.

2013;117(42):12759–12768.

502. Tribello GA, Ceriotti M, Parrinello M. A self-learning algorithm for biased

molecular dynamics. Proc Natl Acad Sci USA. 2010;107(41):17509–17514.

503. Swendsen RH, Wang JS. Replica Monte Carlo simulation of spin glasses. Phys

Rev Lett. 1986;57:2607–2609.

504.

Sugita Y, Okamoto Y. Replica-exchange molecular dynamics method for protein

folding. Chem Phys Lett. 1999;314(1):141–151.

505. Wang L, Friesner RA, Berne B. Replica exchange with solute scaling: A more

efficient version of replica exchange with solute tempering (REST2). J Phys

Chem B. 2011;115(30):9431–9438.

PLOS 72/83

506. van der Spoel D, Seibert MM. Protein Folding Kinetics and Thermodynamics

from Atomistic Simulations. Phys Rev Lett. 2006;96(3):238102.

507. Hess B, Scheek RM. Orientation restraints in molecular dynamics simulations

using time and ensemble averaging. J Magn Reson. 2003;164(1):19–27.

508. De Simone A, Richter B, Salvatella X, Vendruscolo M. Toward an Accurate

Determination of Free Energy Landscapes in Solution States of Proteins. J Am

Chem Soc. 2009;131(11):3810–3811.

509.

De Simone A, Montalvao RW, Vendruscolo M. Determination of Conformational

Equilibria in Proteins Using Residual Dipolar Couplings. J Chem Theory

Comput. 2011;7(12):4189–4195.

510.

Allison JR, Hertig S, Missimer JH, Smith LJ, Steinmetz MO, Dolenc J. Probing

the Structure and Dynamics of Proteins by Combining Molecular Dynamics

Simulations and Experimental NMR Data. J Chem Theory Comput.

2012;8(10):3430–3444.

511. Markwick PRL, Nilges M. Computational approaches to the interpretation of

NMR data for studying protein dynamics. J Chem Phys. 2012;396(2):124–134.

512. Salmon L, Pierce L, Grimm A, Roldan JO, Mollica L, Jensen MR, et al.

Multi-Timescale Conformational Dynamics of the SH3 Domain of

CD2-Associated Protein using NMR Spectroscopy and Accelerated Molecular

Dynamics. Angew Chem Int Ed Engl. 2012;51(25):6103–6106.

513. Jaynes ET. Information Theory and Statistical Mechanics. Phys Rev.

1957;106(4):620–630.

514. Roux B, Weare J. On the statistical equivalence of restrained-ensemble

simulations with the maximum entropy method. J Chem Phys.

2013;138(8):084107.

515. Fu B, Sahakyan AB, Camilloni C, Tartaglia GG, Paci E, Caflisch A, et al.

ALMOST: An all atom molecular simulation toolkit for protein structure

determination. J Comput Chem. 2014;35(14):1101–1105.

516. Camilloni C, Cavalli A, Vendruscolo M. Replica-Averaged Metadynamics. J

Chem Theory and Comput. 2013;9(12):5610–5617.

517.

Torrie GM, Valleau JP. Monte Carlo free energy estimates using non-Boltzmann

sampling: application to the sub-critical Lennard-Jones fluid. Chem Phys Lett.

1974;28(4):578–581.

518. Roux B. The calculation of the potential of mean force using computer

simulations. Computer Physics Communications. 1995;91(1):275–282.

519. Bartels C, Karplus M. Multidimensional adaptive umbrella sampling:

applications to main chain and side chain peptide conformations. J Comput

Chem. 1997;18(12):1450–1462.

520.

Kumar S, Rosenberg JM, Bouzida D, Swendsen RH, Kollman PA. The weighted

histogram analysis method for free-energy calculations on biomolecules. I. The

method. J Comput Chem. 1992;13(8):1011–1021.

521.

Zhu F, Hummer G. Convergence and error estimation in free energy calculations

using the weighted histogram analysis method. J Comput Chem.

2012;33(4):453–465.

PLOS 73/83

522.

Hub JS, de Groot BL, van der Spoel D. g whams - A Free Weighted Histogram

Analysis Implementation Including Robust Error and Autocorrelation Estimates.

J Chem Theory Comput. 2010;6(12):3713–3720.

523. Wojtas-Niziurski W, Meng Y, Roux B, Berneche S. Self-learning adaptive

umbrella sampling method for the determination of free energy landscapes in

multiple dimensions. J Chem Theory Comput. 2013;9(4):1885–1895.

524. Snyder R, Wang B, Roark M, Feller SE. Replica Exchange Umbrella Sampling

Simulations Provide Insight into the Role of Docosahexaenoic Acid in

Modulating the Stability of Transmembrane Proteins. Biophys J.

2014;106(2):16a.

525. Krivov SV, Karplus M. Hidden complexity of free energy surfaces for peptide

(protein) folding. Proc Natl Acad Sci USA. 2004;101(41):14766–14770.

526. Zheng W, Rohrdanz MA, Maggioni M, Clementi C. Polymer reversal rate

calculated via locally scaled diffusion map. J Chem Phys. 2011;134(14):144109.

527. Rohrdanz MA, Zheng W, Maggioni M, Clementi C. Determination of reaction

coordinates via locally scaled diffusion map. J Chem Phys. 2011;134(12):124116.

528. Zheng W, Rohrdanz MA, Clementi C. Rapid Exploration of Configuration

Space with Diffusion-Map-Directed Molecular Dynamics. J Phys Chem B.

2013;117(42):12769–12776.

529. Preto J, Clementi C. Fast recovery of free energy landscapes via

diffusion-map-directed molecular dynamics. Phys Chem Chem Phys.

2014;16(36):19181–19191.

530. Becker OM, Karplus M. The topology of multidimensional potential energy

surfaces: Theory and application to peptide structure and kinetics. J Chem

Phys. 1997;106(4):1495–1517.

531.

Doye J, Miller M, Wales D. Evolution of the Potential Energy Surface with Size

for Lennard-Jones Clusters. J Chem Phys. 1999;111(18):8417–8428.

532. Krivov SV, Karplus M. Free energy disconnectivity graphs: Application to

peptide models. J Chem Phys. 2002;117(23):10894–10903.

533. Rao F, Caflisch A. The protein folding network. J Mol Biol.

2004;342(1):299–306.

534. Muff S, Caflisch A. Kinetic analysis of molecular dynamics simulations reveals

changes in the denatured state and switch of folding pathways upon single-point

mutation of a β-sheet miniprotein. Proteins: Struct Funct Bioinf.

2008;70(4):1185–1195.

535. Caflisch A. Network and graph analyses of folding free energy surfaces. Curr

Opinion Struct Biol. 2006;16(1):71–78.

536. Krivov SV, Karplus M. Diffusive reaction dynamics on invariant free energy

profiles. Proc Natl Acad Sci USA. 2008;105(37):13841–13846.

537. Zhou R. Free energy landscape of protein folding in water: explicit vs. implicit

solvent. Proteins: Struct, Funct, Bioinf. 2003;53(2):148–161.

PLOS 74/83

538. Barron LD, Hecht L, Wilson G. The lubricant of life: A proposal that solvent

water promotes extremely fast conformational fluctuations in mobile

heteropolypeptide structure. Biochemistry. 1997;36(43):13143–13147.

539. Singhal N, Pande VS. Error analysis and efficient sampling in Markovian state

models for molecular dynamics. J Chem Phys. 2005;123(20):204909.

540. Noe F, Fischer S. Transition networks for modeling the kinetics of

conformational change in macromolecules. Curr Opinion Struct Biol.

2008;18:154–162.

541.

P´erez-Hern´andez G, Paul F, Giorgino T, De Fabritiis G, No´e F. Identification of

slow molecular order parameters for Markov model construction. J Chem Phys.

2013;139(1):015102.

542. Piana S, Lindorff-Larsen K, Shaw DE. Atomic-level description of ubiquitin

folding. Proc Natl Acad Sci USA. 2013;110(15):5915–5920.

543. Weber JK, Jack RL, Pande VS. Emergence of glass-like behavior in Markov

state models of protein folding dynamics. J Amer Chem Soc.

2013;135(15):5501–5504.

544. Deng Nj, Dai W, Levy RM. How kinetics within the unfolded state affects

protein folding: An analysis based on Markov state models and an ultra-long

MD trajectory. J Phys Chem B. 2013;117(42):12787–12799.

545. Voelz VA, Ja¨ager M, Yao S, Chen Y, Zhu L, Waldauer SA, et al. Slow

unfolded-state structuring in Acyl-CoA binding protein folding revealed by

simulation and experiment. J Amer Chem Soc. 2012;134(30):12565–12577.

546.

Weber M, Bujotzek A, Haag R. Quantifying the rebinding effect in multivalent

chemical ligand-receptor systems. J Chem Phys. 2012;137(5):054111.

547.

Shukla D, Meng Y, Roux B, Pande VS. Activation pathway of Src kinase reveals

intermediate states as targets for drug design. Nature Communications. 2014;5.

548. Kohlhoff KJ, Shukla D, Lawrenz M, Bowman GR, Konerding DE, Belov D,

et al. Cloud-based simulations on Google Exacycle reveal ligand modulation of

GPCR activation pathways. Nature Chem. 2014;6(1):15–21.

549. Bowman GR, Geissler PL. Equilibrium fluctuations of a single folded protein

reveal a multitude of potential cryptic allosteric sites. Proc Natl Acad Sci USA.

2012;109(29):11681–11686.

550. Lin YS, Bowman GR, Beauchamp KA, Pande VS. Investigating how peptide

length and a pathogenic mutation modify the structural ensemble of amyloid

beta monomer. Biophysic J. 2012;102(2):315–324.

551.

Qiao Q, Bowman GR, Huang X. Dynamics of an intrinsically disordered protein

reveal metastable conformations that potentially seed aggregation. J Amer

Chem Soc. 2013;135(43):16092–16101.

552.

Du WN, Bolhuis PG. Adaptive single replica multiple state transition interface

sampling. J Chem Phys. 2013;139(4):044105.

553. Noe F. Beating the millisecond barrier in molecular dynamics simulations.

Biophys J. 2015;108:228–229.

PLOS 75/83

554.

Laio A, Rodriguez-Fortea A, Gervasio FL, Ceccarelli M, Parrinello M. Assessing

the accuracy of metadynamics. J Phys Chem B. 2005;109(14):6714–6721.

555. Barducci A, Bonomi M, Parrinello M. Metadynamics. Wiley Interdisciplinary

Reviews: Computational Molecular Science. 2011;1(5):826–843.

556. Bonomi M, Branduardi D, Bussi G, Camilloni C, Provasi D, Raiten P, et al.

PLUMED: a portable plugin for free-energy calculations wit h molecular

dynamics. Comput Phys Communications. 2009;180(10):1961–1972.

557. Bonomi M, Branduardi D, Gervasio FL, Parrinello M. The unfolded ensemble

and folding mechanism of the C-terminal GB1 β-hairpin. J Am Chem Soc.

2008;130(42):13938–13944.

558. Piana S, Laio A, Marinelli F, Van Troys M, Bourry D, Ampe C, et al.

Predicting the effect of a point mutation on a protein fold: the villin and advillin

headpieces and their Pro62Ala mutants. J Mol Biol. 2008;375(2):460–470.

559.

Berteotti A, Cavalli A, Branduardi D, Gervasio FL, Recanatini M, Parrinello M.

Protein conformational transitions: the closure mechanism of a kinase explored

by atomistic simulations. J Am Chem Soc. 2008;131(1):244–250.

560. Melis C, Bussi G, Lummis SC, Molteni C. Trans- cis Switching Mechanisms in

Proline Analogues and Their Relevance for the Gating of the 5-HT3 Receptor. J

Phys Chem B. 2009;113(35):12148–12153.

561.

Prakash MK, Barducci A, Parrinello M. Probing the mechanism of pH-induced

large-scale conformational changes in dengue virus envelope protein using

atomistic simulations. Biophys J. 2010;99(2):588–594.

562. Bocahut A, Bernad S, Sebban P, Sacquin-Mora S. Relating the diffusion of

small ligands in human neuroglobin to its structural and mechanical properties.

J Phys Chem B. 2009;113(50):16257–16267.

563. Nishihara Y, Hayashi S, Kato S. A search for ligand diffusion pathway in

myoglobin using a metadynamics simulation. Chem Phys Lett.

2008;464(4):220–225.

564. Provasi D, Bortolato A, Filizola M. Exploring molecular mechanisms of ligand

recognition by opioid receptors with metadynamics. Biochemistry.

2009;48(42):10020–10029.

565.

Limongelli V, Bonomi M, Marinelli L, Gervasio FL, Cavalli A, Novellino E, et al.

Molecular basis of cyclooxygenase enzymes (COXs) selective inhibition. Proc

Natl Acad Sci USA. 2010;107(12):5411–5416.

566.

Masetti M, Cavalli A, Recanatini M, Gervasio FL. Exploring Complex Protein-

Ligand Recognition Mechanisms with Coarse Metadynamics. J Phys Chem B.

2009;113(14):4807–4816.

567. Cavalli A, Spitaleri A, Saladino G, Gervasio FL. Investigating Drug–Target

Association and Dissociation Mechanisms Using Metadynamics-Based

Algorithms. Accounts Chem Res. 2014;.

568. Gur M, Madura JD, Bahar I. Global transitions of proteins explored by a

multiscale hybrid methodology: application to adenylate kinase. Biophys J.

2013;105(7):1643–1652.

PLOS 76/83

569. Atilgan A, Durell S, Jernigan R, Demirel M, Keskin O, Bahar I. Anisotropy of

fluctuation dynamics of proteins with an elastic network model. Biophys J.

2001;80(1):505–515.

570. Das A, Gur M, Cheng MH, Jo S, Bahar I, Roux B. Exploring the

Conformational Transitions of Biomolecular Systems Using a Simple Two-State

Anisotropic Network Model. PLoS Comput Biol. 2014;10(4):e1003521.

571. Baron R. Fast Sampling of A-to-B Protein Global Conformational Transitions:

From Galileo Galilei to Monte Carlo Anisotropic Network Modeling. Biophys J.

2013;105(7):1545–1546.

572.

Suarez E, Lettieri S, Zwier MC, Stringer CA, Subramanian SR, Chong LT, et al.

Simultaneous computation of dynamical and equilibrium information using a

weighted ensemble of trajectories. J Chem Theory Comput.

2014;10(7):2658–2667.

573. Rojnuckarin A, Kim S, Subramaniam S. Brownian dynamics simulations of

protein folding: access to milliseconds time scale and beyond. Proc Natl Acad

Sci USA. 1998;95(8):4288–4292.

574. Bhatt D, Zuckerman DM. Beyond microscopic reversibility: Are observable

nonequilibrium processes precisely reversible? J Chem Theory Comput.

2011;7(8):2520–2527.

575. Bhatt D, Zuckerman DM. Heterogeneous path ensembles for conformational

transitions in semiatomistic models of adenylate kinase. J Chem Theory

Comput. 2010;6(11):3527–3539.

576. Echols N, Milburn D, Gerstein M. MolMovDB: analysis and visualization of

conformational change and structural flexibility. Nucleic Acids Res.

2003;31(1):478–482.

577. Flores S, Echols N, Milburn D, Hespenheide B, Keating K, Lu J, et al. The

Database of Macromolecular Motions: new features added at the decade mark.

Nucleic Acids Res. 2006;34(suppl 1):D296–D301.

578. Cecchini M, Houdusse A, Karplus M. Allosteric communication in myosin V:

from small conformational changes to large directed movements. PLoS Comput

Biol. 2008;4(8):e1000129.

579.

Zhu F, Hummber G. Gating transition of pentameric ligand-gated ion channels.

Biophys J. 2009;97(9):2456–2463.

580.

Zimmermann MT, Kloczkowski A, Jernigan RL. MAVENs: motion analysis and

visualization of elastic networks and structural ensembles. BMC Bioinf.

2011;12(1):264.

581. Go N, Scheraga H. Analysis of contribution of internal vibrations to statistical

weights of equilibrium conformations of macromolecules. J Chem Phys.

1969;51(11):4751–4767.

582.

Go N, Scheraga H. On the use of classical statistical-mechanics in treatment of

polymer-chain conformation. Macromolecules. 1976;9(4):535–542.

583. Flory PJ. Statistical thermodynamics of random networks. Proc Royal Soc.

1976;351(1666):351–380.

PLOS 77/83

584. Tirion MM. Large amplitude elastic motions in proteins from a single

parameter, atomic analysis. Phys Rev Lett. 1996;77(9):1905–1908.

585. Bahar I, Atilgan A, Erman B. Direct evaluation of thermal fluctuations in

proteins using a single-parameter harmonic potential. Fold Des.

1997;2(3):173–181.

586. Micheletti C, Seno F, Banavar JR, Maritan A. Learning effective amino acid

interactions through iterative stochastic techniques. Proteins.

2001;42(3):422–431.

587. Halle B. Flexibility and packing in proteins. Proc Natl Acad Sci USA.

2002;99(3):1274–1279.

588.

Haliloglu T, Bahar I, Erman B. Gaussian dynamics of folded proteins. Phys Rev

Lett. 1997;79(16):3090–3093.

589. Kundu S, Melton JS, Sorensen DC, Phillips GN. Dynamics of proteins in

crystals: comparison of experiment with simple models. Biophys J.

2002;83(2):723–732.

590. Li G, Cui Q. Analysis of functional motions in Brownian molecular machines

with an efficient block normal mode approach: myosin-II and Ca2+-ATPase.

Biophys J. 2004;86(2):743–763.

591. Ming D, Kong Y, Lambert MA, Huang Z, Ma J. How to describe protein

motion without amino acid sequence and atomic coordinates. Proc Natl Acad

Sci USA. 2002;99(13):8620–8625.

592.

Delarue M, Sanejouand YH. Simplified normal mode analysis of conformational

transitions in dna-dependent polymerases: the elastic network model. J Mol

Biol. 2002;320(5):1011–1024.

593. Tama F, Valle M, Frank J, Brooks CL. Dynamic reorganization of the

functionally active ribosome explored by normal mode analysis and cryo-electron

microscopy. Proc Natl Acad Sci USA. 2003;100(16):9319–9323.

594. Reuter N, Hinsen K, Lacap´ere JJ. Transconformations of the SERCA1

Ca-ATPase: a normal mode study. Biophys J. 2003;85(4):2186–2197.

595. Xu C, Tobi D, Bahar I. Allosteric changes in protein structure computed by a

simple mechanical model: hemoglobin T–R2 transition. J Mol Biol.

2003;333(1):153–158.

596. Tama F, Sanejouand YH. Conformational change of proteins arising from

normal mode calculations. Protein Eng. 2001;14(1):1–6.

597.

Zheng W, Doniach S. A comparative study of motor-protein motions by using a

simple elastic-network model. Proc Natl Acad Sci USA.

2003;100(23):13253–13258.

598. Ikeguchi M, Ueno J, Sato M, Kidera A. Protein structural change upon ligand

binding: linear response theory. Phys Rev Lett. 2005;94(7):078102.

599. Kim MK, Chirikjian GS, Jernigan RL. Elastic models of conformational

transitions in macromolecules. J Mol Graph Model. 2002;21(2):151–160.

PLOS 78/83

600. Tama F, Brooks CL. Diversity and identity of mechanical properties of

icosahedral viral capsids studied with elastic network normal mode analysis. J

Mol Biol. 2005;345(2):299–314.

601.

Tama F, Feig M, Liu J, Brooks CL, Taylor KA. The requirement for mechanical

coupling between head and s2 domains in smooth muscle Myosin ATPase

regulation and its implications for dimeric motor function. J Mol Biol.

2005;345(4):837–854.

602.

conformational transitions explored by mixed elastic network models P. Protein

conformational transitions explored by mixed elastic network models. Proteins:

Struct Funct Bioinf. 2007;69(1):43–57.

603. Maragakis P, Karplus M. Large amplitude conformational change in proteins

explored with a plastic network model: adenylate kinase. J Mol Biol.

2005;352(4):807–822.

604.

Miayshita O, Onuchic JN, Wolynes PG. Nonlinear elasticity, proteinquakes, and

the energy landscapes of functional transitions in proteins. Proc Natl Acad Sci

USA. 2003;100(22):12570–12575.

605.

Miayshita O, Wolynes PG, Onuchic JN. Simple energy landscape model for the

kinetics of functional transitions in proteins. J Phys Chem B.

2005;5(1959-1969):109.

606. Chu JW, Voth GA. Coarse-grained free energy functions for studying protein

conformational changes: a double-well network model. Biophys J.

2007;93(11):3860–3871.

607. Fischer S, Karplus M. Conjugate peak refinement: an algorithm for finding

reaction paths and accurate transition states in systems with many degrees of

freedom. Chem Phys Lett. 1992 Jun;194(3):252–261. Available from:

http://dx.doi.org/10.1016/0009-2614(92)85543-j.

608. Weiss DR, Koehl P. Morphing Methods to Visualize Coarse-Grained Protein

Dynamics. In: Protein Dynamics. Springer; 2014. p. 271–282.

609.

Seo S, Kim MK. KOSMOS: a universal morph server for nucleic acids, proteins

and their complexes. Nucleic Acids Res. 2012;40(Web Server issue):W531–W536.

610. Pratt L. A statistical method for identifying transition states in high

dimensional problems. J Chem Phys. 1986;85(9):5045–5048.

611. Dellago C, Bolhuis PG, Csajka FS, Chandler D. Transition path sampling and

the calculation of rate constants. J Chem Phys. 1998;108(5):1964–1977.

612.

Woolf T. Path corrected functionals of stochastic trajectories: Towards relative

free energy and reaction coordinate calculations. Chem Phys Lett.

1998;289(5-6):433–441.

613. van Erp TS, Moroni D, Bolhuis PG. A novel path sampling method for the

calculation of rate constants. J Chem Phys. 2003;118(17):7762–7774.

614. Faradjian AK, Elber R. Computing time scales from reaction coordinates by

milestoning. J Chem Phys. 2004;120(23):10880–10889.

615. Allen RJ, Warren PB, Ten Wolde PR. Sampling rare switching events in

biochemical networks. Phys Rev Lett. 2005;94(1):018104.

PLOS 79/83

616. Warmflash A, Bhimalapuram P, , Dinner AR. Umbrella sampling for

nonequilibrium processes. J Chem Phys. 2007;127(15):154112.

617. Bolhuis PG, Chandler D, Dellago C, Geissler PL. Transition path sampling:

throwing ropes over mountain passes in the dark. Annu Rev Phys Chem.

2002;53:291–318.

618. Dellago C, Bolhuis PG. Transition path sampling and other advanced

simulation techniques for rare events. In: Holm C, Kremer K, editors. Advanced

Computer Simulation Approaches for Soft Matter Sciences III. vol. 221 of

Advances in Polymer Science. Springer Berlin Heidelberg; 2009. p. 167–233.

619. Vanden-Eijnden EW. Towards a theory of transition paths. J Stat Phys.

2006;123(3):503–523.

620. Elber R, Karplus M. A method for determining reaction paths in large

molecules: Application to myoglobin. Chem Phys Lett. 1987;139(5):375–380.

621. Henkelmann G, J´onsson H. Improved tangent estimate in the nudged elastic

band method for finding minimum energy paths and saddle points. J Chem

Phys. 2000;113:9978–9985.

622. Weinan E, Ren W, Vanden-Eijnden E. String method for the study of rare

events. Phys Rev B. 2002;66:052301.

623. Bohner MU, Zeman J, Smiatek J, Arnold A, K¨astner J. Nudged-elastic band

used to find reaction coordinates based on the free energy. J Chem Phys.

2014;140(7):074109.

624. J´onsson H, Mills G, Jacobsen KW. Nudged Elastic Band Method for Finding

Minimum Energy Paths of Transitions. In: Berne BJ, Ciccotti G, Coker DF,

editors. Classical and Quantum Dynamics in Condensed Phase Simulations.

Singapore: World Scientific; 1998. p. 385–404.

625.

Olender R, Elber R. Yet another look at the steepest descent path. J Mol Struct

THEOCHEM. 1997;398-399:63–71.

626.

Crehuet R, Field MJ. A temperature-dependent nudged-elastic-band algorithm.

J Chem Phys. 2003;118(21):9653–9571.

627. Ren W, Vanden-Eijnden E. Finite temperature string method for the study of

rare events. J Phys Chem B. 2005;109(14):6688–6693.

628. Pan AC, Weinreich TM, Shan Y, Scarpazza DP, Shaw DE. Assessing the

accuracy of two enhanced sampling methods using EGFR kinase transition

pathways: the influence of collective variable choice. J Chem Theory and

Comput. 2014;10(7):2860–2865.

629. Ovchinnikov V, Karplus M. Investigations of α-helix - β-sheet transition

pathways in a miniprotein using the finite-temperature string method. J Chem

Phys. 2014;140(17):175103.

630. Ovchinnikov V, Karplus M, Vanden-Eijnden E. Free energy of conformational

transition paths in biomolecules: The string method and its application to

myosin VI. J Chem Phys. 2011;134(8):085103.

631. Stober ST, Abrams CF. Energetics and mechanism of the

normal-to-amyloidogenic isomerization of β2-microglobulin: On-the-fly string

method calculations. J Phys Chem B. 2012;116(31):9371–9375.

PLOS 80/83

632. Matsunaga Y, Fujisaki H, Terada T, Kidera A. Conformational Transition

Pathways of Adenylate Kinase Explored by the String Method. Biophys J.

2012;102(3):733a.

633. Kumari M, Kozmon S, Kulhanek P, Stepan J, Tvaroska I, Koˇca J. Exploring

Reaction Pathways for O-GlcNAc Transferase Catalysis. A String Method Study.

J Phys Chem B. 2015;.

634. Fajer M, Meng Y, Roux B. Simulation of the Conformational Transition

Pathway for the Activation of Full-Length C-Src Kinase using the String

Method. Biophys J. 2014;106(2):639a–640a.

635. Ovchinnikov V, Cecchini M, Vanden-Eijnden E, Karplus M. Free energy of

conformational transition paths in biomolecules: The string method and its

application to myosin VI. Biophys J. 2011;101(10):2436–2444.

636.

Adelman JL, Grabe M. Simulating rare events using a weighted ensemble-based

string method. J Chem Phys. 2013;138(4):044105.

637.

Gan W, Yang S, Roux B. Atomistic view of the conformational activation of Src

kinase using the string method with swarms-of-trajectories. Biophys J.

2009;97(4):L8–L10.

638. Maragliano L, Roux B, Vanden-Eijnden E. Comparison between Mean Forces

and Swarms-of-Trajectories String Methods. J Chem Theory Comput.

2014;10(2):524–533.

639. Sanchez-Martinez M, Field M, Crehuet R. Enzymatic Minimum Free Energy

Path Calculations Using Swarms of Trajectories. J Phys Chem B. 2014;.

640. Peters B, Heyden A, Bell AT, Chakraborty A. A growing string method for

determining transition states: comparison to the nudged elastic band and string

methods. J Chem Phys. 2004;120(17):7877–7886.

641. Quapp W. A growing string method for the reaction pathway defined by a

Newton trajectory. J Chem Phys. 2005;122(17):174106/1–174106/11.

642.

Goodrow A, Bell AT, Head-Gordon M. Development and application of a hybrid

method involving interpolation and ab initio calculations for the determination

of transition states. J Chem Phys. 2008;129(17):174109/1–174109/12.

643. Goodrow A, Bell AT, Head-Gordon M. Transition state-finding strategies for

use with the growing string method. J Chem Phys.

2009;130(24):244108/1–244108/14.

644.

Goodrow A, Bell AT, Head-Gordon M. A strategy for obtaining a more accurate

transition state estimate using the growing string method. Chem Phys Lett.

2010;484(4-6):392–398.

645. Behn A, Zimmerman PM, Bell AT, Head-Gordon M. Efficient exploration of

reaction paths via a freezing string method. J Chem Phys.

2011;135(22):224108–224116.

646.

Mallikarjun Sharada S, Zimmerman PM, Bell AT, Head-Gordon M. Automated

transition state searches without evaluating the Hessian. J Chem Theory

Comput. 2012;8(12):5166–5174.

PLOS 81/83

647.

De Jong KA. Evolutionary Computation: A Unified Approach. Cambridge, MA:

MIT Press; 2006.

648. Unger R. The Genetic Algorithm Approach to Protein Structure Prediction.

Structure and Bonding. 2004;110:153–175.

649. Wales DJ, Doye JPK. Global Optimization by Basin-Hopping and the Lowest

Energy Structures of Lennard-Jones Clusters Containing up to 110 Atoms. J

Phys Chem A. 1997;101(28):5111–5116.

650.

Shehu A. Probabilistic Search and Optimization for Protein Energy Landscapes.

In: Aluru S, Singh A, editors. Handbook of Computational Molecular Biology.

Chapman & Hall/CRC Computer & Information Science Series; 2013. .

651. Shehu A. omputer-Aided Drug Discovery. In: Zhang W, editor. Methods in

Pharmacology and Toxicology. Springer Verlag; 2015. .

652. Hoque M, Chetty M, Sattar A. Genetic Algorithm in Ab Initio Protein

Structure Prediction Using Low Resolution Model: A Review. Biomed Data and

Applications. 2009;p. 317–342.

653. Dotu II, Cebri´an MM, Van Hentenryck PP, Clote PP. On lattice protein

structure prediction revisited. IEEE/ACM Trans Comput Biol Bioinf. 2011

Nov;8(6):1620–1632.

654. Prentiss MC, Wales DJ, Wolynes PG. Protein structure prediction using

basin-hopping. J Chem Phys. 2008;128(22):225106–225106.

655.

Olson B, Shehu A. Multi-Ob jective Optimization Techniques for Conformational

Sampling in Template-Free Protein Structure Prediction. In: Intl Conf on Bioinf

and Comp Biol (BICoB). Las Vegas, NV; 2014. .

656.

Verma A, Schug A, Lee KH, Wenzel W. Basin hopping simulations for all-atom

protein folding. J Chem Phys. 2006;124(4):044515.

657.

Baldwin JM. A new factor in evolution. American Naturalists. 1896;p. 441–451.

658. Rusu M, Birmanns S. Evolutionary tabu search strategies for the simultaneous

registration of multiple atomic structures in cryo-EM reconstructions. J Struct

Biol. 2010;170(1):164–171.

659. Rusu M, Wriggers W. Evolutionary bidirectional expansion for the tracing of

alpha helices in cryo-electron microscopy reconstructions. J Struct Biol.

2012;177(2):410–419.

660. Clausen R, Ma B, Nussinov R, Shehu A. Mapping the Conformation Space of

Wildtype and Mutant H-Ras with a Memetic, Cellular, and Multiscale

Evolutionary Algorithm. PLoS Comput Biol. 2015;11(9):e1004470.

661. Kim D, Blum B, Bradley P, Baker D. Sampling bottlenecks in de novo protein

structure prediction. J Mol Biol. 2009;393(1):249–60.

662. Choset H, et al . Principles of Robot Motion: Theory, Algorithms, and

Implementations. 1st ed. Cambridge, MA: MIT Press; 2005.

663. Kavraki LE, Svetska P, Latombe JC, Overmars M. Probabilistic roadmaps for

path planning in high-dimensional configuration spaces. IEEE Trans Robot

Autom. 1996;12(4):566–580.

PLOS 82/83

664. Amato NM, Dill KA, Song G. Using motion planning to map protein folding

landscapes and analyze folding kinetics of known native structures. J Comp Biol.

2002;10(3-4):239–255.

665. Song G, Amato NM. A Motion Planning Approach to Folding: From Paper

Craft to Protein Folding. IEEE Trans Robot Autom. 2004;20(1):60–71.

666. Molloy K, Shehu A. Interleaving Global and Local Search for Protein Motion

Computation. In: Harrison R, Li Y, Mandoiu I, editors. LNCS: Bioinformatics

Research and Applications. vol. 9096. Norfolk, VA: Springer International

Publishing; 2015. p. 175–186.

667. Chiang TH, Apaydin MS, Brutlag DL, Hsu D, Latombe JC. Using stochastic

roadmap simulation to predict experimental quantities in protein folding

kinetics: folding rates and phi-values. J Comp Biol. 2007;14(5):578–593.

668. Cortes J, Simeon T, de Angulo R, Guieysse D, Remaud-Simeon M, Tran V. A

path planning approach for computing large-amplitude motions of flexible

molecules. Bioinformatics. 2005;21(S1):116–125.

669.

Shehu A. An Ab-initio tree-based exploration to enhance sampling of low-energy

protein conformations. In: Trinkle J, Matsuoka Y, A CJ, editors. Robotics:

Science and Systems V. Seattle, WA, USA; 2009. p. 241–248.

670. Olson B, Molloy K, Shehu A. In Search of the Protein Native State with a

Probabilistic Sampling Approach. J Bioinf & Comp Biol. 2011;9(3):383–398.

671. Behzadi M, Roonasi P, Assle taghipoura K, van der Spoel D, Manzetti S.

Relationship between electronic properties and drug activity of seven

quinoxaline compounds: A DFT study. J Phys Chem. 2015;1091(5):196–202.

672. Khaliullin RZ, VandeVondele J, Hutter J. Efficient Linear-Scaling Density

Functional Theory for Molecular Systems. J Chem Theory Comput.

2013;9(10):4421–4427.

673.

Senn HM, Thiel W. QM/MM methods for biomolecular systems. Angew Chem

Int Ed Engl. 2009;48(7):1198–1229.

674. Larsson DSD, Liljas L, van der Spoel D. Virus capsid dissolution studied by

microsecond molecular dynamics simulations. PLoS Comput Biol.

2012;8(5):e1002502.

675.

Roy A, Zhang Y. Protein Structure Prediction. In: Encyclopeda of Life Sciences.

John Wiley & Sons, Ltd; 2012. p. a0003031.

676. Scheres SHW. A Bayesian View on Cryo-EM Structure Determination. J Mol

Biol. 2012;415(2):406–418.

677. Topf M, Lasker K, Webb B, Wolfson H, Chiu W, Sali A. Protein Structure

Fitting and Refinement Guided by cryoEM Density. Structure.

2008;16(2):295–307.

678.

Engel A, Gaub HE. Structure and Mechanics of Membrane Proteins. Annu Rev

Biochem. 2008;77:127–148.

PLOS 83/83

S1 Text

Data

April 2016

Tatiana Maximova · Ryan Moffatt · Buyong Ma · Ruth Nussinov · Amarda Shehu

Structure-Based Protein and Small Molecule Generation Using EGNN and Diffusion Models: A Comprehensive Review

Article

Full-text available

Jun 2024

Recent breakthroughs in deep learning have revolutionized protein sequence and structure prediction. These advancements are built on decades of protein design efforts, and are overcoming traditional time and cost limitations. Diffusion models, at the forefront of these innovations, significantly enhance design efficiency by automating knowledge acquisition. In the field of de novo protein design, the goal is to create entirely novel proteins with predetermined structures. Given the arbitrary positions of proteins in 3-D space, graph representations and their properties are widely used in protein generation studies. A critical requirement in protein modelling is maintaining spatial relationships under transformations (rotations, translations, and reflections). This property, known as equivariance, ensures that predicted protein characteristics adapt seamlessly to changes in orientation or position. Equivariant graph neural networks offer a solution to this challenge. By incorporating equivariant graph neural networks to learn the score of the probability density function in diffusion models, one can generate proteins with robust 3-D structural representations. This review examines the latest deep learning advancements, specifically focusing on frameworks that combine diffusion models with equivariant graph neural networks for protein generation.

Molecular Docking Study of the Interactions Between Cyanine Dyes And DNA

Article

Full-text available

Jun 2023

Among the various fluorescent probes currently used for biomedical and biochemical studies, significant attention attracts cyanine dyes possessing advantageous properties upon their complexation with biomolecules, particularly nucleic acids. Given the wide range of cyanine applications in DNA studies, a better understanding of their binding mode and intermolecular interactions governing dye-DNA complexation would facilitate the synthesis of new molecular probes of the cyanine family with optimized properties and would be led to the development of new cyanine-based strategies for nucleic acid detection and characterization. In the present study molecular docking techniques have been employed to evaluate the mode of interaction between one representative of monomethines (AK12-17), three trimethines (AK3-1, AK3-3, AK3-5), three pentamethines (AK5-1, AK5-3, AK5-9) and one heptamethine (AK7-6) cyanine dyes and B–DNA dodecamer d(CGCGAATTCGCG)2 (PDB ID: 1BNA). The molecular docking studies indicate that: i) all cyanines under study (excepting AK5-9 and AK7-6) form the most stable dye-DNA complexes with the minor groove of double-stranded DNA; ii) cyanines AK5-9 and AK7-6 interact with the major groove of the DNA on the basis of their more extended structure and higher lipophilicity in comparison with other dyes; iii) cyanine dye binding is governed by the hydrophobic and Van der Waals interactions presumably with the nucleotide residues C9A, G10A (excepts AK3-1, AK3-5), A17B (excepts AK3-5, AK5-3) and A18B in the minor groove and the major groove residues С16B, A17B, A18B, C3A, G4A, A5A, A6A (AK5-9 and AK7-6); iv) all dyes under study (except AK3-1, AK3-5 and AK5-39 possess an affinity to adenine and cytosine residues, whereas AK3-1, AK3-5 and AK5-3 also interact with thymine residues of the double-stranded DNA.

Machine learning assisted molecular modeling from biochemistry to petroleum engineering: A review

Article

Mar 2024

Unraveling motion in proteins by combining NMR relaxometry and molecular dynamics simulations: A case study on ubiquitin

Article

Mar 2024

Nuclear magnetic resonance (NMR) relaxation experiments shine light onto the dynamics of molecular systems in the picosecond to millisecond timescales. As these methods cannot provide an atomically resolved view of the motion of atoms, functional groups, or domains giving rise to such signals, relaxation techniques have been combined with molecular dynamics (MD) simulations to obtain mechanistic descriptions and gain insights into the functional role of side chain or domain motion. In this work, we present a comparison of five computational methods that permit the joint analysis of MD simulations and NMR relaxation experiments. We discuss their relative strengths and areas of applicability and demonstrate how they may be utilized to interpret the dynamics in MD simulations with the small protein ubiquitin as a test system. We focus on the aliphatic side chains given the rigidity of the backbone of this protein. We find encouraging agreement between experiment, Markov state models built in the χ1/χ2 rotamer space of isoleucine residues, explicit rotamer jump models, and a decomposition of the motion using ROMANCE. These methods allow us to ascribe the dynamics to specific rotamer jumps. Simulations with eight different combinations of force field and water model highlight how the different metrics may be employed to pinpoint force field deficiencies. Furthermore, the presented comparison offers a perspective on the utility of NMR relaxation to serve as validation data for the prediction of kinetics by state-of-the-art biomolecular force fields.

Significance of the Disulfide Bridge in the Structure and Stability of Metalloprotein Azurin

Article

Jan 2024
J PHYS CHEM B

3D animation as a tool for integrative modeling of dynamic molecular mechanisms

Article

Feb 2024
STRUCTURE

In silico Nanotoxicology: the computational biology state of art for nanomaterial safety assessments

Article

Full-text available

Nov 2023
Mater Des

In recent decade, nanotechnology has got an extensive advancement in terms of production and application of nanomaterials. With the advancement, concern has risen for their biomedical and ecological safety, provoking a detailed analysis of the safety assement. Numerous experimental and computational approach has been developed to accomplish the goal of safety assessment of nanomaterials leading to orgin of interdisciplinary fields like nanoinformatics. Nanoinformatics has accomplished significant strides with the development of several modeling frameworks, data platforms, knowledge infrastructures, and in silico tools for risk assessment forecasts of nanomaterials. This review is an attemption to decipher and establish the bridge between the two emerging scientific arenas that includes computational modeling and nanotoxicity. We have reviewed the recent informations to uncover the link between the computational toxicology and nanotoxicology in terms of biomedical and ecological applications. In addition to the details about nanomaterials interaction with the biological system, this article offers a concise evaluation of recent developments in the various nanoinformatics domains. In detail, the computational tools like molecular docking, QSAR, etc. for the prediction of nanotoxicity here have been described. Moreover, techniques like molecular dynamics simulations used for experimental data collection and their translation to standard computational formats are explored.

Molecular dynamics simulation techniques and their application to aroma compounds/cyclodextrin inclusion complexes: A review

Article

Oct 2023
CARBOHYD POLYM

Scaffold Matcher : A CMA‐ES based algorithm for identifying hotspot aligned peptidomimetic scaffolds

Article

Oct 2023

The design of protein interaction inhibitors is a promising approach to address aberrant protein interactions that cause disease. One strategy in designing inhibitors is to use peptidomimetic scaffolds that mimic the natural interaction interface. A central challenge in using peptidomimetics as protein interaction inhibitors, however, is determining how best the molecular scaffold aligns to the residues of the interface it is attempting to mimic. Here we present the Scaffold Matcher algorithm that aligns a given molecular scaffold onto hotspot residues from a protein interaction interface. To optimize the degrees of freedom of the molecular scaffold we implement the covariance matrix adaptation evolution strategy (CMA‐ES), a state‐of‐the‐art derivative‐free optimization algorithm in Rosetta. To evaluate the performance of the CMA‐ES, we used 26 peptides from the FlexPepDock Benchmark and compared with three other algorithms in Rosetta, specifically, Rosetta's default minimizer, a Monte Carlo protocol of small backbone perturbations, and a Genetic algorithm. We test the algorithms' performance on their ability to align a molecular scaffold to a series of hotspot residues (i.e., constraints) along native peptides. Of the 4 methods, CMA‐ES was able to find the lowest energy conformation for all 26 benchmark peptides. Additionally, as a proof of concept, we apply the Scaffold Match algorithm with CMA‐ES to align a peptidomimetic oligooxopiperazine scaffold to the hotspot residues of the substrate of the main protease of severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2). Our implementation of CMA‐ES into Rosetta allows for an alternative optimization method to be used on macromolecular modeling problems with rough energy landscapes. Finally, our Scaffold Matcher algorithm allows for the identification of initial conformations of interaction inhibitors that can be further designed and optimized as high‐affinity reagents.

Discovery of putative natural compounds inhibitor of the germinant spore receptor CspC in Clostridiodes difficile infection: Gaining insights via In silico and bioinformatics approach

Article

Aug 2023

Disordered proteins and network disorder in network descriptions of protein structure, dynamics and function. Hypotheses and a comprehensive review

Article

Full-text available

Jan 2011

During the last decade, network approaches became a powerful tool to describe protein structure and dynamics. Here we review the links between disordered proteins and the associated networks, and describe the consequences of local, mesoscopic and global network disorder on changes in protein structure and dynamics. We introduce a new classification of protein networks into ‘cumulus-type’, i.e., those similar to puffy (white) clouds, and ‘stratus-type’, i.e., those similar to flat, dense (dark) low-lying clouds, and relate these network types to protein disorder dynamics and to differences in energy transmission processes. In the first class, there is limited overlap between the modules, which implies higher rigidity of the individual units; there the conformational changes can be described by an ‘energy transfer’ mechanism. In the second class, the topology presents a compact structure with significant overlap between the modules; there the conformational changes can be described by ‘multi-trajectories’; that is, multiple highly populated pathways. We further propose that disordered protein regions evolved to help other protein segments reach ‘rarely visited’ but functionally-related states. We also show the role of disorder in ‘spatial games’ of amino acids; highlight the effects of intrinsically disordered proteins (IDPs) on cellular networks and list some possible studies linking protein disorder and protein structure networks.

Pathways of Conformational Transitions in Proteins

Chapter

Full-text available

Sep 2008

Prediction of protein conformational freedom from distance constraints

Article

Oct 1997
PROTEINS

A method is presented that generates random protein structures that fulfil a set of upper and lower interatomic distance limits. These limits depend on distances measured in experimental structures and the strength of the interatomic interaction. Structural differences between generated structures are similar to those obtained from experiment and from MD simulation. Although detailed aspects of dynamical mechanisms are not covered and the extent of variations are only estimated in a relative sense, applications to an IgG-binding domain, an SH3 binding domain, HPr, calmodulin, and lysozyme are presented which illustrate the use of the method as a fast and simple way to predict structural variability in proteins. The method may be used to support the design of mutants, when structural fluctuations for a large number of mutants are to be screened. The results suggest that motional freedom in proteins is ruled largely by a set of simple geometric constraints. Proteins 29:240–251, 1997. © 1997 Wiley-Liss, Inc.

Principles of Robot Motion: Theory, Algorithms and Implementation

Book

Jan 2005

Automated docking using Lamarckian genetic algorithm and an empirical binding free energy function

Article

Nov 1998
J COMPUT CHEM

A novel and robust automated docking method that predicts the bound conformations of flexible ligands to macromolecular targets has been developed and tested, in combination with a new scoring function that estimates the free energy change upon binding. Interestingly, this method applies a Lamarckian model of genetics, in which environmental adaptations of an individual's phenotype are reverse transcribed into its genotype and become heritable traits (sic). We consider three search methods, Monte Carlo simulated annealing, a traditional genetic algorithm, and the Lamarckian genetic algorithm, and compare their performance in dockings of seven protein–ligand test systems having known three-dimensional structure. We show that both the traditional and Lamarckian genetic algorithms can handle ligands with more degrees of freedom than the simulated annealing method used in earlier versions of AUTODOCK, and that the Lamarckian genetic algorithm is the most efficient, reliable, and successful of the three. The empirical free energy function was calibrated using a set of 30 structurally known protein–ligand complexes with experimentally determined binding constants. Linear regression analysis of the observed binding constants in terms of a wide variety of structure-derived molecular properties was performed. The final model had a residual standard error of 9.11 kJ mol⁻¹ (2.177 kcal mol⁻¹) and was chosen as the new energy function. The new search methods and empirical free energy function are available in AUTODOCK, version 3.0. © 1998 John Wiley & Sons, Inc. J Comput Chem 19: 1639–1662, 1998

CECAM workshop on intrinsically disordered proteins

Article

Dec 2014

With the increasing need to integrate different areas of science in the study of intrinsically disordered proteins we arranged a meeting entitled “Intrinsically Disordered Proteins: Connecting Computation, Physics and Biology” in Zürich in September 2013. The aim of the meeting was to bring together scientists from a range of disciplines to provide a snapshot of the field, as well as to promote future interdisciplinary studies that link the fundamental physical and chemical properties of intrinsically disordered proteins with their biological function. A range of important topics were covered at the meeting including studies linking structural studies of intrinsically disordered proteins with their function, the effect of post-translational modifications, studies of folding-upon-binding, as well as presentation of a number of systems in which intrinsically disordered proteins play a central role in important biological processes. A recurring theme was how computation, including various forms of molecular simulations, can be integrated with experimental and theoretical studies to help understand the complex properties of intrinsically disordered proteins. With this Meeting Report we hope to give a brief overview of the inspiration obtained from presentations, discussions and conversations held at the workshop and point out possible future directions within the field of intrinsically disordered proteins.

Automated docking of flexible ligands: applications of AutoDock

Article

Jan 1998

Evolutionary Computation: A Unified Approach

Conference Paper

Jul 2016

Kenneth De Jong

A motion planning approach to folding: from paper craft to protein folding

Article

Jan 2001

We present a framework for studying folding problems from a motion planning perspective. Modeling foldable objects as tree-like multi-link objects allows one to apply motion planning techniques to folding problems. An important feature of this approach is that it not only allows one to study foldability questions, such as, can an object be folded (or unfolded) into another object, but also provides one with another tool for investigating the dynamic folding process itself. The framework proposed here has application to traditional motion planning areas such as automation and animation, and presents a novel approach for studying protein folding pathways. Preliminary experimental results with traditional paper crafts (e.g., box folding) and small proteins (approximately 60 residues) are quite encouraging.

HEN EGG-WHITE LYSOZYME MOLECULE

Article