Article

Automated design of specifity in molecular recognition

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Specific protein-protein interactions are crucial in signaling networks and for the assembly of multi-protein complexes, and represent a challenging goal for protein design. Optimizing interaction specificity requires both positive design, the stabilization of a desired interaction, and negative design, the destabilization of undesired interactions. Currently, no automated protein-design algorithms use explicit negative design to guide a sequence search. We describe a multi-state framework for engineering specificity that selects sequences maximizing the transfer free energy of a protein from a target conformation to a set of undesired competitor conformations. To test the multi-state framework, we engineered coiled-coil interfaces that direct the formation of either homodimers or heterodimers. The algorithm identified three specificity motifs that have not been observed in naturally occurring coiled coils. In all cases, experimental results confirm the predicted specificities.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Residues a and d are typically non-polar amino acids buried at the interface between the two helices, while e and g are charged amino acids which contribute to the dimeric coil stability through salt bridges 24 . In our design the heptad repeats of a GCN4-based ideal coiled coil 25 were matched to the predicted heptads of the NEMO sequence 51-112, to create a continuous and seamless coiled coil. The desired outcome was to increase the crystallization potential of the NEMO construct in two ways: by increasing the intrinsic stability (or decreasing conformational heterogeneity) and possibly by facilitating crystallization through the GCN4 adaptors portion of the protein. ...
... These regions are characterized by canonical hydrophobic residues in a-d positions packed as knobs-into-holes, forming the hydrophobic zipper (highlighted in Fig. 3a,b). Buried Asn residues in position a (Asn30, Asn58 and Asn118) create a-a' hydrogen bonds in this structure, imparting dimerization specificity 23,25 . Similarly, charged residues in positions g and e form stabilizing salt bridges: g-Glu36/e'-Lys41, g-Glu43/e'-Lys48, g-Glu124/e'-Lys129 23 . ...
... The interhelical spacing goes from an average value of 7.6 Å in the regular coiled-coil structure to a maximum of 11.5 Å for the stutter "heptad". As previously observed in coils discontinuous regions 25 , buried polar a-residues, such as lysine, can form favorable interactions with g' glutamate (a-Lys90/g'-Glu89, and a-Lys111/g'-Glu110) lowering the energetic penalty for burying polar residues 33 . ...
Article
Full-text available
NEMO is an essential component in the activation of the canonical NF-κB pathway and exerts its function by recruiting the IκB kinases (IKK) to the IKK complex. Inhibition of the NEMO/IKKs interaction is an attractive therapeutic paradigm for diseases related to NF-κB mis-regulation, but a difficult endeavor because of the extensive protein-protein interface. Here we report the high-resolution structure of the unbound IKKβ-binding domain of NEMO that will greatly facilitate the design of NEMO/IKK inhibitors. The structures of unbound NEMO show a closed conformation that partially occludes the three binding hot-spots and suggest a facile transition to an open state that can accommodate ligand binding. By fusing coiled-coil adaptors to the IKKβ-binding domain of NEMO, we succeeded in creating a protein with improved solution behavior, IKKβ-binding affinity and crystallization compatibility, which will enable the structural characterization of new NEMO/inhibitor complexes.
... Despite these simplifications, the size of the search space remains exponentially large and the problem of searching for a sequence with a minimum energy conformation is known to be decision NP-complete [81]. For this reason, most CPD approaches rely on stochastic optimization algorithms such as Monte Carlo Simulated Annealing [82,83] or Genetic algorithms [84], which provide only asymptotic convergence guarantees. However, recent progress in guaranteed discrete optimization techniques showed that such stochastic methods may durably fail to find or even get close to the GMEC when the problem becomes hard. ...
... Indeed, the traditional single state protein design (SSD) contrasts with the increasing evidence that proteins do not remain fixed in a unique conformational state but rather sample conformational ensembles. Compared to the usual SSD approach, multistate design (MSD) has shown to provide enhanced design capacities [123] to stabilize an ensemble of backbones [124,125], to design conformational switches [126,101] or proteins with specific binding properties [127,84,128]. In 2017 Loffler and coworkers showed that Rosetta modular framework for multistate design offered a 15% higher performance than single-state design on a ligand-binding benchmark [129]. ...
Thesis
Proteins are fundamental components of life. Over the billions of years of evolution, proteins have evolved to perform certain functions better and faster or to achieve new functions in order to pursue the biological needs under diverse and changing conditions. The field of protein engineering is becoming a research domain of great importance. The interest of proteins with new or improved properties is increasing in health, nano/biotechnology and green chemistry.Computational Protein Design (CPD) plays a critical role in advancing the field of protein engineering and accelerating the delivery of novel proteins displaying high specificity, high efficiency and better stability. The CPD problem can be formalized as an optimization problem. Using an all-atom energy function and a reliable search method, CPD tries to identify amino acid sequences that fold into a target structure and ultimately perform a desired function. The traditional Single State Protein Design (SSD) contrasts with the increasing evidence that proteins do not remain in a unique conformational state but rather sample conformational ensembles. In this thesis we propose a MultiState Design (MSD) method which aims at alleviating SSD limitations by simultaneously considering several conformational states.In the second part of this thesis, MSD was applied on two projects that led to an experimental characterization and validation. These two projects concern two application domains: health and white biotechnologies. The first one targets GH11 Xylanases. To understand the molecular basis underlying its thermal stability and activity, Molecular Dynamics simulations were used and revealed useful characteristics to design this enzyme. This produced GH11 xylanases with improved thermal stability and catalytic activity.The second project concerns the design of a synthetic humanized nanobody scaffold. The resulting nanobody is highly expressed and shows suitable affinity with different CDR loops.
... Crucial to our AIR architecture is the ability to rationally design drug insensitive receptors that can retain binding activity to the designed binders. In computational design this relates to the welldefined problem of multistate optimization where the sequence space is searched to optimize simultaneously several objective functions 33,34 . In our case, residues in the drug receptor close to the drug and away from the designed binder were selected and sampled for mutations that knocked-out drug binding (negative design) and maintained affinity to the protein binder (positive design) [35][36][37] (Fig. 4a) (see Methods). ...
... Drug-insensitive receptor mutations predictions. Bcl-XL and Bcl2 were redesigned for resistance to Drug-1 and Drug-2, respectively, following a computational strategy similar to one used to predict drug resistance 33,34 . Briefly, a set of residues in the receptor protein's (Bcl-XL/Bcl2) binding site was selected for redesign. ...
Article
Full-text available
Small-molecule responsive protein switches are crucial components to control synthetic cellular activities. However, the repertoire of small-molecule protein switches is insufficient for many applications, including those in the translational spaces, where properties such as safety, immunogenicity, drug half-life, and drug side-effects are critical. Here, we present a computational protein design strategy to repurpose drug-inhibited protein-protein interactions as OFF- and ON-switches. The designed binders and drug-receptors form chemically-disruptable heterodimers (CDH) which dissociate in the presence of small molecules. To design ON-switches, we converted the CDHs into a multi-domain architecture which we refer to as activation by inhibitor release switches (AIR) that incorporate a rationally designed drug-insensitive receptor protein. CDHs and AIRs showed excellent performance as drug responsive switches to control combinations of synthetic circuits in mammalian cells. This approach effectively expands the chemical space and logic responses in living cells and provides a blueprint to develop new ON- and OFF-switches.
... Examples of computational optimization and material design range from high-hyperpolarizability materials 3 over organic (polymer) photovoltaics 4,5 to ceramic and cement materials design 6 to protein design. [7][8][9] Two aspects of the design problem dictate the computational expense: the combinatorial size of the search space 10,11 and the expense of property determination of a given material or compound. Machine-learning (ML) approaches have shown promise in tackling the latter. ...
... where n is the number of domains, for all but the 1 optimization, which requires 13 function evaluations instead of 25. Most distributions of function evaluations (see e.g. Figure 5) show a peak at the minimum number of function evaluations, and follow-up peaks of lower height close to multiples (7,14, and 21 in the case of 4 A ) are observed for reordered optimizations. ...
Article
Full-text available
We report developments in combinatorial optimization under constraints in chemical space. We considered random functions, which serve as a baseline to measure performance, and constrained optimizations over two databases of electrochromic molecules (∼5000 and 1012, respectively). These problems were optimized using sequential heuristic next-neighbor search (HNNS) (introduced in Elward and Rinderspacher Phys. Chem. Chem. Phys. 2015, 17, 24322 and Elward and Rinderspacher Mol. Syst. Des. Eng. 2018, 3, 485) and kernel-based efficient global optimization (EGO) with two reordering strategies. In addition to the average ordering method introduced with sequential HNNS, a new reordering based on a locally separable linear model is formulated and applied. Presented is the analysis of periodic kernels for EGO: the modified Dirichlet kernels D̃ n (x) = ∑ i n [cos(2πix) - (-1) i ]/i, the minimalist kernel K N (x) = cos(πx/N)2/2 - δ x /2, and an analogue to the popular Gaussian kernel, G N σ(x) = e-|x mod N|2/(2σ2), where n is the order of the Dirichlet kernel, N is the number of choices on a combinatorial site, and σ is a broadening factor. We find that reordering is pivotal for high optimization efficiency, in both the average and worst-case scenarios. The new linear-estimation ordering paradigm is superior in the chemistry context. Furthermore, with judicious use of hyperparameters and algorithmic choices, EGO outperforms HNNS. The global optimum for the small chemical problem is found with >98% likelihood for all global search methods employing a linear reordering heuristic for organizing the search space.
... Our results reveal the central role of "negative constraints", used here to denote an amino acid in a cognate interface that interferes with binding to a non-cognate partner. The term negative constraint has been used in the field of protein design [14][15][16][17] to denote a domain that must be designed against, in effect an "anti-target". By contrast, our use of the term here focuses on individual amino acids rather than entire domains. ...
... In general, more than one negative constraint is required to significantly weaken binding. We also discuss the trade-off between specificity and affinity, a phenomenon that appears essential for the generation of multiple specificity groups 14,15,19 . Overall, our analysis reveals general principles about the design of protein families whose members are very similar in sequence and structure and yet exhibit exquisitely controlled binding affinities and specificities. ...
Article
Full-text available
Differential binding affinities among closely related protein family members underlie many biological phenomena, including cell-cell recognition. Drosophila DIP and Dpr proteins mediate neuronal targeting in the fly through highly specific protein-protein interactions. We show here that DIPs/Dprs segregate into seven specificity subgroups defined by binding preferences between their DIP and Dpr members. We then describe a sequence-, structure- and energy-based computational approach, combined with experimental binding affinity measurements, to reveal how specificity is coded on the canonical DIP/Dpr interface. We show that binding specificity of DIP/Dpr subgroups is controlled by “negative constraints”, which interfere with binding. To achieve specificity, each subgroup utilizes a different combination of negative constraints, which are broadly distributed and cover the majority of the protein-protein interface. We discuss the structural origins of negative constraints, and potential general implications for the evolutionary origins of binding specificity in multi-protein families. Dpr (Defective proboscis extension response) and DIP (Dpr Interacting Proteins) are immunoglobulin-like cell-cell adhesion proteins that form highly specific pairwise interactions, which control synaptic connectivity during Drosophila development. Here, the authors combine a computational approach with binding affinity measurements and find that DIP/Dpr binding specificity is controlled by negative constraints that interfere with non-cognate binding.
... Our results reveal the central role of "negative constraints", used here to denote an amino acid in a cognate interface that interferes with binding to a non-cognate partner. The term negative constraint has been used in the field of protein design [14][15][16][17] to denote a domain that must be designed against, in effect an "anti-target". By contrast, our use of the term here focuses on individual amino acids rather than entire domains. ...
... In general, more than one negative constraint is required to significantly weaken binding. We also discuss the trade-off between specificity and affinity, a phenomenon that appears essential for the generation of multiple specificity groups 14,15,19 . Overall, our analysis reveals general principles about the design of protein families whose members are very similar in sequence and structure and yet exhibit exquisitely controlled binding affinities and specificities. ...
Preprint
Full-text available
Differential binding affinities among closely related protein family members underlie many biological phenomena, including cell-cell recognition. Drosophila DIP and Dpr proteins mediate neuronal targeting in the fly through highly specific protein-protein interactions. DIPs/Dprs segregate into seven specificity subgroups defined by binding preferences between their DIP and Dpr members. Here we describe a novel sequence-, structure- and energy-based computational approach, combined with experimental binding affinity measurements, to reveal how specificity is coded on the canonical DIP/Dpr interface. We show that binding specificity of DIP/Dpr subgroups is controlled by "negative constraints", which interfere with binding. To achieve specificity, each subgroup utilizes a different combination of negative constraints, which are broadly distributed and cover the majority of the protein-protein interface. We discuss the structural origins of negative constraints, and potential general implications for the evolutionary origins of binding specificity in multi-protein families.
... Highly efficient and specific biomolecular recognition requires both affinity and specificity [8][9][10][11] . The stability of the complex is determined by the affinity while the specificity is controlled by either partner binding to other competitive biomolecules discriminatively. ...
... The reason that the specificity usually was not taken into account explicitly is that the description of binding specificity was challenging to quantify. The conventional definition (Fig. 1a) of specificity is the ability of a ligand to specifically bind to a protein against other proteins, namely the relative difference in affinity of one specific protein-ligand complex to others [8][9][10][11] . The quantification of conventional specificity is challenging since specificity requires comparison of the affinities of all the different receptors with the same ligand. ...
... A gap in binding free energy or affinity will lead to significant population discrimination between the specific complex and alternative ones. As such, introducing the consideration of specificity into the computational design of protein-protein interactions has achieve a few successful applications (Bolon et al., 2005;Grigoryan et al., 2009;Havranek et al., 2003;Kortemme et al., 2004;Shifman and Mayo, 2003). These work designed interactions that seek to stabilize the desired structures and also destabilize the competitive structures, as the specificity-related interactions lie in the binding patches constituting the interface of the complex (Malod-Dognin et al., 2012). ...
... The conventional definition of specificity is the preference of a protein ligand specifically binding to a protein receptor over other competitive alternatives (Bolon et al., 2005;Grigoryan et al., 2009;Havranek et al., 2003;Janin, 1995;Kortemme et al., 2004;Shifman and Mayo, 2003). The definition is clear, although in practice, the quantification of the conventional specificity still remains challenging (Grigoryan et al., 2009). ...
... In protein structure prediction, the aim is to deduce the threedimensional structure of an amino acid sequence; while, in protein design, having a specific, known function or protein structure as a target, the aim is to figure out which amino acid sequences will lead to successful, correct folding, and biological activity. One may think of the protein design problem as the inverse of the protein structure prediction problem [11], or as an inverted-folding problem [12]. ...
... Similar in concept to enzyme design is the design of biosensors [12,73], including metalloproteins [37], and single chain antibodies called nanobodies [74]. For these classes, specificity and regulation of affinity by external cues, e.g. ...
Article
Full-text available
Introduction: Protein function is determined by protein structure which is in turn determined by the corresponding protein sequence. If the rules that cause a protein to adopt a particular structure are understood, it should be possible to refine or even redefine the function of a protein by working backwards from the desired structure to the sequence. Automated protein design attempts to calculate the effects of mutations computationally with the goal of more radical or complex transformations than are accessible by experimental techniques. Areas covered: The authors give a brief overview of the recent methodological advances in computer-aided protein design, showing how methodological choices affect final design and how automated protein design can be used to address problems considered beyond traditional protein engineering, including the creation of novel protein scaffolds for drug development. Also, the authors address specifically the future challenges in the development of automated protein design. Expert opinion: Automated protein design holds potential as a protein engineering technique, particularly in cases where screening by combinatorial mutagenesis is problematic. Considering solubility and immunogenicity issues, automated protein design is initially more likely to make an impact as a research tool for exploring basic biology in drug discovery than in the design of protein biologics.
... This is because these algorithms are, by construction, designed to explicitly approximate the Pareto front in a user-specified objective space, and, as we demonstrate in this work, can iteratively guide the sampling process through the construction of biophysically-informed mutation operators. This emphasis on both explicit approximation of the Pareto front and use of informative mutation operators distinguishes this work from previous applications of genetic algorithms to protein design [47][48][49][50][51][52][53]. ...
Preprint
Full-text available
With recent methodological advances in the field of computational protein design, in particular those based on deep learning, there is an increasing need for frameworks that allow for coherent, direct integration of different models and objective functions into the generative design process. Here we demonstrate how evolutionary multiobjective optimization techniques can be adapted to provide such an approach. With the established Non-dominated Sorting Genetic Algorithm II (NSGA-II) as the optimization framework, we use AlphaFold2 and ProteinMPNN confidence metrics to define the objective space, and a mutation operator composed of ESM-1v and ProteinMPNN to rank and then redesign the least favorable positions. Using the multistate design problem of the foldswitching protein RfaH as an in-depth case study, we show that the evolutionary multiobjective optimization approach leads to significant reduction in the bias and variance in RfaH native sequence recovery, compared to a direct application of ProteinMPNN. We suggest that this improvement is due to three factors: (i) the use of an informative mutation operator that accelerates the sequence space exploration, (ii) the parallel, iterative design process inherent to the genetic algorithm that improves upon the ProteinMPNN autoregressive sequence decoding scheme, and (iii) the explicit approximation of the Pareto front that leads to optimal design candidates representing diverse tradeoff conditions. We anticipate this approach to be readily adaptable to different models and broadly relevant for protein design tasks with complex specifications.
... A straightforward extension samples according to the binding free energy difference between two substrates (Mignon et al., 2020;Opuu et al., 2020), such as α-Met and β-Met. Although design for binding affinity or specificity was pioneered by Havranek and Harbury (2003) and has been applied by others (Dowling et al., 2023;Gainza et al., 2016;Lowegard et al., 2020), the earlier methods did not provide, as here, a Boltzmann sampling of very large ensembles of sequences. Furthermore, in most previous CPD work, enzyme mutations were sampled simply according to the total energy of the enzyme-substrate complex (Lechner et al., 2018;Li et al., 2018;Michael and Simonson, 2022;Richter et al., 2011;Simonson et al., 2016;Stoddard, 2016). ...
Article
Full-text available
Amino acids (AAs) with a noncanonical backbone would be a valuable tool for protein engineering, enabling new structural motifs and building blocks. To incorporate them into an expanded genetic code, the first, key step is to obtain an appropriate aminoacyl‐tRNA synthetase. Currently, directed evolution is not available to optimize AAs with noncanonical backbones, since an appropriate selective pressure has not been discovered. Computational protein design (CPD) is an alternative. We used a new CPD method to redesign MetRS and increase its activity towards β‐Met, which has an extra backbone methylene. The new method considered a few active site positions for design and used a Monte Carlo exploration of the corresponding sequence space. During the exploration, a bias energy was adaptively learned, such that the free energy landscape of the apo enzyme was flattened. Enzyme variants could then be sampled, in the presence of the ligand and the bias energy, according to their β‐Met binding affinities. Eighteen predicted variants were chosen for experimental testing; 10 exhibited detectable activity for β‐Met adenylation. Top predicted hits were characterized experimentally in detail. Dissociation constants, catalytic rates, and Michaelis constants for both α‐Met and β‐Met were measured. The best mutant retained a preference for α‐Met over β‐Met; however, the preference was reduced, compared to the wildtype, by a factor of 29. For this mutant, high resolution crystal structures were obtained in complex with both α‐Met and β‐Met, indicating that the predicted, active conformation of β‐Met in the active site was retained.
... The optimization of functional proteins is a problem with competing objectives [19][20][21][22][23][24]. An example is the design of asymmetric bispecific antibodies [24,[25][26][27][28][29][30] relative to the symmetric chain composition of naturally occurring antibodies. ...
Preprint
Functional biologics design is a multi-objective optimization problem often with competing design objectives. We report on a novel deep learning based protein sequence prediction framework, ZymeSwapNet, that can be customized to handle a wide range of quantifiable design objectives, a current limitation of traditional protein design methods. We train a simple convolutional neural network (1D-CNN) on non-redundant curated protein crystal structures, using a set of geometric and topological features that describes a local protein environment, to predict the likelihood of each amino acid type for residue sites in the design region. While the model can be directly used to rank templates derived from mutagenesis campaigns, we extend the scope by developing a sequence/mutation generator that optimizes the desired multivariate distribution using a Monte-Carlo sampling. Using a case study, the design of a stable heterodimeric Fc (HetFc) antibody domain, we show that we can further include a Metropolis criterion to bias the sampling to enhance features such as the heterodimeric binding specificity, in addition to original sampling objective of enhancing stability. We demonstrate that ZymeSwapNet can generate stable HetFc designs, within minutes that had taken several rounds of rational structure and physical force-field based modelling attempts.
... In the process of molecular recognition, it is necessary to require the molecular specificity and affinity. Molecular specificity may be defined as why two or more binding partners approach and bind together to form a specific complex in aqueous environment [124][125][126][127]. Regarding the binding affinity, it determines whether the complex will be formed in aqueous environments, which is related to the strength of non-covalent interac-tions between partners. ...
Article
Full-text available
Hydrophobic interactions are involved in and believed to be the fundamental driving force of many chemical and biological phenomena in aqueous environments. This review focuses on our current understanding on hydrophobic effects. As a solute is embedded into water, the interface appears between solute and water, which mainly affects the structure of interfacial water (the topmost water layer at the solute/water interface). From our recent structural studies on water and air-water interface, hydration free energy is derived and utilized to investigate the origin of hydrophobic interactions. It is found that hydration free energy depends on the size of solute. With increasing the solute size, it is reasonably divided into initial and hydrophobic solvation processes, and various dissolved behaviors of the solutes are expected in different solvation processes, such as dispersed and accumulated distributions in solutions. Regarding the origin of hydrophobic effects, it is ascribed to the structural competition between the hydrogen bondings of interfacial and bulk water. This can be applied to understand the characteristics of hydrophobic interactions, such as the dependence of hydrophobic interactions on solute size (or concentrations), the directional natures of hydrophobic interactions, and temperature effects on hydrophobic interactions.
... However, the individual protomers, either helical hairpins or individual helices, lack a hydrophobic core and are thus flexible and unstable as monomers, allowing a wide range of potential off-target homo-oligomers to form (Fig. 1B). Explicit negative design methods favor one state by considering the effect of amino acid substitutions on the free energies of both states (15)(16)(17). However, such methods cannot be readily applied to disfavor self association, as there are in general a large number of possible self associated states which cannot be systematically enumerated. ...
Article
Full-text available
Asymmetric multiprotein complexes that undergo subunit exchange play central roles in biology but present a challenge for design because the components must not only contain interfaces that enable reversible association but also be stable and well behaved in isolation. We use implicit negative design to generate β sheet-mediated heterodimers that can be assembled into a wide variety of complexes. The designs are stable, folded, and soluble in isolation and rapidly assemble upon mixing, and crystal structures are close to the computational models. We construct linearly arranged hetero-oligomers with up to six different components, branched hetero-oligomers, closed C4-symmetric two-component rings, and hetero-oligomers assembled on a cyclic homo-oligomeric central hub and demonstrate that such complexes can readily reconfigure through subunit exchange. Our approach provides a general route to designing asymmetric reconfigurable protein systems.
... For example, enzymes adopt different conformations during a catalytic cycle and more generally, all important biological effects are best represented by an ensemble of conformational states; for a review see [13]. This is why multi-state design (MSD) protocols have been introduced, which score a single sequence with respect to conformationally different backbones modelled as states [14][15][16][17][18][19][20][21][22][23][24]. Moreover, MSD allows for negative design, i.e., the computation of sequences that destabilize certain states related to misfolded conformations or an undesired binding interaction [25]. ...
Article
Full-text available
Rational protein design aims at the targeted modification of existing proteins. To reach this goal, software suites like Rosetta propose sequences to introduce the desired properties. Challenging design problems necessitate the representation of a protein by means of a structural ensemble. Thus, Rosetta multi-state design (MSD) protocols have been developed wherein each state represents one protein conformation. Computational demands of MSD protocols are high, because for each of the candidate sequences a costly three-dimensional (3D) model has to be created and assessed for all states. Each of these scores contributes one data point to a complex, design-specific energy landscape. As neural networks (NN) proved well-suited to learn such solution spaces, we integrated one into the framework Rosetta:MSF instead of the so far used genetic algorithm with the aim to reduce computational costs. As its predecessor, Rosetta:MSF:NN administers a set of candidate sequences and their scores and scans sequence space iteratively. During each iteration, the union of all candidate sequences and their Rosetta scores are used to re-train NNs that possess a design-specific architecture. The enormous speed of the NNs allows an extensive assessment of alternative sequences, which are ranked on the scores predicted by the NN. Costly 3D models are computed only for a small fraction of best-scoring sequences; these and the corresponding 3D-based scores replace half of the candidate sequences during each iteration. The analysis of two sets of candidate sequences generated for a specific design problem by means of a genetic algorithm confirmed that the NN predicted 3D-based scores quite well; the Pearson correlation coefficient was at least 0.95. Applying Rosetta:MSF:NN:enzdes to a benchmark consisting of 16 ligand-binding problems showed that this protocol converges ten-times faster than the genetic algorithm and finds sequences with comparable scores.
... MSA showed interesting results when combined with local backbone fluctuation search algorithms for each state. Such multi-state design has been applied successfully in several protein design cases [30,2,4,14,5]. When, instead, the aim is to design a sequence that fits several conformational states that must be adopted for the targeted function (e.g., states defining conformational switches), it is important that the energies of all states contribute to the definition of the fitness. ...
Article
Full-text available
Computational protein design (CPD) is a powerful technique for engineering new proteins, with both great fundamental implications and diverse practical interests. However, the approximations usually made for computational efficiency, using a single fixed backbone and a discrete set of side chain rotamers, tend to produce rigid and hyper-stable folds that may lack functionality. These approximations contrast with the demonstrated importance of molecular flexibility and motions in a wide range of protein functions. The integration of backbone flexibility and multiple conformational states in CPD, in order to relieve the inaccuracies resulting from these simplifications and to improve design reliability, are attracting increased attention. However, the greatly increased search space that needs to be explored in these extensions defines extremely challenging computational problems. In this review, we outline the principles of CPD and discuss recent effort in algorithmic developments for incorporating molecular flexibility in the design process.
... Both specificity and affinity are required in the process of molecular recognition. The molecular specificity can be defined as why two or more binding partners approach and bind together [58][59][60][61], which is fundamental in various fields of molecular and biological sciences. The binding affinity means the strength of these interactions, which determines whether an interaction will be formed in solution. ...
Article
According to recent studies on hydrophobic interactions, we have investigated the directional nature of hydrophobic interactions. Hydrophobic interactions are dependent on the relative orientation between the solutes when they aggregate in water. In the H1w process, they are attracted and approach each other in the specific direction with the lowest energy barrier until their surfaces come into contact. In the H2s process, to maximize the hydrogen bonding of water, the solutes aggregate in a specific direction to minimize the surface area to volume ratio. Additionally, with decreasing separation between the solutes, the solute–solute interactions become stronger. This was demonstrated by calculated the potential of mean force from molecular dynamics simulations. From these results, a hydrophobic-interaction-driven model is proposed to understand the specificity and affinity of molecular recognition in water.
... Early efforts to design specific protein-protein interactions focused on coiled coils due to their structural simplicity. Researchers used a combina-tion of hydrophobic and hydrophilic (polar) interactions, and peripheral charges to program desired binding specificities (Acharya et al., 2006;Fletcher et al., 2012;Gonzalez et al., 1996;Gradi sar and Jerala, 2011;Havranek and Harbury, 2003;Lumb and Kim, 1995;Reinke et al., 2010); however, the limited size of the binding interface between coiled coils restricted the diversity of specificity-determining residues that could fit and therefore limited the number of possible orthogonal pairs that could be designed. ...
Article
A fundamental challenge in synthetic biology is to create molecular circuits that can program complex cellular functions. Because proteins can bind, cleave, and chemically modify one another and interface directly and rapidly with endogenous pathways, they could extend the capabilities of synthetic circuits beyond what is possible with gene regulation alone. However, the very diversity that makes proteins so powerful also complicates efforts to harness them as well-controlled synthetic circuit components. Recent work has begun to address this challenge, focusing on principles such as orthogonality and composability that permit construction of diverse circuit-level functions from a limited set of engineered protein components. These approaches are now enabling the engineering of circuits that can sense, transmit, and process information; dynamically control cellular behaviors; and enable new therapeutic strategies, establishing a powerful paradigm for programming biology.
... Moreover, there is no immediate recipe for updating the designed sequence based on the prediction results-instead, sequences that do not have the designed structure as their lowest-energy state are typically discarded. Multistate design (10)(11)(12) can be carried out to maximize the energy gap between the desired conformation and other specified conformations, but the latter must be known in advance and be relatively few in number for such calculations to be tractable. ...
Article
Full-text available
Significance Almost all proteins fold to their lowest free energy state, which is determined by their amino acid sequence. Computational protein design has primarily focused on finding sequences that have very low energy in the target designed structure. However, what is most relevant during folding is not the absolute energy of the folded state but the energy difference between the folded state and the lowest-lying alternative states. We describe a deep learning approach that captures aspects of the folding landscape, in particular the presence of structures in alternative energy minima, and show that it can enhance current protein design methods.
... Crucial to our AIR architecture is the ability to rationally design drug insensitive receptors that can retain binding activity to the designed binders. In computational design this relates to the well-defined problem of multistate optimization where the sequence space is searched to optimize simultaneously several objective functions 35,36 . In our case, residues in the drug receptor close to the drug and away from the designed binder were selected and sampled for mutations that knocked-out drug binding (negative design) and maintained affinity to the protein binder (positive design) [37][38][39] (Fig. 4a) (see Methods). ...
Preprint
Full-text available
Small-molecule responsive protein switches are crucial components to control synthetic cellular activities. However, the repertoire of small-molecule protein switches is insufficient for many applications, including those in the translational spaces, where properties such as safety, immunogenicity, drug half-life, and drug side-effects are critical. Here, we present a computational protein design strategy to repurpose drug-inhibited protein-protein interactions as OFF- and ON-switches. The designed binders and drug-receptors form chemically-disruptable heterodimers (CDH) which dissociate in the presence of small molecules. To design ON-switches, we converted the CDHs into a multi-domain architecture which we refer to as activation by inhibitor release switches (AIR) that incorporate a rationally designed drug-insensitive receptor protein. CDHs and AIRs showed excellent performance as drug responsive switches to control combinations of synthetic circuits in mammalian cells. This approach effectively expands the chemical space and logic responses in living cells and provides a blueprint to develop new ON- and OFF- switches for basic and translational applications.
... Selectivity can be very difficult to achieve for peptides by rational design, since they can have many potential interaction partners in addition to the desired target [168,169]. However, library screening approaches capable of considering multiple off-targets in addition to the target are being developed. ...
Article
Full-text available
c-Myc is a transcription factor that is constitutively and aberrantly expressed in over 70% of human cancers. Its direct inhibition has been shown to trigger rapid tumor regression in mice with only mild and fully reversible side effects, suggesting this to be a viable therapeutic strategy. Here we reassess the challenges of directly targeting c-Myc, evaluate lessons learned from current inhibitors, and explore how future strategies such as miniaturisation of Omomyc and targeting E-box binding could facilitate translation of c-Myc inhibitors into the clinic.
... In fact, both specificity and affinity are required in the process of molecular recognition. The molecular specificity can be defined why two or more binding partners are allowed to approach and bind together [55][56][57][58], which is fundamental in various fields of molecular and biological sciences. For the binding affinity, it means the strength of these interactions, which determines whether an interaction will be formed in solution or not. ...
Preprint
Full-text available
Based on recent studies on hydrophobic interactions, it is devoted to investigate the directional nature of hydrophobic interactions. It means that the hydrophobic interactions are dependent on the relative orientations as the solutes tend to be aggregated in water. In H1w process, they are attracted to approach each other in the specific direction with lower energy barrier until their surfaces become contact. In H2s process, to maximize the hydrogen bondings of water, the solutes are aggregated in the specific direction to minimize the ratio of surface area to volume of them. Additionally, with decreasing the separation between them, the short-range interactions between the solutes become stronger. In addition, these can be demonstrated by the calculated potential of mean force (PMF) using molecular dynamics simulation. From this work, the hydrophobic driven model is proposed to understand the specificity and affinity of molecular recognition in water.
... One important application of computational interface design includes the alteration of binding specificity. Interface design for selectivity can consider both positive and negative design elements to jointly optimize the desired interactions and reduce unwanted interactions (Bolon et al., 2005;Havranek and Harbury, 2003;Shrestha et al., 2019). ...
Article
Herpes virus entry mediator (HVEM) regulates positive and negative signals for T cell activation through co-signaling pathways. Dysfunction of the HVEM co-signaling network is associated with multiple pathologies related to autoimmunity, infectious disease, and cancer, making the associated molecules biologically and therapeutically attractive targets. HVEM interacts with three ligands from two different superfamilies using two different binding interfaces. The engagement with ligands CD160 and B- and T-lymphocyte attenuator (BTLA), members of immunoglobulin superfamily, is associated with inhibitory signals, whereas inflammatory responses are regulated through the interaction with LIGHT from the TNF superfamily. We computationally redesigned the HVEM recognition interfaces using a residue-specific pharmacophore approach, ProtLID, to achieve switchable-binding specificity. In subsequent cell-based binding assays the new interfaces, designed with only single or double mutations, exhibited selective binding to only one or two out of the three cognate ligands.
... These approaches are based on finding an optimal sequence for a given single structure or ensemble of related states, and do not provide a strategy to construct a protein capable of large on-demand conformational transitions (4,5). A number of multistate protein design algorithms (4,6) have been proposed; however, designing an experimentally confirmed, regulatable multistate protein, or a conformational switch (5), still remains as a challenging task because of the necessity of engineering and controlling multiple protein states (4,7,8). ...
Article
Full-text available
Chemically inducible dimerization (CID) uses a small molecule to induce binding of two different proteins. CID tools such as the FK506-binding protein–FKBP–rapamycin-binding– (FKBP–FRB)–rapamycin system have been widely used to probe molecular events inside and outside cells. While various CID tools are available, chemically inducible trimerization (CIT) does not exist, due to inherent challenges in designing a chemical that simultaneously binds three proteins with high affinity and specificity. Here, we developed CIT by rationally splitting FRB and FKBP. Cellular and structural datasets showed efficient trimerization of split pairs of FRB or FKBP with full-length FKBP or FRB, respectively, by rapamycin. CIT rapidly induced tri-organellar junctions and perturbed intended membrane lipids exclusively at select membrane contact sites. By conferring one additional condition to what is achievable with CID, CIT expands the types of manipulation in single live cells to address cell biology questions otherwise intractable and engineer cell functions for future synthetic biology applications. Chemically inducible trimerization tools based on split FRB or FKBP with full-length FKBP or FRB, respectively, expand the chemogenetics toolbox. Their efficiency and fast kinetics enable new types of protein manipulation in live cells.
... Moreover, there is no immediate recipe for updating the designed sequence based on the prediction results-instead sequences which do not have the designed structure as their lowest energy state are typically discarded. Multistate design (10)(11)(12) can be carried out to maximize the energy gap between the desired conformation and other specified conformations, but the latter must be known in advance and be relatively few in number for such calculations to be tractable. ...
Preprint
Full-text available
The protein design problem is to identify an amino acid sequence which folds to a desired structure. Given Anfinsen’s thermodynamic hypothesis of folding, this can be recast as finding an amino acid sequence for which the lowest energy conformation is that structure. As this calculation involves not only all possible amino acid sequences but also all possible structures, most current approaches focus instead on the more tractable problem of finding the lowest energy amino acid sequence for the desired structure, often checking by protein structure prediction in a second step that the desired structure is indeed the lowest energy conformation for the designed sequence, and discarding the in many cases large fraction of designed sequences for which this is not the case. Here we show that by backpropagating gradients through the trRosetta structure prediction network from the desired structure to the input amino acid sequence, we can directly optimize over all possible amino acid sequences and all possible structures, and in one calculation explicitly design amino acid sequences predicted to fold into the desired structure and not any other. We find that trRosetta calculations, which consider the full conformational landscape, can be more effective than Rosetta single point energy estimations in predicting folding and stability of de novo designed proteins. We compare sequence design by landscape optimization to the standard fixed backbone sequence design methodology in Rosetta, and show that the results of the former, but not the latter, are sensitive to the presence of competing low-lying states. We show further that more funneled energy landscapes can be designed by combining the strengths of the two approaches: the low resolution trRosetta model serves to disfavor alternative states, and the high resolution Rosetta model, to create a deep energy minimum at the design target structure. Significance Computational protein design has primarily focused on finding sequences which have very low energy in the target designed structure. However, what is most relevant during folding is not the absolute energy of the folded state, but the energy difference between the folded state and the lowest lying alternative states. We describe a deep learning approach which captures the entire folding landscape, and show that it can enhance current protein design methods.
... Several characteristics of protein-protein interfaces need to be taken into consideration in a typical model, such as the size, shape, residue interface propensities, interface complementarity, hydrophobicity, secondary structure, and conformational changes (Jones and Thornton 1996). To design the specificity of protein binding, specificity motifs can be constructed using computer algorithms (Havranek and Harbury 2003;Kortemme et al. 2004;Fromer and Linial 2010;Chakraborty et al. 2012). Huang and colleagues used a fast Fourier transform-based docking algorithm to generate a computational model for a dimeric version of the β1 domain of streptococcal protein G (GB1). ...
Article
Full-text available
Conceptual constructive models are a type of scientific model that can be used to construct or reshape the target phenomenon conceptually. Though it has received scant attention from the philosophers, it raises an intriguing issue of how a conceptual constructive model can construct the target phenomenon in a conceptual way. Proponents of the conception of conceptual constructive models are not being explicit about the application of the constructive force of a model in the target construction. It is far from clear that how a conceptual constructive model exerts its constructive force on the constructed phenomenon of interest. Consequently, the function and the epistemic status of a conceptual constructive model are dubious at best. Making use of the conception of abstraction-as-aggregation, I argue that a conceptual constructive model can be used to construct the phenomenon of interest conceptually via a two-step process of abstraction: (1) abstracting away the lower-level details; and (2) aggregating the relevant information into a higher-level composite element. I contend that this process of abstraction, which is not playing the representational role as in a typical representational model, confers the constructive force on a model.
... Structure coordinates can further be built into cryo-EM density maps for large RNA-protein complexes with DRRAFTER (De novo Ribonucleoprotein modeling in Real space through Assembly of Fragments Together with Experimental density in Rosetta) 158 . Redesign and prediction of protein-DNA interfaces 159,160 have been accomplished with flexible protein backbones 161 , genetic algorithms 159,161,162 and motif-biased rotamer sampling 163,164 . A potential limitation is the reliance on fixed DNA backbone conformations, as DNA backbone conformations can be flexible. ...
Article
The Rosetta software for macromolecular modeling, docking and design is extensively used in laboratories worldwide. During two decades of development by a community of laboratories at more than 60 institutions, Rosetta has been continuously refactored and extended. Its advantages are its performance and interoperability between broad modeling capabilities. Here we review tools developed in the last 5 years, including over 80 methods. We discuss improvements to the score function, user interfaces and usability. Rosetta is available at http://www.rosettacommons.org.
... When engineering new aTF biosensor specificity, it is important to acknowledge that relaxation of ligand specificity towards cognate ligands can impose a challenge in maintaining allostery in transcriptional regulators (16), and for this reason engineering specificity requires both negative selection (i.e. loss of specificity for the native ligand) and positive selection (i.e. gain of specificity for the new ligand) (28,45). Hence, similar to evolving dynamic range variants, we carried out a toggled selection procedure using adipic acid as an inducer in the ON state, and subsequently sorted variants without background fluorescence under uninduced conditions (OFF state) ( Figure 2B). ...
Article
Full-text available
Allosteric transcription factors (aTFs) have proven widely applicable for biotechnology and synthetic biology as ligand-specific biosensors enabling real-time monitoring, selection and regulation of cellular metabolism. However, both the biosensor specificity and the correlation between ligand concentration and biosensor output signal, also known as the transfer function, often needs to be optimized before meeting application needs. Here, we present a versatile and high-throughput method to evolve prokaryotic aTF specificity and transfer functions in a eukaryote chassis, namely baker's yeast Saccharomyces cerevisiae. From a single round of mutagenesis of the effector-binding domain (EBD) coupled with various toggled selection regimes, we robustly select aTF variants of the cis,cis-muconic acid-inducible transcription factor BenM evolved for change in ligand specificity, increased dynamic output range, shifts in operational range, and a complete inversion-of-function from activation to repression. Importantly, by targeting only the EBD, the evolved biosensors display DNA-binding affinities similar to BenM, and are functional when ported back into a prokaryotic chassis. The developed platform technology thus leverages aTF evolvability for the development of new host-agnostic biosensors with user-defined small-molecule specificities and transfer functions.
... Redesign and prediction of protein-DNA interfaces is also possible in Rosetta 145,146 and has been accomplished with flexible protein backbones 147 and motif-biased rotamer sampling 148 . However, the biggest limitation of these approaches is that they rely on fixed DNA backbone conformations, which in nature can be highly flexible. ...
Preprint
Full-text available
The Rosetta software suite for macromolecular modeling, docking, and design is widely used in pharmaceutical, industrial, academic, non-profit, and government laboratories. Despite its broad modeling capabilities, Rosetta remains consistently among leading software suites when compared to other methods created for highly specialized protein modeling and design tasks. Developed for over two decades by a global community of over 60 laboratories, Rosetta has undergone multiple refactorings, and now comprises over three million lines of code. Here we discuss methods developed in the last five years in Rosetta, involving the latest protocols for structure prediction; protein–protein and protein–small molecule docking; protein structure and interface design; loop modeling; the incorporation of various types of experimental data; modeling of peptides, antibodies and proteins in the immune system, nucleic acids, non-standard chemistries, carbohydrates, and membrane proteins. We briefly discuss improvements to the energy function, user interfaces, and usability of the ­­software. Rosetta is available at www.rosettacommons.org.
... Computational design algorithms have been used to design new folds (Kuhlman et al., 2003), enzymatic functions (Rothlisberger et al., 2008), and novel binding functions (Looger et al., 2003). Computational approaches for optimizing selectivity often require both positive design considerations to stabilize the desired interactions and negative design considerations to distinguish among a number of sequence and structurally similar competitor molecules (Havranek and Harbury, 2003;Bolon et al., 2005). ...
Article
Chronic or persistent stimulation of the programmed cell death-1 (PD-1) pathway prevents T cells from mounting anti-tumor and anti-viral immune responses. Blockade of this inhibitory checkpoint pathway has shown therapeutic importance by rescuing T cells from their exhausted state. Cognate ligands of the PD-1 receptor include the tissue-specific PD-L1 and PD-L2 proteins. Engineering a human PD-1 interface specific for PD-L1 or PD-L2 can provide a specific reagent and therapeutic advantage for tissue-specific disruption of the PD-1 pathway. We utilized ProtLID, a computational framework, which constitutes a residue-based pharmacophore approach, to custom-design a human PD-1 interface specific to human PD-L1 without any significant affinity to PD-L2. In subsequent cell assay experiments, half of all single-point mutant designs proved to introduce a statistically significant selectivity, with nine of these maintaining a close to wild-type affinity to PD-L1. This proof-of-concept study suggests a general approach to re-engineer protein interfaces for specificity. Shrestha et al. present a computational approach that employs a residue-based pharmacophore approach to design mutations for the interface of PD-1, specific to one of its cognate ligands only, PD-L1 without any significant affinity to PD-L2. In subsequent cell assay experiments half of all single-point mutant designs proved to introduce a statistically significant selectivity.
... A gap in binding free energy or affinity leads to significant population discrimination of the native complex against alternative ones [19][20][21][22][23][24], which is the requirement for the proper functions of the specific biomolecular recognitions in cell. Recent works taking the consideration of specificity into the computational design and optimization of interface interactions has achieved a few successful applications [6,[19][20][21][22][23][25][26][27][28][29]. These works designed and optimized the interactions that seek to stabilize the desired structures and also destabilize the competitive structures. ...
... Large-Scale Panels in Multistate Design. Multistate design has been successful in a number of different applications; however, it is generally applied to modulating specificity in protein-protein binding partners (24)(25)(26) or modeling conformational ensembles of a single protein (27). We instead focused here on design of an antibody against a large ensemble of targets. ...
Article
Full-text available
Influenza is a yearly threat to global public health. Rapid changes in influenza surface proteins resulting from antigenic drift and shift events make it difficult to readily identify antibodies with broadly neutralizing activity against different influenza subtypes with high frequency, specifically antibodies targeting the receptor binding domain (RBD) on influenza HA protein. We developed an optimized computational design method that is able to optimize an antibody for recognition of large panels of antigens. To demonstrate the utility of this multistate design method, we used it to redesign an antiinfluenza antibody against a large panel of more than 500 seasonal HA antigens of the H1 subtype. As a proof of concept, we tested this method on a variety of known antiinfluenza antibodies and identified those that could be improved computationally. We generated redesigned variants of antibody C05 to the HA RBD and experimentally characterized variants that exhibited improved breadth and affinity against our panel. C05 mutants exhibited improved affinity for three of the subtypes used in design by stabilizing the CDRH3 loop and creating favorable electrostatic interactions with the antigen. These mutants possess increased breadth and affinity of binding while maintaining high-affinity binding to existing targets, surpassing a major limitation up to this point.
... Previous studies mutated interfacial amino acids and changed the rigid-body orientation of the colE wt7 /Im wt7 pair to block binding to the wild-type partners 11,12 . These and other computational specificity-design studies yielded at most two orders of magnitude difference in affinity between the newly designed partners and undesired interactions between the designed and wild-type proteins [11][12][13][14][15][16][17][18][19] . ...
Article
Full-text available
Protein networks in all organisms comprise homologous interacting pairs. In these networks, some proteins are specific, interacting with one or a few binding partners, whereas others are multispecific and bind a range of targets. We describe an algorithm that starts from an interacting pair and designs dozens of new pairs with diverse backbone conformations at the binding site as well as new binding orientations and sequences. Applied to a high-affinity bacterial pair, the algorithm results in 18 new ones, with cognate affinities from pico- to micromolar. Three pairs exhibit 3-5 orders of magnitude switch in specificity relative to the wild type, whereas others are multispecific, collectively forming a protein-interaction network. Crystallographic analysis confirms design accuracy, including in new backbones and polar interactions. Preorganized polar interaction networks are responsible for high specificity, thus defining design principles that can be applied to program synthetic cellular interaction networks of desired affinity and specificity.
... However, for multistate protein design with substate ensembles, no exact algorithm exists except an extension of DEE/A*-COMETS (Constrained Optimization of Multi-state Energies by Tree Search) (Hallen and Donald, 2015). Progress has been focused on heuristic or approximation algorithms (Grigoryan et al., 2009;Harbury et al., 1998;Havranek and Harbury, 2003;Leaver-Fay et al., 2011a;Loffler et al., 2017;Negron and Keating, 2013;Sevy et al., 2015). For multistate design, DEE has been extended to type-dependent DEE where only rotamers of the same amino-acid type can prune each other (Yanover et al., 2007). ...
Article
Full-text available
Motivation: Multistate protein design addresses real-world challenges, such as multi-specificity design and backbone flexibility, by considering both positive and negative protein states with an ensemble of substates for each. It also presents an enormous challenge to exact algorithms that guarantee the optimal solutions and enable a direct test of mechanistic hypotheses behind models. However, efficient exact algorithms are lacking for multistate protein design. Results: We have developed an efficient exact algorithm called interconnected cost function networks (iCFN) for multistate protein design. Its generic formulation allows for a wide array of applications such as stability, affinity and specificity designs while addressing concerns such as global flexibility of protein backbones. iCFN treats each substate design as a weighted constraint satisfaction problem (WCSP) modeled through a CFN; and it solves the coupled WCSPs using novel bounds and a depth-first branch-and-bound search over a tree structure of sequences, substates, and conformations. When iCFN is applied to specificity design of a T-cell receptor, a problem of unprecedented size to exact methods, it drastically reduces search space and running time to make the problem tractable. Moreover, iCFN generates experimentally-agreeing receptor designs with improved accuracy compared with state-of-the-art methods, highlights the importance of modeling backbone flexibility in protein design, and reveals molecular mechanisms underlying binding specificity. Availability and implementation: https://shen-lab.github.io/software/iCFN Supplementary information Supplementary data are available at Bioinformatics online.
... Predicting the specificity of protein-protein interactions is more complex than predicting affinity. While specificity prediction requires both positive design (i.e., stabilization of the desired complex) and negative design (i.e., destabilization of unwanted complexes), affinity prediction considers only positive design [10,20]. For instance, computationally saturated mutagenesis and similar classical approaches focus chiefly on single targets (namely, stabilization of the desired complex) and only allow for testing the effects of single mutations [1,3,9,21,22]. ...
Article
Full-text available
Developing selective inhibitors for proteolytic enzymes that share high sequence homology and structural similarity is important for achieving high target affinity and functional specificity. Here, we used a combination of yeast surface display and dual-color selective library screening to obtain selective inhibitors for each of the matrix metalloproteinases (MMPs) MMP14 and MMP9 by modifying the non-specific N-terminal domain of the tissue inhibitor of metalloproteinase-2 (N-TIMP2). We generated inhibitor variants with 30- to 1175-fold improved specificity to each of the proteases, respectively, relative to wild type N-TIMP2. These biochemical results accurately predicted the selectivity and specificity obtained in cell-based assays. In U87MG cells, the activation of MMP2 by MMP14 was inhibited by MMP14-selective blockers but not MMP9-specific inhibitors. Target specificity was also demonstrated in MCF-7 cells stably expressing either MMP14 or MMP9, with only the MMP14-specific inhibitors preventing the mobility of MMP14-expressing cells. Similarly, the mobility of MMP9-expressing cells was inhibited by the MMP9-specific inhibitors, yet was not altered by the MMP14-specific inhibitors. The strategy developed in this study for improving the specificity of an otherwise broad-spectrum inhibitor will likely enhance our understanding of the basis for target specificity of inhibitors to proteolytic enzymes, in general, and to MMPs, in particular. We, moreover, envision that this study could serve as a platform for the development of next-generation, target-specific therapeutic agents. Finally, our methodology can be extended to other classes of proteolytic enzymes and other important target proteins.
... The abstraction processes in the computational modeling, as I shall discuss in this paper, are critical for the successful construction of protein-protein interfaces. The advancement of computational tools has greatly enhanced the protein affinity and altered specificity in the protein construction (Havranek and Harbury 2002;Karanicolas and Kuhlman 2009). A well-designed computational algorithm may facilitate compound modification for lead structure optimizations (Hartenfeller and Schneider 2011). ...
Article
Full-text available
Computational modeling is one of the primary approaches to constructing protein–protein interfaces in the laboratory. The algorithm-driven computational protein design has been successfully applied to the construction of functional proteins with improved binding affinity and increased thermostability. It is intriguing how a computational protein modeling approach can construct and shape the reality of new functional proteins from scratch. I articulate an account of abstraction and exploration-driven strategies in this computational endeavor. I aim to show that how a computational modelling approach, which is laden with mathematics and algorithms, can have a constructive force on the target protein.
... Structural moieties that determine specific interactions are often termed positive-and negative-design elements [63,64]. In the context of our work, Gα residues that contribute favorably to interactions with RGS2 act as positivedesign elements, while Gα residues that perturb favorable interactions with RGS2 can be considered as negativedesign elements. ...
Article
Regulators of G protein Signaling (RGS) proteins inactivate Gα subunits, thereby controling G protein-coupled signaling networks. Among all RGS proteins, RGS2 is unique in interacting only with the Gαq and not with the Gαi sub-family. Previous studies suggested that this specificity is determined by the RGS domain, and in particular by three RGS2-specific residues that lead to a unique mode of interaction with Gαq This interaction was further proposed to act through contacts with the Gα GTPase domain. Here, we combined energy calculations and GTPase activity measurements to determine which Gα residues dictate specificity toward RGS2. We identified putative specificity-determining residues in the Gα helical domain, which among G proteins is found only in Gα subunits. Replacing these helical domain residues in Gαi with their Gαq counterparts resulted in a dramatic specificity-switch towards RGS2. We further show that Gα-RGS2 specificity is set by Gαi residues that perturb interactions with RGS2, and by Gαq residues that enhance these interactions. These results show, for the first time, that the Gα helical domain is central to dictating specificity towards RGS2, suggesting this domain plays a general role in governing Gα-RGS specificity. Our insights provide new options for manipulating RGS-G protein interactions in vivo , for better understanding of their "wiring" into signaling networks, and for devising novel drugs targeting such interactions.
... Additionally, to be useful in cell-engineering applications, the designed proteins should retain the properties of signaling switches and signal only upon receiving extracellular ligand agonist stimuli (21,22). To achieve these goals, we further developed our homology modeling (23,24) and design (25,26) techniques (SI Appendix, Supplementary Methods, Supplementary Discussion, and Fig. S1). ...
Article
Significance Membrane receptors sense and translate extracellular signals into specific intracellular functions and represent powerful molecules for biomedicine and synthetic biology. However, the computational design of receptors with novel signaling functions remains challenging because receptors often couple to multiple intracellular proteins with poorly characterized interactions. We developed a computational approach to model and rationally engineer orthogonal receptor-intracellular protein pairs that bind and signal with high selectivity. The orthogonal receptor-effector systems coupled with high efficiency and triggered the intended cellular functions without interfering with natural systems. The designed proteins displayed key, distinct sequence motifs when compared with native proteins, which expanded the alphabet of receptor-effector recognition. This design approach can be used to reprogram cellular functions in cell-engineering applications.
... Looking beyond the RGS family, specific protein-protein interactions between families of signaling proteins are crucial for the wiring of signaling networks. Useful terms to define structural elements that determine such specific interactions were coined by the protein design field: "Positive design elements" stabilize favorable interactions that strengthen particular protein pairings, whereas "Negative design elements" introduce unfavorable interactions that limit selected interactions between some family members (66,67). In particular, negative design elements are critical specificity determinants among well-studied examples in protein-protein interactions, such as heterodimeric coiled-coil pairs (68,69), colicin-immunity protein interactions (70,71), -lactamase and its protein inhibitors (72), and Bcl-2 receptors binding to BH3-only proteins (73). ...
Article
Understanding the molecular basis of interaction specificity between RGS (regulator of G protein signaling) proteins and heterotrimeric (αβγ) G proteins would enable the manipulation of RGS-G protein interactions, explore their functions, and effectively target them therapeutically. RGS proteins are classified into four subfamilies (R4, R7, RZ, and R12) and function as negative regulators of G protein signaling by inactivating Gα subunits. We found that the R12 subfamily members RGS10 and RGS14 had lower activity than most R4 subfamily members toward the Gi subfamily member Gαo. Using structure-based energy calculations with multiple Gα-RGS complexes, we identified R12-specific residues in positions that are predicted to determine the divergent activity of this subfamily. This analysis predicted that these residues, which we call “disruptor residues,” interact with the Gα helical domain. We engineered the R12 disruptor residues into the RGS domains of the high-activity R4 subfamily and found that these altered proteins exhibited reduced activity toward Gαo. Reciprocally, replacing the putative disruptor residues in RGS18 (a member of the R4 subfamily that exhibited low activity toward Gαo) with the corresponding residues from a high-activity R4 subfamily RGS protein increased its activity toward Gαo. Furthermore, the high activity of the R4 subfamily toward Gαo was independent of the residues in the homologous positions to the R12 subfamily and RGS18 disruptor residues. Thus, our results suggest that the identified RGS disruptor residues function as negative design elements that attenuate RGS activity for specific Gα proteins.
... Molecular inverse design beyond the purview of drug design has of late enjoyed increasing popularity. Examples can be found in protein design [1][2][3][4] or highhyperpolarizability materials [5][6][7]. The design problem is complicated by the vastness of possible chemicals, termed chemical space. ...
Article
Full-text available
Finding optimal solutions to design problems in chemistry is hampered by the combinatorially large search space. We develop a general theoretical framework for finding chemical compounds with prescribed properties using nuclear charge distributions. The key is the reformulation of the design problem into an optimization problem on probability density functions in chemical space. In order to achieve tractability, a constrained search formalism on the nuclear charge distributions, which are non-negative, is used to reduce the dimensionality of the problem. Furthermore, we introduce approximations to the exact functional, as derived, for common design properties and constraints. © 2018 This is a U.S. Government work and not under
Article
Activating transcription factor 3 (ATF3) is an activation transcription factor/cyclic adenosine monophosphate (cAMP) responsive element-binding (CREB) protein family member. It is recognized as an important regulator of cancer progression by repressing expression of key inflammatory factors such as interferon-γ and chemokine (C–C motif) ligand 4 (CCL4). Here, we describe a novel library screening approach that probes individual leucine zipper components before combining them to search exponentially larger sequence spaces not normally accessible to intracellular screening. To do so, we employ two individual semirational library design approaches and screen using a protein-fragment complementation assay (PCA). First, a 248,832-member library explored 12 amino acid positions at all five a positions to identify those that provided improved binding, with all e/g positions fixed as Q, placing selection pressure onto the library options provided. Next, a 59,049-member library probed all ten e/g positions with 3 options. Similarly, during e/g library screening, a positions were locked into a generically bindable sequence pattern (AIAIA), weakly favoring leucine zipper formation, while placing selection pressure onto e/g options provided. The combined a/e/g library represents ∼14.7 billion members, with the resulting peptide, ATF3W_aeg, binding ATF3 with high affinity (Tm = 60 °C; Kd = 151 nM) while strongly disfavoring homodimerization. Moreover, ATF3W_aeg is notably improved over component PCA hits, with target specificity found to be driven predominantly by electrostatic interactions. The combined a/e/g exponential library screening approach provides a robust, accelerated platform for exploring larger peptide libraries, toward derivation of potent yet selective antagonists that avoid homoassociation to provide new insight into rational peptide design.
Article
Full-text available
Biomolecular recognition usually leads to the formation of binding complexes, often accompanied by large-scale conformational changes. This process is fundamental to biological functions at the molecular and cellular levels. Uncovering the physical mechanisms of biomolecular recognition and quantifying the key biomolecular interactions are vital to understand these functions. The recently developed energy landscape theory has been successful in quantifying recognition processes and revealing the underlying mechanisms. Recent studies have shown that in addition to affinity, specificity is also crucial for biomolecular recognition. The proposed physical concept of intrinsic specificity based on the underlying energy landscape theory provides a practical way to quantify the specificity. Optimization of affinity and specificity can be adopted as a principle to guide the evolution and design of molecular recognition. This approach can also be used in practice for drug discovery using multidimensional screening to identify lead compounds. The energy landscape topography of molecular recognition is important for revealing the underlying flexible binding or binding-folding mechanisms. In this review, we first introduce the energy landscape theory for molecular recognition and then address four critical issues related to biomolecular recognition and conformational dynamics: (1) specificity quantification of molecular recognition; (2) evolution and design in molecular recognition; (3) flexible molecular recognition; (4) chromosome structural dynamics. The results described here and the discussions of the insights gained from the energy landscape topography can provide valuable guidance for further computational and experimental investigations of biomolecular recognition and conformational dynamics.
Chapter
Scoring function of protein-ligand interactions is used to recognize the “native” binding pose of a ligand on the protein and to predict the binding affinity, so that the active small molecules can be discriminated from the non-active ones. Scoring function is widely used in computationally molecular docking and structure-based drug discovery. The development and improvement of scoring functions have broad implications in pharmaceutical industry and academic research. During the past three decades, much progress have been made in methodology and accuracy for scoring functions, and many successful cases have be witnessed in virtual database screening. In this chapter, the authors introduced the basic types of scoring functions and their derivations, the commonly-used evaluation methods and benchmarks, as well as the underlying challenges and current solutions. Finally, the authors discussed the promising directions to improve and develop scoring functions for future molecular docking-based drug discovery.
Article
Computational protein design relies on simulations of a protein structure, where selected amino acids can mutate randomly, and mutations are selected to enhance a target property, such as stability. Often, the protein backbone is held fixed and its degrees of freedom are modeled implicitly to reduce the complexity of the conformational space. We present a hybrid method where short molecular dynamics (MD) segments are used to explore conformations and alternate with Monte Carlo (MC) moves that apply mutations to side chains. The backbone is fully flexible during MD. As a test, we computed side chain acid/base constants or pKa’s in five proteins. This problem can be considered a special case of protein design, with protonation/deprotonation playing the role of mutations. The solvent was modeled as a dielectric continuum. Due to cost, in each protein we allowed just one side chain position to change its protonation state and the other position to change its type or mutate. The pKa’s were computed with a standard method that scans a range of pH values and with a new method that uses adaptive landscape flattening (ALF) to sample all protonation states in a single simulation. The hybrid method gave notably better accuracy than standard, fixed-backbone MC. ALF decreased the computational cost a factor of 13.
Article
The concentration of each protein in the cellular heterogeneous mixture is precisely regulated for function and to prevent aggregation. The regulation follows a simple linear law that is counter-intuitive considering the combinatorial number of potential protein-protein interactions. With computer simulations and experiments, we prove that in a protein mixture, folding takes place undisturbed up to the aggregation concentration of each isolated species. Our results open new possibilities to understand the evolutionary physiology of cells because they imply that proteins can be optimised to fold and regulated independently of the other proteins in the cell. Moreover, our findings suggest that our protein design protocol generates sequences that are not prone to aggregation and ease the requirement for negative design procedures.
Article
Protein design, also known as the inverse protein folding problem, is the identification of a protein sequence that folds into a target protein structure. Protein design is proved as an NP-hard problem. While researchers are working on designing heuristics with an emphasis on new scoring functions, we propose a replica-exchange Monte Carlo (REMC) search algorithm that ensures faster convergence using a greedy strategy. Using biological insights, we construct an evolutionary profile to encode the amino acid variability in different positions of the target protein from its structural homologs. The evolutionary profile guides the REMC, and the greedy approach confirms appreciable exploration and exploitation of the sequence-structure fitness surface. We allow termination of a simulation trajectory once stagnant situation is detected. A series of sequence and structure level validations establish the goodness of our design. On a benchmark dataset, our algorithm reports an average root-mean-square deviation of 1.21 between the target and the design proteins when modeled with protein folding software. Besides, our algorithm assures 6.16 times overall speedup. In Molecular Dynamics simulations, we observe that four out of selected five design proteins report better to comparable stability to the corresponding target proteins.
Article
The human fibroblast growth factor-2 (FGF-2) highly expressed in tumors is an important factor to promote tumor angiogenesis and lymphangiogensis. A disulfide-stabilized diabody (ds-Diabody) could specifically target FGF-2 and show its advantages in inhibition of tumor angiogenesis and growth. It is very important for antibody drugs to confirm the fine epitope. Here, theoretical structure models of FGF-2 and antibody were built by homology modeling. The amino acid residues in the interaction interface of antigen and antibody were analyzed by molecular docking. The potential epitope was predicted by homology modeling and molecular docking of antigen-antibody and site-directed mutation assays of alanine scanning. The predicted epitope was verified by antigen mutagenesis and enzyme-linked immunosorbent assay (ELISA). The epitope mapping assay showed that the epitope of ds-Diabody against FGF-2 was defined by the discontinuous sites including six amino acid residues (P23, Q65, R69, G70, Y82 and R118). The results showed that the epitope was localized in the interaction interface of FGF-2 and ds-Diabody. The fine epitope mapping provided the important information for understanding the inhibition activity of ds-Diabody against FGF-2 and helping in the further development of ds-Diabody against FGF-2 as a potentially promising antibody drug for future cancer therapy.
Article
T cell receptors (TCRs) have emerged as a new class of immunological therapeutics. However, though antigen specificity is a hallmark of adaptive immunity, TCRs themselves do not possess the high specificity of monoclonal antibodies. Although a necessary function of T cell biology, the resulting cross-reactivity presents a significant challenge for TCR-based therapeutic development, as it creates the potential for off-target recognition and immune toxicity. Efforts to enhance TCR specificity by mimicking the antibody maturation process and enhancing affinity can inadvertently exacerbate TCR cross-reactivity. Here we demonstrate this concern by showing that even peptide-targeted mutations in the TCR can introduce new reactivities against peptides that bear similarity to the original target. To counteract this, we explored a novel structure-guided approach for enhancing TCR specificity independent of affinity. Tested with the MART-1-specific TCR DMF5, our approach had a small but discernible impact on cross-reactivity toward MART-1 homologs yet was able to eliminate DMF5 cross-recognition of more divergent, unrelated epitopes. Our study provides a proof of principle for the use of advanced structure-guided design techniques for improving TCR specificity, and it suggests new ways forward for enhancing TCRs for therapeutic use.
Article
Molecular recognition is a critical process for many biological functions and consists in non-covalent binding of different molecules, such as protein-protein, antigen-antibody and many others. The host-guest molecules involved often show a shape complementarity, and one of the leading specification for molecular recognition is that the interaction must be selective, i.e. the host should strongly bind to one selected guest and poorly, if at all, to all other biomolecules. Our work focuses on the role played by the chemical heterogeneity and the steric compatibility on the selectivity power of the binding site between two proteins. We tackle the problem computationally, reducing the complexity of the system by simulating a protein and a surface-like element, that shapes part of the protein and represents the binding site of an interaction partner. We investigate four systems, differing in terms of binding site size. A significant result is that, despite the fact that protein and surface chemical sequences are interdependent and simultaneously generated to stabilise the bound folded structure, the protein is stable in the folded conformation even in the absence of the surface-like partner for all investigated systems. We observe that an increase of the surface area results in a significant increase of the binding affinity. Interestingly, our data suggest the presence of upper and lower limits for the maximum and minimum area size available for a binding site. Our data match the experimental observation of such limits (750 -1500~Ų), and provide a rationale for them: the extent of the binding site area is limited by the value of the binding constant. For large contact areas, at physiological conditions, the binding is orders of magnitude stronger (Ka > 10⁴⁰ l/mol) that what typically observed in natural biological processes. Conversely, the smallest surface tested is just the minimal size to allow for selective binding.
Article
Full-text available
A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent mathematical results on the stochastic properties of MSP scores allow an analysis of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the analysis of multiple regions of similarity in long DNA sequences. In addition to its flexibility and tractability to mathematical analysis, BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.
Article
Full-text available
A DNA sequence coding for the immunogenic capsid protein VP3 of foot-and-mouth disease virus A12, prepared from the virion RNA, was ligated to a plasmid designed to express a chimeric protein from the Escherichia coli tryptophan promoter-operator system. When Escherichia coli transformed with this plasmid was grown in tryptophan-depleted media, approximately 17 percent of the total cellular protein was found to be an insoluble and stable chimeric protein. The purified chimeric protein competed equally on a molar basis with VP3 for specific antibodies to foot-and-mouth disease virus. When inoculated into six cattle and two swine, this protein elicited high levels of neutralizing antibody and protection against challenge with foot-and-mouth disease virus.
Article
Full-text available
The stability of globular proteins arises largely from the burial of non-polar amino acids in their interior. These residues are efficiently packed to eliminate energetically unfavorable cavities. Contrary to these observations, high resolution X-ray crystallographic analyses of four homologous lipases from filamentous fungi reveal an alpha/beta fold which contains a buried conserved constellation of charged and polar side chains with associated cavities containing ordered water molecules. It is possible that this structural arrangement plays an important role in interfacial catalysis.
Article
Full-text available
Progress in homology modeling and protein design has generated considerable interest in methods for predicting side-chain packing in the hydrophobic cores of proteins. Present techniques are not practically useful, however, because they are unable to model protein main-chain flexibility. Parameterization of backbone motions may represent a general and efficient method to incorporate backbone relaxation into such fixed main-chain models. To test this notion, we introduce a method for treating explicitly the backbone motions of alpha-helical bundles based on an algebraic parameterization proposed by Francis Crick in 1953 [Crick, F. H. C. (1953) Acta Crystallogr. 6, 685-689]. Given only the core amino acid sequence, a simple calculation can rapidly reproduce the crystallographic main-chain and core side-chain structures of three coiled coils (one dimer, one trimer, and one tetramer) to within 0.6-A root-mean-square deviations. The speed of the predictive method [approximately 3 min per rotamer choice on a Silicon Graphics (Mountain View, CA) 4D/35 computer] permits it to be used as a design tool.
Article
Full-text available
We report a blind test of lattice-model-based search strategies for finding global minima of model protein chains. One of us (E.I.S.) selected 10 compact conformations of 48-mer chains on the three-dimensional cubic lattice and used their inverse folding algorithm to design HP (H, hydrophobic; P, polar) sequences that should fold to those "target" structures. The sequences, but not the structures, were sent to the UCSF group (K.Y., K.M.F., P.D.T., H.S.C., and K.A.D.), who used two methods to attempt to find the globally optimal conformations: "hydrophobic zippers" and a constraint-based hydrophobic core construction (CHCC) method. The CHCC method found global minima in all cases, and the hydrophobic zippers method found global minima in some cases, in minutes to hours on workstations. In 9 out of 10 sequences, the CHCC method found lower energy conformations than the 48-mers were designed to fold to. Thus the search strategies succeed for the HP model but the design strategy does not. For every sequence the global energy minimum was found to have multiple degeneracy with 10(3) to 10(6) conformations. We discuss the implications of these results for (i) searching conformational spaces of simple models of proteins and (ii) how these simple models relate to proteins.
Article
Full-text available
A backbone-dependent rotamer library for amino acid side-chains is developed and used for constructing protein side-chain conformations from the main-chain co-ordinates. The rotamer library is obtained from 132 protein chains in the Brookhaven Protein Database. A grid of 20 degrees by 20 degrees blocks for the main-chain angles phi, psi is used in the rotamer library. Significant correlations are found between side-chain dihedral angle probabilities and backbone phi, psi values. These probabilities are used to place the side-chains on the known backbone in test applications for six proteins for which high-resolution crystal structures are available. A minimization scheme is used to reorient side-chains that conflict with the backbone or other side-chains after the initial placement. The initial placement yields 59% of both chi 1 and chi 2 values in the correct position (to within 40 degrees) for thermolysin to 81% for crambin. After refinement the values range from 61% (lysozyme) to 89% (crambin). It is evident from the results that a single protein does not adequately test a prediction scheme. The computation time required by the method scales linearly with the number of side-chains. An initial prediction from the library takes only a few seconds of computer time, while the iterative refinement takes on the order of hours. The method is automated and can easily be applied to aid experimental side-chain determinations and homology modeling. The high degree of correlation between backbone and side-chain conformations may introduce a simplification in the protein folding process by reducing the available conformational space.
Article
Full-text available
This review examines protein complexes in the Brookhaven Protein Databank to gain a better understanding of the principles governing the interactions involved in protein-protein recognition. The factors that influence the formation of protein-protein complexes are explored in four different types of protein-protein complexes--homodimeric proteins, heterodimeric proteins, enzyme-inhibitor complexes, and antibody-protein complexes. The comparison between the complexes highlights differences that reflect their biological roles.
Article
Full-text available
A series of synthetic receptors capable of binding to the calmodulin-binding domain of calcineurin (CN393-414) was designed, synthesized and characterized. The design was accomplished by docking CN393-414 against a two-helix receptor, using an idealized three-stranded coiled coil as a starting geometry. The sequence of the receptor was chosen using a side-chain re-packing program, which employed a genetic algorithm to select potential binders from a total of 7.5x10(6) possible sequences. A total of 25 receptors were prepared, representing 13 sequences predicted by the algorithm as well as 12 related sequences that were not predicted. The receptors were characterized by CD spectroscopy, analytical ultracentrifugation, and binding assays. The receptors predicted by the algorithm bound CN393-414 with apparent dissociation constants ranging from 0.2 microM to >50 microM. Many of the receptors that were not predicted by the algorithm also bound to CN393-414. Methods to circumvent this problem and to improve the automated design of functional proteins are discussed.
Article
Full-text available
Recent advances in computational techniques have allowed the design of precise side-chain packing in proteins with predetermined, naturally occurring backbone structures. Because these methods do not model protein main-chain flexibility, they lack the breadth to explore novel backbone conformations. Here the de novo design of a family of α-helical bundle proteins with a right-handed superhelical twist is described. In the design, the overall protein fold was specified by hydrophobic-polar residue patterning, whereas the bundle oligomerization state, detailed main-chain conformation, and interior side-chain rotamers were engineered by computational enumerations of packing in alternate backbone structures. Main-chain flexibility was incorporated through an algebraic parameterization of the backbone. The designed peptides form α-helical dimers, trimers, and tetramers in accord with the design goals. The crystal structure of the tetramer matches the designed structure in atomic detail.
Article
Full-text available
It is generally accepted that many different protein sequences have similar folded structures, and that there is a relatively high probability that a new sequence possesses a previously observed fold. An indirect consequence of this is that protein design should define the sequence space accessible to a given structure, rather than providing a single optimized sequence. We have recently developed a new approach for protein sequence design, which optimizes the complete sequence of a protein based on the knowledge of its backbone structure, its amino acid composition and a physical energy function including van der Waals interactions, electrostatics, and environment free energy. The specificity of the designed sequence for its template backbone is imposed by keeping the amino acid composition fixed. Here, we show that our procedure converges in sequence space, albeit not to the native sequence of the protein. We observe that while polar residues are well conserved in our designed sequences, non-polar amino acids at the surface of a protein are often replaced by polar residues. The designed sequences provide a multiple alignment of sequences that all adopt the same three-dimensional fold. This alignment is used to derive a profile matrix for chicken triose phosphate isomerase, TIM. The matrix is found to recognize significantly the native sequence for TIM, as well as closely related sequences. Possible application of this approach to protein fold recognition is discussed.
Article
The tractability of many algorithms for determining the energy state of a system depends on the pairwise nature of an energy expression. Some energy terms, such as the standard implementation of the van der Waals potential, satisfy this criterion whereas others do not. One class of important potentials that are not pairwise involves benefits and penalties for burying hydrophobic and/or polar surface areas. It has been found previously that, in some cases, a pairwise approximation to these surface areas correlates with the true surface areas. We set out to generalize the applicability of this approximation. Results: We develop a pairwise expression with one scalable parameter that closely reproduces both the true buried and the true exposed solvent- accessible surface areas. We then refit our previously published coiled-coil stability data to give solvation parameters of 26 cal/mol Å^2 favoring hydrophobic burial and 100 cal/mol Å^2 opposing polar burial. Conclusions: An accurate pairwise approximation to calculate exposed and buried protein solvent-accessible surface area is achieved.
Article
We have solved, refined, and analyzed the 2.0-A resolution crystal structure of a 1:l complex between the baca certain degree of flexibility within the barnase active site is required to allow for the structural differences between barnase-barstar binding and barnase-RNA binding. A comparison between,the bound,and the free barstar structure shows that the overall structural response to barnase- binding is significant. This response can be best described as outwardly oriented, rigid-body movements of the four a-helices of barstar, resulting in the structure of bound barstar being somewhat expanded. Understanding the nature of the recognition between two
Article
A method and parametrization scheme which allow fast and accurate calculations of hydration free energies are described. The solute is treated as a polarizable cavity of a shape defined by the molecular surface, containing point charges at the location of atomic nuclei. Electrostatic contributions to solvation are derived from:finite difference solutions of the Poisson equation (FDPB method). Nonpolar (cavity/van der Waals) energies are added as a surface area dependent term, with a single surface tension coefficient (gamma) derived from hydrocarbon solubility in water. Atomic charges and radii are obtained by modifying existing force-field or quantum-mechanically-derived values, by fitting to experimental solvation energies of small organic molecules. A new, simple parameter set (parameters for solvation energy, PARSE) is developed specifically for the FDPB/gamma method, by choosing atomic charges and radii which reproduce the estimated contributions to solvation of simple functional groups. The PARSE parameters reproduce hydration free energies for a test set of 67 molecules with an average error of 0.4 kcal/mol. For amino acid side chain and peptide backbone analogs the average error is only 0.1 kcal/mol.
Article
A complete set of intermolecular potential functions has been developed for use in computer simulations of proteins in their native environment. Parameters are reported for 25 peptide residues as well as the common neutral and charged terminal groups. The potential functions have the simple Coulomb plus Lennard-Jones form and are compatible with the widely used models for water, TIP4P, TIP3P, and SPC. The parameters were obtained and tested primarily in conjunction with Monte Carlo statistical mechanics simulations of 36 pure organic liquids and numerous aqueous solutions of organic ions representative of subunits in the side chains and backbones of proteins. Bond stretch, angle bend, and torsional terms have been adopted from the AMBER united-atom force field. As reported here, further testing has involved studies of conformational energy surfaces and optimizations of the crystal structures for four cyclic hexapeptides and a cyclic pentapeptide. The average root-mean-square deviation from the X-ray structures of the crystals is only 0.17 Å for the atomic positions and 3% for the unit cell volumes. A more critical test was then provided by performing energy minimizations for the complete crystal of the protein crambin, including 182 water molecules that were initially placed via a Monte Carlo simulation. The resultant root-mean-square deviation for the non-hydrogen atoms is still ca. 0.2 Å and the variation in the errors for charged, polar, and nonpolar residues is small. Improvement is apparent over the AMBER united-atom force field which has previously been demonstrated to be superior to many alternatives.
Article
The construction and characterization of a series of proteins in which the Blue Copper CysHis2Met primary coordination sphere was placed in various orientations within the hydrophobic core of thioredoxin has allowed exploration of the principles of molecular recognition between proteins and metals. An automated rational protein design algorithm predicted structurally suitable locations for these centers without use of either a potential preexisting binding site or any structural or sequence homology between the thioredoxin host and known Blue Copper proteins. A series of four primary designs and 32 variants were constructed. It was necessary to surround the designed primary coordination sphere with a hydrophobic shell to ensure the absence of potential alternative coordinating residues. Formation of a stable Cu(II)-thiolate bond required destabilization of a normally favored redox reaction in which the thiol is oxidized to a disulfide. This was achieved by more deeply burying the coordinating cysteine, presumably via a mechanism in which the free energy of protein unfolding opposes the competing redox reaction. The distorted tetrahedral coordination geometry of the Cu(II) complex is unstable with respect to a competing tetragonal geometry resulting from incorporation of bound water. Although natural systems appear to sterically exclude such water binding, this exclusion mechanism was not successfully reproduced in the designs presented here. Instead, a suitably placed small cavity allowed a strong, exogenous ligand, such as azide, to be introduced axially, which competitively stabilizes the tetrahedral geometry corresponding to a “Type 1.5” Blue Copper complex in favor of the tetragonally bound water. This iterative rational design study demonstrates that destabilization of competing reactions (“negative design”) is a crucial, if cryptic, aspect of molecular recognition in proteins, and that proteins have evolved a variety of mechanisms that impose negative design constraints.
Article
The Random Energy Model of statistical physics is applied to the problem of the specificity of recognition between two biological (macro)molecules forming a non-covalent complex. In this model, the native mode of association is separated by an energy gap from a large body of non-native modes. Whereas the native mode is unique, the non-native modes form an energy spectrum which is approximated by a gaussian distribution. Specificity can then be estimated by writing the partition function and calculating the ratio r of non-native to native modes at thermodynamic equilibrium. We examine three situations: (i) recognition in the absence of a competitor; (ii) recognition in the presence of a competing ligand; (iii) recognition in a heterogeneous mixture. We derive the dependence of the ratio r on temperature and on the concentration of competing ligands, and we estimate the effect of a local perturbation such as can result from a point mutation. Cases (i) and (iii) are modeled by docking experiments in the computer. In case (iii), which is representative of a wide variety of biological situations, we show that Increasing the heterogeneity of a mixture affects the specificity of recognition, even when the concentration of competing species is kept constant. © 1996 Wiley-Liss, Inc.
Article
VMD is a molecular graphics program designed for the display and analysis of molecular assemblies, in particular biopolymers such as proteins and nucleic acids. VMD can simultaneously display any number of structures using a wide variety of rendering styles and coloring methods. Molecules are displayed as one or more "representations," in which each representation embodies a particular rendering method and coloring scheme for a selected subset of atoms. The atoms displayed in each representation are chosen using an extensive atom selection syntax, which includes Boolean operators and regular expressions. VMD provides a complete graphical user interface for program control, as well as a text interface using the Tcl embeddable parser to allow for complex scripts with variable substitution, control loops, and function calls. Full session logging is supported, which produces a VMD command script for later playback. High-resolution raster images of displayed molecules may be produced by generating input scripts for use by a number of photorealistic image-rendering applications. VMD has also been expressly designed with the ability to animate molecular dynamics (MD) simulation trajectories, imported either from files or from a direct connection to a running MD simulation. VMD is the visualization component of MDScope, a set of tools for interactive problem solving in structural biology, which also includes the parallel MD program NAMD, and the MDCOMM software used to connect the visualization and simulation programs. VMD is written in C++, using an object-oriented design; the program, including source code and extensive documentation, is freely available via anonymous ftp and through the World Wide Web.
Article
A structure-based, sequence-design procedure is proposed in which one considers a set of decoy structures that compete significantly with the target structure in being low energy conformations. The decoy structures are chosen to have strong overlaps in contacts with the putative native state. The procedure allows the design of sequences with large and small stability gaps in a random-bond heteropolymer model in both two and three dimensions by an appropriate assignment of the contact energies to both the native and nonnative contacts. The design procedure is also successfully applied to the two-dimensional HP model.
Article
A combination of enzyme kinetics and X-ray crystallographic analysis of site-specific mutants has been used to probe the determinants of substrate specificity for the enzyme alpha-lytic protease. We now present a generalized model for understanding the effects of mutagenesis on enzyme substrate specificity. This algorithm uses a library of side-chain rotamers to sample conformation space within the binding site for the enzyme-substrate complex. The free energy of each conformation is evaluated with a standard molecular mechanics force field, modified to include a solvation energy term. This rapid energy calculation based on coarse conformation sampling quite accurately predicts the relative catalytic efficiency of over 40 different alpha-lytic protease-substrate combinations. Unlike other computational approaches, with this method it is feasible to evaluate all possible mutations within the binding site. Using this algorithm, we have successfully designed a protease that is both highly active and selective for a non-natural substrate. These encouraging results indicate that it is possible to design altered enzymes solely on the basis of empirical energy calculations.
Article
Oligonucleotide-directed mutagenesis is a widely used procedure for studying the structure and function of DNA and the macromolecules for which it codes. The most commonly used strategy for site-directed mutagenesis is to clone the segment of DNA to be mutated into a vector whose DNA can be obtained in single-stranded form. An oligonucleotide partially complementary to the region to be altered, but containing the mutation to be introduced, is hybridized to the single-stranded DNA. A complementary strand is synthesized by DNA polymerase using the oligonucleotide as a primer. The efficiency of site-directed mutagenesis, that is, the proportion of progeny containing the desired sequence alteration, depends on the quality of each of the steps in the procedure. The number of progeny clones that must be monitored to obtain the desired mutant increases as the efficiency of mutagenesis decreases.
Article
We assume that each class of protein has a core structure that is defined by internal residues, and that the external, solvent-contacting residues contribute to the stability of the structure, are of primary importance to function, but do not determine the architecture of the core portions of the polypeptide chain. An algorithm has been developed to supply a list of permitted sequences of internal residues compatible with a known core structure. This list is referred to as the tertiary template for that structure. In general the positions in the template are not sequentially adjacent and are distributed throughout the polypeptide chain. The template is derived using the fixed positions for the main-chain and beta-carbon atoms in the test structure and selected stereochemical rules. The focus of this paper is on the use of two packing criteria: avoidance of steric overlap and complete filling of available space. The program also notes potential polar group interactions and disulfide bonds as well as possible burial of formal charges. Central to the algorithm is the side-chain rotamer library. In an update of earlier studies by others, we show that 17 of the 20 amino acids (omitting Met, Lys and Arg) can be represented adequately by 67 side-chain rotamers. A list of chi angles and their standard deviations is given. The newer, high-resolution, refined structures in the Brookhaven Protein Data Bank show similar mean chi values, but have much smaller deviations than those of earlier studies. This suggests that a rotamer library may be a better structural approximation than was previously thought. In using packing constraints, it has been found essential to include all hydrogen atoms specifically. The "unified atom" representation is not adequate. The permitted rotamer sequences are severely restricted by the main-chain plus beta-carbon atoms of the test structure. Further restriction is introduced if the full set of atoms of the external residues are held fixed, the full-chain model. The space-filling requirement has a major role in restricting the template lists. The preliminary tests reported here make it appear likely that templates prepared from the currently known core structures will be able to discriminate between these structures. The templates should thus be useful in deciding whether a sequence of unknown tertiary structure fits any of the known core classes and, if a fit is found, how the sequence should be aligned in three dimensions to fit the core of that class.(ABSTRACT TRUNCATED AT 400 WORDS)
Article
A discontinuous sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) system for the separation of proteins in the range from 1 to 100 kDa is described. Tricine, used as the trailing ion, allows a resolution of small proteins at lower acrylamide concentrations than in glycine-SDS-PAGE systems. A superior resolution of proteins, especially in the range between 5 and 20 kDa, is achieved without the necessity to use urea. Proteins above 30 kDa are already destacked within the sample gel. Thus a smooth passage of these proteins from sample to separating gel is warranted and overloading effects are reduced. This is of special importance when large amounts of protein are to be loaded onto preparative gels. The omission of glycine and urea prevents disturbances which might occur in the course of subsequent amino acid sequencing.
Article
The products of the nuclear oncogenes fos and jun are known to form heterodimers that bind to DNA and modulate transcription. Both proteins contain a leucine zipper that is important for heterodimer formation. Peptides corresponding to these leucine zippers were synthesized. When mixed, these peptides preferentially form heterodimers over homodimers by at least 1000-fold. Both homodimers and the heterodimer are parallel alpha helices. The leucine zipper regions from Fos and Jun therefore correspond to autonomous helical dimerization sites that are likely to be short coiled coils, and these regions are sufficient to determine the specificity of interaction between Fos and Jun. The Fos leucine zipper forms a relatively unstable homodimer. Instability of homodimers provides a thermodynamic driving force for preferential heterodimer formation.
Article
Buried polar residues are a common feature of natural proteins. ACID-p1 and BASE-p1 are two designed peptides that form a parallel, heterodimeric coiled coil with a fixed tertiary structure [O'Shea, E. K., Lumb, K. J., & Kim, P. S. (1993) Curr. Biol. 3, 658-667]. The interface between the ACID-p1 and BASE-p1 helices consists of hydrophobic Leu residues, with the exception of a single polar residue, Asn 14. In the crystal structure of the GCN4 leucine zipper coiled coil, an analogous Asn is hydrogen bonded to the corresponding Asn of the opposing helix, thereby forming a buried polar interaction in an otherwise hydrophobic interface between the helices [O'Shea, E. K., Klemm, J. D., Kim, P. S., & Alber, T. (1991) Science 254, 539-544]. This buried polar interaction in the ACID-p1/BASE-p1 heterodimer was removed by substituting Asn 14 with Leu. The Asn 14-->Leu variants are significantly more stable than the p1 peptides and preferentially form a heterotetramer instead of a heterodimer. Strikingly, the heterotetramer does not fold into a unique structure; in particular, the helices lack a unique orientation. Thus, the Asn 14 residue imparts specificity for formation of a two-stranded, parallel coiled coil at the expense of stability. The results suggest that, whereas nonspecific hydrophobic interactions contribute to protein stability, the requirement to satisfy the hydrogen bonding potential of buried polar residues in the generally hydrophobic environment of the protein interior can impart specificity (structural uniqueness) to protein folding and design.
Article
Enzymes are thought to use their ordered structures to facilitate catalysis. A corollary of this theory suggests that enzyme residues involved in function are not optimized for stability. We tested this hypothesis by mutating functionally important residues in the active site of T4 lysozyme. Six mutations at two catalytic residues, Glu-11 and Asp-20, abolished or reduced enzymatic activity but increased thermal stability by 0.7-1.7 kcal.mol-1. Nine mutations at two substrate-binding residues, Ser-117 and Asn-132, increased stability by 1.2-2.0 kcal.mol-1, again at the cost of reduced activity. X-ray crystal structures show that the substituted residues complement regions of the protein surface that are used for substrate recognition in the native enzyme. In two of these structures the enzyme undergoes a general conformational change, similar to that seen in an enzyme-product complex. These results support a relationship between stability and function for T4 lysozyme. Other evidence suggests that the relationship is general.
Article
Rational design of protein structure requires the identification of optimal sequences to carry out a particular function within a given backbone structure. A general solution to this problem requires that a potential function describing the energy of the system as a function of its atomic coordinates be minimized simultaneously over all available sequences and their three-dimensional atomic configurations. Here we present a method that explicitly minimizes a semiempirical potential function simultaneously in these two spaces, using a simulated annealing approach. The method takes the fixed three-dimensional coordinates of a protein backbone and stochastically generates possible sequences through the introduction of random mutations. The corresponding three-dimensional coordinates are constructed for each sequence by "redecorating" the backbone coordinates of the original structure with the corresponding side chains. These are then allowed to vary in their structure by random rotations around free torsional angles to generate a stochastic walk in configurational space. We have named this method protein simulated evolution, because, in loose analogy with natural selection, it randomly selects for allowed solutions in the sequence of a protein subject to the "selective pressure" of a potential function. Energies predicted by this method for sequences of a small group of residues in the hydrophobic core of the phage lambda cI repressor correlate well with experimentally determined biological activities. This "genetic selection by computer" approach has potential applications in protein engineering, rational protein design, and structure-based drug discovery.
Article
Understanding the relations between the conformation of the side-chains and the backbone geometry is crucial for structure prediction as well as for homology modelling. To attempt to unravel these rules, we have developed a method which allows us to predict the position of the side-chains from the co-ordinates of the main-chain atoms. This method is based on a rotamer library and refines iteratively a conformational matrix of the side-chains of a protein, CM, such that its current element at each cycle CM (ij) gives the probability that side-chain i of the protein adopts the conformation of its possible rotamer j. Each residue feels the average of all possible environments, weighted by their respective probabilities. The method converges in only a few cycles, thereby deserving the name of self consistent mean field method. Using the rotamer with the highest probability in the optimized conformational matrix to define the conformation of the side-chain leads to the result that on average 72% of chi 1, 75% of chi 2 and 62% of chi 1 + 2 are correctly predicted for a set of 30 proteins. Tests with six pairs of homologous proteins have shown that the method is quite successful even when the protein backbone deviates from the correct conformation. The second application of the optimized conformational matrix was to provide estimates of the conformational entropy of the side-chains in the folded state of the protein. The relevance of this entropy is discussed.
Article
We have developed and experimentally tested a novel computational approach for the de novo design of hydrophobic cores. A pair of computer programs has been written, the first of which creates a “custom” rotamer library for potential hydrophobic residues, based on the backbone structure of the protein of interest. The second program uses a genetic algorithm to globally optimize for a low energy core sequence and structure, using the custom rotamer library as input. Success of the programs in predicting the sequences of native proteins indicates that they should be effective tools for protein design. Using these programs, we have designed and engineered several variants of the phage 434 cro protein, containing five, seven, or eight sequence changes in the hydrophobic core. As controls, we have produced a variant consisting of a randomly generated core with six sequence changes but equal volume relative to the native core and a variant with a “minimalist” core containing predominantly leucine residues. Two of the designs, including one with eight core sequence changes, have thermal stabilities comparable to the native protein, whereas the third design and the minimalist protein are significantly destabilized. The randomly designed control is completely unfolded under equivalent conditions. These results suggest that rational de novo design of hydrophobic cores is feasible, and stress the importance of specific packing interactions for the stability of proteins. A surprising aspect of the results is that all of the variants display highly cooperative thermal denaturation curves and reasonably dispersed NMR spectra. This suggests that the non‐core residues of a protein play a significant role in determining the uniqueness of the folded structure.
Article
Octanol-to-water solvation free energies of acetyl amino amides (Ac-X-amides) [Fauchère, J.L., & Pliska, V. (1983) Eur. J. Med. Chem. --Chim. Ther. 18,369] form the basis for computational comparisons of protein stabilities by means of the atomic solvation parameter formalism of Eisenberg and McLachlan [(1986) Nature 319, 199]. In order to explore this approach for more complex systems, we have determined by octanol-to-water partitioning the solvation energies of (1) the guest (X) side chains in the host-guest pentapeptides AcWL-X-LL, (2) the carboxy terminus of the pentapeptides, and (3) the peptide bonds of the homologous series of peptides AcWLm (m = 1-6). Solvation parameters were derived from the solvation energies using estimates of the solvent-accessible surface areas (ASA) obtained from hard-sphere Monte Carlo simulations. The measurements lead to a side chain solvation-energy scale for the pentapeptides and suggest the need for modifying the Asp, Glu, and Cys values of the "Fauchère-Pliska" solvation-energy scale fro the Ac-X-amides. We find that the unfavorable solvation energy of nonpolar residues can be calculated accurately by a solvation parameter of 22.8 +/- 0.8 cal/mol/A2, which agrees satisfactorily with the AC-X-amide data and thereby validates the Monte Carlo ASA results. Unlike the Ac-X-amide data, the apparent solvation energies of the uncharged polar residues are also largely unfavorable. This unexpected finding probably results, primarily, from differences in conformation and hydrogen bonding in octanol and buffer but may also be due to the additional flaking peptide bonds of the pentapeptides. The atomic solvation parameter (ASP) for the peptide bond is comparable to the ASP of the charged carboxy terminus which is an order of magnitude larger than the ASP of the uncharged polar side chains of the Ac-X-amides. The very large peptide bond ASP, -96 +/- 6 cal/mol/A2, profoundly affects the results of computational comparisons of protein stability which use ASPs derived from octanol-water partitioning data.
Article
VMD is a molecular graphics program designed for the display and analysis of molecular assemblies, in particular biopolymers such as proteins and nucleic acids. VMD can simultaneously display any number of structures using a wide variety of rendering styles and coloring methods. Molecules are displayed as one or more "representations," in which each representation embodies a particular rendering method and coloring scheme for a selected subset of atoms. The atoms displayed in each representation are chosen using an extensive atom selection syntax, which includes Boolean operators and regular expressions. VMD provides a complete graphical user interface for program control, as well as a text interface using the Tcl embeddable parser to allow for complex scripts with variable substitution, control loops, and function calls. Full session logging is supported, which produces a VMD command script for later playback. High-resolution raster images of displayed molecules may be produced by generating input scripts for use by a number of photorealistic image-rendering applications. VMD has also been expressly designed with the ability to animate molecular dynamics (MD) simulation trajectories, imported either from files or from a direct connection to a running MD simulation. VMD is the visualization component of MDScope, a set of tools for interactive problem solving in structural biology, which also includes the parallel MD program NAMD, and the MDCOMM software used to connect the visualization and simulation programs. VMD is written in C++, using an object-oriented design; the program, including source code and extensive documentation, is freely available via anonymous ftp and through the World Wide Web.
Article
The Random Energy Model of statistical physics is applied to the problem of the specificity of recognition between two biological (macro)molecules forming a non-covalent complex. In this model, the native mode of association is separated by an energy gap from a large body of non-native modes. Whereas the native mode is unique, the non-native modes form an energy spectrum which is approximated by a gaussian distribution. Specificity can then be estimated by writing the partition function and calculating the ratio r of non-native to native modes at thermodynamic equilibrium. We examine three situations: (i) recognition in the absence of a competitor; (ii) recognition in the presence of a competing ligand; (iii) recognition in a heterogeneous mixture. We derive the dependence of the ratio r on temperature and on the concentration of competing ligands, and we estimate the effect of a local perturbation such as can result from a point mutation. Cases (i) and (iii) are modeled by docking experiments in the computer. In case (iii), which is representative of a wide variety of biological situations, we show that increasing the heterogeneity of a mixture affects the specificity of recognition, even when the concentration of competing species is kept constant.
Article
We have previously reported the development and evaluation of a computational program to assist in the design of hydrophobic cores of proteins. In an effort to investigate the role of core packing in protein structure, we have used this program, referred to as Repacking of Cores (ROC), to design several variants of the protein ubiquitin. Nine ubiquitin variants containing from three to eight hydrophobic core mutations were constructed, purified, and characterized in terms of their stability and their ability to adopt a uniquely folded native-like conformation. In general, designed ubiquitin variants are more stable than control variants in which the hydrophobic core was chosen randomly. However, in contrast to previous results with 434 cro, all designs are destabilized relative to the wild-type (WT) protein. This raises the possibility that β-sheet structures have more stringent packing requirements than α-helical proteins. A more striking observation is that all variants, including random controls, adopt fairly well-defined conformations, regardless of their stability. This result supports conclusions from the cro studies that non-core residues contribute significantly to the conformational uniqueness of these proteins while core packing largely affects protein stability and has less impact on the nature or uniqueness of the fold.
Article
By using a protein-design algorithm that quantitatively considers side-chain packing, the effect of specific steric constraints on protein design was assessed in the core of the streptococcal protein G beta1 domain. The strength of packing constraints used in the design was varied, resulting in core sequences that reflected differing amounts of packing specificity. The structural flexibility and stability of several of the designed proteins were experimentally determined and showed a trend from well-ordered to highly mobile structures as the degree of packing specificity in the design decreased. This trend both demonstrates that the inclusion of specific packing interactions is necessary for the design of native-like proteins and defines a useful range of packing specificity for the design algorithm. In addition, an analysis of the modeled protein structures suggested that penalizing for exposed hydrophobic surface area can improve design performance.
Article
The rational design of protein structure and function is rapidly emerging as a powerful approach to test general theories in protein chemistry (1). De novo creation of a protein or an active site requires that all the necessary interactions are provided. The design approach is therefore a way to test the limits of completeness of understanding experimentally. Furthermore, if the experiments are devised in a progressive fashion, such that the simplest possible designs are tried first, followed by iterative additions of more complex interactions until the desired result is achieved, then it may be possible to identify a minimally sufficient set of components. At the center of the design approach is the “design cycle,” in which theory and experiment alternate. The starting point is the development of a molecular model, based on rules of protein structure and function, combined with an algorithm for applying these. This is followed by experimental construction and analysis of the properties of the designed protein. If the experimental outcome is failure or partial success, then a next iteration of the design cycle is started in which additional complexity is introduced, rules and parameters are refined, or the algorithms for applying them are modified. The paper by Dahiyat and Mayo (2) in the current issue of these Proceedings describes such a design cycle. Sequences predicted to repack the interior of a small protein were generated by a computer design algorithm using different sets of parameters describing the packing interactions, thereby establishing a direct experimental correlation between the design parameters and the properties of the resulting proteins. This work is the latest addition to a series of efforts in which objective computational techniques developed to create protein structure (3–8) or function (9, 10) are being tested directly by experiment. The ultimate goal of such procedures is to develop a fully automated protein design method (6).
Article
The first fully automated design and experimental validation of a novel sequence for an entire protein is described. A computational design algorithm based on physical chemical potential functions and stereochemical constraints was used to screen a combinatorial library of 1.9 × 1027 possible amino acid sequences for compatibility with the design target, a ββα protein motif based on the polypeptide backbone structure of a zinc finger domain. A BLAST search shows that the designed sequence, full sequence design 1 (FSD-1), has very low identity to any known protein sequence. The solution structure of FSD-1 was solved by nuclear magnetic resonance spectroscopy and indicates that FSD-1 forms a compact well-ordered structure, which is in excellent agreement with the design target structure. This result demonstrates that computational methods can perform the immense combinatorial search required for protein design, and it suggests that an unbiased and quantitative algorithm can be used in various structural contexts.
Article
Here we report the use of an objective computer algorithm in the design of a hyperstable variant of the Streptococcal protein Gbeta1 domain (Gbeta1). The designed seven-fold mutant, Gbeta1-c3b4, has a melting temperature in excess of 100 degrees C and an enhancement in thermodynamic stability of 4.3 kcal mol(-1) at 50 degrees C over the wild-type protein. Gbeta1-c3b4 maintains the Gbeta1 fold, as determined by nuclear magnetic resonance spectroscopy, and also retains a significant level of binding to human IgG in qualitative comparisons with wild type. The basis of the stability enhancement appears to have multiple components including optimized core packing, increased burial of hydrophobic surface area, more favorable helix dipole interactions, and improvement of secondary structure propensity. The design algorithm is able to model such complex contributions simultaneously using empirical physical/chemical potential functions and a combinatorial optimization algorithm based on the dead-end elimination theorem. Because the design methodology is based on general principles, there is the potential of applying the methodology to the stabilization of other unrelated protein folds.
Article
The tractability of many algorithms for determining the energy state of a system depends on the pairwise nature of an energy expression. Some energy terms, such as the standard implementation of the van der Waals potential, satisfy this criterion whereas others do not. One class of important potentials that are not pairwise involves benefits and penalties for burying hydrophobic and/or polar surface areas. It has been found previously that, in some cases, a pairwise approximation to these surface areas correlates with the true surface areas. We set out to generalize the applicability of this approximation. We develop a pairwise expression with one scalable parameter that closely reproduces both the true buried and the true exposed solvent-accessible surface areas. We then refit our previously published coiled-coil stability data to give solvation parameters of 26 cal/mol A2 favoring hydrophobic burial and 100 cal/mol A2 opposing polar burial. An accurate pairwise approximation to calculate exposed and buried protein solvent-accessible surface area is achieved.
Article
The information about the conformational behavior of monomeric helical peptides in solution, as well as the alpha-helix stability in proteins, has been previously utilized to derive a database with the energy contributions for various interactions taking place in an alpha-helix: intrinsic helical propensities, side-chain-side-chain interactions, main-chain-main-chain hydrogen bonds, and capping effects. This database was implemented in an algorithm based on the helix/coil transition theory (AGADIR). Here, we have modified this algorithm to include previously described local motifs: hydrophobic staple, Schellman motif and Pro-capping motif, new variants of these, and newly described side-chain-side-chain interactions. Based on recent experimental data we have introduced a position dependence of the helical propensities for some of the 20 amino acid residues. A new electrostatic model that takes into consideration all electrostatic interactions up to 12 residues in distance in the helix and random-coil conformations, as well as the effect of ionic strength, has been implemented. We have synthesized and analyzed several peptides, and used data from peptides already analysed by other groups, to test the validity of our electrostatic model. The modified algorithm predicts, with an overall standard deviation value of 6.6 (maximum helix is 100%), the helical, content of 778 peptides of which 223 correspond to wild-type and modified protein fragments. To improve the prediction potential of the algorithm and to have a direct comparison with nuclear magnetic resonance data, the algorithm now predicts the conformational shift of the CalphaH protons, 13Calpha and 3JalphaN values. We have found that for those peptides correctly predicted from the point of view of circular dichroism, the prediction of the NMR parameters is very good.
Article
We apply a new approach to the reverse protein folding problem. Our method uses a minimization function in the design process which is different from the energy function used for folding. For a lattice model, we show that this new approach produces sequences that are likely to fold into desired structures. Our method is a significant improvement over previous attempts which used the energy function for designing sequences.
Article
A strategy is outlined for obtaining the free energy of a typical designed heteropolymer. The design procedure considers the probability that the target conformation is occupied in comparison with all the other conformations that could house the given sequence. Numerical calculations on lattice heteropolymer models are presented to illustrate the key physical principles.
Article
Capping interactions associated with specific sequences at or near the ends of alpha-helices are important determinants of the stability of protein secondary and tertiary structure. We investigate here the role of the helix-capping motif Ser-X-X-Glu, a sequence that occurs frequently at the N termini of alpha helices in proteins, on the conformation and stability of the GCN4 leucine zipper. The 1.8 A resolution crystal structure of the capped molecule reveals distinct conformations, packing geometries and hydrogen-bonding networks at the amino terminus of the two helices in the leucine zipper dimer. The free energy of helix stabilization associated with the hydrogen-bonding and hydrophobic interactions in this capping structure is -1.2 kcal/mol, evaluated from thermal unfolding experiments. A single cap thus contributes appreciably to stabilizing the terminated helix and thereby the native state. These results suggest that helix capping plays a further role in protein folding, providing a sensitive connector linking alpha-helix formation to the developing tertiary structure of a protein.
Article
With the objective of improving side-chain conformation prediction, we have analyzed the influence of various factors on prediction by the Self-Consistent Mean Field Theory method, applied to a set of high resolution x-ray protein structure models. These factors may be classed as variations in the mean field optimization protocol, variations in the potential energy function, and variations in rotamer library completeness. We have developed an optimization protocol that consistently reached lower mean field conformational free energies than two other protocols. This protocol led to an important improvement in prediction. We observed a major improvement in prediction with two more detailed van der Waals parameter sets, which we found to be due mainly to the introduction of scaling of 1-4 interactions. In a comparison of two knowledge-based rotamer libraries of considerably different size, we observed an unexpected decrease in prediction with an increase in library completeness. However, when we introduced a torsion potential term in the potential energy function, we found an important increase in average prediction and in the prediction of almost all residue types with a more complete rotamer set. The two knowledge-based rotamer libraries now became equivalent in terms of average prediction. The results we obtained in an analysis of the effect of the introduction of an additional electrostatic term in the potential energy function were largely inconclusive. However, we found a small increase in average prediction for an electrostatic potential term with a fixed dielectric constant of 15. The combined effect of all the factors we analyzed in this study resulted in average prediction accuracies of 79.9% for X1, 68.1% for X1 + 2, and 1.590 A for global rms deviation (RMSD); the corresponding values for core residues were 88.2%, 78.6%, and 1.171 A. These values represent improvements in average prediction of 6.5% for X1, 9.1% for X1 + 2, and 0.163 A for global RMSD over the original conditions; the corresponding improvements in the core were 5.9%, 9.0%, and 0.180 A, respectively.
Article
We have developed a computational approach for the design and prediction of hydrophobic cores that includes explicit backbone flexibility. The program consists of a two-stage combination of a genetic algorithm and monte carlo sampling using a torsional model of the protein. Backbone structures are evaluated either by a canonical force-field or a constraining potential that emphasizes the preservation of local geometry. The utility of the method for protein design and engineering is explored by designing three novel hydrophobic core variants of the protein 434 cro. We use the new method to evaluate these and previously designed 434 cro variants, as well as a series of phage T4 lysozyme variants. In order to properly evaluate the influence of backbone flexibility, we have also analyzed the effects of varying amounts of side-chain flexibility on the performance of fixed backbone methods. Comparison of results using a fixed versus flexible backbone reveals that, surprisingly, the two methods are almost equivalent in their abilities to predict relative experimental stabilities, but only when full side-chain flexibility is allowed. The prediction of core side-chain structure can vary dramatically between methods. In some, but not all, cases the flexible backbone method is a better predictor of structure. The development of a flexible backbone approach to core design is particularly important for attempts at de novo protein design, where there is no prior knowledge of a precise backbone structure.
Article
Recent successes in protein design have illustrated the promise of computational approaches. These methods rely on energy expressions to evaluate the quality of different amino acid sequences for target protein structures. The force fields optimized for design differ from those typically used in molecular mechanics and molecular dynamics calculations.
Article
Solvent plays a significant role in determining the electrostatic potential energy of proteins, most notably through its favorable interactions with charged residues and its screening of electrostatic interactions. These energetic contributions are frequently ignored in computational protein design and protein modeling methodologies because they are difficult to evaluate rapidly and accurately. To address this deficiency, we report a revised form of the original Tanford-Kirkwood continuum electrostatic model [Tanford, C. & Kirkwood, J. G. (1957) J. Am. Chem. Soc. 79, 5333-5339], which accounts for the effects of solvent polarization on charged atoms in proteins. The Tanford-Kirkwood model was modified to increase its speed and to improve its sensitivity to the details of protein structure. For the 37 electrostatic self-energies of the polar side-chains in bovine pancreatic trypsin inhibitor, and their 666 interaction energies, the modified Tanford-Kirkwood potential of mean force differs from a computationally intensive numerical potential (DelPhi) by root-mean-square errors of 0.6 kcal/mol and 0.08 kcal/mol, respectively. The Tanford-Kirkwood approach makes possible a realistic treatment of electrostatics in computationally demanding protein modeling calculations. For example, pH titration calculations for ovomucoid third domain that model polar side-chain relaxation (including >2 x 10(23) rotamer conformations of the protein) provide pKa values of unprecedented accuracy.