ArticlePDF Available

Backbone–dependent Rotamer Library for Proteins Application to Side–chain Prediction

Authors:

Abstract

A backbone-dependent rotamer library for amino acid side-chains is developed and used for constructing protein side-chain conformations from the main-chain co-ordinates. The rotamer library is obtained from 132 protein chains in the Brookhaven Protein Database. A grid of 20 degrees by 20 degrees blocks for the main-chain angles phi, psi is used in the rotamer library. Significant correlations are found between side-chain dihedral angle probabilities and backbone phi, psi values. These probabilities are used to place the side-chains on the known backbone in test applications for six proteins for which high-resolution crystal structures are available. A minimization scheme is used to reorient side-chains that conflict with the backbone or other side-chains after the initial placement. The initial placement yields 59% of both chi 1 and chi 2 values in the correct position (to within 40 degrees) for thermolysin to 81% for crambin. After refinement the values range from 61% (lysozyme) to 89% (crambin). It is evident from the results that a single protein does not adequately test a prediction scheme. The computation time required by the method scales linearly with the number of side-chains. An initial prediction from the library takes only a few seconds of computer time, while the iterative refinement takes on the order of hours. The method is automated and can easily be applied to aid experimental side-chain determinations and homology modeling. The high degree of correlation between backbone and side-chain conformations may introduce a simplification in the protein folding process by reducing the available conformational space.
... Sampling-based side-chain modeling methods typically consist of three main components: a rotamer library, a scoring function, and a search algorithm. To minimize the sampling space, some studies have concentrated on developing rotamer libraries that contain a limited set of representative side-chain conformations [29][30][31][32] . A precise scoring function is crucial for effective side-chain packing. ...
... Two docking programs, AutoDock-Vina and Dock6.10, are utilized in this study. For AutoDock-Vina, we set the box size as (30,30,30) and set the exhaustiveness to 16. For Dock6.10, we proceed with the default parameters for docking. ...
... Two docking programs, AutoDock-Vina and Dock6.10, are utilized in this study. For AutoDock-Vina, we set the box size as (30,30,30) and set the exhaustiveness to 16. For Dock6.10, we proceed with the default parameters for docking. ...
Preprint
Full-text available
Accurate protein side-chain modeling is crucial for protein folding and design. This is particularly true for molecular docking as ligands primarily interact with side chains. A protein structure with large errors in side chains has limited usage such as in drug design. Previous research on AlphaFold2 (AF2) predictions of GPCR targets indicates that the docking of natural ligands back on AF2-predicted structures has limited successful rate presumably due to large errors in side chains. Here, we introduce a two-stage side-chain modeling approach called OPUS-Rota5. It leverages a modified 3D-Unet to capture the local environmental features including ligand information of each residue, and then employs RotaFormer module to aggregate various types of feature. Evaluation on three test sets, including recently released targets from CAMEO and CASP15, reveals that side chains modeled by OPUS-Rota5 are significantly more accurate than those predicted by other methods. We also employ OPUS-Rota5 to refine the side chains of 25 GPCR targets predicted by AF2 and then performed docking of their natural ligands back with a significantly improved successful rate. Such results suggest that OPUS-Rota5 could be a valuable tool for molecular docking, particularly for targets with relatively accurate predicted backbones, but not side chains.
... Analysis of the observed distributions of backbone and sidechain dihedral angles has been an object of intense interest since the early protein structural and biophysical studies: Ramachandran et al. (1963), Janin and Wodak (1978), McGregor et al. (1987), Dunbrack and Karplus (1993), Dunbrack and Cohen (1997), Dunbrack (2002), and Dunbrack (2007, 2011). This interest is fuelled by the need for accurate statistical models that can effectively characterize the observed dihedral angle distributions of proteins, as these models are used by techniques for protein experimental structure determination, computational prediction, rational design, and many other protein structural analyses. ...
... Rotamer libraries are 2-fold: backbone independent and backbone dependent. Backbone-dependent rotamer libraries contain rotameric preferences conditioned on any observed backbone dihedral angles (Dunbrack and Karplus 1993;Dunbrack and Cohen 1997;Shapovalov and Dunbrack 2011), and differ from the backbone-independent libraries which simply cluster sidechain conformations agnostic to the backbone conformation of amino acids (Ponder and Richards 1987;Lovell et al. 2000). ...
... The Dunbrack rotamer library (Dunbrack and Karplus 1993;Dunbrack and Cohen 1997;Dunbrack 2002;Dunbrack 2007, 2011) is a continually maintained and improved rotamer library. It defines the state of the art and is among the most widely used rotamer libraries across many downstream applications that employ them. ...
Article
Full-text available
Unlabelled: The tendency of an amino acid to adopt certain configurations in folded proteins is treated here as a statistical estimation problem. We model the joint distribution of the observed mainchain and sidechain dihedral angles (〈ϕ,ψ,χ1,χ2,…〉) of any amino acid by a mixture of a product of von Mises probability distributions. This mixture model maps any vector of dihedral angles to a point on a multi-dimensional torus. The continuous space it uses to specify the dihedral angles provides an alternative to the commonly used rotamer libraries. These rotamer libraries discretize the space of dihedral angles into coarse angular bins, and cluster combinations of sidechain dihedral angles (〈χ1,χ2,…〉) as a function of backbone 〈ϕ,ψ〉 conformations. A 'good' model is one that is both concise and explains (compresses) observed data. Competing models can be compared directly and in particular our model is shown to outperform the Dunbrack rotamer library in terms of model complexity (by three orders of magnitude) and its fidelity (on average 20% more compression) when losslessly explaining the observed dihedral angle data across experimental resolutions of structures. Our method is unsupervised (with parameters estimated automatically) and uses information theory to determine the optimal complexity of the statistical model, thus avoiding under/over-fitting, a common pitfall in model selection problems. Our models are computationally inexpensive to sample from and are geared to support a number of downstream studies, ranging from experimental structure refinement, de novo protein design, and protein structure prediction. We call our collection of mixture models as PhiSiCal (ϕψχal). Availability and implementation: PhiSiCal mixture models and programs to sample from them are available for download at http://lcb.infotech.monash.edu.au/phisical.
... Thus, across all structures, all three Ser χ 1 rotamers were represented, similar to what is observed in the PDB, where a relatively even distribution of Ser χ 1 rotamers is present. 52,53 In the other conformation in the unit cell (Figure 5c), Ser was in the β conformation. The β structure was stabilized by an intraresidue C5 hydrogen bond. ...
... In contrast, in Ser-trans-Pro structures, the three χ 1 rotamers are relatively equally populated, as has been observed previously for Ser. 52,53 Notably, the t rotamer was exclusively observed in structures with C-H/O interactions at Ser-cis-Pro, as had also been observed by small-molecule X-ray crystallography ( Figure 3). ...
Preprint
Full-text available
Structures at serine-proline sites in proteins were analyzed using a combination of peptide synthesis with structural methods and bioinformatics analysis of the PDB. Dipeptides were synthesized with the proline derivative (2 S ,4 S )-(4-iodophenyl)hydroxyproline [hyp(4-I-Ph)]. The crystal structure of Boc-Ser-hyp(4-I-Ph)-OMe had two molecules in the unit cell. One molecule exhibited cis -proline and a type VIa2 β-turn (BcisD). The cis -proline conformation was stabilized by a C–H/O interaction between Pro C–H α and the Ser side-chain oxygen. NMR data were consistent with stabilization of cis -proline by a C–H/O interaction in solution. The other crystallographically observed molecule had trans -Pro and both residues in the PPII conformation. Two conformations were observed in the crystal structure of Ac-Ser-hyp(4-I-Ph)-OMe, with Ser adopting PPII in one and the β conformation in the other, each with Pro in the δ conformation and trans -Pro. Structures at Ser-Pro sequences were further examined via bioinformatics analysis of the PDB and via DFT calculations. Ser–Pro versus Ala-Pro sequences were compared to identify bases for Ser stabilization of local structures. C–H/O interactions between the Ser side-chain O γ and Pro C–H α were observed in 45% of structures with Ser- cis - Pro in the PDB, with nearly all Ser- cis -Pro structures adopting a type VI β-turn. 53% of Ser- trans -Pro sequences exhibited main-chain C=O i •••H–N i +3 or C=O i •••H–N i +4 hydrogen bonds, with Ser as the i residue and Pro as the i +1 residue. These structures were overwhelmingly either type I β-turns or N-terminal capping motifs on α-helices or a 3 10 -helices. These results indicate that Ser-Pro sequences are particularly potent in favoring these structures. In each, Ser is in either the PPII or β conformation, with the Ser O γ capable of engaging in a hydrogen bond with the amide N–H of the i +2 (type I β-turn or 3 -helix; Ser χ 1 t ) or i +3 (α-helix; Ser χ 1 g ⁺ ) residue. Non-proline cis amide bonds can also be stabilized by C–H/O interactions. Abstract Figure Graphical Table of Contents
... It was noticed that the diversity of the structures is essential for building rotamer libraries. Dunbrack and Karplus (1993) chose not to include identical structures and, later, Lovell et al. (2000) expanded on that idea not to include proteins that have sequence similarity >50%. Later, Scouras and Daggett (2011) (van der Kamp et al. 2010) used proteins with unique folds in order to get even more diverse datasets for building the rotamer library. ...
... Until sufficient quantity of protein structures were solved, all sidechain v i angles were analysed. With increasing number of protein crystal structures, additional criteria for clustering dihedral angles were used, such as protein secondary structure (McGregor et al. 1987), ranges of protein backbone / and w angles (Dunbrack and Karplus 1993). Both backbone-independent (BBIND) and backbone-dependent (BBDEP) methods were the main ways to cluster side-chain angles, and both heavily depended on the quantity of initial protein structures. ...
Article
Full-text available
Motivation: Identifying the probable positions of the protein side-chains is one of the protein modelling steps that can improve the prediction of protein-ligand [Meiler and Baker, 2006, Shin and Seok, 2012] and protein-protein [Kamisetty et al., 2011] interactions. Most of the strategies predicting the side-chain conformations use predetermined dihedral angle lists, also called rotamer libraries, that are usually generated from a subset of high-quality protein structures [Shapovalov and Dunbrack, 2011, Towse et al., 2016, Hintze et al., 2016]. Although these methods are fast to apply, they tend to average out geometries instead of taking into account the surrounding atoms and molecules and ignore structures not included in the selected subset. Such simplifications can result in inaccuracies when predicting possible side-chain atom positions [Zavodszky, 2005]. Results: We propose an approach that takes into account both of these circumstances by scanning through sterically accessible side-chain conformations and generating dihedral angle libraries specific to the target proteins. The method avoids the drawbacks of lacking conformations due to unusual or rare protein structures and successfully suggests potential rotamers with average RMSD closer to the experimentally determined side-chain atom positions than other widely used rotamer libraries. Availability: The technique is implemented in open-source software package rotag and available at GitHub: https://www.github.com/agrybauskas/rotag, under GNU Lesser General Public License. Supplementary information: Supplementary data are available at Bioinformatics online.
... The protein side chain packing (PSCP) problem has been traditionally formulated as a guided search over a library of discrete side chain conformations, or rotamers, given a protein backbone and its amino acid sequence 1 . There have been decades of research into and development of rotamer libraries that effectively capture the distribution of conformations observed for each amino acid in naturally occurring proteins [2][3][4][5][6][7][8][9][10][11] . Evaluating the favorability of individual rotamers in a residue's environment often entails an energy function that models various physical phenomena such as hydrogen bonding and van der Waals interactions [12][13][14][15][16][17] . ...
Preprint
Full-text available
Protein side chain packing (PSCP) is a fundamental problem in the field of protein engineering, as high confidence and low energy conformations of amino acid side chains are crucial for understanding (and designing) protein folding, protein-protein interactions, and protein-ligand interactions. Traditional PSCP methods (such as the Rosetta Packer) often rely on a library of discrete side chain conformations, or rotamers, and a forcefield to guide the structure to low energy conformations. Recently, deep learning (DL) based methods (such as DLPacker, AttnPacker, and DiffPack) have demonstrated state-of-the-art predictions and speed in the PSCP task. Building off the success of graph and message passing neural networks for protein modeling, we present the Protein Invariant Point Packer (PIPPack) which effectively processes local structural and sequence information to produce realistic, idealized side chain coordinates using χ-angle distribution predictions and geometry-aware invariant point message passing (IPMP). To demonstrate its broad applicability to protein-related tasks, IPMP was additionally incorporated in a fixed backbone protein design method, which enabled the generation of more native-like sequences than common message passing schemes. On a test set of ~1,400 high-quality protein chains, PIPPack outperforms other state-of-the-art PSCP methods in rotamer recovery, while producing competitive per-residue RMSDs and being significantly faster.
... In addition, as controls, geometry optimization was performed on these peptides with all residues in a PPII conformation, and with the Pro residue in either an exo Table S73]). 43,44 Thus, the differences in rotamer energies are relatively small compared to the observed interaction energies, allowing approximation of the interaction energies. ...
Preprint
Full-text available
In proteins, proline-aromatic sequences exhibit increased frequencies of cis-proline amide bonds, via proposed C–H/π interactions between the aromatic ring and either the proline ring or the backbone C–Hα of the residue prior to proline. These interactions would be expected to result in tryptophan, as the most electron-rich aromatic residue, exhibiting the highest frequency of cis-proline. However, prior results from bioinformatics studies on proteins and experiments on proline-aromatic sequences in peptides have not revealed a clear correlation between the properties of the aromatic ring and the population of cis-proline. An investigation of the effects of aromatic residue (aromatic ring properties) on the conformation of proline-aromatic sequences was conducted using three distinct approaches: (1) NMR spectroscopy in model peptides of the sequence Ac-TGPAr-NH2 (Ar = encoded and unnatural aromatic amino acids); (2) bioinformatics analysis of structures in proline-aromatic sequences in the PDB; and (3) computational investigation using DFT and MP2 methods on models of proline-aromatic sequences and interactions. C–H/π and hydrophobic interactions were observed to stabilize local structures in both the trans-proline and cis-proline conformations, with both proline amide conformations exhibiting C–H/π interactions between the aromatic ring and Hα of the residue prior to proline (Hα-trans-Pro-aromatic and Hα-cis-Pro-aromatic interactions) and/or with the proline ring (trans-ProH-aromatic and cis-ProH-aromatic interactions). These C–H/π interactions were strongest with tryptophan (Trp) and weakest with cationic histidine (HisH+). Aromatic interactions with histidine were modulated in strength by His ionization state. Proline-aromatic sequences were associated with specific conformational poses, including type I and type VI β-turns. C–H/π interactions at the pre-proline Hα, which were stronger than interactions at Pro, stabilize normally less favorable conformations, including the ζ or αL conformations at the pre-proline residue, cis-proline, and/or the g+ χ1 rotamer or αL conformation at the aromatic residue. These results indicate that proline-aromatic sequences, especially Pro-Trp sequences, are loci to nucleate turns, helices, loops, and other local structures in proteins. These results also suggest that mutations that introduce proline-aromatic sequences, such as the R406W mutation that is associated with protein misfolding and aggregation in the microtubule-binding protein tau, might result in substantial induced structure, particularly in intrinsically disordered regions of proteins.
Article
Structures at serine‐proline sites in proteins were analyzed using a combination of peptide synthesis with structural methods and bioinformatics analysis of the PDB. Dipeptides were synthesized with the proline derivative (2 S ,4 S )‐(4‐iodophenyl)hydroxyproline [hyp(4‐I‐Ph)]. The crystal structure of Boc‐Ser‐hyp(4‐I‐Ph)‐OMe had two molecules in the unit cell. One molecule exhibited cis ‐proline and a type VIa2 β‐turn (BcisD). The cis ‐proline conformation was stabilized by a C–H/O interaction between Pro C–H α and the Ser side‐chain oxygen. NMR data were consistent with stabilization of cis ‐proline by a C–H/O interaction in solution. The other crystallographically observed molecule had trans ‐Pro and both residues in the PPII conformation. Two conformations were observed in the crystal structure of Ac‐Ser‐hyp(4‐I‐Ph)‐OMe, with Ser adopting PPII in one and the β conformation in the other, each with Pro in the δ conformation and trans ‐Pro. Structures at Ser‐Pro sequences were further examined via bioinformatics analysis of the PDB and via DFT calculations. Ser‐Pro versus Ala–Pro sequences were compared to identify bases for Ser stabilization of local structures. C–H/O interactions between the Ser side‐chain O γ and Pro C–H α were observed in 45% of structures with Ser‐ cis ‐Pro in the PDB, with nearly all Ser‐ cis ‐Pro structures adopting a type VI β‐turn. 53% of Ser‐ trans ‐Pro sequences exhibited main‐chain CO i •••HN i +3 or CO i •••HN i +4 hydrogen bonds, with Ser as the i residue and Pro as the i + 1 residue. These structures were overwhelmingly either type I β‐turns or N‐terminal capping motifs on α‐helices or 3 10 ‐helices. These results indicate that Ser‐Pro sequences are particularly potent in favoring these structures. In each, Ser is in either the PPII or β conformation, with the Ser O γ capable of engaging in a hydrogen bond with the amide N–H of the i + 2 (type I β‐turn or 3 10 ‐helix; Ser χ 1 t ) or i + 3 (α‐helix; Ser χ 1 g ⁺ ) residue. Non‐proline cis amide bonds can also be stabilized by C–H/O interactions.
Preprint
Full-text available
Oxalate decarboxylase (OxDC) from Bacillus subtilis is a Mn-dependent hexameric enzyme which converts oxalate to carbon dioxide and formate. Recently, OxDC has attracted the interest of the scientific community, due to its biotechnological and medical applications for the treatment of hyperoxalurias, a group of pathologic conditions associated with excessive oxalate urinary excretion due to either increased endogenous production or increased exogenous absorption. The fact that OxDC displays optimum pH in the acidic range, represents a big limitation for most biotechnological applications involving processes occurring at neutral pH, where the activity and stability of the enzyme are remarkably reduced. Here, through bioinformatics-guided protein engineering, followed by combinatorial mutagenesis and analyses of activity and thermodynamic stability, we identified a double mutant of OxDC endowed with enhanced catalytic efficiency and stability under physiological conditions. The obtained engineered form of OxDC offers a potential tool for improved intestinal oxalate degradation in hyperoxaluria patients.
Article
Full-text available
The prediction of a protein's tertiary structure is still a considerable problem because the huge amount of possible conformational space¹ makes it computationally difficult. With regard to side-chain modelling, a solution has been attempted by the grouping of side-chain conformations into representative sets of rotamers²⁻⁵. Nonetheless, an exhaustive combinatorial search is still limited to carefully indentified packing units⁵⁶ containing a limited number of residues. For larger systems other strategies had to be developed, such as the Monte Carlo Procedure⁶⁷ and the genetic algorithm and clustering approach⁸. Here we present a theorem, referred to as the 'dead-end elimination' theorem, which imposes a suitable condition to identify rotamers that cannot be members of the global minimum energy conformation. Application of this theorem effectively controls the computational explosion of the rotamer combinatorial problem, thereby allowing the determination of the global minimum energy conformation of a large collection of side chains.
Article
Full-text available
We have developed a rapid and completely automatic method for prediction of protein side-chain conformation, applying the simulated annealing algorithm to optimization of side-chain packing (van der Waals) interactions. The method directly attacks the combinatorial problem of simultaneously predicting many residues' conformation, solving in 8 to 12 hours problems for which the systematic search would require over 10(300) central processing unit years. Over a test set of nine proteins ranging in size from 46 to 323 residues, the program's predictions for side-chain atoms had a root-mean-square (r.m.s.) deviation of 1.77 A overall versus the native structures. More importantly, the predictions for core residues were especially accurate, with an r.m.s. value of 1.25 A overall: 80 to 90% of the large hydrophobic side-chains dominating the internal core were correctly predicted, versus 30 to 40% for most current methods. The predictions' main errors were in surface residues poorly constrained by packing and small residues with greater steric freedom and hydrogen bonding interactions, which were not included in the program's potential function. van der Waals interactions appear to be the supreme determinant of the arrangement of side-chains in the core, enforcing a unique allowed packing that in every case so far examined matches the native structure.
Article
Retention of known geometry, with regard to mean atomic positions, has proved useful in the refinement of macromolecules. In structures with a paucity of diffraction data and large displacements of the atoms from their mean positions, it is also of value to restrain the thermal factors to be consistent with known stereochemistry. This paper presents a technique for accomplishing this by restraining the variances of the interatomic distributions (which are functions of the mean atomic positions and the thermal parameters) to suitably small values. This procedure allows meaningful anisotropic refinement of macromolecules to be carried out with low-resolution diffraction data. Anisotropic thermal parameters obtained in this way should prove useful in understanding the dynamics of the biological functions of macromolecules.
Article
We assume that each class of protein has a core structure that is defined by internal residues, and that the external, solvent-contacting residues contribute to the stability of the structure, are of primary importance to function, but do not determine the architecture of the core portions of the polypeptide chain. An algorithm has been developed to supply a list of permitted sequences of internal residues compatible with a known core structure. This list is referred to as the tertiary template for that structure. In general the positions in the template are not sequentially adjacent and are distributed throughout the polypeptide chain. The template is derived using the fixed positions for the main-chain and beta-carbon atoms in the test structure and selected stereochemical rules. The focus of this paper is on the use of two packing criteria: avoidance of steric overlap and complete filling of available space. The program also notes potential polar group interactions and disulfide bonds as well as possible burial of formal charges.
Article
We have analysed the side-chain dihedral angles in 2536 residues from 19 protein structures. The distributions of x1 and x2 are compared with predictions made on the basis of simple energy calculations. The x1 distribution is trimodal; the g− position of the side-chain (trans to Hα), which is rare except in serine, the t position (trans to the amino group), and the g+ position (trans to the carbonyl group), which is preferred in all residues. Characteristic x2 distributions are observed for residues with a tetrahedral γ-carbon, for aromatic residues, and for aspartic acid/asparagine. The number of configurations actually observed is small for all types of side-chains, with 60% or more of them in only one or two configurations. We give estimates of the experimental errors on x1 and x2 (3 ° to 16 °, depending on the type of the residue), and show that the dihedral angles remain within 15 ° to 18 ° (standard deviation) from the configurations with the lowest calculated energies. The distribution of the side-chains among the permitted configurations varies slightly with the conformation of the main chain, and with the position of the residue relative to the protein surface. Configurations that are rare for exposed residues are even rarer for buried residues, suggesting that, while the folded structure puts little strain on side-chain conformations, the side-chain positions with the lowest energy in the unfolded structure are chosen preferentially during folding.
Article
Side-chain torsional potentials in the bovine pancreatic trypsin inhibitor are calculated from empirical energy functions by use of the known X-ray structure of the protein and the rigid-geometry mapping technique. The potentials are analyzed to determine the roles and relative importance of contributions from the dipeptide backbone, the protein, and the crystalline environment of solvent and other protein molecules. The structural characteristics of the side chains determine two major patterns of energy surfaces, E(X1,X2): a gamma-branched pattern and a pattern for longer, straight side chains (Arg, Lys, Glu, and Met). Most of the dipeptide potential curves and surfaces have a local minimum corresponding to the side-chain torsional angles in the X-ray structure. Addition of the protein forces sharpens and/or selects from these minima, providing very good agreement with the experimental conformation for most side chains at the surface or in the core of the protein. Inclusion of the crystalline environment produces still better results, especially for the side chains extending away from the protein. The results are discussed in terms of the details of the interactions due to the surrounding, calculated solvent-accessibility figures and the temperature factors derived from the crystallographic refinement of the pancreatic trypsin inhibitor.
Article
Conformational potentials of sidechains in the bovine pancreatic trypsin inhibitor have been studied with an empirical energy function. Calculated minimumenergy positions are in excellent agreement with the x-ray structure for sidechains in the core or at the surface of the protein; as expected, angles for sidechains that are directed out into the solvent do not agree with the calculated values. The contributions to the potentials are analyzed and compared with the potentials for the free amino acid. Although there is a large restriction in the available conformational space due to nonbonded interactions, the minimum energy positions in the protein are close to those of the free amino acid; the significance of this result is discussed. To estimate the effective barriers for rotation of the aromatic rings (tyrosine and phenylalanine), calculations are done in which the protein is permitted to relax as a function of the ring orientation. Thr resulting barriers, which are much lowere than the rigid rotation barriers, are used to evaluate the rotation rates; comparison is made with the available nuclear magnetic resonance data.
Article
A program (PROBIT) has been developed that allows the reconstruction of a complete set of three-dimensional protein coordinates from alpha-carbon coordinates. The program generates a statistical measure of polypeptide conformational behavior for substructures in a defined structural context from a library of highly refined protein structures. These statistics provide a prescription for substructure substitution from the database to allow regeneration of the complete protein structure.
Article
The problem of constructing all-atom model co-ordinates of a protein from an outline of the polypeptide chain is encountered in protein structure determination by crystallography or nuclear magnetic resonance spectroscopy, in model building by homology and in protein design. Here, we present an automatic procedure for generating full protein co-ordinates (backbone and, optionally, side-chains) given the C alpha trace and amino acid sequence. To construct backbones, a protein structure database is first scanned for fragments that locally fit the chain trace according to distance criteria. A best path algorithm then sifts through these segments and selects an optimal path with minimal mismatch at fragment joints. In blind tests, using fully known protein structures, backbones (C alpha, C, N, O) can be reconstructed with a reliability of 0.4 to 0.6 A root-mean-square position deviation and not more than 0 to 5% peptide flips. This accuracy is sufficient to identify possible errors in protein co-ordinate sets. To construct full co-ordinates, side-chains are added from a library of frequently occurring rotamers using a simple and fast Monte Carlo procedure with simulated annealing. In tests on X-ray structures determined at better than 2.5 A resolution, the positions of side-chain atoms in the protein core (less than 20% relative accessibility) have an accuracy of 1.6 A (r.m.s. deviation) and 70% of chi 1 angles are within 30 degrees of the X-ray structure. The computer program MaxSprout is available on request.
Article
Free energy simulation methods are used to analyze the effects of the mutation Arg 96----His on the stability of T4 lysozyme. The calculated stability change and the lack of significant structural rearrangement in the folded state due to the mutation are in agreement with experimental studies [Kitamura, S., & Sturtevant, J. M. (1989) Biochemistry 28, 3788-3792; Weaver, L. H., et al. (1989) Biochemistry 28, 3793-3797]. By use of thermodynamic integration, the contributions of specific interactions to the free energy change are evaluated. It is shown that a number of contributions that stabilize the wild type or the mutant partially cancel in the overall free energy difference; some of these involve the unfolded state. Comparison of the results with conclusions based on structural and thermodynamic data leads to new insights into the origin of the stability difference between wild-type and mutant proteins. Of particular interest is the importance of the contributions of more distant residues, solvent water, and the covalent linkage of the mutated amino acid. Also, the analysis of the interactions of Arg/His 96 with the C-terminal end of a helix (residues 82-90) makes it clear that the nearby carbonyl groups (Tyr 88 and Asp 89) make the dominant contribution, that the amide groups do not contribute significantly, and that the helix-dipole model is inappropriate for this case.