Conference PaperPDF Available

Optimization of Expanded Genetic Codes via Genetic Algorithms

Authors:

Abstract

In the last decades, researchers have proposed the use of genetically modified organisms that utilize unnatural amino acids, i.e., amino acids other than the 20 amino acids encoded in the standard genetic code. Unnatural amino acids have been incorporated into genetically engineered organisms for the development of new drugs, fuels and chemicals. When new amino acids are incorporated, it is necessary to modify the standard genetic code. Expanded genetic codes have been created without considering the robustness of the code. The objective of this work is the use of genetic algorithms (GAs) for the optimization of expanded genetic codes. The GA indicates which codons of the standard genetic code should be used to encode a new unnatural amino acid. The fitness function has two terms; one for robustness of the new code and another that takes into account the frequency of use of amino acids. Experiments show that, by controlling the weighting between the two terms, it is possible to obtain more or less amino acid substitutions at the same time that the robustness is minimized.
A preview of the PDF is not available
Chapter
There is great interest in the creation of genetically modified organisms that use amino acids different from the naturally encoded amino acids. Unnatural amino acids have been incorporated into genetically modified organisms to develop new drugs, fuels and chemicals. When incorporating new amino acids, it is necessary to change the standard genetic code. Expanded genetic codes have been created without considering the robustness of the code. In this work, multi-objective genetic algorithms are proposed for the optimization of expanded genetic codes. Two different approaches are compared: weighted and Pareto. The expanded codes are optimized in relation to the frequency of replaced codons and two measures based on robustness (for polar requirement and molecular volume). The experiments indicate that multi-objective approaches allow to obtain a list of expanded genetic codes optimized according to combinations of the three objectives. Thus, specialists can choose an optimized solution according to their needs.
Article
Full-text available
Since at least the last common ancestor of all life on Earth, genetic information has been stored in a four-letter alphabet that is propagated and retrieved by the formation of two base pairs. The central goal of synthetic biology is to create new life forms and functions, and the most general route to this goal is the creation of semi-synthetic organisms whose DNA harbours two additional letters that form a third, unnatural base pair. Previous efforts to generate such semi-synthetic organisms culminated in the creation of a strain of Escherichia coli that, by virtue of a nucleoside triphosphate transporter from Phaeodactylum tricornutum, imports the requisite unnatural triphosphates from its medium and then uses them to replicate a plasmid containing the unnatural base pair dNaM-dTPT3. Although the semi-synthetic organism stores increased information when compared to natural organisms, retrieval of the information requires in vivo transcription of the unnatural base pair into mRNA and tRNA, aminoacylation of the tRNA with a non-canonical amino acid, and efficient participation of the unnatural base pair in decoding at the ribosome. Here we report the in vivo transcription of DNA containing dNaM and dTPT3 into mRNAs with two different unnatural codons and tRNAs with cognate unnatural anticodons, and their efficient decoding at the ribosome to direct the site-specific incorporation of natural or non-canonical amino acids into superfolder green fluorescent protein. The results demonstrate that interactions other than hydrogen bonding can contribute to every step of information storage and retrieval. The resulting semi-synthetic organism both encodes and retrieves increased information and should serve as a platform for the creation of new life forms and functions. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
Article
Full-text available
Background The organization of the canonical code has intrigued researches since it was first described. If we consider all codes mapping the 64 codes into 20 amino acids and one stop codon, there are more than 1.51×1084 possible genetic codes. The main question related to the organization of the genetic code is why exactly the canonical code was selected among this huge number of possible genetic codes. Many researchers argue that the organization of the canonical code is a product of natural selection and that the code’s robustness against mutations would support this hypothesis. In order to investigate the natural selection hypothesis, some researches employ optimization algorithms to identify regions of the genetic code space where best codes, according to a given evaluation function, can be found (engineering approach). The optimization process uses only one objective to evaluate the codes, generally based on the robustness for an amino acid property. Only one objective is also employed in the statistical approach for the comparison of the canonical code with random codes. We propose a multiobjective approach where two or more objectives are considered simultaneously to evaluate the genetic codes. Results In order to test our hypothesis that the multiobjective approach is useful for the analysis of the genetic code adaptability, we implemented a multiobjective optimization algorithm where two objectives are simultaneously optimized. Using as objectives the robustness against mutation with the amino acids properties polar requirement (objective 1) and robustness with respect to hydropathy index or molecular volume (objective 2), we found solutions closer to the canonical genetic code in terms of robustness, when compared with the results using only one objective reported by other authors. Conclusions Using more objectives, more optimal solutions are obtained and, as a consequence, more information can be used to investigate the adaptability of the genetic code. The multiobjective approach is also more natural, because more than one objective was adapted during the evolutionary process of the canonical genetic code. Our results suggest that the evaluation function employed to compare genetic codes should consider simultaneously more than one objective, in contrast to what has been done in the literature.
Article
Full-text available
Genetically modified organisms (GMOs) are increasingly used in research and industrial systems to produce high-value pharmaceuticals, fuels and chemicals. Genetic isolation and intrinsic biocontainment would provide essential biosafety measures to secure these closed systems and enable safe applications of GMOs in open systems, which include bioremediation and probiotics. Although safeguards have been designed to control cell growth by essential gene regulation, inducible toxin switches and engineered auxotrophies, these approaches are compromised by cross-feeding of essential metabolites, leaked expression of essential genes, or genetic mutations. Here we describe the construction of a series of genomically recoded organisms (GROs) whose growth is restricted by the expression of multiple essential genes that depend on exogenously supplied synthetic amino acids (sAAs). We introduced a Methanocaldococcus jannaschii tRNA:aminoacyl-tRNA synthetase pair into the chromosome of a GRO derived from Escherichia coli that lacks all TAG codons and release factor 1, endowing this organism with the orthogonal translational components to convert TAG into a dedicated sense codon for sAAs. Using multiplex automated genome engineering, we introduced in-frame TAG codons into 22 essential genes, linking their expression to the incorporation of synthetic phenylalanine-derived amino acids. Of the 60 sAA-dependent variants isolated, a notable strain harbouring three TAG codons in conserved functional residues of MurG, DnaA and SerS and containing targeted tRNA deletions maintained robust growth and exhibited undetectable escape frequencies upon culturing ∼10(11) cells on solid media for 7 days or in liquid media for 20 days. This is a significant improvement over existing biocontainment approaches. We constructed synthetic auxotrophs dependent on sAAs that were not rescued by cross-feeding in environmental growth assays. These auxotrophic GROs possess alternative genetic codes that impart genetic isolation by impeding horizontal gene transfer and now depend on the use of synthetic biochemical building blocks, advancing orthogonal barriers between engineered organisms and the environment.
Article
Full-text available
The genetic code is the interface between the genetic information stored in DNA molecules and the proteins. Considering the hypothesis that the genetic code evolved to its current structure, some researches use optimization algorithms to find hypothetical codes to be compared to the canonical genetic code. For this purpose, a function with only one objective is employed to evaluate the codes, generally a function based on the robustness of the code against mutations. Very few random codes are better than the canonical genetic code when the evaluation function based on robustness is considered. However, most codons are associated with a few amino acids in the best hypothetical codes when only robustness is employed to evaluate the codes, what makes hard to believe that the genetic code evolved based on only one objective, i.e., the robustness against mutations. In this way, we propose here to use entropy as a second objective for the evaluation of the codes. We propose a Pareto approach to deal with both objectives. The results indicate that the Pareto approach generates codes closer to the canonical genetic code when compared to the codes generated by the approach with only one objective employed in the literature.
Article
Using a robustness measure based on values of the polar requirement of amino acids, Freeland and Hurst (1998) showed that less than one in one million random hypothetical codes are better than the standard genetic code. In this paper, instead of comparing the standard code with randomly generated codes, we use an optimisation algorithm to find the best hypothetical codes. This approach has been used before, but considering only one objective to be optimised. The robustness measure based on the polar requirement is considered the most effective objective to be optimised by the algorithm. We propose here that the polar requirement is not the only property to be considered when computing the robustness of the genetic code. We include the hydropathy index and molecular volume in the evaluation of the amino acids using three multi-objective approaches: the weighted formula, lexicographic and Pareto approaches. To our knowledge, this is the first work proposing multi-objective optimisation approaches with a non-restrictive encoding for studying the evolution of the genetic code. Our results indicate that multi-objective approaches considering the three amino acid properties obtain better results than those obtained by single objective approaches reported in the literature. The codes obtained by the multi-objective approach are more robust and structurally more similar to the standard code.
Article
The ability to site-specifically incorporate noncanonical amino acids (ncAAs) with novel structures into proteins in living cells affords a powerful tool to investigate and manipulate protein structure and function. More than 200 ncAAs with diverse biological, chemical, and physical properties have been genetically encoded in response to nonsense or frameshift codons in both prokaryotic and eukaryotic organisms with high fidelity and efficiency. In this review, recent advances in the technology and its application to problems in protein biochemistry, cellular biology, and medicine are highlighted.
Book
This manual teaches theoretical and practical molecular genetic approaches to bacterial pathogenicity. Chapters on concepts, technologies and applications are followed by 15 experiments with Salmonella and Vibrio, in which protocols and expected findings are demonstrated and strategies for similar approaches to other bacteria are discussed. This manual is aimed at microbiologists in research, industrial and public health laboratories.
Article
This paper addresses the problem of how to evaluate the quality of a model built from the data in a multi-objective optimization scenario, where two or more quality criteria must be simultaneously optimized. A typical example is a scenario where one wants to maximize both the accuracy and the simplicity of a classification model or a candidate attribute subset in attribute selection. One reviews three very different approaches to cope with this problem, namely: (a) transforming the original multi-objective problem into a single-objective problem by using a weighted formula; (b) the lexicographical approach, where the objectives are ranked in order of priority; and (c) the Pareto approach, which consists of finding as many non-dominated solutions as possible and returning the set of non-dominated solutions to the user. One also presents a critical review of the case for and against each of these approaches. The general conclusions are that the weighted formula approach -- which is by far the most used in the data mining literature -- is to a large extent an ad-hoc approach for multi-objective optimization, whereas the lexicographic and the Pareto approach are more principled approaches, and therefore deserve more attention from the data mining community.
Book
Genetic algorithms have been used in science and engineering as adaptive algorithms for solving practical problems and as computational models of natural evolutionary systems. This brief, accessible introduction describes some of the most interesting research in the field and also enables readers to implement and experiment with genetic algorithms on their own. It focuses in depth on a small set of important and interesting topics—particularly in machine learning, scientific modeling, and artificial life—and reviews a broad span of research, including the work of Mitchell and her colleagues. The descriptions of applications and modeling projects stretch beyond the strict boundaries of computer science to include dynamical systems theory, game theory, molecular biology, ecology, evolutionary biology, and population genetics, underscoring the exciting "general purpose" nature of genetic algorithms as search methods that can be employed across disciplines. An Introduction to Genetic Algorithms is accessible to students and researchers in any scientific discipline. It includes many thought and computer exercises that build on and reinforce the reader's understanding of the text. The first chapter introduces genetic algorithms and their terminology and describes two provocative applications in detail. The second and third chapters look at the use of genetic algorithms in machine learning (computer programs, data analysis and prediction, neural networks) and in scientific models (interactions among learning, evolution, and culture; sexual selection; ecosystems; evolutionary activity). Several approaches to the theory of genetic algorithms are discussed in depth in the fourth chapter. The fifth chapter takes up implementation, and the last chapter poses some currently unanswered questions and surveys prospects for the future of evolutionary computation. Bradford Books imprint
Article
The segregated, linear and digital character of the genome has allowed us to apply information theory and other mathematical theorems about sequences or strings of symbols to make a quantitative rather than an anecdotal and ad hoc discussion of significant problems in molecular biology. This procedure has led us to avoid a number of illusions common in the literature. The application of these mathematical procedures will play a role in molecular biology analogous to that of thermodynamics in chemistry.