Article

Analysis of codon usage bias of chloroplast genomes in Gynostemma species

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Gynostemma plants are important Chinese medicinal material and economic crops. Codon usage analysis is a good way to understand organism evolution and phylogeny. There is no report yet about analysis of codon usage bias of chloroplast genomes in Gynostemma species. In this study, the chloroplast genomes in nine Gynostemma species were analyzed systematically to explore the factors affecting the formation of codon usage bias. The codon usage indicators were analyzed. Multivariate statistical analysis including analysis of neutrality plot, effective number of codons plot, parity rule 2 plot and correspondence were performed. Composition analysis of codons showed that the frequency of GC in chloroplast genes of all nine Gynostemma species was less than 50%, and the protein-coding sequences of chloroplast genes preferred to end with A/T at the third codon position. The chloroplast genes had an overall weak codon usage bias. A total of 29 high frequency codons and 12 optimal codons were identified. These could provide useful information in optimizing and modifying codons thus improving the gene expression of Gynostemma species. The results of multivariate analysis showed that the codon usage patterns were not only affected by single one factor but multiple factors. Mutation pressure, natural selection and base composition might have an influence on the codon usage patterns while natural selection might be the main determinant. The study could provide a reference for organism evolution and phylogeny of Gynostemma species and help to understand the patterns of codons in chloroplast genomes in other plant species. Supplementary information: The online version contains supplementary material available at 10.1007/s12298-021-01105-z.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Codon usage bias in angiosperms is relatively weak, as indicated by Nair et al. [48], Zeng et al. [49], and Zhang et al. [50]. Of the genes in the T. costata cp genome, 83.02% had values greater than 45, suggesting a weak bias. ...
... In total, 28 out of the 29 highfrequency codons and 8 of the optimal codons end in A/T, indicating that codons prefer to use those ending in A/U. This result is consistent with studies on Gynogyra [50], Hemiptelea davidii [51], and Porphyra umbilicalis [52]. A significant correlation was found between GC 1 and GC 2 , but no significant correlation was found between GC 3 and GC 1 , suggesting that the composition of the first and second codon bases is different from that of the third codon. ...
... In general, natural selection is the main factor affecting codon usage bias in the cp genes of T. costata. It has been detected as a major factor influencing the codon usage bias in the cp genomes of several groups and species, for instance, in Mesona chinensis [55], Elaeagnus [56], Gynostemma [50], Juglandaceae [49], and Euphorbiaceae [53]. It has been reported that codon preference affects gene expression by regulating the accuracy and efficiency of gene translation [57]. ...
Article
Full-text available
Chloroplasts (cp) are important organelles in plant cells that have been widely used in phylogenetic, molecular evolution, and gene expression studies due to their conserved molecular structure. In this study, we obtained the complete cp genome of Trivalvaria costata (Annonaceae) and analyzed its structural characteristics. Additionally, we analyzed the rps12 gene in the phylogenetic framework of magnoliids. The T. costata cp genome comprises 1,662,002 bp and contains 132 genes. We detected 48 simple sequence repeats (SSRs) and identified 29 high-frequency codons as well as 8 optimal codons. Our multiple analyses show that codon usage bias is mainly influenced by natural selection. For the first time, we found the rps12 gene to be entirely located in the IR region (in Annona). In groups with exon 1 located in the single-copy (SC) region and exons 2–3 located in the inverted repeat (IR) region, the transition rate and synonymous substitution rate of exon 1 were higher than those of exons 2–3. Adaptive evolution identified a positive selection site (116) located in the 310-helix region, suggesting that the rps12 gene may undergo adaptive changes during the evolutionary history of magnoliids. This study enhances our knowledge regarding genetic information on T. costata and provides support for reduced substitution rates in the IR region.
... Furthermore, we downloaded all the available chloroplast genomes for this genus and performed a comprehensive analysis of the chloroplast genomes of this basal group of Cucurbitaceae. The chloroplast genomes of different species of Gynostemma are highly conserved, and the gene compositions, structures and GC contents are similar, RNA editing sites and condon usage bias showing close species relationships within the genus Gynostemma [13,[17][18][19][20][21]51]. Our newly sequenced chloroplast genomes of Gynostemma species revealed differences between the two subgenera, despite the low chloroplast polymorphism in this genus. ...
... The most enriched amino acid within the Gynostemma species was leucine, and this result was frequently reported in other angiosperms [52]. The frequently used codons (RSCU > 1) usually end in A/U, which is consistent with reports from other plants [51]. The occurrence of base addition, loss, or conversion in the coding region after transcription is known as RNA editing. ...
Article
Full-text available
Gynostemma is an important medicinal and food plant of the Cucurbitaceae family. The phylogenetic position of the genus Gynostemma in the Cucurbitaceae family has been determined by morphology and phylogenetics, but the evolutionary relationships within the genus Gynostemma remain to be explored. The chloroplast genomes of seven species of the genus Gynostemma were sequenced and annotated, of which the genomes of Gynostemma simplicifolium, Gynostemma guangxiense and Gynostemma laxum were sequenced and annotated for the first time. The chloroplast genomes ranged from 157,419 bp (Gynostemma compressum) to 157,840 bp (G. simplicifolium) in length, including 133 identical genes: 87 protein-coding genes, 37 tRNA genes, eight rRNA genes and one pseudogene. Phylogenetic analysis showed that the genus Gynostemma is divided into three primary taxonomic clusters, which differs from the traditional morphological classification of the genus Gynostemma into the subgenus Gynostemma and Trirostellum. The highly variable regions of atpH-atpL, rpl32-trnL, and ccsA-ndhD, the repeat unilts of AAG/CTT and ATC/ATG in simple sequence repeats (SSRs) and the length of overlapping regions between rps19 and inverted repeats(IRb) and between ycf1 and small single-copy (SSC) were found to be consistent with the phylogeny. Observations of fruit morphology of the genus Gynostemma revealed that transitional state species have independent morphological characteristics, such as oblate fruit and inferior ovaries. In conclusion, both molecular and morphological results showed consistency with those of phylogenetic analysis.
... The codon bias in some plastomes of the species was mainly affected by mutation pressure, such as in the genus Quercus [78] and Coffea arabica [79]. In other species, such as the genus Gynostemma, natural selection played a major role in shaping codon usage bias [80]. In Sinojackia plastomes, natural selection interacted with mutation pressure playing a crucial role in shaping the codon usage pattern, but natural selection was the primary driver of the codon usage in the Sinojackia plastomes. ...
Article
Full-text available
Sinojackia Hu. comprises five to eight Chinese endemic species with high ornamental and medicinal value. However, the generic limits, interspecific relationships and evolutionary history of the genus remain unresolved. In this study, we newly sequenced three plastomes of S. oblongicarpa and compared them with those of the other congeneric species to explore the taxonomic delimitation of the species and the evolutionary history of the genus. The plastome structure of Sinojackia species was extremely conserved in terms of number of genes, sequence length, and GC content. The codon usage patterns revealed that natural selection may be the main factor shaping codon usage bias. Our phylogenetic tree shows that Sinojackia is monophyletic and can be divided into two clades. Sinojackia oblongicarpa as a distinct species is supported for it is distantly related to S. sarcocarpa. The evolutionary analysis of morphological features indicates that the woody mesocarp is an ancestral feature. Sinojackia originated in central Southeast China during the early Miocene. In this period, it experienced elevated diversification and migrated from central Southeast China to the Hunan Province and the Sichuan Province with the development of the Asian monsoon and East Asian flora. Glacial–interglacial interactions with the monsoon climate may provide favorable expansion conditions for Sinojackia on a small scale.
... Codon bias in some plastomes of species was mainly affected by mutation pressure, such as in the genus Quercus [78] and Coffea arabica [79]. In other species, such as the genus Gynostemma, natural selection plays a major role in shaping codon usage bias [80]. In Sinojackia plastomes, natural selection interacted with mutation pressure played a crucial role in shaping codon usage pattern, but natural selection is the primary driver of the codon usage in the Sinojackia plastomes. ...
Preprint
Full-text available
Sinojackia Hu. comprises five to eight Chinese endemic species with high ornamental and medicinal value. However, the generic limits, interspecific relationships and evolutionary history of the genus remain unresolved. In this study, we newly sequenced and assembled three plastomes of S. oblongicarpa and compared them with those of the other congeneric species to explore the taxonomic delimitation of the species and the evolutionary history of the genus. Plastomes structure of Sinojackia species were extremely conserved in terms of number of genes, sequence length and GC content. Codon usage patterns revealed that natural selection may be the main factor shaping codon usage bias. Our phylogenetic tree shows that Sinojackia is monophyletic and can be divided into two clades. Sinojackia oblongicarpa as a distinct species is supported for it is distantly related to S. sarcocarpa. The evolutionary analysis of morphological features indicates that woody mesocarp is an ancestral feature, while mesocarp undeveloped, spongy and fleshy are the later derived. Sinojackia originated in Central-Southeast China during the early Miocene. In this period, it experienced elevated diversification and migrated from the Central-Southeast China to Hunan Province and Sichuan Province with the development of the Asian monsoon and East Asian flora. Sinojackia experienced elevated diversification at intraspecies levels that mainly occurred in the Quaternary. Glacial-interglacial interactions with the monsoon climate may provide favorable expansion conditions for Sinojackia on a small-scale.
... The predominant usage and frequency of A or T-ending codons, along with the lower preference for codons ending in G or C, are major factors that influence the codon usage bias of Jasminum chloroplast genes. The genomes of plant chloroplasts usually exhibit an AT bias, as seen in Camellia [58], H. davidii [59], Gynostemma [60], Asteraceae [61], and M. chinensis [62]. This tendency may be associated with enhancing gene expression. ...
Article
Full-text available
Background Jasmine (Jasminum), renowned for its ornamental value and captivating fragrance, has given rise to numerous species and accessions. However, limited knowledge exists regarding the evolutionary relationships among various Jasminum species. Results In the present study, we sequenced seven distinct Jasminum species, resulting in the assembly of twelve high-quality complete chloroplast (cp) genomes. Our findings revealed that the size of the 12 cp genomes ranged from 159 to 165 kb and encoded 134–135 genes, including 86–88 protein-coding genes, 38–40 tRNA genes, and 8 rRNA genes. J. nudiflorum exhibited a larger genome size compared to other species, mainly attributed to the elevated number of forward repeats (FRs). Despite the typically conservative nature of chloroplasts, variations in the presence or absence of accD have been observed within J. sambac. The calculation of nucleotide diversity (Pi) values for 19 cp genomes indicated that potential mutation hotspots were more likely to be located in LSC regions than in other regions, particularly in genes ycf2, rbcL, atpE, ndhK, and ndhC (Pi > 0.2). Ka/Ks values revealed strong selection pressure on the genes rps2, atpA, rpoA, rpoC1, and rpl33 when comparing J. sambac with the three most closely related species (J. auriculatum, J. multiflorum, and J. dichotomum). Additionally, SNP identification, along with the results of Structure, PCA, and phylogenetic tree analyses, divided the Jasminum cp genomes into six groups. Notably, J. polyanthum showed gene flow signals from both the G5 group (J. nudiflorum) and the G3 group (J. tortuosum and J. fluminense). Phylogenetic tree analysis reflected that most species from the same genus clustered together with robust support in Oleaceae, strongly supporting the monophyletic nature of cp genomes within the genus Jasminum. Conclusion Overall, this study provides comprehensive insights into the genomic composition, variation, and phylogenetic relationships among various Jasminum species. These findings enhance our understanding of the genetic diversity and evolutionary history of Jasminum.
... Nevertheless, it is important to note that only tryptophan and methionine do not have alternative codons [36]. Similar to other higher plants, for plants using multiple codons, the third nucleotide of the codon was more frequently occupied by A/T than C/G [37,38]. ...
Article
Full-text available
The Meconopsis species are widely distributed in the Qinghai-Tibet Plateau, Himalayas, and Hengduan Mountains in China, and have high medicinal and ornamental value. The high diversity of plant morphology in this genus poses significant challenges for species identification, given their propensity for highland dwelling, which makes it a question worth exploring how they cope with the harsh surroundings. In this study, we recently generated chloroplast (cp) genomes of two Meconopsis species, Meconopsis paniculata (M. paniculata) and M. pinnatifolia, and compared them with those of ten Meconopsis cp genomes to comprehend cp genomic features, their phylogenetic relationships, and what part they might play in plateau adaptation. These cp genomes shared a great deal of similarities in terms of genome size, structure, gene content, GC content, and codon usage patterns. The cp genomes were between 151,864 bp and 154,997 bp in length, and contain 133 predictive genes. Through sequence divergence analysis, we identified three highly variable regions (trnD-psbD, ccsA-ndhD, and ycf1 genes), which could be used as potential markers or DNA barcodes for phylogenetic analysis. Between 22 and 38 SSRs and some long repeat sequences were identified from 12 Meconopsis species. Our phylogenetic analysis confirmed that 12 species of Meconopsis clustered into a monophyletic clade in Papaveraceae, which corroborated their intrageneric relationships. The results indicated that M. pinnatifolia and M. paniculata are sister species in the phylogenetic tree. In addition, the atpA and ycf2 genes were positively selected in high-altitude species. The functions of these two genes might be involved in adaptation to the extreme environment in the cold and low CO2 concentration conditions at the plateau.
... Research demonstrated that codons prefer to use A/U endings in the cp genomes of plants (Tang et al. 2021;Chu and Wei 2019;Chakraborty et al. 2020). In this study, a total of 31 high-frequency codons were identified in the L. cylindrica cp genome, 29 of which ended with A or U, which might be caused by base mutation and natural selection (Zhang et al. 2021). In addition, the number of leucine was the highest used among the amino acids. ...
Article
Full-text available
White towel gourd (Luffacylindrica) is an important cultivated vegetable and economic plant belong to the Cucurbitaceae family. In this study, the complete cp genome of L. cylindrica was successfully sequenced using the Illumina NovaSeq 6000 platform, and compared with its closely related Cucurbitaceae species. The complete cp genome of L. cylindrica had a length of 157,296 bp. The cp genome exhibited a typical quadripartite structure, including a pair of inverted repeat (IRa and IRb) regions (26,287 bp), a large single-copy (LSC) region (86,321 bp), and a small single-copy (SSC) region (18,401 bp). The genome encoded 131 genes, consisting of 37 tRNA genes (tRNAs), eight rRNA genes (rRNAs), and 86 protein-coding genes (CDS). A total of 31 codons and 270 simple sequence repeats (SSRs) were found in the cp genome of L. cylindrica. Ka/Ks analysis showed purification selection in the most of coding genes. However, the ycf2, accD, atpE, rps4, clpP, rpl32 and ndhG genes of L.lcylindrica were influenced by positive selection. Seven highly divergent regions were the ycf1, rps12, rpl22, accD, clpP, petL and rps19 genes, which located at the LSC and SSC regions. Phylogenetic analysis revealed that L.aegyptiaca (OQ810000.1) was closely related to L.Cylindrica. Our results of the cp genome of L. cylindrica could provide available information for further species identification, classification, molecular marker, evolution and phylogeny of Cucurbitaceae plants.
... Therefore, an investigation of CUB in the mitogenome could provide a basic understanding of mitogenomic evolution and offer deeper insight into improving the expression efficiency of exogenous target genes in host organisms. Typically, the optimal genes in the nuclear genome use predominantly C-or G-ending codons, whereas those in the organelle genome prefer Aor T-ending codons [37,50,51]. In this study, we identified a total of 29 high-frequency codons and 22 optimal codons, and most of them exhibit a preference for A or T at the synonymous site. ...
Article
Full-text available
Background Hemerocallis citrina Baroni is a traditional vegetable crop widely cultivated in eastern Asia for its high edible, medicinal, and ornamental value. The phenomenon of codon usage bias (CUB) is prevalent in various genomes and provides excellent clues for gaining insight into organism evolution and phylogeny. Comprehensive analysis of the CUB of mitochondrial (mt) genes can provide rich genetic information for improving the expression efficiency of exogenous genes and optimizing molecular-assisted breeding programmes in H. citrina . Results Here, the CUB patterns in the mt genome of H. citrina were systematically analyzed, and the possible factors shaping CUB were further evaluated. Composition analysis of codons revealed that the overall GC (GCall) and GC at the third codon position (GC3) contents of mt genes were lower than 50%, presenting a preference for A/T-rich nucleotides and A/T-ending codons in H. citrina . The high values of the effective number of codons (ENC) are indicative of fairly weak CUB. Significant correlations of ENC with the GC3 and codon counts were observed, suggesting that not only compositional constraints but also gene length contributed greatly to CUB. Combined ENC-plot, neutrality plot, and Parity rule 2 (PR2)-plot analyses augmented the inference that the CUB patterns of the H. citrina mitogenome can be attributed to multiple factors. Natural selection, mutation pressure, and other factors might play a major role in shaping the CUB of mt genes, although natural selection is the decisive factor. Moreover, we identified a total of 29 high-frequency codons and 22 optimal codons, which exhibited a consistent preference for ending in A/T. Subsequent relative synonymous codon usage (RSCU)-based cluster and mt protein coding gene (PCG)-based phylogenetic analyses suggested that H. citrina is close to Asparagus officinalis , Chlorophytum comosum , Allium cepa , and Allium fistulosum in evolutionary terms, reflecting a certain correlation between CUB and evolutionary relationships. Conclusions There is weak CUB in the H. citrina mitogenome that is subject to the combined effects of multiple factors, especially natural selection. H. citrina was found to be closely related to Asparagus officinalis , Chlorophytum comosum , Allium cepa , and Allium fistulosum in terms of their evolutionary relationships as well as the CUB patterns of their mitogenomes. Our findings provide a fundamental reference for further studies on genetic modification and phylogenetic evolution in H. citrina .
... Interestingly, there is a higher prevalence of A/T ending codon, consistent with the findings of codon bias studies in Gymnostemma. [42]. ...
Article
Full-text available
Background The Aconitum genus is a crucial member of the Ranunculaceae family. There are 350 Aconitum species worldwide, with about 170 species found in China. These species are known for their various pharmacological effects and are commonly used to treat joint pain, cold abdominal pain, and other ailments. Codon usage bias (CUB) analysis contributes to evolutionary relationships and phylogeny. Based on protein-coding sequences (PCGs), we selected 48 species of Aconitum for CUB analysis. Results The results revealed that Aconitum species had less than 50% GC content. Furthermore, the distribution of GC content was irregular and followed a trend of GC1 > GC2 > GC3, indicating a bias towards A/T bases. The relative synonymous codon usage (RSCU) heat map revealed the presence of conservative codons with slight variations within the genus. The effective number of codons (ENC)-Plot and the parity rule 2 (PR2)-bias plot analysis indicate that natural selection is the primary factor influencing the variation in codon usage. As a result, we screened various optimal codons and found that A/T bases were preferred as the last codon. Furthermore, our Maximum Likelihood (ML) analysis based on PCGs among 48 Aconitum species yielded results consistent with those obtained from complete chloroplast (cp.) genome data. This suggests that analyzing mutation in PCGs is an efficient method for demonstrating the phylogeny of species at the genus level. Conclusions The CUB analysis of 48 species of Aconitum was mainly influenced by natural selection. This study reveals the CUB pattern of Aconitum and lays the foundation for future genetic modification and phylogenetic analyses.
... Research on CUB is well established in many organisms [42,43]. Ciliates are the most specialized and complex group of protozoa. ...
Article
Full-text available
Ciliated protozoa (ciliates) are unicellular eukaryotes, several of which are important model organisms for molecular biology research. Analyses of codon usage bias (CUB) of the macronuclear (MAC) genome of ciliates can promote a better understanding of the genetic mode and evolutionary history of these organisms and help optimize codons to improve gene editing efficiency in model ciliates. In this study, the following indices were calculated: the guanine-cytosine (GC) content, the frequency of the nucleotides at the third position of codons (T3, C3, A3, G3), the effective number of codons (ENc), GC content at the 3rd position of synonymous codons (GC3s), and the relative synonymous codon usage (RSCU). Parity rule 2 plot analysis, Neutrality plot analysis, ENc plot analysis, and correlation analysis were employed to explore the main influencing factors of CUB. The results showed that the GC content in the MAC genomes of each of 21 ciliate species, the genomes of which were relatively complete, was lower than 50%, and the base compositions of GC and GC3s were markedly distinct. Synonymous codon analysis revealed that the codons in most of the 21 ciliates ended with A or T and four codons were the general putative optimal codons. Collectively, our results indicated that most of the ciliates investigated preferred using the codons with anof AT-ending and thate codon usage bias was affected by gene mutation and natural selection.
... However, most codons showed a bias toward an A/U ending, and these findings are consistent with those observed in other chloroplast genomes (Yan et al., 2019;Du et al., 2020). Previous studies revealed that this unequal usage of nucleotides derived from mutation selection and natural selection was the primary driver of codon bias in angiosperms (Nie et al., 2014;Wang et al., 2020;Zhang et al., 2021). These findings indicate that the high proportion of A/U-ending codons in the chloroplast genome, along with the selective pressure of the chloroplast genome of the tribe Hibisceae, may have driven several degenerate codon biases. ...
Article
Full-text available
Hibiscus syriacus, a member of the tribe Hibisceae, is considered an important ornamental and medicinal plant in east Asian countries. Here, we sequenced and assembled the complete chloroplast genome of H. syriacus var. Baekdansim using the PacBio long-read sequencing platform. A quadripartite structure with 161,026 base pairs was obtained, consisting of a pair of inverted repeats (IRA and IRB) with 25,745 base pairs, separated by a large single-copy region of 89,705 base pairs and a short single-copy region of 19,831 base pairs. This chloroplast genome had 79 protein-coding genes, 30 transfer RNA genes, 4 ribosomal RNA genes, and 109 simple sequence repeat regions. Among them, ndhD and rpoC1, containing traces of RNA-editing events associated with adaptive evolution, were identified by analysis of putative RNA-editing sites. Codon usage analysis revealed a preference for A/U-terminated codons. Furthermore, the codon usage pattern had a clustering tendency similar to that of the phylogenetic analysis of the tribe Hibisceae. This study provides clues for understanding the relationships and refining the taxonomy of the tribe Hibisceae.
... However, only tryptophan and methionine had no alternative codon [66]. Similar to the other advanced plant, for those that applied more than one codon to code, the third nucleotide of the codon was frequently occupied by A/T instead of C/G [67]. Another analysis for ENC and GC3 was conducted on each PCG. ...
Article
Full-text available
Saxifraga species are widely distributed in alpine and arctic regions in the Northern hemisphere. Highly morphological diversity within this genus brings great difficulties for species identification, and their typical highland living properties make it interesting how they adapt to the extreme environment. Here, we newly generated the chloroplast (cp) genomes of two Saxifraga species and compared them with another five Saxifraga cp genomes to understand the characteristics of cp genomes and their potential roles in highland adaptation. The genome size, structure, gene content, GC content, and codon usage pattern were found to be highly similar. Cp genomes ranged from 146,549 bp to 151,066 bp in length, most of which comprised 130 predicted genes. Yet, due to the expansion of IR regions, the second copy of rps19 in Saxifraga stolonifera was uniquely kept. Through sequence divergence analysis, we identified seven hypervariable regions and detected some signatures of regularity associated with genetic distance. We also identified 52 to 89 SSRs and some long repeats among seven Saxifraga species. Both ML and BI phylogenetic analyses confirmed that seven Saxifraga species formed a monophyletic clade in the Saxifragaceae family, and their intragenus relationship was also well supported. Additionally, the ndhI and ycf1 genes were considered under positive selection in species inhabiting relatively high altitudes. Given the conditions of intense light and low CO2 concentration in the highland, the products of these two genes might participate in the adaptation to the extreme environment.
Article
Full-text available
Leguminosae is one of the three largest families of angiosperms after Compositae and Orchidaceae. It is widely distributed and grows in a variety of environments, including plains, mountains, deserts, forests, grasslands, and even waters where almost all legumes can be found. It is one of the most important sources of starch, protein and oil in the food of mankind and also an important source of high-quality forage material for animals, which has important economic significance. In our study, the codon usage patterns and variation sources of the chloroplast genome of nine important forage legumes were systematically analyzed. Meanwhile, we also constructed a phylogenetic tree based on the whole chloroplast genomes and protein coding sequences of these nine forage legumes. Our results showed that the chloroplast genomes of nine forage legumes end with A/T bases, and seven identical high-frequency (HF) codons were detected among the nine forage legumes. ENC-GC3s mapping, PR2 analysis, and neutral analysis showed that the codon bias of nine forage legumes was influenced by many factors, among which natural selection was the main influencing factor. The codon usage frequency showed that the Nicotiana tabacum and Saccharomyces cerevisiae can be considered as receptors for the exogenous expression of chloroplast genes of these nine forage legumes. The phylogenetic relationships of the chloroplast genomes and protein coding genes were highly similar, and the nine forage legumes were divided into three major clades. Among the clades Melilotus officinalis was more closely related to Medicago sativa, and Galega officinalis was more closely related to Galega orientalis. This study provides a scientific basis for the molecular markers research, species identification and phylogenetic studies of forage legumes. Supplementary Information The online version contains supplementary material available at 10.1007/s12298-024-01421-0.
Article
Unlabelled: Codon usage bias (CUB) reveals the characteristics of species and can be utilized to understand their evolutionary relationship, increase the target genes' expression in the heterologous receptor plants, and further provide theoretic assistance for correlative study on molecular biology and genetic breeding. The chief aim of this work was to analyze the CUB in chloroplast (cp.) genes in nine Elaeagnus species to provide references for subsequent studies. The codons of Elaeagnus cp. genes preferred to end with A/T bases rather than with G/C bases. Most of the cp. genes were prone to mutation, while the rps7 genes were identical in sequences. Natural selection was inferred to have a powerful impact on the CUB in Elaeagnus cp. genomes, and their CUB was extremely strong. In addition, the optimal codons were identified in the nine cp. genomes based on the relative synonymous codon usage (RSCU) values, and the optimal codon numbers were between 15 and 19. The clustering analyses based on RSCU were contrasted with the maximum likelihood (ML)-based phylogenetic tree derived from coding sequences, suggesting that the t-distributed Stochastic Neighbor Embedding clustering method was more appropriate for evolutionary relationship analysis than the complete linkage method. Moreover, the ML-based phylogenetic tree based on the conservative matK genes and the whole cp. genomes had visible differences, indicating that the sequences of specific cp. genes were profoundly affected by their surroundings. Following the clustering analysis, Arabidopsis thaliana was considered the optimal heterologous expression receptor plant for the Elaeagnus cp. genes. Supplementary information: The online version contains supplementary material available at 10.1007/s12298-023-01289-6.
Preprint
Full-text available
Background The Ranunculaceae family comprises an essential group of genus known as Aconitum. Globally, 350 Aconitum species are found, and about 170 species are found in China. Aconitum species have several pharmacological effects and are also frequently used to treat joint pain, cold abdominal pain, and other diseases. Codon usage bias (CUB) analysis is an effective method for studying evolutionary relationships and phylogenetics. On the basis of protein-coding sequences (PCGs), 41 Aconitum species were selected for performing CUB analysis. Results The results revealed that the presence of GC content was less than 50% at different positions in Aconitum species. The distribution of GC content was also irregular and showed a trend of GC1 > GC2 > GC3, indicating that Aconitum species were biased towards A/T bases. Relative synonymous codon usage (RSCU) heat map analysis found the presence of conservative codons with silght differences in the genus. The effective number of codons (ENC)-Plot and the parity rule 2 (PR2)-bias plot analysis found natural selection as the main factor affecting the variation in codon usage. Consequently, various optimal codons were screened out, and A/T bases were preferred as the last codon. In addition, the results of Maximum Likelihood (ML) based on PCGs among 41 Aconitum species were consistent with the results of complete chloroplast (cp) genome data, inferring that the mutation analysis of PCGs is an efficient method to show phylogeny between species at the genus level. Conclusions The CUB analysis in 41 Aconitum species was majorly impacted by natural selection. The present study highlights the CUB patterns of Aconitum species in order to establish sources for future research on the genetic modifications and phylogeny.
Article
Chloroplast codon preference affects gene expression efficiency and is important for chloroplast genetic engineering applications. Despite being the third most important food crop, cassava (Manihot esculenta) has received little attention for its chloroplast codon usage patterns. The analysis of cassava chloroplast genomic codon usage and preferences was conducted using CodonW, CUSP, and R software programs. The codons of cassava chloroplast genes had a GC content of 37.90%, whereas codon positions 1 to 3 had GC contents of 46.64%, 39.19%, and 27.58%, respectively, suggesting a codon preference ending in A or U. A weak codon bias was observed, with the effective number of codons (ENC) ranging from 36.55 to 61.29. There was a significant correlation between GC1 and GC2, but not between GC3 and GC1, or GC3 and GC2, indicating that the base composition of the first two codon positions was similar but significantly different from that of the third codon position. Relative synonymous codon usage (RSCU) analysis showed that of the 30 codons with RSCU greater than 1, 13 ended in A, 16 ended in U, and 1 ended in G. This further demonstrated that the codons in the cassava chloroplast genome preferentially ended in A or U. The correlation and regression coefficients for GC12 and GC3 were 0.228 and 0.351, respectively, and the correlations were not significant, suggesting that natural selection and mutational pressure worked together to shape the codon preference in cassava chloroplasts. Furthermore, the ENC plot and ENC ratio analysis revealed that natural selection rather than mutational pressure significantly influenced codon preferences in cassava chloroplasts. In the PR2-plot analysis, most of the genes were located in the lower half of the plot, particularly in the lower right corner, indicating that the third base of the synonymous codon was preferred to end in U/G, especially U. Finally, a two-way Chi-squared contingency test was performed to identify the optimal codons of UUC, UCU, CUA, CCU, GCU, AAC, and GGU. These results can be used as a scientific basis to improve the expression of exogenous genes in cassava chloroplasts by optimizing their codons.
Article
Full-text available
Mesona chinensis Benth (MCB) is one of the main economic crops in tropical and subtropical areas. To understand the codon usage bias (CUB) in M. chinensis Benth, chloroplast genome is essential to study its genetic law, molecular phylogenetic relationships, and exogenous gene expression. Results showed that the GC content of 53 CDS sequences was 37.95%, and GC1, GC2, and GC3 content were 46.02%, 38.26%, and 29.85%, respectively. The general GC content order was GC1>GC2>GC3. Moreover, the majority of genes had an effective number of codon (ENC) value greater than 40, except ndhE, rps8, and rps18. Correlation analysis results revealed that the GC content was significantly correlated with GC1, GC2, GC3, and ENC. Neutrality plot analysis, ENC-plot analysis, and PR2-plot analysis presented that the CUB of M. chinensis Benth chloroplast genome was mainly affected by mutation and selection. In addition, GGG, GCA, and TCC were found to be the optimal codons. Furthermore, results of cluster analysis and evolutionary tree showed that M. chinensis Benth was closely related to Ocimum basilicum, indicating that there was a certain correlation between the CUB of the chloroplast gene and the genetic relationship of plant species. Overall, the study on the CUB of chloroplast genome laid a basis for genetic modification and phylogenetic research of M. chinensis Benth chloroplast genome.
Article
Full-text available
Gynostemma yixingense, an important medicinal member of the Cucurbitaceae family, is an endemic herbaceous species distributed in East China. It is morphologically similar to the plants in the same genus, which resulted in some confusion in identification and application. Meanwhile, there are still some controversies in taxonomy. Herein, the complete chloroplast genome sequence of G. yixingense was obtained by Illumina paired-end sequencing technology and compared to other chloroplast genome sequences of congeneric species. The complete chloroplast genome of G. yixingense is 157,910 bp in length with 36.94% GC content and contains a large single-copy (LSC) region of 86,791 bp, a small single-copy (SSC) region of 18,635 bp and a pair of inverted repeat (IR) regions of 26,242 bp. The whole ge-nome contains 133 unique genes, including 87 protein-coding genes, 37 tRNA genes, eight rRNA genes and one pseudogene. In addition, 74 simple sequence repeats (SSRs) were identified, most of which were A/T rich. The phylo-genetic analysis indicated that G. yixingense had the closest relationship to G. laxiflorum. The result of this study provided an important theoretical basis for chloroplast genome and phylogenetic analysis of G. yixingense.
Article
Full-text available
Main conclusion: The codon usage bias in chloroplast genes of Oryza species was low and AT rich. The pattern of codon usage was different among Oryza species and mainly influenced by mutation pressure and natural selection. Codon usage bias (CUB) is the unequal usage of synonymous codons in which some codons are more preferred to others in the coding sequences of genes. It shows a species-specific property. We studied the patterns of codon usage and the factors that influenced the CUB of protein-coding chloroplast (cp) genes in 18 Oryza species as no work was yet reported. The nucleotide composition analysis revealed that the overall GC content of cp genes in different species of Oryza was lower than 50%, i.e., Oryza cp genes were AT rich. Synonymous codon usage order (SCUO) suggested that CUB was weak in the cp genes of different Oryza species. A highly significant correlation was observed between overall nucleotides and its constituents at the third codon position suggesting that both, mutation pressure and natural selection, might influence the CUB. Correspondence analysis (COA) revealed that codon usage pattern differed across Oryza species. In the neutrality plot, a narrow range of GC3 distribution was recorded and some points were diagonally distributed in all the plots, suggesting that natural selection and mutation pressure might have influenced the CUB. The slope of the regression line was < 0.5, augmenting our inference that natural selection might have played a major role, while mutation pressure had a minor role in shaping the CUB of cp genes. The magnitudes of mutation pressure and natural selection on cp genes varied across Oryza species.
Article
Full-text available
Euphorbiaceae plants are important as suppliers of biodiesel. In the current study, the codon usage patterns and sources of variance in chloroplast genome sequences of six different Euphorbiaceae plant species have been systematically analyzed. Our results revealed that the chloroplast genomes of six Euphorbiaceae plant species were biased towards A/T bases and A/T-ending codons, followed by detection of 17 identical high-frequency codons including GCT, TGT, GAT, GAA, TTT, GGA, CAT, AAA, TTA, AAT, CCT, CAA, AGA, TCT, ACT, TAT and TAA. It was found that mutation pressure was a minor factor affecting the variation of codon usage, however, natural selection played a significant role. Comparative analysis of codon usage frequencies of six Euphorbiaceae plant species with four model organisms reflected that Arabidopsis thaliana , Populus trichocarpa , and Saccharomyces cerevisiae should be considered as suitable exogenous expression receptor systems for chloroplast genes of six Euphorbiaceae plant species. Furthermore, it is optimal to choose Saccharomyces cerevisiae as the exogenous expression receptor. The outcome of the present study might provide important reference information for further understanding the codon usage patterns of chloroplast genomes in other plant species.
Article
Full-text available
Plant genetic engineering is an important tool used in current efforts in crop improvement, pharmaceutical product biosynthesis and sustainable agriculture. However, conventional genetic engineering techniques target the nuclear genome, prompting concerns about the proliferation of foreign genes to weedy relatives. Chloroplast transformation does not have this limitation, since the plastid genome is maternally inherited in most plants, motivating the need for organelle-specific and selective nanocarriers. Here, we rationally designed chitosan-complexed single-walled carbon nanotubes, utilizing the lipid exchange envelope penetration mechanism. The single-walled carbon nanotubes selectively deliver plasmid DNA to chloroplasts of different plant species without external biolistic or chemical aid. We demonstrate chloroplast-targeted transgene delivery and transient expression in mature Eruca sativa, Nasturtium officinale, Nicotiana tabacum and Spinacia oleracea plants and in isolated Arabidopsis thaliana mesophyll protoplasts. This nanoparticle-mediated chloroplast transgene delivery tool provides practical advantages over current delivery techniques as a potential transformation method for mature plants to benefit plant bioengineering and biological studies. © 2019, The Author(s), under exclusive licence to Springer Nature Limited.
Article
Full-text available
The development of technologies for the stable genetic transformation of plastid (chloroplast) genomes has been a boon to both basic and applied research. However, extension of the transplastomic technology to major crops and model plants has proven extremely challenging, and the species range of plastid transformation is still very much limited in that most species currently remain recalcitrant to plastid genome engineering. Here, we report an efficient plastid transformation technology for the model plant Arabidopsis thaliana that relies on root-derived microcalli as a source tissue for biolistic transformation. The method produces fertile transplastomic plants at high frequency when combined with a clustered regularly interspaced short palindromic repeats (CRISPR)–CRISPR-associated protein 9 (Cas9)-generated knockout allele of a nuclear locus that enhances sensitivity to the selection agent used for isolation of transplastomic events. Our work makes the model organism of plant biology amenable to routine engineering of the plastid genome, facilitates the combination of plastid engineering with the power of Arabidopsis nuclear genetics, and informs the future development of plastid transformation protocols for other recalcitrant species. © 2019, The Author(s), under exclusive licence to Springer Nature Limited.
Article
Full-text available
The flower-meristem-identity gene APETALA2 (AP2), one of class-A genes, is involved in the establishment of the floral meristem and the forming of sepals and petals. Codon usage bias (CUB) identifies differences among species, meanwhile dynamic analysis of base composition can identify the molecular mechanisms and evolutionary relationships of a specific gene. In this study, eight coding sequences (CDS) of AP2 gene were selected from different plant species using the GenBank database. Their nucleotide composition (GC content), genetic index, relative synonymous codon usage (RSCU) and relative codon usage bias (RCUB) were calculated with R Software to compare codon bias and base composition dynamics of AP2 gene codon usage patterns in different plant species. The results showed that the usage of AP2 gene codons from different plant species were influened by GC bias, especially GC3s. Overall, base composition analysis indicated that the usage frequency of codon AT in the gene coding sequence was higher than GC among AP2 gene CDS from different plant species. Furthermore, most AP2 gene CDSs ended with AT; AGA, GCU and UGU had relatively high RSCU values as the most dominant codon; the usage characteristic of the AP2 gene codon in Malus domestica was similar to that of Vitis vinifera; Paeonia lactiflora was similar to Paeonia suffruticosa and Solanum lycopersicum was similar to Petunia×hybrida. There was a moderate preference in the usage of AP2 gene codon among different plant species from relatively low frequency of optimal codon (Fop) values and high effective number of codons (ENC) value. This study has revealed the usage characteristics of the AP2 gene codon from the comparision of AP2 gene codon preference and base dynamics in different plant species and provide a platform for further study towards transgenic engineering and codon optimization.
Article
Full-text available
Gynostemma BL., belonging to the family Cucurbitaceae, is a genus containing 17 creeping herbaceous species mainly distributed in East Asia. It can be divided into two subgenera based on different fruit morphology. Herein, we report eight complete chloroplast genome sequences of the genus Gynostemma, which were obtained by Illumina paired-end sequencing, assembly, and annotation. The length of the eight complete cp genomes ranged from 157,576 bp (G. pentaphyllum) to 158,273 bp (G. laxiflorum). Each encoded 133 genes, including 87 protein-coding genes, 37 tRNA genes, eight rRNA genes, and one pseudogene. The four types of repeated sequences had been discovered and indicated that the repeated structure for species in the Subgen. Triostellum was greater than that for species in the Subgen. Gynostemma. The percentage of variation of the eight cp genomes in different regions were calculated, which demonstrated that the coding and inverted repeats regions were highly conserved. Phylogenetic analysis based on Bayesian inference and maximum likelihood methods strongly supported the phylogenetic position of the genus Gynostemma as a member of family Cucurbitaceae. The phylogenetic relationships among the eight species were clearly resolved using the complete cp genome sequences in this study. It will also provide potential molecular markers and candidate DNA barcodes for future studies and enrich the valuable complete cp genome resources of Cucurbitaceae.
Article
Full-text available
Nonalcoholic steatohepatitis (NASH) is the most frequent cause of liver dysfunction and a common global problem. Gypenosides can decrease pathological modifications of high-fat diet-induced rat atherosclerosis; however, its effect and mechanism on NASH remain unclear. In this study, rats were randomly divided into normal control and model groups. Model rats were fed with a high-fat diet and treated with gypenosides, rosiglitazone, or water for 6 weeks. We found that liver tissues showed significant hepatic steatosis and vacuolar degeneration with significantly higher triglyceride (TG), free fatty acid (FFA) and malonyl CoA, serum alanine aminotransferase (ALT), aspartate aminotransferase (AST) and gamma-glutamyl transferase (GGT) activities in model group versus normal control group (p<0.01). Liver tissue mRNA and protein levels of sterol regulatory element binding protein-1c (SREBP-1c), carbohydrate response element binding protein (ChREBP), acetyl-CoA carboxylase (ACCase), and stearoyl CoA desaturase enzyme 1 (SCD1) were significantly increased, while the carnitine palmitoyl transferase-1 (CPT-1) level was significantly decreased in the model group versus the normal control group (p<0.01). Pathological changes of hepatic steatosis; body weight and liver wet weight; liver tissue TG, FFA and malonyl CoA concentrations; serum ALT, AST and GGT activities; liver tissue mRNA and protein levels of SREBP-1c, ChREBP, ACCase, and SCD-1 were significantly decreased; protein and mRNA levels of CPT-1 were significantly increased in the gypenosides group versus model group (p<0.01). In conclusion, gypenosides has therapeutic effect on NASH through regulating key transcriptional factors and lipogenic enzymes involved in fatty acid oxidation during hepatic lipogenesis.
Article
Full-text available
Background Analysis of codon usage bias is an extremely versatile method using in furthering understanding of the genetic and evolutionary paths of species. Codon usage bias of envelope glycoprotein genes in nuclear polyhedrosis virus (NPV) has remained largely unexplored at present. Hence, the codon usage bias of NPV envelope glycoprotein was analyzed here to reveal the genetic and evolutionary relationships between different viral species in baculovirus genus. ResultsA total of 9236 codons from 18 different species of NPV of the baculovirus genera were used to perform this analysis. Glycoprotein of NPV exhibits weaker codon usage bias. Neutrality plot analysis and correlation analysis of effective number of codons (ENC) values indicate that natural selection is the main factor influencing codon usage bias, and that the impact of mutation pressure is relatively smaller. Another cluster analysis shows that the kinship or evolutionary relationships of these viral species can be divided into two broad categories despite all of these 18 species are from the same baculovirus genus. Conclusions There are many elements that can affect codon bias, such as the composition of amino acids, mutation pressure, natural selection, gene expression level, and etc. In the meantime, cluster analysis also illustrates that codon usage bias of virus envelope glycoprotein can serve as an effective means of evolutionary classification in baculovirus genus.
Article
Full-text available
The sub-3 Mbp genomes from microsporidian species of the Encephalitozoon genus are the smallest known among eukaryotes and paragons of genomic reduction and compaction in parasites. However, their diminutive stature is not characteristic of all Microsporidia, whose genome sizes vary by an order of magnitude. This large variability suggests that different evolutionary forces are applied on the group as a whole. In this study, we have compared the codon usage bias (CUB) between eight taxonomically distinct microsporidian genomes: Encephalitozoon intestinalis, Encephalitozoon cuniculi, Spraguea lophii, Trachipleistophora hominis, Enterocytozoon bieneusi, Nematocida parisii, Nosema bombycis and Nosema ceranae. While the CUB was found to be weak in all eight Microsporidia, nearly all (98%) of the optimal codons in S. lophii, T. hominis, E. bieneusi, N. parisii, N. bombycis and N. ceranae are fond of A/U in third position whereas most (64.6%) optimal codons in the Encephalitozoon species E. intestinalis and E. cuniculi are biased towards G/C. Although nucleotide composition biases are likely the main factor driving the CUB in Microsporidia according to correlation analyses, directed mutational pressure also likely affects the CUB as suggested by ENc-plots, correspondence and neutrality analyses. Overall, the Encephalitozoon genomes were found to be markedly different from the other microsporidians and, despite being the first sequenced representatives of this lineage, are uncharacteristic of the group as a whole. The disparities observed cannot be attributed solely to differences in host specificity and we hypothesize that other forces are at play in the lineage leading to Encephalitozoon species.
Article
Full-text available
Synonymous codon usage bias (SCUB) is the nonuniform usage of codons, occurring often in nearly all organisms. Our previous study found that SCUB is correlated with intron number, is unequal among exons in the plant nuclear genome, and mirrors evolutionary specialization. However, whether this rule exists in the plastid genome has not been addressed. Here, we present an analysis of SCUB in the plastid genomes of 25 species from lower to higher plants (algae, bryophytes, pteridophytes, gymnosperms, and spermatophytes). We found NNA and NNT (A- and T-ending codons) are preferential in the plastid genomes of all plants. Interestingly, this preference is heterogeneous among taxonomies of plants, with the strongest preference in bryophytes and the weakest in pteridophytes, suggesting an association between SCUB and plant evolution. In addition, SCUB frequencies are consistent among genes with varied introns and among exons, indicating that the bias of NNA and NNT is unrelated to either intron number or exon position. Further, SCUB is associated with DNA methylation-induced conversion of cytosine to thymine in the vascular plants but not in algae or bryophytes. These data demonstrate that these SCUB profiles in the plastid genome are distinctly different compared with the nuclear genome.
Article
Full-text available
Codon usage bias (CUB) is an important evolutionary feature in a genome and has been widely documented from prokaryotes to eukaryotes. However, the significance of CUB in the Asteraceae family has not been well understood, with no Asteraceae species having been analyzed for this characteristic. Here, we use bioinformatics approaches to comparatively analyze genomes the general patterns and influencing factors of CUB in five Asteraceae chloroplast (cp) genomes. The results indicated that the five genomes had similar codon usage patterns, showing a strong bias towards a high representation of NNA and NNT codons. Neutrality analysis showed that these cp genomes had a narrow GC distribution and no significant correlation was observed between GC12 and GC3. Parity Rule 2 (PR2) plot analysis revealed that purines were used more frequently than pyrimidines. Effective number of codons (ENc)-plot analysis showed that most genes followed the parabolic line of trajectory, but several genes with low ENc values lying below the expected curve were also observed. Furthermore, correspondence analysis of relative synonymous codon usage (RSCU) yielded a first axis that explained only a partial amount of variation of codon usage. These findings suggested that both natural selection and mutational bias contributed to codon bias, while selection was the major force to shape the codon usage in these Asteraceae cp. Our study, which is the first to investigate codon usage patterns in Asteraceae plastomes, will provide helpful information about codon distribution and variation in these species, and also shed light on the genetic and evolutionary mechanisms of codon biology within this family.
Article
Full-text available
odon usage patterns of 23 Poaceae chloroplast genomes were analysed in this study. Neutrality analysis indicated that the codon usage patterns have significant correlations with GC12 and GC3 and also showed strong bias towards a high representation of NNA and NNT codons. The Nc-plot showed that although a large proportion of points follow the parabolic line of trajectory, several genes with low ENc values lie below the expected curve, suggesting that mutational bias played a major role in the codon biology of the Poaceae chloroplast genome. Parity Rule 2 plot analysis showed that T was used more frequently than A in all the genomes. Correspondence analysis of relative synonymous codon usage indicated that the first axis explained only a partial amount of variation of codon usage. Furthermore, the gene length and expression level were also found to drive codon usage variation. These findings revealed that besides natural selection, other factors might also exert some influences in shaping the codon usage bias in Poaceae chloroplast genomes. The optimal codons of these 23 genomes were also identified in this study.
Article
Full-text available
Unlabelled: Codon usage bias (CUB) is an omnipresent phenomenon, which occurs in nearly all organisms. Previous studies of codon bias in Plasmodium species were based on a limited dataset. This study uses whole genome datasets for comparative genome analysis of six Plasmodium species using CUB and other related methods for the first time. Codon usage bias, compositional variation in translated amino acid frequency, effective number of codons and optimal codons are analyzed for P.falciparum, P.vivax, P.knowlesi, P.berghei, P.chabaudii and P.yoelli. A plot of effective number of codons versus GC3 shows their differential codon usage pattern arises due to a combination of mutational and translational selection pressure. The increased relative usage of adenine and thymine ending optimal codons in highly expressed genes of P.falciparum is the result of higher composition biased pressure, and usage of guanine and cytosine bases at third codon position can be explained by translational selection pressure acting on them. While higher usage of adenine and thymine bases at third codon position in optimal codons of P.vivax highlights the role of translational selection pressure apart from composition biased mutation pressure in shaping their codon usage pattern. The frequency of those amino acids that are encoded by AT ending codons are significantly high in P.falciparum due to action of high composition biased mutational pressure compared with other Plasmodium species. The CUB variation in the three rodent parasites, P.berghei, P.chabaudii and P.yoelli is strikingly similar to that of P.falciparum. The simian and human malarial parasite, P.knowlesi shows a variation in codon usage bias similar to P.vivax but on closer study there are differences confirmed by the method of Principal Component Analysis (PCA). Abbreviations: CDS - Coding sequences, GC1 - GC composition at first site of codon, GC2 - GC composition at second site of codon, GC3 - GC composition at third site of codon, Ala - Alanine, Arg - Arginine, Asn - Asparagine, Asp - Aspartic acid, Cys - Cysteine, Gln - Glutamine Glu - Glutamic acid Gly - Glycine His - Histidine Ile - Isoleucine Leu - Leucine Lys - Lysine Met - Methionine Phe - Phenylalanine Pro - Proline Ser - Serine Thr - Threonine Trp - Tryptophan Tyr - Tyrosine Val - Valine.
Article
Full-text available
Analysis of synonymous codon usage pattern in the genome of a thermophilic cyanobacterium, Thermosynechococcus elongatus BP-1 using multivariate statistical analysis revealed a single major explanatory axis accounting for codon usage variation in the organism. This axis is correlated with the GC content at third base of synonymous codons (GC3s) in correspondence analysis taking T. elongatus genes. A negative correlation was observed between effective number of codons i.e. Nc and GC3s. Results suggested a mutational bias as the major factor in shaping codon usage in this cyanobacterium. In comparison to the lowly expressed genes, highly expressed genes of this organism possess significantly higher proportion of pyrimidine-ending codons suggesting that besides, mutational bias, translational selection also influenced codon usage variation in T. elongatus. Correspondence analysis of relative synonymous codon usage (RSCU) with A, T, G, C at third positions (A3s, T3s, G3s, C3s, respectively) also supported this fact and expression levels of genes and gene length also influenced codon usage. A role of translational accuracy was identified in dictating the codon usage variation of this genome. Results indicated that although mutational bias is the major factor in shaping codon usage in T. elongatus, factors like translational selection, translational accuracy and gene expression level also influenced codon usage variation.
Article
Full-text available
Synonymous codons are used with different frequencies both among species and among genes within the same genome and are controlled by neutral processes (such as mutation and drift) as well as by selection. Up to now, a systematic examination of the codon usage for the chicken genome has not been performed. Here, we carried out a whole genome analysis of the chicken genome by the use of the relative synonymous codon usage (RSCU) method and identified 11 putative optimal codons, all of them ending with uracil (U), which is significantly departing from the pattern observed in other eukaryotes. Optimal codons in the chicken genome are most likely the ones corresponding to highly expressed transfer RNA (tRNAs) or tRNA gene copy numbers in the cell. Codon bias, measured as the frequency of optimal codons (Fop), is negatively correlated with the G + C content, recombination rate, but positively correlated with gene expression, protein length, gene length and intron length. The positive correlation between codon bias and protein, gene and intron length is quite different from other multi-cellular organism, as this trend has been only found in unicellular organisms. Our data displayed that regional G + C content explains a large proportion of the variance of codon bias in chicken. Stepwise selection model analyses indicate that G + C content of coding sequence is the most important factor for codon bias. It appears that variation in the G + C content of CDSs accounts for over 60% of the variation of codon bias. This study suggests that both mutation bias and selection contribute to codon bias. However, mutation bias is the driving force of the codon usage in the Gallus gallus genome. Our data also provide evidence that the negative correlation between codon bias and recombination rates in G. gallus is determined mostly by recombination-dependent mutational patterns.
Article
Full-text available
The frequencies of alternative synonymous codons vary both among species and among genes from the same genome. These patterns have been inferred to reflect the action of natural selection. Here we evaluate this in bacteria. While intragenomic variation in many species is consistent with selection favouring translationally optimal codons, much of the variation among species appears to be due to biased patterns of mutation. The strength of selection on codon usage can be estimated by two different approaches. First, the extent of bias in favour of translationally optimal codons in highly expressed genes, compared to that in genes where selection is weak, reveals the long-term effectiveness of selection. Here we show that the strength of selected codon usage bias is highly correlated with bacterial growth rate, suggesting that selection has favoured translational efficiency. Second, the pattern of bias towards optimal codons at polymorphic sites reveals the ongoing action of selection. Using this approach we obtained results that were completely consistent with the first method; importantly, the frequency spectra of optimal codons at polymorphic sites were similar to those predicted under an equilibrium model. Highly expressed genes in Escherichia coli appear to be under continuing strong selection, whereas selection is very weak in genes expressed at low levels.
Article
Full-text available
Transposable elements (TEs) are mobile genetic entities ubiquitously distributed in nearly all genomes. High frequency of codons ending in A/T in TEs has been previously observed in some species. In this study, the biases in nucleotide composition and codon usage of TE transposases and host nuclear genes were investigated in the AT-rich genome of Arabidopsis thaliana and the GC-rich genome of Oryza sativa. Codons ending in A/T are more frequently used by TEs compared with their host nuclear genes. A remarkable positive correlation between highly expressed nuclear genes and C/G-ending codons were detected in O. sativa (r=0.944 and 0.839, respectively, P<0.0001) but not in A. thaliana, indicating a close association between the GC content and gene expression level in monocot species. In both species, TE codon usage biases are similar to that of weakly expressed genes. The expression and activity of TEs may be strictly controlled in plant genomes. Mutation bias and selection pressure have simultaneously acted on the TE evolution in A. thaliana and O. sativa. The consistently observed biases of nucleotide composition and codon usage of TEs may also provide a useful clue to accurately detect TE sequences in different species.
Article
Full-text available
Baculovirus-insect cell systems have been widely used over the past decades. However, few studies to date have addressed baculovirus codon usage. In this study, we calculated the effective number of codons (ENC) for all 5,842 ORFs from 42 completely sequenced baculoviruses. The results revealed that most of the baculoviruses lacked strong codon bias (ENC > 35). Exceptions were Lymantria dispar nucleopolyhedrovirus (LdMNPV) and Orgyia pseudotsugata nucleopolyhedrovirus (OpMNPV), which were found to have a strong codon bias (ENC < 35) in 20.9 and 11.8%, respectively, of their total genes. Comparisons of preferred codons based on taxonomic clades showed that the preferred codons were different in different clades, but nine codons (UUU, UAC, UUG, CAC, CAA, AAA, GUG, GAA, and AUU) were preferably adopted by most baculovirus genes. Correspondence analysis showed that the major trend in codon usage variation among all genes significantly correlated with the GC content of sequences. Analyses also suggested that the high condon bias of LdMNPV and OpMNPV were correlated with their high GC%.
Article
Full-text available
Codon usage patterns in the slime mould Dictyosteiium discoideum have been reexamined (a total of 58 genes have been analysed). Considering the extreme A+T-richness of this genome (G+C≡22%), there is a surprising degree of codon usage variation among genes. For example, G+C content at silent sites varies from less than 10% to greater than 30%. It was previously suggested [Warrick, H.M. and Spudich, J.A. (1988) Nucleic Acids Res. 16: 6617—6635] that highly expressed genes contain fewer 'optimal' codons than genes expressed at lower levels. However, it appears that the optimal codons were misidentified. Multivariate statistical analysis shows that the greatest variation among genes is in relative usage of a particular subset of codons (about one per amino acid), many of which are C-ending. We have identified these as optimal codons, since (i) their frequency is positively correlated with gene expression level, and (ii) there is a strong mutation bias in this genome towards A and T nucleotides. Thus, codon usage in D.discoideum can be explained by a balance between the forces of mutational bias and translational selection.
Article
Full-text available
Codon usage data for 56 Bacillus subtilis genes show that synonynous codon usage in B.subtilis is less biased than in Escherlchia coli, or in Saccharomyces cerevisiae. Nevertheless, certain genes with a high codon bias can be identified by correspondence analysis, and also by various indices of codon bias. These genes are very highly expressed, and a general trend (a decrease) in codon bias across genes seems to correspond to decreasing expression level. This, then, may be a general phenoaenon in unicellular organisms. The unusually small effect of translational selection on the pattern of codon usage in lowly expressed genes in B.subtilis yields similar dinucleotide frequencies among different codon positions, and on complementary strands. These patterns could arise through selection on DNA structure, but more probably are largely determined by mutation. This prevalence of mutational bias could lead to difficulties in assessing whether open reading frames encode proteins.
Article
Full-text available
The genetic code is degenerate, but alternative synonymous codons are generally not used with equal frequency. Since the pioneering work of Grantham'a group (1, 2) it has been apparent that genes from one species often share similarities incodon frequency; under the “genome hypothesis” (1, 2) there is a species-specific pattern to codon usage. However, it has become clear that in most species there are also considerable differences among genes (3–7). Multivariate analyses have revealed that in each species so far examined there is a single major trend in codon usage among genes, usually from highly biased to more nearly even usage of synonymous codons. Thus, to represent the codon usage pattern of an organism it is not sufficient to sum over all genes (8), as this conceals the underlying heterogeneity. Rather, it is necessary to describe the trend among genes seen in that species. We illustrate these trends for six species where codon usage has been examined in detail, by presenting the pooled codon usage for the 10% of genes at either end of the major trend (Table 1). Closely-related organisms have similar patterns of codon usage, and so the six species in Table 1 are representative of wider groups. For example, with respect to codon usage, Salmonella typhimurlum closely resembles E. coli (9), while all mammalian species so far examined (principally mouse, rat and cow) largely resemble humans (4, 8).
Article
Full-text available
A simple, effective measure of synonymous codon usage bias, the Codon Adaptation Index, is detailed. The index uses a reference set of highly expressed genes from a species to assess the relative merits of each codon, and a score for a gene is calculated from the frequency of use of all codons in that gene. The index assesses the extent to which selection has been effective in moulding the pattern of codon usage. In that respect it is useful for predicting the level of expression of a gene, for assessing the adaptation of viral genes to their hosts, and for making comparisons of codon usage in different organisms. The index may also give an approximate indication of the likely success of heterologous gene expression.
Article
Full-text available
G:C pairs are more stable than A:T pairs because they have an additional hydrogen bond. This has led to many studies on the correlation between the guanine+cytosine (G+C) content of nucleic acids and temperature over the last 20 years. We collected the optimal growth temperatures (Topt) and the G+C contents of genomic DNA; 23S, 16S, and 5S ribosomal RNAs; and transfer RNAs for 764 prokaryotic species. No correlation was found between genomic G+C content and Topt, but there were striking correlations between the G+C content of ribosomal and transfer RNA stems and Topt. Two explanations have been proposed-neutral evolution and selection pressure-for the approximate equalities of G and C (respectively, A and T) contents within each strand of DNA molecules. Our results do not support the notion that selection pressure induces complementary oligonucleotides in close proximity and therefore numerous secondary structures in prokaryotic DNA, as the genomic G+C content does not behave in the same way as that of folded RNA with respect to optimal growth temperature.
Article
Full-text available
The polyketide epothilone is a potential anticancer agent that stabilizes microtubules in a similar manner to Taxol. The gene cluster responsible for epothilone biosynthesis in the myxobacterium Sorangium cellulosum was cloned and completely sequenced. It encodes six multifunctional proteins composed of a loading module, one nonribosomal peptide synthetase module, eight polyketide synthase modules, and a P450 epoxidase that converts desoxyepothilone into epothilone. Concomitant expression of these genes in the actinomycete Streptomyces coelicolor produced epothilones A and B. Streptomyces coelicolor is more amenable to strain improvement and grows about 10-fold as rapidly as the natural producer, so this heterologous expression system portends a plentiful supply of this important agent.
Article
Full-text available
Mitochondrial genetic codons can be categorized by four patterns of nucleotide-site degeneracy based on varying combinations of twofold- or nondegenerate sites at first codon positions and twofold- or fourfold-degenerate sites at third codon positions. Herein, a model of molecular evolution is introduced that uses these patterns to calculate expected substitution frequencies for each codon position and substitution type relative to overall number of synonymous or nonsynonymous substitutions. Regions of the pocket gopher cytochrome oxidase subunit I (COI) and cytochrome b (cyt-b) genes are analyzed using this model. Chi-square distributions are used to produce relative goodness-of-fit (GF) scores for measuring the difference between substitution frequencies predicted by the codon-degeneracy model (CDM), and frequencies inferred using a well-supported phylogenetic tree of closely related species. The GF scores for expected and observed synonymous (GF(syn) = 0.429, p = 0.807) and nonsynonymous (GF(ns) = 2.309, p = 0.679) substitution frequencies resulted in a failure to reject the CDM as a null hypothesis for the molecular evolution of COI and cyt-b in pocket gophers. Alternative tree topologies and calculations of transition bias for these data result in higher GF scores.
Article
Full-text available
Microsatellites, or tandem simple sequence repeats (SSR), are abundant across genomes and show high levels of polymorphism. SSR genetic and evolutionary mechanisms remain controversial. Here we attempt to summarize the available data related to SSR distribution in coding and noncoding regions of genomes and SSR functional importance. Numerous lines of evidence demonstrate that SSR genomic distribution is nonrandom. Random expansions or contractions appear to be selected against for at least part of SSR loci, presumably because of their effect on chromatin organization, regulation of gene activity, recombination, DNA replication, cell cycle, mismatch repair system, etc. This review also discusses the role of two putative mutational mechanisms, replication slippage and recombination, and their interaction in SSR variation.
Article
The protein-coding genes and pseudogenes of Cuscuta australis had the diverse contribution to the formation and evolution of parasitism. The codon usage pattern analysis of these two type genes could be used to understand the gene transcription and translation. In this study, we systematically analyzed the codon usage patterns of protein-coding sequences and pseudogenes sequences in C. australis. The results showed that the high frequency codons of protein coding sequences and pseudogenes had the same A/U bias in the third position. However, these two sequences had converse bias at the third base in optimal codons: the protein coding sequences preferred G/C-ending codons while pseudogene sequences preferred A/U-ending codons. Neutrality plot and effective number of codons plot revealed that natural selection played a more important role than mutation pressure in two sequences codon usage bias. Furthermore, the gene expression level had a significant positive correlation with codon usage bias in C. australis. Highly-expressed protein coding genes exhibited a higher codon bias than lowly-expressed genes. Meanwhile, the high-expression genes tended to use G/C-ending synonymous codons. This result further verified the optimal codons usage bias and its correlation with the gene expression in C. australis.
Article
The base composition of the chloroplast genes is of great interest because they play a highly significant role in the evolutionary development of the plants. Evaluation of the 48 chloroplast protein-coding genes of Hemiptelea davidii showed that the average GC content was about 37.32%, while at the third codon base position alone the average GC content was only 27.80%. The 48 genes were classified into five groups based on the gene function and each group displayed specific codon characteristics. Based on the relative synonymous codon usage analysis, a total of 30 high-frequency codons and 11 optimal codons were identified, most of them ended with A or T. Neutrality plot, ENC-plot and PR2-plot analyses showed that the codon usage bias of the chloroplast genes of H. davidii was greatly influenced by natural selection pressures. Meanwhile, the frequency of codon usage of chloroplast genes among different plant species displayed similarities, with some synonymous codons were preferred to be used in H. davidii. In this study, the codon usage pattern of the chloroplast protein-coding genes of H. davidii provides us with a better understanding of the expression of chloroplast genes, and may advice the future molecular breeding programmes.
Article
Background Synonymous codon usage bias is noticed in the genome of every organism, influenced by mutation pressure and natural selection. The analysis of codon usage pattern in Porphyra umbilicalis chloroplast genome are inferred while previous study focused on codon bias in nuclear genome. Objective To develop a better understanding of the factors affecting synonymous codon usage, codon usage patterns and nucleotide composition of 150 genes in P. umbilicalis cp genome, and provide a theoretical basis for genetic modification of chloroplast genome. Methods In this study, all codon usage bias parameters and nucleotide compositions were calculated by Python script, Codon W, DNA Star, CUSP of EMBOSS and Microsoft Excel. Results It shows that codon usage models are mainly influenced by compositional constraints under mutational pressure and synonymous codon prefers to use codons ending with A/T, comparing to C/G. The ENC value is slight low which shows the weak codon bias. For all coding genes of P. umbilicalis chloroplast genome except Photosystem I genes, a weak correlation between GC3 and GC12 suggests natural selection might play a significant role in synonymous codon usage bias. Conclusion The codon usage bias in P. umbilicalis cp genome is low and in some way or other, influenced by natural selection, mutation pressure, nucleotide composition. Our results can provide a theoretical basis for codon modification of exogenous genes, accuracy of prediction about new members of chloroplast gene family and identification of unknown genome.
Article
This study was attempted to focus on the pattern of codon usage bias (CUB) of chloroplast genes in two species of Pisum viz. P. fulvum and P. sativum and to identify the factors which influence CUB. Bioinformatic tools were used to understand codon usage pattern in the protein-coding sequences of Pisum chloroplast genomes. It was found that GC content was lower than AT content in the genes. Low synonymous codon usage order (SCUO) values of genes indicated low CUB in chloroplast genes of Pisum species. Heatmaps showed positive correlations of GC3 with all the GC and AT ending codons. Neutrality plot analysis revealed that natural selection might have played a prominent role over mutation pressure in sculpturing the CUB of chloroplast genes in these two taxa. Positive correlation between SCUO and mRNA free energy (mFE) suggested that higher energy release by entire mRNA was related to high degree of CUB. Further, highly significant (p<.01) negative correlation was found between parameters in pair i.e. mFE-GC, mFE-GC1, mFE-GC2 and mFE for entire mRNA-GC3. This pointed out that higher GC content might have influenced lesser energy release by mRNA molecules of chloroplast genes.
Article
Damulin B, a dammarane-type saponin from steamed Gynostemma pentaphyllum, exhibits the strongest activity against human lung carcinoma A549 cells among the isolated active saponins. In this study, the structure-activity relationship of a series of saponin compounds was discussed. The inhibitory effect of damulin B on human lung cancer A549 and H1299 cells was investigated from apoptosis, cell cycle, and migration aspects. In vitro, human lung cancer cells were more susceptible to damulin B treatment than human normal fibroblasts. Damulin B exhibited a strong cytotoxic effect, as evidenced by the increase of apoptosis rate, reduction of mitochondrial membrane potential (MMP), generation of reactive oxygen species, and G0/G1 phase arrest. Furthermore, damulin B activated the following: both intrinsic and extrinsic apoptosis pathways along with early G1 phase arrest via the upregulation of the Bax, Bid, tBid, cleaved caspase-8, and p53 expression levels; downregulation of the procaspase-8/-9, CDK4, CDK6, and cyclin D1 expression levels; and more release of cytochrome c in the cytoplasm. In addition, antimigratory activities and suppressive effects on metastasis-related factors, such as MMP-2 and MMP-9, accompanied by the upregulation of IL-24 were revealed. Altogether, the results proved that damulin B could inhibit human lung cancer cells by inducing apoptosis, blocking the cell cycle at early G0/G1 phase and suppressing the migration. Hence, damulin B has potential therapeutic efficacy against lung cancer.
Article
In many organisms, the difference in codon usage patterns among genes reflects variation in local base compositional biases and the intensity of natural selection. In this study, a comparative analysis was performed to investigate the characteristics of codon bias and factors in shaping the codon usage patterns among mitochondrion, chloroplast and nuclear genes in common wheat (Triticum aestivum L.). GC contents in nuclear genes were higher than that in mitochondrion and chloroplast genes. The neutrality and correspondence analyses indicated that the codon usage in nuclear genes would be a result of relative strong mutational bias, while the codon usage patterns of mitochondrion and chloroplast genes were more conserved in GC content and influenced by translation level. The Parity Rule 2 (R2) plot analysis showed that pyrimidines were used more frequently than purines at the third codon position in the three genomes. In addition, using a new alterative strategy, 11, 12, and 24 triplets were defined as preferred codons in the mitochondrion, chloroplast and nuclear genes, respectively. These findings suggested that the mitochondrion, chloroplast and nuclear genes shared particularly different features of codon usage and evolutionary constraints.
Article
The pattern of codon usage in the chloroplast genome of Populus alba was investigated. Correspondence analysis (a commonly used multivariate statistical approach) and method of effective number of codons (ENc)-plot were conducted to analyze synonymous codon usage. The results of correspondence analysis showed that the distribution of genes on the major axis was significantly correlated with the frequency of use of G+C in synonymously variable third position of sense codon (GC3S), (r=0.349), and the positions of genes on the axis 2 and axis 3 were significantly correlated with CAI (r=−0.348, p<0.01 and r=0.602, p<0.01). The ENc for most genes was similar to that for the expected ENc based on the GC3S, but several genes with low ENC values were lying below the expected curve. All of these data indicated that codon usage was dominated by a mutational bias in chloroplast genome of P. alba. The selection in nature for translational efficiency only played a minor role in shaping codon usage in the chloroplast genome of P. alba.
Article
Codon usage in chloroplast genome of six seed plants (Arabidopsis thaliana, Populus alba, Zea mays, Triticum aestivum, Pinus koraiensis and Cycas taitungensis) was analyzed to find general patterns of codon usage in chloroplast genomes of seed plants. The results show that chloroplast genomes of the six seed plants had similar codon usage patterns, with a strong bias towards a high representation of NNA and NNT codons. In chloroplast genomes of the six seed plants, the effective number of codons (ENC) for most genes was similar to that of the expected ENC based on the GC content at the third codon position, but several genes with low ENC values were laying below the expected curve. All of these data indicate that codon usage was dominated by a mutational bias in chloroplast genomes of seed plants and that selection appeared to be limited to a subset of genes and to only subtly affect codon usage. Meantime, four, six, eight, nine, ten and 12 codons were defined as the optimal codons in chloroplast genomes of the six seed plants.
Article
Observed patterns of synonymous codon usage are explained in terms of the joint effects of mutation, selection, and random drift. Examination of the codon usage in 165Escherichia coli genes reveals a consistent trend of increasing bias with increasing gene expression level. Selection on codon usage appears to be unidirectional, so that the pattern seen in lowly expressed genes is best explained in terms of an absence of strong selection. A measure of directional synonymous-codon usage bias, the Codon Adaptation Index, has been developed. In enterobacteria, rates of synonymous substitution are seen to vary greatly among genes, and genes with a high codon bias evolve more slowly. A theoretical study shows that the patterns of extreme codon bias observed for someE. coli (and yeast) genes can be generated by rather small selective differences. The relative plausibilities of various theoretical models for explaining nonrandom codon usage are discussed.
Article
In the plant chloroplast genome the codon usage of the highly expressed psbA gene is unique and is adapted to the tRNA population, probably due to selection for translation efficiency. In this study the role of selection on codon usage in each of the fully sequenced chloroplast genomes, in addition to Chlamydomonas reinhardtii, is investigated by measuring adaptation to this pattern of codon usage. A method is developed which tests selection on each gene individually by constructing sequences with the same amino acid composition as the gene and randomly assigning codons based on the nucleotide composition of noncoding regions of that genome. The codon bias of the actual gene is then compared to a distribution of random sequences. The data indicate that within the algae selection is strong in Cyanophora paradoxa, affecting a majority of genes, of intermediate intensity in Odontella sinensis, and weaker in Porphyra purpurea and Euglena gracilis. In the plants, selection is found to be quite weak in Pinus thunbergii and the angiosperms but there is evidence that an intermediate level of selection exists in the liverwort Marchantia polymorpha. The role of selection is then further investigated in two comparative studies. It is shown that average relative codon bias is correlated with expression level and that, despite saturation levels of substitution, there is a strong correlation among the algae genomes in the degree of codon bias of homologous genes. All of these data indicate that selection for translation efficiency plays a significant role in determining the codon bias of chloroplast genes but that it acts with different intensities in different lineages. In general it is stronger in the algae than the higher plants, but within the algae Euglena is found to have several unusual features which are noted. The factors that might be responsible for this variation in intensity among the various genomes are discussed.
Article
A simple measure is presented that quantifies how far the codon usage of a gene departs from equal usage of synonymous codons. This measure of synonymous codon usage bias, the 'effective number of codons used in a gene', Nc, can be easily calculated from codon usage data alone, and is independent of gene length and amino acid (aa) composition. Nc can take values from 20, in the case of extreme bias where one codon is exclusively used for each aa, to 61 when the use of alternative synonymous codons is equally likely. Nc thus provides an intuitively meaningful measure of the extent of codon preference in a gene. Codon usage patterns across genes can be investigated by the Nc-plot: a plot of Nc vs. G + C content at synonymous sites. Nc-plots are produced for Homo sapiens, Saccharomyces cerevisiae, Escherichia coli, Bacillus subtilis, Dictyostelium discoideum, and Drosophila melanogaster. A FORTRAN77 program written to calculate Nc is available on request.
Article
A quantitative theory of directional mutation pressure proposed in 1962 explained the wide variation of DNA base composition observed among different bacteria and its small heterogeneity within individual bacterial species. The theory was based on the assumption that the effect of mutation on a genome is not random but has a directionality toward higher or lower guanine-plus-cytosine content of DNA, and this pressure generates directional changes more in neutral parts of the genome than in functionally significant parts. Now that DNA sequence data are available, the theory allows the estimation of the extent of neutrality of directional mutation pressure against selection. Newly defined parameters were used in the analysis, and two apparently universal constants were discovered. Analysis of DNA sequence has revealed that practically all organisms are subject to directional mutation pressure. The theory also offers plausible explanations for the large heterogeneity in guanine-plus-cytosine content among different parts of the vertebrate genome.
Article
We measured the expression pattern and analyzed codon usage in 8,133, 1,550, and 2,917 genes, respectively, from Caenorhabditis elegans, Drosophila melanogaster, and Arabidopsis thaliana. In those three species, we observed a clear correlation between codon usage and gene expression levels and showed that this correlation is not due to a mutational bias. This provides direct evidence for selection on silent sites in those three distantly related multicellular eukaryotes. Surprisingly, there is a strong negative correlation between codon usage and protein length. This effect is not due to a smaller size of highly expressed proteins. Thus, for a same-expression pattern, the selective pressure on codon usage appears to be lower in genes encoding long rather than short proteins. This puzzling observation is not predicted by any of the current models of selection on codon usage and thus raises the question of how translation efficiency affects fitness in multicellular organisms.
Article
The relative contribution of mutation and selection to the G+C content of DNA was analyzed in bacterial species having widely different G+C contents. The analysis used two methods that were developed previously. The first method was to plot the average G+C content of a set of nucleotides against the G+C content of the third codon position for each gene. This method was used to present the G+C distribution of the third codon position and to assess the relative neutrality of a set of nucleotides to that of the G+C content of the third codon position. The second method was to plot the intrastrand bias of the third codon position from Parity Rule 2 (PR2), where A = T and G = C. It was found that whereas intragenomic distributions of the DNA G+C content of these bacteria are narrow in the majority of species, in some species the G+C content of the minor class of genes distributes over wider ranges than the major class of genes. On the other hand, ubiquitous PR2 biases are amino acid specific and independent of the G+C content of DNA, so that when averaged over the amino acids, the biases are small and not correlated with the DNA G+C content. Therefore, translation coupled PR2-biases are unlikely to explain the wide range of G+C contents among different species. Considering all data available, it was concluded that the amino acid-specific PR2 bias has only a minor effect, if any, on the average G+C content. In addition, PR2 bias patterns of different species show phylogenetic relationships, and the pattern can be as a taxal fingerprint.
Article
The genome of higher eukaryotes consists of genes having a widely heterogeneous base composition at the third codon position. Ubiquitous variability of the DNA base composition has the following two aspects: intragenomic heterogeneity of the G+C content and the amino-acid-specific translation-coupled biases from the Parity Rule 2 (PR2). PR2 is an intrastrand rule where A = T and G = C are expected if there is no bias in mutation and selection between the two complementary strands of DNA. To examine whether or not the biases from PR2 are responsible for the wide heterogeneity of the DNA G+C content in human, the third codon position of 846 human genes was analyzed. Genes were separated into six groups according to their G+C content of the third codon position, and each group was examined for the translation-coupled PR2 biases in the nucleotide composition of the third codon position for two- and four-codon amino acids. The results show that genes in the different G+C content groups have similar PR2 biases, indicating that the intragenomic heterogeneity of the G+C content is not correlated with translation-coupled biases from the PR2. Therefore, the heterogeneity of the G+C content is likely to be determined by some other mechanism (e.g. locally variable directional mutation pressures) than amino-acid-specific selections for the codon preference.
Article
The human genome, as in other eukaryotes, has a wide heterogeneity in the DNA base composition. The evolutionary basis for this heterogeneity has been unknown. A previous study of the human genome (846 genes analyzed) has shown that, in the major range of the G+C content in the third codon position (0.25-0.75), biases from the Parity Rule 2 (PR2) among the synonymous codons of the four-codon amino acids are similar except in the highest G+C range (Sueoka, N., 1999. Translation-coupled violation of Parity Rule 2 in human genes is not the cause of heterogeneity of the DNA G+C content of third codon position. Gene 238, 53-58.). PR2 is an intra-strand rule where A=T and G=C are expected when there are no biases between the two complementary strands of DNA in mutation and selection rates (substitution rates). In this study, 14,026 human genes were analyzed. In addition, the third codon positions of two-codon amino acids were analyzed. New results show the following: (a) The G+C contents of the third codon position of human genes are scattered in the G+C range of 0.22-0.96 in the third codon position. (b) The PR2 biases are similar in the range of 0.25-0.75, whereas, in the high G+C range (0.75-0.96; 13% of the genes), the PR2-bias fingerprints are different from those of the major range. (c) Unlike the PR2 biases, the G+C contents of the third codon position for both four-codon and two-codon amino acids are all correlated almost perfectly with the G+C content of the third codon position over the total G+C ranges. These results support the notion that the directional mutation pressure, rather than the directional selection pressure, is mainly responsible for the heterogeneity of the G+C content of the third codon position.
Article
Transcription in eukaryotic cells has been described as quantal, with pulses of messenger RNA produced in a probabilistic manner. This description reflects the inherently stochastic nature of gene expression, known to be a major factor in the heterogeneous response of individual cells within a clonal population to an inducing stimulus. Here we show in Saccharomyces cerevisiae that stochasticity (noise) arising from transcription contributes significantly to the level of heterogeneity within a eukaryotic clonal population, in contrast to observations in prokaryotes, and that such noise can be modulated at the translational level. We use a stochastic model of transcription initiation specific to eukaryotes to show that pulsatile mRNA production, through reinitiation, is crucial for the dependence of noise on transcriptional efficiency, highlighting a key difference between eukaryotic and prokaryotic sources of noise. Furthermore, we explore the propagation of noise in a gene cascade network and demonstrate experimentally that increased noise in the transcription of a regulatory protein leads to increased cell-cell variability in the target gene output, resulting in prolonged bistable expression states. This result has implications for the role of noise in phenotypic variation and cellular differentiation.
Article
The influence of local base composition on mutations in chloroplast DNA (cpDNA) is studied in detail and the resulting, empirically derived, mutation dynamics are used to analyze both base composition and codon usage bias. A 4 x 4 substitution matrix is generated for each of the 16 possible flanking base combinations (contexts) using 17,253 noncoding sites, 1309 of which are variable, from an alignment of three complete grass chloroplast genome sequences. It is shown that substitution bias at these sites is correlated with flanking base composition and that the A+T content of these flanking sites as well as the number of flanking pyrimidines on the same strand appears to have general influences on substitution properties. The context-dependent equilibrium base frequencies predicted from these matrices are then applied to two analyses. The first examines whether or not context dependency of mutations is sufficient to generate average compositional differences between noncoding cpDNA and silent sites of coding sequences. It is found that these two classes of sites exist, on average, in very different contexts and that the observed mutation dynamics are expected to generate significant differences in overall composition bias that are similar to the differences observed in cpDNA. Context dependency, however, cannot account for all of the observed differences: although silent sites in coding regions appear to be at the equilibrium predicted, noncoding cpDNA has a significantly lower A+T content than expected from its own substitution dynamics, possibly due to the influence of indels. The second study examines the codon usage of low-expression chloroplast genes. When context is accounted for, codon usage is very similar to what is predicted by the substitution dynamics of noncoding cpDNA. However, certain codon groups show significant deviation when followed by a purine in a manner suggesting some form of weak selection other than translation efficiency. Overall, the findings indicate that a full understanding of mutational dynamics is critical to understanding the role selection plays in generating composition bias and sequence structure.
Article
Codon usage in nuclear genes of four monocot and three dicot species was analyzed to find general patterns in codon choice of plant species. Codon bias was correlated with GC content at the third codon position. GC contents were higher in monocot species than in dicot species at all codon positions. The high GC contents of monocot species might be the result of relatively strong mutational bias that occurred in the lineage of the Poaceae species. In both dicot and monocot species, the effective number of codons (ENCs) for most genes was similar to that for the expected ENCs based on the GC content at the third codon positions. G and C ending codons were detected as the "preferred" codons in monocot species, as in Drosophila. Also, many "preferred" codons are the same in dicot species. Pyrimidine (C and T) is used more frequently than purine (G and A) in four-fold degenerate codon groups.