Article

Simple Methods for Estimating the Numbers of Synonymous and Nonsynonymous Nucleotide Substitutions

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Two simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions are presented. Although they give no weights to different types of codon substitutions, these methods give essentially the same results as those obtained by Miyata and Yasunaga's and by Li et al.'s methods. Computer simulation indicates that estimates of synonymous substitutions obtained by the two methods are quite accurate unless the number of nucleotide substitutions per site is very large. It is shown that all available methods tend to give an underestimate of the number of nonsynonymous substitutions when the number is large.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Here the ω value for each gene has been validated by five established methods including "NG", "ML" methods and other advanced methods, such as LPB93 [89][90][91]. Values of ω from all five methods are comparable with little differences. The ω value of each gene is correlated with experimental evidence and epidemiological studies that justifies the increase or decrease in ω. ...
... Synonymous and nonsynonymous amino acid changes are estimated based on the "NG" method [89], the "ML" (Maximum Likelihood) method [90], Li et al. ...
Article
Full-text available
An estimation of the proportion of nonsynonymous to synonymous mutation (dn/ds, ω) of the SARS-CoV-2 genome would indicate the evolutionary dynamics necessary to evolve into novel strains with increased infection, virulence, and vaccine neutralization. A temporal estimation of ω of the whole genome, and all twenty-nine SARS-CoV-2 genes of major virulent strains of alpha, delta and omicron demonstrates that the SARS-CoV-2 genome originally emerged (ω ~ 0.04) with a strong purifying selection (ω < 1) and reached (ω ~ 0.85) in omicron towards diversifying selection (ω > 1). A marked increase in the ω occurred in the spike gene from alpha (ω = 0.2) to omicron (ω = 1.97). The ω of the replication machinery genes including RDRP, NSP3, NSP4, NSP7, NSP8, NSP10, NSP13, NSP14, and ORF9 are markedly increased, indicating that these genes/proteins are yet to be evolutionary stabilized and are contributing to the evolution of novel virulent strains. The delta-specific maximum increase in ω in the immunomodulatory genes of NSP8, NSP10, NSP16, ORF4, ORF5, ORF6, ORF7A, and ORF8 compared to alpha or omicron indicates delta-specific vulnerabilities for severe COVID-19 related hospitalization and death. The maximum values of ω are observed for spike (S), NSP4, ORF8 and NSP15, which indicates that the gene-specific temporal estimation of ω identifies specific genes for its super-infectivity and virulency that could be targeted for drug development.
... Calculations were performed assuming uniform and homogenous rates, and with pairwise deletion of gapped positions. We used MEGA to perform Tajima's (1989) test for neutrality and the codon-based Z-test for purifying selection (Nei and Gojobori 1986). McDonald-Kreitman test was performed with ingroup CO1 sequences against J. sarsi in DNAsp v6 (Rozas et al. 2017) to examine possible selection on CO1. ...
... We obtained NI < 1 in comparisons with J. sarsi, indicating adaptive evolution in the albifrons group (Bazin et al. 2006;Meiklejohn et al. 2007). The Z-test (Nei and Gojobori 1986) was significant for purifying selection in all species except in J. sarsi. Tajima's test, however, was not significant for non-neutrality of the CO1 sequences. ...
Article
Full-text available
Here, we characterise the standard “Folmer region” of the mitochondrial cytochrome c oxidase subunit 1 (CO1) marker and a fragment of nuclear 28S marker in four species of the Jaera albifrons complex. Jaera albifrons (Leach, 1814), Jaera ischiosetosa Forsman, 1949, Jaera praehirsuta Forsman, 1949, and Jaera forsmani Bocquet, 1950 were collected from localities on the Norwegian coast and identified with morphological characters. We compared DNA sequences with sequences available in GenBank and BOLDsystems and calculated haplotype networks and interspecific versus intraspecific genetic distances. These analyses revealed low interspecific genetic distance (CO1 0.00–1.57%, 28S 0.00–0.39%) and extensive haplotype sharing between J. albifrons group species and specimens from both sides of the North Atlantic for both CO1 and 28S. Genetic distances between J. albifrons group species and other Jaera species, however, exceeded 29% for both CO1 and 28S, with no haplotype sharing. These assessments, together with taxonomically unconstrained analyses with software ABGD and ASAP, show that these markers are unable to distinguish between the J. albifrons group of morphospecies. The sequences do, however, clearly identify J. albifrons species complex from other Jaera species. Thus, a likely hypothesis is that taxa in this complex represent a single species. Our results corroborate previous finds where discordance between mitochondrial gene clusters, AFLP, and other data highlights the potential conflict between different “species criteria” and the well-established distinction between gene trees and species trees. In operational terms, common protocols for metabarcoding will potentially underestimate sympatric species diversity with cases like the J. albifrons complex, if the members of this complex indeed represent different species.
... considering a cut off of sequence identity ≥ 80%, one to one orthologous type and confidence score =1. Later on, to deduce evolutionary rate (ω) of these pairs of orthologous genes in both budding yeast and mice, we calculated the nonsynonymous to synonymous divergence ratio (dN/dS) using codeml program of PAML (version 4.9)by following Nei & Gojobori 1986 method (Nei and Gojobori 1986;Yang 2007). To calculate dN/dS using codeml, we initially downloaded nucleotide sequences of the orthologous pairs from Ensembl (Howe et al. 2021) and then translated them into peptide sequences using EMBOSS Transeq . ...
... considering a cut off of sequence identity ≥ 80%, one to one orthologous type and confidence score =1. Later on, to deduce evolutionary rate (ω) of these pairs of orthologous genes in both budding yeast and mice, we calculated the nonsynonymous to synonymous divergence ratio (dN/dS) using codeml program of PAML (version 4.9)by following Nei & Gojobori 1986 method (Nei and Gojobori 1986;Yang 2007). To calculate dN/dS using codeml, we initially downloaded nucleotide sequences of the orthologous pairs from Ensembl (Howe et al. 2021) and then translated them into peptide sequences using EMBOSS Transeq . ...
Thesis
This thesis incorporates in-silico methods to elucidate important molecular events leading to switch from mitosis to meiotic growth in S. cerevisiae. It unravels probable regulatory cascades by delineating important Master Regulators (MRs), Transcription Factors (TFs) and their Target genes (TGs) for three pivotal stages during meiotic growth in S. cerevisiae i.e. pre-meiotic, meiosis initiation and meiosis commitment stages. Lastly, it reports that the transcriptional cascade for meiosis commitment is more evolutionary conserved compared to transcriptional cascade for meiosis initiation phase in lower eukaryotes like S. cerevisiae and higher eukaryotes like M. musculus. This thesis sheds light into the regulatory mechanisms of three important phases during meiotic growth of S. cerevisiae which will be helpful for gaining intricate knowledge of the meiotic process in eukaryotes.
... Then the evolutionary rate of the TLR genes (TLR1-TLR10) of each TLR group (example: TLR1) was estimated relative to their consensus (example: TLR1_consensus) sequences using Codeml program of the PAML software (ver. 4.5) with runmode = − 2 and CodonFreq = 1 (Nei and Gojobori 1986;Yang 2007). ...
... Protein domains of the ancestral sequences were annotated using the ScanProsite tool (de Castro et al. 2006). Evolutionary parameters such as rate of non-synonymous substitutions per non-synonymous site (Ka) and rate of synonymous substitutions per synonymous site (Ks) of the ancestral sequences were analyzed with respect to the root node sequence of the phylogenetic tree (Nei and Gojobori 1986;Yang 2007). The interaction of the ancestral protein sequences and Human_TLR9 sequence that have been used as a reference for the remaining species (Zhou et al. 2013) with the CpG ODN (Areal et al. 2011) was studied in the HDOCK. ...
Article
Full-text available
The transmembrane pattern recognition receptor, Toll-like receptor (TLR), are best known for their roles in innate immunity via recognition of pathogen and initiation of signaling response. Mammalian TLRs recognize molecular patterns associated with pathogens and initiate innate immune response. We have studied the evolutionary diversity of mammalian TLR genes for differences in immunological response. Reconstruction of ancestral sequences is a key aspect of the molecular evolution of TLR to track changes across the TLR genes. The comprehensive analysis of mammalian TLRs revealed a distinct pattern of evolution of TLR9. Various sequence-based features such as amino acid usage, hydrophobicity, GC content, and evolutionary constraints are found to influence the divergence of TLR9 from other TLRs. Ancestral sequence reconstruction analysis also revealed that the gradual evolution of TLR genes in several ancestral lineages leads to the distinct pattern of TLR9. It demonstrates evolutionary divergence with the progressive accumulation of mutations results in the distinct pattern of TLR9.
... the number of nonsynonymous and synonymous substitutions per site of the S AR S-CoV-2 genes were estimated, because the selective force will depend on the function of the protein, which in turn depends on the amino acid sequences. Table 3 shows the number of nonsynonymous and synonymous changes per site of the S AR S-CoV-2 genes estimated by NG86 model ( 16 ). ...
... The number of Gs and As were almost unchanged throughout the same time period. Solid lines in Fig Table 4 shows the number of nonsynonymous and synonymous substitutions per site of the S AR S-CoV-2 genes estimated by Nei and Gojobori model ( 16 ). In this table, dn indicates the number of nonsynonymous substitutions per site and ds indicates the number of synonymous substitutions per site. ...
Article
Full-text available
SARS-CoV-2 is the cause of the current worldwide pandemic of severe acute respiratory syndrome. The change of nucleotide composition of the SARS-CoV-2 genome is crucial for understanding the spread and transmission dynamics of the virus because viral nucleotide sequences are essential in identifying viral strains. Recent studies have shown that cytosine (C) to uracil (U) substitutions are overrepresented in SARS-CoV-2 genome sequences. These asymmetric substitutions between C and U indicate that traditional time-reversible substitution models cannot be applied to the evolution of SARS-CoV-2 sequences. Thus, we develop a new time-irreversible model of nucleotide substitutions to estimate the substitution rates in SARS-CoV-2 genomes. We investigated the number of nucleotide substitutions among the 7,862 genomic sequences of SARS-CoV-2 registered in the Global Initiative on Sharing All Influenza Data (GISAID) that have been sampled from all over the world. Using the new method, the substitution rates in SARS-CoV-2 genomes were estimated. The C-to-U substitution rates of SARS-CoV-2 were estimated to be 1.95 × 10-3 ± 4.88 ×10-4 per site per year, compared with 1.48 × 10-4 ± 7.42 × 10-5 per site per year for all other types of substitutions.
... S is the number of synonymous differences between two CYP19 genes and ES is the number of expected (i.e., all possible) synonymous sites in a gene. We computed the number of synonymous mutations using the method of Nei and Gojobori (1986) [45]; to determine the ES of a CYP19 gene, we determined the number of possible synonymous mutations for each individual codon and summed them across all codons. When comparing CYP19 gene pairs, we averaged the individual ES values. ...
... S is the number of synonymous differences between two CYP19 genes and ES is the number of expected (i.e., all possible) synonymous sites in a gene. We computed the number of synonymous mutations using the method of Nei and Gojobori (1986) [45]; to determine the ES of a CYP19 gene, we determined the number of possible synonymous mutations for each individual codon and summed them across all codons. When comparing CYP19 gene pairs, we averaged the individual ES values. ...
Article
Full-text available
Estrogens play critical roles in embryonic development, gonadal sex differentiation, behavior, and reproduction in vertebrates and in several human cancers. Estrogens are synthesized from testosterone and androstenedione by the endoplasmic reticulum membrane-bound P450 aromatase/cytochrome P450 oxidoreductase complex (CYP19/CPR). Here, we report the characterization of novel mammalian CYP19 isoforms encoded by CYP19 gene copies. These CYP19 isoforms are all defined by a combination of mutations in the N-terminal transmembrane helix (E42K, D43N) and in helix C of the catalytic domain (P146T, F147Y). The mutant CYP19 isoforms show increased androgen conversion due to the KN transmembrane helix. In addition, the TY substitutions in helix C result in a substrate preference for androstenedione. Our structural models suggest that CYP19 mutants may interact differently with the membrane (affecting substrate uptake) and with CPR (affecting electron transfer), providing structural clues for the catalytic differences.
... Genomic collinearity between Pongamia and soybean was analyzed using MCScanX based on the results of alignment of protein sequences. The synonymous substitutions per synonymous site (Ks) value of gene paired within the collinearity block was calculated using ParaAT (v2.0) employing the NG method [51]. Similarly, The Ks value for gene paired of TDG was calculated using the same method. ...
Article
Full-text available
Background Soybean (Glycine max) is a vital oil-producing crop. Augmenting oleic acid (OA) levels in soybean oil enhances its oxidative stability and health benefits, representing a key objective in soybean breeding. Pongamia (Pongamia pinnata), known for its abundant oil, OA, and flavonoid in the seeds, holds promise as a biofuel and medicinal plant. A comparative analysis of the lipid and flavonoid biosynthesis pathways in Pongamia and soybean seeds would facilitate the assessment of the potential value of Pongamia seeds and advance the genetic improvements of seed traits in both species. Results The study employed multi-omics analysis to systematically compare differences in metabolite accumulation and associated biosynthetic genes between Pongamia seeds and soybean seeds at the transcriptional, metabolic, and genomic levels. The results revealed that OA is the predominant free fatty acid in Pongamia seeds, being 8.3 times more abundant than in soybean seeds. Lipidomics unveiled a notably higher accumulation of triacylglycerols (TAGs) in Pongamia seeds compared to soybean seeds, with 23 TAG species containing OA. Subsequently, we identified orthologous groups (OGs) involved in lipid biosynthesis across 25 gene families in the genomes of Pongamia and soybean, and compared the expression levels of these OGs in the seeds of the two species. Among the OGs with expression levels in Pongamia seeds more than twice as high as in soybean seeds, we identified one fatty acyl-ACP thioesterase A (FATA) and two stearoyl-ACP desaturases (SADs), responsible for OA biosynthesis, along with two phospholipid:diacylglycerol acyltransferases (PDATs) and three acyl-CoA:diacylglycerol acyltransferases (DGATs), responsible for TAG biosynthesis. Furthermore, we observed a significantly higher content of the flavonoid formononetin in Pongamia seeds compared to soybean seeds, by over 2000-fold. This difference may be attributed to the tandem duplication expansions of 2,7,4ʹ-trihydroxyisoflavanone 4ʹ-O-methyltransferases (HI4ʹOMTs) in the Pongamia genome, which are responsible for the final step of formononetin biosynthesis, combined with their high expression levels in Pongamia seeds. Conclusions This study extends beyond observations made in single-species research by offering novel insights into the molecular basis of differences in lipid and flavonoid biosynthetic pathways between Pongamia and soybean, from a cross-species comparative perspective.
... Statistics of positive selection were retrieved from Pop-HumanScan [39]. We considered a classical statistic such as π for computing the average number of nucleotide differences per site [40]. We further considered tests covering different time scales. ...
Article
Full-text available
Background The human lineage has undergone a postcranial skeleton gracilization (i.e. lower bone mass and strength relative to body size) compared to other primates and archaic populations such as the Neanderthals. This gracilization has been traditionally explained by differences in the mechanical load that our ancestors exercised. However, there is growing evidence that gracilization could also be genetically influenced. Results We have analyzed the LRP5 gene, which is known to be associated with high bone mineral density conditions, from an evolutionary and functional point of view. Taking advantage of the published genomes of archaic Homo populations, our results suggest that this gene has a complex evolutionary history both between archaic and living humans and within living human populations. In particular, we identified the presence of different selective pressures in archaics and extant modern humans, as well as evidence of positive selection in the African and South East Asian populations from the 1000 Genomes Project. Furthermore, we observed a very limited evidence of archaic introgression in this gene (only at three haplotypes of East Asian ancestry out of the 1000 Genomes), compatible with a general erasing of the fingerprint of archaic introgression due to functional differences in archaics compared to extant modern humans. In agreement with this hypothesis, we observed private mutations in the archaic genomes that we experimentally validated as putatively increasing bone mineral density. In particular, four of five archaic missense mutations affecting the first β-propeller of LRP5 displayed enhanced Wnt pathway activation, of which two also displayed reduced negative regulation. Conclusions In summary, these data suggest a genetic component contributing to the understanding of skeletal differences between extant modern humans and archaic Homo populations.
... This ratio refers to the number of nonsynonymous substitutions per nonsynonymous site (dN) to the number of synonymous substitutions per synonymous site (dS) [33,35]. We employed the Nei-Gojobori test, complemented by the Jukes-Cantor correction, within the MEGA 11 software to calculate the dN/dS ratio [36][37][38]. To estimate the variance of the difference, we conducted Z-tests of selection, utilizing 1000 bootstrap replications. ...
Preprint
Full-text available
Trachoma, caused by Chlamydia trachomatis (Ct), remains a leading cause of preventable infection induced blindness worldwide. We conducted a four-year longitudinal study in three trachoma-endemic villages in Northern Tanzania, tracking infection dynamics and factors influencing trachomatous scarring progression and persistence pre- and post-Mass Drug Administration (MDA) interventions. We analysed 118 whole genomes of Ct originating from ocular swabs of children. Sample collection was conducted at three-month intervals over four years, encompassing 15 timepoints. We studied Ct phylogeny, patterns of single nucleotide polymorphism (SNP) accumulation in individual isolates and single nucleotide variation (SNV) in the population, with association of clinical signs of trachoma and scarring progression. Seventy-one (60.2%) samples were classified as serovar A (SvA) and 47 (39.8%) as serovar B (SvB) genomes. Initially, SvB dominated among pre-MDA samples (36/40, 90%), but SvA gradually became dominant after the first round of MDA (67/78, 85.9%) (P < 0.0001). Two distinct subsets of SvA were found: subset_1 (29 sequences) pre-MDA, aligning with Tanzanian reference strain A/2497; subset_2 (42 sequences) post-MDA, showing a mutation rate roughly twice as high as subset_1, a 6 kbp genome reduction in the PZ, and forming a distinct cluster. Similarly, 13 SvB sequences exhibited diverse PZ genome reduction (~ 4 and ~ 10 kbp), yet all grouped with Tanzanian reference strain B/TZ1A828/OT. Importantly, we observed a shift in the types of Ct serovars after the first round of MDA, with the emergence of a unique SvA subset with distinct genetic characteristics compared to those circulating before MDA. The observed decrease in the size of the Ct genome suggests a process where the Tanzanian ocular Ct strains may be streamlining, highlighting ongoing evolution. Further research is needed to understand the factors driving these changes and their impact on Ct biology and response to azithromycin.
... To detect whether population evolution deviated from neutral evolution, KaKs Calculator 2.0 software (Wang et al., 2010) served as an excellent tool to calculate the ratio of synonymous to non-synonymous substitution (kn/ks). The NG (Nei and Gojobori, 1986), LWL (Li et al., 1985), and YN (Yang and Nielsen, 2000) algorithms were used. ...
... where p is the site of polymorphism, Ni is the nucleotide difference proportion at the ith site, L is the sequence length, ki is the coverage of the sequence at the ith site, and Ai, Ci, Gi, and Ti are the sums of the four bases at the ith site (Nei and Gojobori, 1986;Kadoya et al., 2022). The nucleotide diversity in nonsynonymous and synonymous coding sites (π N and π S, respectively) was determined separately using the same formula as the nucleotide diversity. ...
Article
Full-text available
High genetic diversity in RNA viruses contributes to their rapid adaptation to environmental stresses, including disinfection. Insufficient disinfection can occur because of the emergence of viruses that are less susceptible to disinfection. However, understanding regarding the mechanisms underlying the alteration of viral susceptibility to disinfectants is limited. Here, we performed an experimental adaptation of murine norovirus (MNV) using chlorine to understand the genetic characteristics of virus populations adapted to chlorine disinfection. Several MNV populations exposed to an initial free chlorine concentration of 50 ppm exhibited reduced susceptibility, particularly after the fifth and tenth passages. A dominant mutation identified using whole-genome sequencing did not explain the reduced susceptibility of the MNV populations to chlorine. Conversely, MNV populations with less susceptibility to chlorine, which appeared under higher chlorine stress, were accompanied by significantly lower synonymous nucleotide diversity (πS) in the major capsid protein (VP1). The nonsynonymous nucleotide diversity (πN) in VP1 in the less-susceptible populations was higher than that in the susceptible populations, although the difference was not significant. Therefore, the ability of MNV populations to adapt to chlorine was associated with the change in nucleotide diversity in VP1, which may lead to viral aggregate formation and reduction in chlorine exposure. Moreover, the appearance of some nonsynonymous mutations can also contribute to the alteration in chlorine susceptibility by influencing the efficiency of viral replication. This study highlights the importance of understanding the genetic characteristics of virus populations under disinfection, which can contribute to the development of effective disinfection strategies and prevent the development of virus populations less susceptible to disinfectants.
... The values were plotted in the plastomes using an R program. The paired nucleotide distances of the three genomes were calculated using MEGA v11.0.13 [49] based on the Nei-Gojobori method [50]. ...
Article
Full-text available
Millions of years of isolation have given Madagascar a unique flora that still reflects some of its relationship with the continents of Africa and India. Here, the complete chloroplast sequence of Beilschmiedia moratii, a tropical tree in Madagascar, was determined. The plastome, with a length of 158,410 bp, was 143 bp and 187 bp smaller than those of two closely related species, B. pierreana and Potameia microphylla, in sub-Saharan Africa and Madagascar with published sequences, respectively. A total of 124 repeats and 114 simple sequence repeats (SSRs) were detected in the plastome of B. moratii. Six highly variable regions, including ndhF, ndhF-rpl32, trnC-petN, pebE-petL, rpl32-trnL, and ycf1, among the three African species were identified and 1151 mutation events, including 14 SVs, 351 indels, and 786 substitutions, were accurately located. There were 634 mutation events between B. moratii and P. microphylla with a mean nucleotide variability (π) value of 0.00279, while there were 827 mutation events between B. moratii and B. pierreana with a mean π value of 0.00385. The Ka/Ks ratios of 86 protein-coding genes in the three African species were less than 1, and the mean value between B. moratii and P. microphylla was 0.184, while the mean value between B. moratii and B. pierreana was 0.286. In this study, the plastid genomes of the three African Beilschmiediineae species were compared for the first time and revealed that B. moratii and P. microphylla from Madagascar were relatively conserved, with low mutation rates and slower evolutionary rates.
... Additionally, the ratio between the number of nonsynonymous substitutions per nonsynonymous sites (dN) and the number of synonymous substitutions per synonymous sites (dS) was analyzed using MEGA version 11 (Tamura et al., 2021). For dN/dS, sequences were evaluated both pairwise and overall, using the codon-based Z-test of Selection with the Nei-Gojobori method (Nei and Gojobori, 1986;Tamura et al., 2021). The new sequences obtained in this study have been submitted to the GenBank database, including the complete mitogenomes for E. chape (OQ935836) and E. onthophagus (OQ935837), as well as the partial mitogenome for D. nigrofasciatus (BK063406) (Table S1). ...
Article
Full-text available
Mitochondrial genomes have provided significant insights into the evolution of several insects. A typical mitogenome contains 37 genes, and variations in gene order can indicate evolutionary relationships between species. In this study, we have assembled the first complete mitogenomes of Endecous chape and E. onthophagus and analyzed the phylogenetic implications for the Gryllidea infraorder. We performed DNA extractions and genome sequencing for both Endecous species. Subsequently, we searched for raw data in the Sequence Read Archive (SRA) in NCBI. Using the SRA data, we assembled the partial mitogenome of Dianemobius nigrofasciatus and annotated the protein-coding genes (PCGs) for nine species. Phylogenomic relationships were reconstructed using Maximum Likelihood (ML) and Bayesian Inference (BI), utilizing the PCGs from 49 Gryllidea species. The mitogenome lengths of E. chape and E. onthophagus are 16,266 bp and 16,023 bp, respectively, while D. nigrofasciatus has a length of 15,359 bp. Our results indicate that species within the infraorder exhibit four types of gene order arrangements that align with their phylogenetic relationships. Both phylogenomic trees displayed strong support, and the ML corroborated with the literature. Gryllidea species have significantly contributed to various fields, and studying their mitogenomes can provide valuable insights into this infraorder evolution.
... Analysis of the ratio ω between non-synonymous and synonymous substitution (d N /d S ) could provide evidence for positive selection on functional alleles. This was calculated according to the Nei-Gojobori method with the Jukes-Cantor correction in MEGA7 (Jukes and Cantor 1969;Nei and Gojobori 1986;Kumar et al. 2016). Standard errors were estimated with 1000 bootstrap replications. ...
Article
Full-text available
High polymorphism in major histocompatibility complex (MHC) genes plays an essential role in adaptive immune response among vertebrates through antigen recognition and presentation. For vulnerable Asiatic black bear, a lack of DQB gene sequences from continental populations hindered further genetic diversity analysis in a large geographical region. Here, we used PCR cloning and sequencing to characterize genetic diversity of DQB gene among diferent populations of the species. Trans-species polymorphism (TSP) and selective strength of DQB gene were assessed by sequence analysis in Ursidae. Forty-seven novel Urth-MHC haplotypes, including 32 putative functional alleles (PFA, Urth-DQB*33–Urth-DQB *64) and one presumed pseudogene (Urth-DQB*65), were identifed in the population. Allelic frequency varied greatly (Urth�DQB*4601 had the highest value) and number of rare alleles was high (34.04%). This might suggest a risk of allele loss by inbreeding and genetic drift if the efective populations continue to be subdivided and decline without appropriate conserva�tion strategies. In the southern continental population (captive animals), the total number of alleles and population-specifc alleles were higher than those in the northern. This suggested the southern continental population was exposed to various pathogens and close conservation attention is required to keep the population safe. Based on values of Hd, π, and K, genetic diversity of the island population was lower when compared to continental populations. This could be explained by fewer pathogen communities in island populations, and confrmed the prediction that large majority of island populations would be less genetically diverse than their continental counterparts. No any allele including those ancestral alleles being similar among Ursidae species was shared between the continental and island populations. In phylogenetic analysis, DQB alleles did not show monophyletic for any single species and four alleles were shared among Ursidae. This pattern was TSP. The ratio (ω=dN / dS) was signifcantly higher than unity on PBR codons (4.029). These features supported the infuence of balanc�ing selection of the DQB locus among continental populations and contributing to the genetic diversity of Urth-DQB. All codons under positive selection matched the PBR sites inferred by HLA-DQB using four testing methods. Pathogen-driven positive selection could be the other important mechanism to maintain the advantageous mutation for DQB alleles. This information will not only promote the understanding of MHC diversity and polymorphism in the Asiatic black bear but will also increase the implication of protecting vulnerable species in the wild and captive for the appropriate management and conservation initiatives
... pN/pS. We used the genetic distance-based method by Nei and Gojobori (Nei, M., and T. Gojobori. 1986) to calculate pN/pS, the ratio of nonsynonymous to synonymous polymorphism rates, within each sample. Other than the rare cases of mixed infections (detected by Freyja above), all sequences within a sample share a common ancestor at the time of infection (Lythgoe et al., 2021). Over this relatively short evolutionary time, we considered ...
Preprint
Full-text available
Infectious disease transmission to different host species makes eradication very challenging and expands the diversity of evolutionary trajectories taken by the pathogen. Since the beginning of the ongoing COVID-19 pandemic, SARS-CoV-2 has been transmitted from humans to many different animal species, and viral variants of concern could potentially evolve in a non-human animal. Previously, using available whole genome consensus sequences of SARS-CoV-2 from four commonly sampled animals (mink, deer, cat, and dog) we inferred similar numbers of transmission events from humans to each animal species but a relatively high number of transmission events from mink back to humans (Naderi et al., 2023). Using a genome-wide association study (GWAS), we identified 26 single nucleotide variants (SNVs) that tend to occur in deer – more than any other animal – suggesting a high rate of viral adaptation to deer. Here we quantify intra-host SARS-CoV-2 across animal species and show that deer harbor more intra-host SNVs (iSNVs) than other animals, providing a larger pool of genetic diversity for natural selection to act upon. Within-host diversity is particularly high in deer lymph nodes compared to nasopharyngeal samples, suggesting tissue-specific differences in viral population sizes or selective pressures. Neither mixed infections involving more than one viral lineage nor large changes in the strength of selection are likely to explain the higher intra-host diversity within deer. Rather, deer are more likely to contain larger viral population sizes, to be infected for longer periods of time, or to be systematically sampled at later stages of infections. Combined with extensive deer-to-deer transmission, the high levels of within-deer viral diversity help explain the apparent rapid adaptation of SARS-CoV-2 to deer.
... Then, dN and dS are calculated considering allele frequencies. Calculations are performed in Python using the Nei-Gojobori method (Nei and Gojobori 1986) with support of gb2seq v0.0.20 (Charité Institute of Virology 2023) for codon annotation. ...
Article
Full-text available
Viral mutations within patients nurture the adaptive potential of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) during chronic infections, which are a potential source of variants of concern. However, there is no integrated framework for the evolutionary analysis of intra-patient SARS-CoV-2 serial samples. Herein, we describe Viral Intra-Patient Evolution Reporting and Analysis (VIPERA), a new software that integrates the evaluation of the intra-patient ancestry of SARS-CoV-2 sequences with the analysis of evolutionary trajectories of serial sequences from the same viral infection. We have validated it using positive and negative control datasets and have successfully applied it to a new case, which revealed population dynamics and evidence of adaptive evolution. VIPERA is available under a free software license at https://github.com/PathoGenOmics-Lab/VIPERA.
... Evaluating q(0.05) in different sequence classes yields again a signal of positive selection in the antigenic epitopes, albeit weaker than by the propagator ratio method, and of negative selection in the remainder of the HA sequence ( Table 1). The q measure is related to the classical dN/dS method [76,77,78]. Previous theoretical work has highlighted potential biases and inaccuracies in dN/dS estimates [79,80,81,82,83]. ...
Preprint
Full-text available
The seasonal human influenza virus undergoes rapid evolution, leading to significant changes in circulating viral strains from year to year. These changes are typically driven by adaptive mutations, particularly in the antigenic epitopes, the regions of the viral surface protein haemagglutinin targeted by human antibodies. Here we describe a consistent set of methods for data-driven predictive analysis of viral evolution. Our pipeline integrates four types of data: (1) sequence data of viral isolates collected on a worldwide scale, (2) epidemiological data on incidences, (3) antigenic characterization of circulating viruses, and (4) intrinsic viral phenotypes. From the combined analysis of these data, we obtain estimates of relative fitness for circulating strains and predictions of clade frequencies for periods of up to one year. Furthermore, we obtain comparative estimates of protection against future viral populations for candidate vaccine strains, providing a basis for pre-emptive vaccine strain selection. Continuously updated predictions obtained from the prediction pipeline for influenza and SARS-CoV-2 are available on the website https://previr.app.
... [45]. We estimated the nonsynonymous substitution (dN) and synonymous substitution (dS) substitution rates (the dN/dS ratio, termed omega ω) according to the Nei and Gojobori method [46]. Window size and step size were set as 30 and 6 codons to determine the variations of selection pressure [47] along the ADCY1 gene sequence for two classes of bat species, namely CF bats and FM bats. ...
Article
Full-text available
Background The majority of bat species have developed remarkable echolocation ability, especially for the laryngeally echolocating bats along with high-frequency hearing. Adaptive evolution has been widely detected for the cochleae in the laryngeally echolocating bats, however, limited understanding for the brain which is the central to echolocation signal processing in the auditory perception system, the laryngeally echolocating bats brain may also undergo adaptive changes. Result In order to uncover the molecular adaptations related with high-frequency hearing in the brain of laryngeally echolocating bats, the genes expressed in the brain of Rhinolophus ferrumequinum (CF bat) and Myotis pilosus (FM bat) were both detected and also compared. A total of 346,891 genes were detected and the signal transduction mechanisms were annotated by the most abundant genes, followed by the transcription. In hence, there were 3,088 DEGs were found between the two bat brains, with 1,426 highly expressed in the brain of R. ferrumequinum, which were significantly enriched in the neuron and neurodevelopmental processes. Moreover, we found a key candidate hearing gene, ADCY1, playing an important role in the R. ferrumequinum brain and undergoing adaptive evolution in CF bats. Conclusions Our study provides a new insight to the molecular bases of high-frequency hearing in two laryngeally echolocating bats brain and revealed different nervous system activities during auditory perception in the brain of CF bats.
... The dN/dS ratio can indicate neutral evolution (dN/dS=1), positive selection (dN/dS>1), and purifying selection (dN/dS<1). We used the Nei-Gojobori test, with the Jukes-Cantor correction in mega 11 to compute the dN/dS ratio [52][53][54]. Z-tests of selection were performed with 1000 bootstrap replications to compute the variance of the difference. A positive value signifies an excess of non-synonymous substitutions. ...
Article
Full-text available
Trachoma, a neglected tropical disease caused by Chlamydia trachomatis (Ct) serovars A-C, is the leading infectious cause of blindness worldwide. Africa bears the highest burden, accounting for over 86 % of global trachoma cases. We investigated Ct serovar A (SvA) and B (SvB) whole genome sequences prior to the induction of mass antibiotic drug administration in The Gambia. Here, we explore the factors contributing to Ct strain diversification and the implications for Ct evolution within the context of ocular infection. A cohort study in 2002-2003 collected ocular swabs across nine Gambian villages during a 6 month follow-up study. To explore the genetic diversity of Ct within and between individuals, we conducted whole-genome sequencing (WGS) on a limited number (n=43) of Ct-positive samples with an omcB load ≥10 from four villages. WGS was performed using target enrichment with SureSelect and Illumina paired-end sequencing. Out of 43 WGS samples, 41 provided sufficient quality for further analysis. ompA analysis revealed that 11 samples had highest identity to ompA from strain A/HAR13 (NC_007429) and 30 had highest identity to ompA from strain B/Jali20 (NC_012686). While SvB genome sequences formed two distinct village-driven subclades, the heterogeneity of SvA sequences led to the formation of many individual branches within the Gambian SvA subclade. Comparing the Gambian SvA and SvB sequences with their reference strains, Ct A/HAR13 and Ct B/ Jali20, indicated an single nucleotide polymorphism accumulation rate of 2.4×10 −5 per site per year for the Gambian SvA and 1.3×10 −5 per site per year for SvB variants (P<0.0001). Variant calling resulted in a total of 1371 single nucleotide variants (SNVs) with a frequency >25 % in SvA sequences, and 438 SNVs in SvB sequences. Of note, in SvA variants, highest evolutionary pressure was recorded on genes responsible for host cell modulation and intracellular survival mechanisms, whereas in SvB variants this pressure was mainly on genes essential for DNA replication/repair mechanisms and protein synthesis. A comparison of the sequences between observed separate infection events (4-20 weeks between infections) suggested that the majority of the variations accumulated in genes responsible for host-pathogen interaction such as CTA_0166 (phospholipase D-like protein), CTA_0498 (TarP) and CTA_0948 (deubiquitinase). This comparison of Ct SvA and SvB variants within a trachoma endemic population focused on their local evolutionary adaptation. We found a different variation accumulation pattern in the Gambian SvA chromosomal genes compared with SvB, hinting at the potential of Ct serovar-specific variation in diversification and evolutionary fitness. These findings may have implications for optimizing trachoma control and prevention strategies. OPEN
... where p n is the number of non-synonymous substitutions divided by the number of non-synonymous sites and p s is the number of synonymous substitutions divided by the number of synonymous sites [41]. DnaSP software was used to calculate the synonymous and non-synonymous sites. ...
Article
Full-text available
Variability in how individuals respond to pathogens is a hallmark of infectious disease, yet the basis for individual variation in host response is often poorly understood. The titer of infectious virus among individual mosquitoes infected with arboviruses is frequently observed to vary by several orders of magnitude in a single experiment, even when the mosquitoes are highly inbred. To better understand the basis for this titer variation, we sequenced populations of Sindbis virus (SINV) obtained from individual infected Aedes aegypti mosquitoes that, despite being from a highly inbred laboratory colony, differed in their titers of infectious virus by approximately 10,000-fold. We observed genetic differences between these virus populations that indicated the virus present in the midguts of low titer mosquitoes was less fit than that of high titer mosquitoes, possibly due to founder effects that occurred during midgut infection. Furthermore, we found dramatic differences in the specific infectivity or SI (the ratio of infectious units/viral genome equivalents) between these virus populations, with the SI of low titer mosquitoes being up to 10,000-fold lower than that of high titer mosquitoes. Despite having similar amounts of viral genomes, low titer mosquitoes appeared to contain less viral particles, suggesting that viral genomes were packaged into virions less efficiently than in high titer mosquitoes. Finally, antibiotic treatment, which has been shown to suppress mosquito antiviral immunity, caused an increase in SI. Our results indicate that the extreme variation that is observed in SINV infectious titer between individual Ae . aegypti mosquitoes is due to both genetic differences between virus populations and to differences in the proportion of genomes that are packaged into infectious particles.
... The sequences of fungal isolates were compared with fungal 18S rRNA sequences in the GenBank database by the BLAST online program at NCBI. The phylogenetic tree was created using the Kimura 2-parametric model and using the Neighbour-Joining method in MEGA X software [9,10]. ...
Article
Full-text available
This study aimed to isolate and identify marine fungi from Bai Tu Long Bay and assess their antimicrobial potential. We successfully isolated twenty strains of marine-derived fungi. The crude extracts from these fungi were tested against pathogenic microorganisms. All twenty strains exhibit some degree of growth inhibition against the tested microorganisms. Notably, strains M223, M250, M253, and M256 showed significant antimicrobial activity, with MIC values equal to or lower than the positive control. These results highlight the potential of marine fungi as a rich source of antimicrobial agents, a finding of considerable importance to marine mycology and pharmaceuticals. Further analysis was conducted on four promising isolates. M253 was identified as Hamigera avellanea, while M223, M250, and M256 were found to belong to the Aspergillus genus. These isolates were then analyzed using a phylogenetic tree based on MegaX software.
... Nucleotide diversity (π) and pairwise differences in nucleotide substitutions between alleles (NPD) within each sheep breed were calculated using Arlequin 3.5. The mean number of nonsynonymous (dN), and synonymous (dS) nucleotide substitutions per site calculated as an average over all sequence pairs were estimated within each group using the modified Nei-Gojobori model [29] and Jukes-Cantor's formula implemented in the software MEGA X [30]. ...
Article
Full-text available
Introduction The Ovar-DRB1 gene, a crucial element of the Major Histocompatibility Complex (MHC) Class II region, initiates adaptive immunity by presenting antigens to T-cells. Genetic diversity in sheep, particularly in MHC Class II genes like Ovar-DRB1, directly influences the specturm of presented antigens impacting immune responses and disease susceptability. Understanding the allelic diversity of Ovar-DRB1 gene in Sudan Desert Sheep (SDS) is essential for uncovering the genetic basis of immune responses and disease resistance, given the the breeds significance in Sudan's unique environment. Methods Utilizing Targeted Next-Generation Sequencing (NGS) we explore allelic diversity in Ovar-DRB1 gene within SDS. Successfully ampliying and and sequencing the second exon of this gene in 288 SDS samples representing six breeds provided a comprehensive allelic profile, enabling a detalied examination of the gene's genetic makeup. Results We identifed forty-six alleles, including four previously unreported, enrichness the genetic diversity of SDS breeds. These alleles exhibiting non-uniform distribution, varying frequencies across breeds, indicating a breed-specific genetic landscape. Certain alleles, known and novel, show higher frequencies in specific populations, suggesting potential associations with adaptive immune responses. Identifying these alleles sets the stage for investigating their functional roles and implications for disease resistance. Genetic differentiation among SDS breeds, as indicated by FST values and clustering analyses, highlights a unique genetic makeup shaped by geographic and historical factors. These differentiation patterns among SDS breeds have broader implications for breed conservation and targeted breeding to enhance disease resistance in specific populations. Conclusion This study unveils Ovar-DRB1 gene allelic diversity in SDS breeds through targeted NGS and genetic analyses, revealing new alleles that underscore the breeds’ unique genetic profile. Insights into the genetic factors governing immune responses and disease resistance emerge, promising for optimization of breeding strategies for enhanced livestock health in Sudan’s unique environment.
... To evaluate and compare the selection pressure involved in the evolution of the pestiviruses genome [16,17], two major species, including BVDV-1 and CSFV (Table S3) Table S3. ...
Article
Full-text available
Border disease virus (BDV), a member of the Pestivirus genus within the Flaviviridae family, is known to inflict significant economic losses on livestock farms due to its association with reproductive disorders and persistent infections in sheep and goats. However, comprehensive epidemiological investigations of BDV in China are scarce. This study examined BDV infection in sheep from Hulunbuir, Inner Mongolia, northeastern China, utilizing metagenomic sequencing and polymerase chain reaction (PCR) assay. Among the 96 serum samples analyzed, only one tested positive for BDV nucleotide sequence, yielding a prevalence rate of 1.0%. A total of 11,985 nt long genome sequences was amplified, which showed nucleotide identities ranging from 76.6% to 87.2% and amino acid identities ranging from 85.2% to 93.2% with other BDV strains worldwide. Phylogenetic analysis unequivocally placed the viral strain within genotype BDV-3, showing a close genetic affinity with strain JSLS12-01 identified in Jiangsu province, China. Furthermore, selection pressure analyses suggested that purifying selection predominantly influenced the evolutionary dynamics of BDV genomes. This study marks the inaugural detection of BDV in sheep within Inner Mongolia, northeastern China, thereby enhancing our understanding of the extensive genetic diversity and geographical distribution of BDV strains across the country. These findings hold relevance for the livestock industry and disease surveillance efforts, offering valuable insights into the prevalence and genetic characteristics of BDV in this region.
... The genetic variability in each of the L1, E2, E6, E7 genes and the LCR of study participants was estimated by aligning obtained sequences with L1, E2, E6, E7 open-reading frames and the LCR sequence of the prototype HPV35H, respectively, using the MEGA 7.1 software (http:// www.megasoftware.net). To evaluate whether and to what extent the genetic variability correlates with the selective pressure into key genes of HPV35 from study participants, synonymous (dS) and nonsynonymous (dN) nucleotide differences between each pair of L1, E2, E6, E7, and LCR sequences from study participants and those from the HPV35H prototype, corresponding to the genetic distance (d) between paired sequences, were determined using the Nei-Gojobori model [48]. Thus, the nonsynonymous/synonymous ratio rates (ω = dN/dS), giving insight into the selective pressure in the encoded protein, were further calculated for each amino acid sequence deriving from L1, E2, E6, and E7 genes. ...
Article
Full-text available
Human Papillomavirus (HPV)-35 accounts for up 10% of cervical cancers in Sub-Saharan Africa. We herein assessed the genetic diversity of HPV35 in HIV-negative women from Chad (identified as #CHAD) and HIV-infected men having sex with men (MSM) in the Central African Republic (CAR), identified as #CAR. Ten HPV35 DNA from self-collected genital secretions (n = 5) and anal margin samples (n = 5) obtained from women and MSM, respectively, were sequenced using the ABI PRISM ® BigDye Sequencing technology. All but one HPV35 strains belonged to the A2 sublineage, and only #CAR5 belonged to A1. HPV35 from #CAR had higher L1 variability compared to #CHAD (mean number of mutations: 16 versus 6). L1 of #CAR5 showed a significant variability (2.29%), suggesting a possible intra-type divergence from HPV35H. Three (BC, DE, and EF) out of the 5 capsid loops domains remained totally conserved, while FG- and HI- loops of #CAR exhibited amino acid variations. #CAR5 also showed the highest LCR variability with a 16bp insertion at binding sites of the YY1. HPV35 from #CHAD exhibited the highest variability in E2 gene ( P <0.05). E6 and E7 oncoproteins remained well conserved. There is a relative maintenance of a well conserved HPV35 A2 sublineage within heterosexual women in Chad and MSM with HIV in the Central African Republic.
... To determine the evolutionary dynamics of duplicated genes, nonsynonymous (Ka) to synonymous (Ks) nucleotide substitutions of between SbNAC paralogs was calculated using the KaKs Calculator software with the method described by Nei and Gojobori [33]. Ka/Ks = 1 means neutral evolution, Ka/Ks < 1 indicates purifying selection, and a gene pair experience positive selection if Ka/Ks > 1. ...
Article
Full-text available
Background Sorghum (Sorghum bicolor) is an important cereal crop grown worldwide because of its multipurpose uses such as food, forage, and bioenergy feedstock and its wide range of adaption even in marginal environments. Greenbug can cause severe damage to sorghum plants and yield loss. Plant NAC transcription factors (TFs) have been reported to have diverse functions in plant development and plant defense but has not been studied in sorghum yet. Methods and results In this study, a comprehensive analysis of the sorghum NAC (SbNAC) gene family was conducted through genome-wide analysis. A total of 112 NAC genes has been identified in the sorghum genome. These SbNAC genes are phylogenetically clustered into 15 distinct subfamilies and unevenly distribute in clusters at the telomeric ends of each chromosome. Twelve pairs of SbNAC genes are possibly involved in the segmental duplication among nine chromosomes except chromosome 10. Structure analysis showed the diverse structures with a highly variable number of exons in the SbNAC genes. Furthermore, most of the SbNAC genes showed specific temporal and spatial expression patterns according to the results of RNA-seq analysis, suggesting their diverse functions during sorghum growth and development. We have also identified nine greenbug-inducible SbNAC genes by comparing the expression profiles between two sorghum genotypes (susceptible BTx623 and resistant PI607900) in response to greenbug infestation. Conclusions Our systematic analysis of the NAC gene expression profiles provides both a preliminary survey into their roles in plant defense against insect pests and a useful reference for in-depth characterization of the SbNAC genes and the regulatory network that contributes genetic resistance to aphids.
... To identify the sites subjected to selection in the icaA gene, the DnaSP 5.0 was used to calculate the parameter (ω) for functional alleles by estimating the dN/dS of the icaA gene (Librado, 2009). Furthermore, the Nei-Gojobori method was performed by MEGA7 software to calculate the codons (Nei and Gojobori, 1986) with the Jukes and Cantor correction. Standard error estimates were derived from 1000 bootstrap replicates. ...
Article
Full-text available
Background Biofilm production by Staphylococcus aureus is a prevailing cause of multidrug resistance. The evolutionary mechanisms of adaption with host and pathogenicity are poorly understood. Aims The present study aimed to investigate the biofilm-forming potential, associated multidrug resistance, and the evolutionary analysis of S. aureus isolated from bovine subclinical mastitis. Methods 122 S. aureus isolates were subjected to Congo red agar method (CRA), microtitre plate method (MTP), and PCR to check the biofilm-forming potential. The Kirby-Bauer disk diffusion method was used to evaluate the antibiotic resistance pattern. The icaA gene of isolates was subjected to molecular and evolutionary analysis using different bioinformatics tools. Results The results showed that 63.93% of S. aureus isolates carried the icaA gene and the detection rate of CRA was higher (36.07%) compared to the MTP test (24.59%). A total of 78.21% and 56.41% of biofilm-positive isolates were methicillin-resistant S. aureus (MRSA) and vancomycin-resistant S. aureus (VRSA), respectively. All S. aureus isolates (100%) showed multidrug resistance. The molecular analysis showed an evolutionary link between isolates and revealed a strong codon bias, three different recombination events, and positive selection in some residues of the semi-conserved segments of the icaA gene. Conclusion The study concluded that biofilm-positive isolates have a high tendency to exhibit methicillin, vancomycin, and multidrug resistance. The findings suggest that mutation and selection are the most likely causes of codon bias in the icaA gene sequences. The variations led by recombination events and positive selection are suggestive of bacterial strategy to combat antimicrobial effects and to escape the host’s immune surveillance.
... It has also been shown that gBGC can alter our ability to detect selection in protein-coding genes, mainly through incorrect interpretation of the dN/dS ratio (Berglund et al., 2009;Galtier et al., 2009;Ratnakumar et al., 2010;Bolívar et al., 2016). Originally, the dN/dS ratio was designed to quantify the selective constraints exerted on the sequence of a protein (Miyata and Yasunaga, 1980;Nei and Gojobori, 1986). It . ...
Preprint
Full-text available
It is commonly thought that the long-term advantage of meiotic recombination is to dissipate genetic linkage, allowing natural selection to act independently on different loci. It is thus theoretically expected that genes with higher recombination rates evolve under more effective selection, and can adapt more easily to environmental change. On the other hand, recombination is often associated with GC-biased gene conversion (gBGC), which theoretically interferes with selection by promoting the fixation of deleterious GC alleles. To test these predictions, several empirical studies assessed whether selection was more effective in highly recombining genes (due to dissipation of genetic linkage) or less effective (due to gBGC), implicitly assuming a fixed distribution of fitness effects (DFE) for all genes. In this study, I directly derive the expected shape of the DFE from the evolutionary history of a gene (shaped by mutation, selection, drift and gBGC) under empirical fitness landscapes. I show that genes that have known high levels of gBGC are less fit and thus have more opportunities for beneficial mutations. Only a slight decrease in the genome-wide intensity of gBGC leads to the fixation of these beneficial mutations, particularly in highly recombining genes. This shows that increased positive selection in highly recombining genes is not by itself an evidence for more effective selection due to the dissipation of genetic linkage. Additionally, I show that the death of a long-lived recombination hotspot can lead to a higher dN/dS than its birth, but with substitutions patterns biased towards AT, and only at selected position. This shows that controlling for a substitution bias towards GC is therefore not sufficient to rule out the contribution of gBGC to signatures of accelerated evolution.
... Polymorphism metrics, i.e. nucleotide diversity (calculated as in [51]), the number of non-synonymous over the number of synonymous substitutions pN/pS [52], and the Direction of Selection (DoS) [53] were calculated using custom python scripts. DoS measures the direction and extent of selection by comparing the number of non-silent (pN) and silent polymorphisms (pS) to the number of non-silent (dN) and silent (dS) substitutions per locus, avoiding statistical bias due to low sampling sizes sometimes observed with other selection metrics such as the McDonald and Kreitman test or the Neutrality Index [53]. ...
Article
Full-text available
The extent of intraspecific genomic variation is key to understanding species evolutionary history, including recent adaptive shifts. Intraspecific genomic variation remains poorly explored in eukaryotic micro-organisms, especially in the nuclear dimorphic ciliates, despite their fundamental role as laboratory model systems and their ecological importance in many ecosystems. We sequenced the macronuclear genome of 22 laboratory strains of the oligohymenophoran Tetrahymena thermophila , a model species in both cellular biology and evolutionary ecology. We explored polymorphisms at the junctions of programmed eliminated sequences, and reveal their utility to barcode very closely related cells. As for other species of the genus Tetrahymena , we confirm micronuclear centromeres as gene diversification centres in T. thermophila , but also reveal a two-speed evolution in these regions. In the rest of the genome, we highlight recent diversification of genes coding for extracellular proteins and cell adhesion. We discuss all these findings in relation to this ciliate’s ecology and cellular characteristics.
... Both scripts, and the bash commands for running the codon-aware alignments, are available in v1.1.0 of this repository: https://github.com/gavinmdouglas/handy_pop_gen. The latter script identifies potential non-synonymous and synonymous mutation sites between each sequence pair using the NG86 approach 45 . We computed the mean values across all pairwise strain comparisons, resulting in a single measure of dN/dS and dS per species. ...
Article
Full-text available
A long-standing question is to what degree genetic drift and selection drive the divergence in rare accessory gene content between closely related bacteria. Rare genes, including singletons, make up a large proportion of pangenomes (all genes in a set of genomes), but it remains unclear how many such genes are adaptive, deleterious or neutral to their host genome. Estimates of species’ effective population sizes (Ne) are positively associated with pangenome size and fluidity, which has independently been interpreted as evidence for both neutral and adaptive pangenome models. We hypothesized that pseudogenes, used as a neutral reference, could be used to distinguish these models. We find that most functional categories are depleted for rare pseudogenes when a genome encodes only a single intact copy of a gene family. In contrast, transposons are enriched in pseudogenes, suggesting they are mostly neutral or deleterious to the host genome. Thus, even if individual rare accessory genes vary in their effects on host fitness, we can confidently reject a model of entirely neutral or deleterious rare genes. We also define the ratio of singleton intact genes to singleton pseudogenes (si/sp) within a pangenome, compare this measure across 668 prokaryotic species and detect a signal consistent with the adaptive value of many rare accessory genes. Taken together, our work demonstrates that comparing with pseudogenes can improve inferences of the evolutionary forces driving pangenome variation.
... GC%, GC3%, dS, dN, and RSCU were determined using MEGA7 and AA content by the ProtrWeb server (Xiao et al. 2015). The Nei-Gojobori method (Nei & Gojobori 1986) was employed to calculate dS and dN. PCA was performed using the prcomp function, a default package of R. To account for the phylogenetic relationship between the sequences for PCA, phylogenetic principal component analysis (pPCA) was implemented using the ppca function of adephylo package in R (with the following parameters: method = "sumDD," center = FALSE, scale = FALSE, scannf = FALSE) (Jombart et al. 2010). ...
Article
Full-text available
The Praja family is an E3 ubiquitin ligase, promoting polyubiquitination and subsequent degradation of substrates. It comprises two paralogs, praja1 and praja2. Prior research suggests these paralogs have undergone functional divergence, with examples, such as their distinct roles in neurite outgrowth. However, the specific evolutionary trajectories of each paralog remain largely unexplored preventing mechanistic understanding of functional differences between paralogs. Here, we investigated the phylogeny and divergence of the vertebrate Praja family through molecular evolutionary analysis. Phylogenetic examination of the vertebrate praja revealed that praja1 and praja2 originated from the common ancestor of placentals via gene duplication, with praja1 evolving at twice the rate of praja2 shortly after the duplication. Moreover, a unique evolutionary trajectory for praja1 relative to other vertebrate Praja was indicated, as evidenced by principal component analysis on GC content, codon usage frequency, and amino acid composition. Subsequent motif/domain comparison revealed conserved N terminus and C terminus in praja1 and praja2, together with praja1-specific motifs, including nuclear localization signal and Ala–Gly–Ser repeats. The nuclear localization signal was demonstrated to be functional in human neuroblastoma SH-SY5Y cells using deletion mutant, while praja2 was exclusively expressed in the nucleus. These discoveries contribute to a more comprehensive understanding of the Praja family’s phylogeny and suggest a functional divergence between praja1 and praja2. Specifically, the shift of praja1 into the nucleus implies the degradation of novel substrates located in the nucleus as an evolutionary consequence.
Article
Full-text available
It is commonly thought that the long-term advantage of meiotic recombination is to dissipate genetic linkage, allowing natural selection to act independently on different loci. It is thus theoretically expected that genes with higher recombination rates evolve under more effective selection. On the other hand, recombination is often associated with GC-biased gene conversion (gBGC), which theoretically interferes with selection by promoting the fixation of deleterious GC alleles. To test these predictions, several studies assessed whether selection was more effective in highly recombining genes (due to dissipation of genetic link age) or less effective (due to gBGC), assuming a fixed distribution of fitness effects (DFE) for all genes. In this study, I directly derive the DFE from a gene's evolutionary history (shaped by mutation, selection, drift and gBGC) under empirical fitness landscapes. I show that genes that have experienced high levels of gBGC are less fit and thus have more opportunities for beneficial mutations. Only a small decrease in the genome-wide intensity of gBGC leads to the fixation of these beneficial mutations, particularly in highly recombining genes. This results in increased positive selection in highly recombining genes that is not caused by more effective selection. Additionally, I show that the death of a recombination hotspot can lead to a higher dN/dS than its birth, but with substitution patterns biased towards AT, and only at selected positions. This shows that controlling for a substitution bias towards GC is therefore not sufficient to rule out the contribution of gBGC to signatures of accelerated evolution. Finally, although gBGC does not affect the fixation probability of GC-conservative mutations, I show that by altering the DFE, gBGC can also significantly affect non-synonymous GC-conservative substitution patterns.
Preprint
Full-text available
Hemizygous genes are present in one of the two sister chromatids of diploid organisms. It comes to be known for their prevalent occurrence and vital roles in sex chromosome. However, hemizygous genes in genomes of diploid plants remain largely unexplored. In this study, we investigated the features, genetic, cis- and epigenetic regulations of hemizygous genes in seven crops. These crops represent three clonal lineages, one outcrossing species, and three putatively homozygous (selfed or doubled haploid) genomes. By remapping long reads to the primary genome assembly, we identified structural variants that included annotated genes. We found 3,399-5,610 hemizygous genes (10.1%-15.1%) in the three clonal plants. As expected, very few genes (0.003%-0.007%) were hemizygous in the three homozygous genomes, representing negative controls. The genome from an outcrossing species was intermediate between the two extremes. Hemizygous genes experienced a more recent origin and stronger selection pressure than diploid genes. We also found reduced expression of hemizygous genes compared to diploid genes, with ~20% expression levels on average, which violated the evolutionary model of dosage compensation. Furthermore, we detected higher DNA methylation levels on average in hemizygous genes and transposable elements, which may contribute to the reduced hemizygous gene expression levels. Finally, expression profiles showed that hemizygous genes were more tissue/treatment-specific expressed than diploid genes in fruit development, organ differentiation, and responses to abiotic and biotic stresses. Overall, hemizygous genes displayed distinct genomic, genetic and epigenomic features compared to diploid genes, providing new insights for the genetics and breeding of crops with heterozygous genomes.
Article
The seasonal human influenza virus undergoes rapid evolution, leading to significant changes in circulating viral strains from year to year. These changes are typically driven by adaptive mutations, particularly in the antigenic epitopes, the regions of the viral surface protein haemagglutinin targeted by human antibodies. Here we describe a consistent set of methods for data-driven predictive analysis of viral evolution. Our pipeline integrates four types of data: (1) sequence data of viral isolates collected on a worldwide scale, (2) epidemiological data on incidences, (3) antigenic characterization of circulating viruses, and (4) intrinsic viral phenotypes. From the combined analysis of these data, we obtain estimates of relative fitness for circulating strains and predictions of clade frequencies for periods of up to one year. Furthermore, we obtain comparative estimates of protection against future viral populations for candidate vaccine strains, providing a basis for pre-emptive vaccine strain selection. Continuously updated predictions obtained from the prediction pipeline for influenza and SARS-CoV-2 are available on the website https://previr.app.
Article
Distyly is an iconic floral polymorphism governed by a supergene, which promotes efficient pollen transfer and outcrossing through reciprocal differences in the position of sexual organs in flowers, often coupled with heteromorphic self-incompatibility (SI). Distyly has evolved convergently in multiple flowering plant lineages, but has also broken down repeatedly, often resulting in homostylous, self-compatible populations with elevated rates of self-fertilization. Here, we aimed to study the genetic causes and genomic consequences of the shift to homostyly in Linum trigynum, which is closely related to distylous Linum tenue. Building on a high-quality genome assembly, we show that L. trigynum harbors a genomic region homologous to the dominant haplotype of the distyly supergene conferring long stamens and short styles in L. tenue, suggesting that loss of distyly first occurred in a short-styled individual. In contrast to homostylous Primula and Fagopyrum, L. trigynum harbors no fixed loss-of-function mutations in coding sequences of S-linked distyly candidate genes. Instead, floral gene expression analyses and controlled crosses suggest that mutations downregulating the S-linked LtWDR-44 candidate gene for male SI and/or anther height could underlie homostyly and self-compatibility (SC) in L. trigynum. Population genomic analyses of 224 whole-genome sequences further demonstrate that L. trigynum is highly self-fertilizing, exhibits significantly lower genetic diversity genome-wide, and is experiencing relaxed purifying selection and less frequent positive selection on nonsynonymous mutations relative to L. tenue. Our analyses shed light on the loss of distyly in L. trigynum, and advance our understanding of a common evolutionary transition in flowering plants.
Article
Full-text available
Nature has devised many ways of producing males and females. Here, we report on a previously undescribed mechanism for Lepidoptera that functions without a female-specific gene. The number of alleles or allele heterozygosity in a single Z-linked gene ( BaMasc ) is the primary sex-determining switch in Bicyclus anynana butterflies. Embryos carrying a single BaMasc allele develop into WZ (or Z0) females, those carrying two distinct alleles develop into ZZ males, while (ZZ) homozygotes initiate female development, have mismatched dosage compensation, and die as embryos. Consequently, selection against homozygotes has favored the evolution of spectacular allelic diversity: 205 different coding sequences of BaMasc were detected in a sample of 246 females. The structural similarity of a hypervariable region (HVR) in BaMasc to the HVR in Apis mellifera csd suggests molecular convergence between deeply diverged insect lineages. Our discovery of this primary switch highlights the fascinating diversity of sex-determining mechanisms and underlying evolutionary drivers.
Preprint
Based on the nucleotide sequences of the mitochondrial genome (mitogenome) for specimens taken from two mussel species (Arcuatula senhousia and Mytilus coruscus) the investigation was performed by means of the complex approaches of the genomics, molecular phylogenetics, and evolutionary genetics. Mitogenome structure of studied mussels, like in many other invertebrates, appears to be much more variable then in vertebrates, and includes changing gene order, duplications and deletions that were most frequent for tRNA genes; the mussel species mitogenome has also a variable size. The result proved one of the very important properties of the protein polypeptides, such as their hydrophobicity and its determination by the purines and pyrimidines nucleotide ratio. This fact might indirectly indicate the necessity of purifying natural selection for the support of polypeptides functionality. However, in accordance with the widely accepted and logical concept of natural cutoff selection for the organisms living in Nature, which explains its action against the deleterious nucleotide substitutions in the nonsynonymous codons (mutations) and holding the active (effective) macromolecules of the polypeptides in a population, we were unable to get unambiguous evidence in the favor of this concept in current paper. The phylogeny and systematics of mussels is studied in one of largest taxon of bivalve mollusks, the family Mytilidae. The phylogeny for the family Mytilidae (order Mytilida), which currently has no consensus in systematics, is reconstructed using data matrix of 26-27 mitogenomes. Initially, a set of100 GenBank's sequences were downloaded and checked for their gender: whether being female (F) or Male (M) origin. Our analysis on new data confirms the known drastic differences between the F/M mitogenome lines in mussels. The phylogenetic reconstructions of the F-lines were performed using the combined set of the genetic markers, only protein coding genes (PCGs), only rRNA + tRNA genes and all genes. Additionally, analysis includes usage of the nucleotide sequences composed of other data matrices, such as 20-68 mitogenome sequences. The time of divergence from MRCA estimated via BEAST2 for Mytilidae is close to 293 Mya pointing to the Silurian Period. By all these data a consensus for the phylogeny of the subfamily Mytilinae and its systematics is suggested. Particularly, the long-lived concerns on the mussel systematics resolved whether the family Mytilidae and the subfamily Mytilinae are monophyletic. The topology signal, that was strongly resolved in this paper and in the literature, has refuted the monophyly of the subfamily Mytilinae.
Article
Endogenous retroviruses (ERVs) are remnants of ancestral viruses in the host genome. The present study identified the expression of a defective retroviral env gene belonging to the ERV group V member Env (EnvV) in Felis catus (EnvV-Fca). EnV-Fca was specifically detected in the placental trophoblast syncytiotrophobic layer and expressed as a secreted protein in cultured cells. Genetic analyses indicated that EnvV2 genes are widely present in vertebrates and are under purifying selection among carnivores, suggesting a potential benefit for the host. This study suggests that birds, bats, and rodents carrying EnvV2 may play significant roles as intermediate vectors in spreading or cross-transmitting viruses among species. Our findings provide valuable insights into the evolution of ERV in vertebrate hosts.
Article
Full-text available
The regressive evolution of independent lineages often results in convergent phenotypes. Several teleost groups display secondary loss of the stomach, and four gastric genes, atp4a, atp4b, pgc, and pga2 have been co-deleted in agastric (stomachless) fish. Analyses of genotypic convergence among agastric fishes showed that four genes, slc26a9, kcne2, cldn18a, and vsig1, were co-deleted or pseudogenized in most agastric fishes of the four major groups. kcne2 and vsig1 were also deleted or pseudogenized in the agastric monotreme echidna and platypus, respectively. In the stomachs of sticklebacks, these genes are expressed in gastric gland cells or surface epithelial cells. An ohnolog of cldn18 was retained in some agastric teleosts but exhibited an increased non-synonymous substitution when compared with gastric species. These results revealed novel convergent gene losses at multiple loci among the four major groups of agastric fish, as well as a single gene loss in the echidna and platypus.
Article
The Fagaceae, a plant family with a wide distribution and diverse adaptability, has garnered significant interest as a subject of study in plant speciation and adaptation. Meanwhile, certain Fagaceae species are regarded as highly valuable wood resources due to the exceptional quality of their wood. In this study, we present two high-quality, chromosome-scale genome sequences for Quercus sichourensis (848.75 Mb) and Quercus rex (883.46 Mb). Comparative genomics analysis reveals that the difference in the number of plant disease resistance genes and the nonsynonymous and synonymous substitution ratio (Ka/Ks) of protein-coding genes among Fagaceae species are related to different environmental adaptations. Interestingly, most genes related to starch synthesis in the investigated Quercoideae species are located on a single chromosome, as compared to the outgroup species, Fagus sylvatica. Furthermore, resequencing and population analysis on Q. sichourensis and Q. rex. reveal that Q. sichourensis has lower genetic diversity and higher deleterious mutations compared to Q. rex. The high-quality, chromosome-level genomes and the population genomic analysis of the critically endangered Q. sichourensis and Q. rex will provide an invaluable resource as well as insights for future study in these two species, even the genus Quercus to facilitate their conservation.
Article
Full-text available
Chromosomal fusions represent one of the most common types of chromosomal rearrangements found in nature. Yet, their role in shaping the genomic landscape of recombination and hence genome evolution remains largely unexplored. Here, we take advantage of wild mice populations with chromosomal fusions to evaluate the effect of this type of structural variant on genomic landscapes of recombination and divergence. To this aim, we combined cytological analysis of meiotic crossovers (COs) in primary spermatocytes with inferred analysis of recombination rates based on linkage disequilibrium using single nucleotide polymorphisms. Our results suggest the presence of a combined effect of Rb fusions and Prdm9 allelic background, a gene involved in the formation of meiotic double strand breaks and postzygotic reproductive isolation, in reshaping genomic landscapes of recombination. We detected a chromosomal redistribution of meiotic recombination towards telomeric regions in metacentric chromosomes in mice with Robertsonian (Rb) fusions when compared to non-fused mice. This repatterning was accompanied by increased levels of CO interference and reduced levels of estimated recombination rates between populations, together with high levels of genomic divergence. Interestingly, we detected that Prdm9 allelic background was a major determinant of recombination rates at the population level, whereas Rb fusions showed limited effects, restricted to centromeric regions of fused chromosomes. Altogether, our results provide new insights into the effect of Rb fusions and Prdm9 background on meiotic recombination.
Article
Full-text available
Clonal expansion of antigen‐specific lymphocytes is the fundamental mechanism enabling potent adaptive immune responses and the generation of immune memory. Accompanied by pronounced epigenetic remodeling, the massive proliferation of individual cells generates a critical mass of effectors for the control of acute infections, as well as a pool of memory cells protecting against future pathogen encounters. Classically associated with the adaptive immune system, recent work has demonstrated that innate immune memory to human cytomegalovirus (CMV) infection is stably maintained as large clonal expansions of natural killer (NK) cells, raising questions on the mechanisms for clonal selection and expansion in the absence of re‐arranged antigen receptors. Here, we discuss clonal NK cell memory in the context of the mechanisms underlying clonal competition of adaptive lymphocytes and propose alternative selection mechanisms that might decide on the clonal success of their innate counterparts. We propose that the integration of external cues with cell‐intrinsic sources of heterogeneity, such as variegated receptor expression, transcriptional states, and somatic variants, compose a bottleneck for clonal selection, contributing to the large size of memory NK cell clones.
Article
Full-text available
Bacteria have developed various defense mechanisms to avoid infection and killing in response to the fast evolution and turnover of viruses and other genetic parasites. Such pan-immune system (defensome) encompasses a growing number of defense lines that include well-studied innate and adaptive systems such as restriction-modification, CRISPR-Cas and abortive infection, but also newly found ones whose mechanisms are still poorly understood. While the abundance and distribution of defense systems is well-known in complete and culturable genomes, there is a void in our understanding of their diversity and richness in complex microbial communities. Here we performed a large-scale in-depth analysis of the defensomes of 7759 high-quality bacterial population genomes reconstructed from soil, marine, and human gut environments. We observed a wide variation in the frequency and nature of the defensome among large phyla, which correlated with lifestyle, genome size, habitat, and geographic background. The defensome’s genetic mobility, its clustering in defense islands, and genetic variability was found to be system-specific and shaped by the bacterial environment. Hence, our results provide a detailed picture of the multiple immune barriers present in environmentally distinct bacterial communities and set the stage for subsequent identification of novel and ingenious strategies of diversification among uncultivated microbes.
Preprint
Full-text available
The CO 2 content of Earth’s atmosphere is rapidly increasing due to human consumption of fossil fuels. Models based on short-term culture experiments predict that major changes will occur in marine phytoplankton communities in the future ocean, but these models rarely consider how the evolutionary potential of phytoplankton or interactions within marine microbial communities may influence these changes. Here we experimentally evolved representatives of four phytoplankton functional types (silicifiers, calcifiers, coastal cyanobacteria, and oligotrophic cyanobacteria) in co-culture with a heterotrophic bacterium, Alteromonas , under either present-day or predicted future pCO 2 conditions. Growth rates of cyanobacteria generally increased under both conditions, and the growth defects observed in ancestral Prochlorococcus cultures at elevated pCO 2 and in axenic culture were diminished after evolution, possibly due to regulatory mutations in antioxidant genes. Except for Prochlorococcus , mutational profiles suggested phytoplankton experienced primarily purifying selection, but most Alteromonas lineages showed evidence of directional selection, especially when co-cultured with eukaryotic phytoplankton, where evolution appeared to favor a broad metabolic switch from growth on small organic acids to catabolism of more complex carbon substrates. Evolved Alteromonas were also poorer “helpers” for Prochlorococcus , supporting the assertion that the interaction between Prochlorococcus and heterotrophic bacteria is not a true mutualism but rather a competitive interaction stabilized by Black Queen processes. This work provides new insights on how phytoplankton will respond to anthropogenic change and on the evolutionary mechanisms governing the structure and function of marine microbial communities.
Preprint
Full-text available
Background The human lineage has suffered a skeleton gracilization compared to other primates and archaic populations such as the Neanderthals. This gracilization has been traditionally explained by differences in the mechanical load that our ancestors exercised. However, there is growing evidence that gracilization could be also genetically determined. Results We have analyzed the LRP5 gene from an evolutionary and functional point of view, taking advantage of the published genomes of archaic Homo populations. Mutations in LRP5 are involved in high bone mineral density conditions. Our results suggest that this gene has a complex evolutionary history both between archaic and anatomically modern humans and within the anatomically modern human populations. In particular, we identified the presence of different selective pressures in archaics and anatomically modern humans, as well as evidence of positive selection in the African and South East Asian populations from the 1000G. Furthermore, we observed limited evidence of archaic introgression in this gene at haplotypes of East Asian ancestry, compatible with a general clearing of the archaic introgression due to functional differences in archaics compared to anatomically modern humans. In agreement with this hypothesis, we observed private mutations in the archaic genomes that we experimentally validated as putatively increasing high bone mineral density. In particular, four of five archaic missense mutations affecting the first β-propeller of LRP5 displayed enhanced Wnt pathway activation, of which two also displayed reduced negative regulation. Conclusions In summary, these data suggest a genetic component contributing to the understanding of skeletal differences between anatomically modern humans and archaic Homo populations.
Article
Full-text available
Evidence from a variety of organisms points to convergent evolution on the mitochondria associated with a physiological response to oxygen deprivation or temperature stress, including mechanisms for high-altitude adaptation. Here, we examine whether demography and/or selection explains standing mitogenome nucleotide diversity in high-altitude adapted populations of three Andean waterfowl species: yellow-billed pintail ( Anas georgica ), speckled teal ( Anas flavirostris ), and cinnamon teal ( Spatula cyanoptera ). We compared a total of 60 mitogenomes from each of these three duck species ( n = 20 per species) across low and high altitudes and tested whether part(s) or all of the mitogenome exhibited expected signatures of purifying selection within the high-altitude populations of these species. Historical effective population sizes ( N e ) were inferred to be similar between high- and low-altitude populations of each species, suggesting that selection rather than genetic drift best explains the reduced genetic variation found in mitochondrial genes of high-altitude populations compared to low-altitude populations of the same species. Specifically, we provide evidence that establishment of these three Andean waterfowl species in the high-altitude environment, coincided at least in part with a persistent pattern of negative purifying selection acting on oxidative phosphorylation (OXPHOS) function of the mitochondria. Our results further reveal that the extent of gene-specific purifying selection has been greatest in the speckled teal, the species with the longest history of high-altitude occupancy.
Article
Full-text available
We are launching a series to celebrate the 40th anniversary of the first issue of Molecular Biology and Evolution. In 2024, we will publish virtual issues containing selected papers published in the Society for Molecular Biology and Evolution journals, Molecular Biology and Evolution and Genome Biology and Evolution. Each virtual issue will be accompanied by a perspective that highlights the historic and contemporary contributions of our journals to a specific topic in molecular evolution. This perspective, the first in the series, presents an account of the broad array of methods that have been published in the Society for Molecular Biology and Evolution journals, including methods to infer phylogenies, to test hypotheses in a phylogenetic framework, and to infer population genetic processes. We also mention many of the software implementations that make methods tractable for empiricists. In short, the Society for Molecular Biology and Evolution community has much to celebrate after four decades of publishing high-quality science including numerous important inferential methods.
Article
Full-text available
A model of evolutionary base substitutions that can incorporate different substitutional rates between the four bases and that takes into account unequal composition of bases in DNA sequences is proposed. Using this model, we derived formulae that enable us to estimate the evolutionary distances in terms of the number of nucleotide substitutions through comparative studies of nucleotide sequences. In order to check the validity of various formulae, Monte Carlo experiments were performed. These formulae were applied to analyze data on DNA sequences from diverse organisms. Particular attention was paid to problems concerning a globin pseudogene in the mouse and the time of its origin through duplication. We obtained a result suggesting that the evolutionary rates of substitution in the first and second codon positions of the pseudogene were roughly 10 times faster than those in the normal globin genes; whereas, the rate in the third position remained almost unchanged. Application of our formulae to histone genes H2B and H3 of the sea urchin showed that, in each of these genes, the rate in the third codon position is tremendously higher than that in the second position. All of these observations can easily and consistently be interpreted by the neutral theory of molecular evolution.
Article
Rabbit chromosomal DNA contains a cluster of four linked beta-like globin genes arranged in the orientation 5'-beta 4-(8kb)-beta 3-(5 kb)-beta 2-(7-kb)-beta 1-3'. Determination of the nucleotide sequence of gene beta 1 confirms that this gene corresponds to the second type of two common co-dominant alleles encoding the adult beta-globin chain. With the exception of two nucleotide substitutions in the large intervening sequence (intron), the intron and flanking sequences are identical with the nucleotide sequence of the first type determined by Weissmann et al. (1979). A 14S polyadenylated transcript containing large intron sequences (possibly a mRNA precursor) is detected in the bone marrow cells of anemic rabbits. Gene beta 2 has limited sequence homology to adult and embryonic beta-globin probes and lacks a detectable mRNA transcript in the erythropoietic tissues examined. It contains at least one intervening sequence analogous to the large intron in gene beta 1. Genes beta 3 and beta 4 both contain an intron of 0.8 kb. Partial DNA sequence analysis indicates that the large intron in beta 4 is located between codons for amino acids lysine and leucine in an analogous position to that of the large intron in beta 1. In addition, a second smaller intron interrupts the 5' coding sequences of gene beta 4. Both genes beta 3 and beta 4 are transcribed in embryonic globin-producing cells. Their DNA sequence homology is limited, however, to a segment of approximately 0.2 kb located on the 5' side of the large intron.
Article
A new method is proposed for estimating the number of synonymous and nonsynonymous nucleotide substitutions between homologous genes. In this method, a nucleotide site is classified as nondegenerate, twofold degenerate, or fourfold degenerate, depending on how often nucleotide substitutions will result in amino acid replacement; nucleotide changes are classified as either transitional or transversional, and changes between codons are assumed to occur with different probabilities, which are determined by their relative frequencies among more than 3,000 changes in mammalian genes. The method is applied to a large number of mammalian genes. The rate of nonsynonymous substitution is extremely variable among genes; it ranges from 0.004 X 10(-9) (histone H4) to 2.80 X 10(-9) (interferon gamma), with a mean of 0.88 X 10(-9) substitutions per nonsynonymous site per year. The rate of synonymous substitution is also variable among genes; the highest rate is three to four times higher than the lowest one, with a mean of 4.7 X 10(-9) substitutions per synonymous site per year. The rate of nucleotide substitution is lowest at nondegenerate sites (the average being 0.94 X 10(-9), intermediate at twofold degenerate sites (2.26 X 10(-9)). and highest at fourfold degenerate sites (4.2 X 10(-9)). The implication of our results for the mechanisms of DNA evolution and that of the relative likelihood of codon interchanges in parsimonious phylogenetic reconstruction are discussed.
Article
The assumptions underlying the use of the Poisson distribution are essentially that the probability of an event is small but nearly identical for all occurrences and that the occurrence of an event does not alter the probability of recurrence of such events. These assumptions do not seem to be met for evolutionary events since (i) the probability of fixing nucleotide codon substitutions is not equal for all substitutions at a codon, and probably varies for the same substitution in different lineages; (ii) the probability of fixing codon substitutions varies among positions of a cistron; and (iii) the fixation of a nucleotide codon substitution at one position in a cistron modifies, and may even promote, the fixation of a codon substitution elsewhere along the cistron. Natural selection presumably is the causative factor that acts to modify the probability of a nucleotide codon substitution's being fixed in a population. The use of the negative binomial distribution is consistent with the evidence that selective pressure on amino acid or nucleotide codon positions varies both among codon positions of a cistron and at a particular position during evolutionary time. If the number of fixations of nucleotide codon substitutions per position of cistrons encoding cytochromes c are phyletically inferred (phylogeny based on a paleontological record) rather than phenetically inferred (based on paired comparisons of extant species' differences in the absence of a phylogeny) the distribution of these fixation data cannot be described adequately by a single Poisson distribution. The fit of these same data to a negative binomial distribution is very satisfactory. It has been argued that the fit of phenetically inferred fixation data, which do not take account of parallel or reverse fixations, to the Poisson distribution was supportive evidence for the hypothesis that protein evolution results from the fixation of selectively neutral codon substitutions. This argument now appears to be undercut by the evidence that data on nucleotide codon fixation are more probably distributed according to the negative binomial distribution. The fact that fixation data can be described by a particular discrete probability distribution does not of itself provide insight into the mechanisms of the evolutionary process. However, the facts—(i) that the assumptions underlying the use of the negative binomial distribution adequately deal with the varying probability of fixing amino acid or nucleotide codon substitutions at and among the positions of a cistron and (ii) that the negative binomial distribution provides an excellent fit for the phyletically inferred fixation data—suggest that the negative binomial is a very appropriate discrete probability distribution for describing evolutionary events. Amino acids or their nucleotide codon substitutions may be fixed at a position of a cistron as though selectively neutral relative to the codon being replaced, even though the codon position will not be selectively neutral, since many amino acids cannot function there. The negative binomial distribution treats this situation well whereas a single Poisson distribution could only be satisfactory if all codon positions that could vary were selectively neutral.
Article
We report the complete nucleotide sequence of the human beta-globin gene. The purpose of this study is to obtain information necessary to study the evolutionary relationships between members of the human beta-like globin gene family and to provide the basis for comparing normal beta-globin genes with those obtained from the DNA of individuals with genetic defects in hemoglobin expression.
Article
A mathematical model for codon substitution is presented, taking into account unequal mutation rates among different nucleotides and purifying selection. This model is constructed by using a 61 X 61 transition probability matrix for the 61 nonterminating codons. Under this model, a computer simulation is conducted to study the numbers of silent (synonymous) and amino acid-altering (nonsynonymous) nucleotide substitutions when the underlying mutation rates among the four kinds of nucleotides are not equal. It is assumed that the substitution rates are constant over evolutionary time, the codon frequencies being in equilibrium, and, thus, the numbers of synonymous and nonsynonymous substitutions both increase linearly with evolutionary time. It is shown that, when the mutation rates are not equal, the estimate of synonymous substitutions obtained by F. Perler, A. Efstratiadis, P. Lomedico, W. Gilbert, R. Kolodner and J. Dodgson's "Percent Corrected Divergence" method increases nonlinearly, although the true number of synonymous substitutions increases linearly. It is, therefore, possible that the "saturation" of synonymous substitutions observed by Perler et al. is due to the inefficiency of their method to detect all synonymous substitutions.
Article
Comparison of about 50 pairs of homologous nucleotide sequences for different genes revealed that the substitutions between synonymous codons occurred at much higher rates than did amino acid substitutions. Furthermore, five pairs of mRNA sequences for different genes were compared in species that had diverged at the same time. The evolutionary rate of synonymous substitution was estimated to be 5.1 X 10(-9) per site per year on the average and is approximately constant among different genes. It also is suggested that this property would be suitable for a molecular clock to determine the evolutionary relationships and branching order of duplicated genes. Each functional block of the noncoding region evolves with a rate that is almost constant, regardless of the types of genes. The intervening sequence and the 5' portion of the 3' noncoding region show considerable divergence, the extent of which is almost comparable to that in the synonymous codon sites, whereas the other blocks consisting of the 5' noncoding region and the 3' portion of the 3' noncoding region are strongly conserved, showing approximatley half of the divergence of the synonymous sites. This strong sequence preservation might be due to the functional requirements for transcription and modification of mRNA.
Article
The pattern of point mutations is inferred from nucleotide substitutions in pseudogenes. The pattern obtained suggests that transition mutations occur somewhat more frequently than transversion mutations and that mutations result more often in A or T than in G or C. Our results are discussed with respect to the predictions from Topal and Fresco's model for the molecular basis of point (substitution) mutations (Nature 263:285–289, 1976). The pattern of nucleotide substitution at the first and second positions of codons in functional genes is quite similar to that in pseudogenes, but the relative frequency of the transition CT in the sense strand is drastically reduced and those of the transversions CG and GC are doubled. The differences between the two patterns can be explained by the observation that in the protein evolution amino acid substitutions occur mainly between amino acids with similar biochemical properties (Grantham, Science 185:862–864, 1974). Our results for the patterns of nucleotide substitutions in pseudogenes and in functional genes lead to the prediction that both the coding and non-coding regions of protein coding genes should have high frequencies of A and T. Available data show that the non-coding regions are indeed high in A and T but the coding regions are low in T, though high in A.
Article
We have characterized a clone carrying a chicken preproinsulin gene, which is present in only one copy in the chicken genome. The gene contains two introns: a 3.5 kb intron interrupting the region encoding the connecting peptide and a 119 bp intron interrupting the DNA corresponding to the 5' non-coding region of the mRNA. This is similar to the structure of rat insulin gene II; therefore it represents the common ancestor. Since the rat insulin gene I lacks a 499 bp intron in the coding region, the rat genes have evolved by a recent gene duplication followed by loss of this intron in one copy. The divergences between insulin gene sequences, and also between globin genes, show that changes at introns and silent positions in coding regions appear very rapidly (7 X 10(-9) substitutions per nucleotide site per year), but that the accumulation of changes in these sites saturates, although not completely, after about 100 million years. From this we conclude that not all of these sites are neutral and that they do not behave as accurate evolutionary clocks over long periods of time. However, nucleotide substitutions leading to amino acid replacements are an excellent clock. Our analysis indicates that this clock is driven by selection.
Article
DNA sequence analysis of a cloned partially deleted human alpha-thalassemia globin gene revealed a novel 3' untranslated region displaying at least nineteen differences when compared with previously published alpha mRNA sequences. Restriction enzyme mapping established the origin of the alpha-thalassemia gene as the more 3' of the normal, duplicated alpha genes (alpha 1). DNA sequencing of a previously isolated alpha 1 gene revealed a 3' untranslated region identical to that of the alpha-thalassemia gene. The sequence of the corresponding region of the more 5' alpha gene (alpha 2) was consistent with published mRNA sequences except in three probably polymorphic positions. Therefore the 3' untranslated regions of the highly homologous alpha-globin genes differ significantly. The recognition that the duplicated alpha genes differ in a region expressed in mature mRNA should not permit direct assessment of relative gene output in various normal and pathologic states. The divergence of the alpha gene 3' untranslated regions in the face of minimal coding sequence differences must be reconciled with current models for matching homologous gene sequences by recombination events.
Analysis of the adult chicken B-globin gene
  • M Dolan
  • J B Dodgeson
DOLAN, M., J. B. DODGESON, and J. D. ENGEL. 1983. Analysis of the adult chicken B-globin gene. J. Biol. Chem. 258:3983-3990
Evolution of protein molecules. Pp. 2 1-132 in H Mammalian protein metabolism III Molecular evolution of human and rabbit P-globin mRNAs
  • C R Cantor
  • Kafatos Gojobori
  • A Efstratiadis
  • B G Forget
  • S M Weissman
JUKES, T. H., and C. R. CANTOR. 1969. Evolution of protein molecules. Pp. 2 1-132 in H. N. MUNRO, ed. Mammalian protein metabolism III. Academic Press, New York. 426 Nei and Gojobori KAFATOS, F. C., A. EFSTRATIADIS, B. G. FORGET, and S. M. WEISSMAN. 1977. Molecular evolution of human and rabbit P-globin mRNAs. Proc. Natl. Acad. Sci. USA 74:56 18-5622.