Article

RNA-based Phylogenetic Methods: Application to Mammalian Mitochondrial RNA Sequences

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The PHASE software package allows phylogenetic tree construction with a number of evolutionary models designed specifically for use with RNA sequences that have conserved secondary structure. Evolution in the paired regions of RNAs occurs via compensatory substitutions, hence changes on either side of a pair are correlated. Accounting for this correlation is important for phylogenetic inference because it affects the likelihood calculation. In the present study we use the complete set of tRNA and rRNA sequences from 69 complete mammalian mitochondrial genomes. The likelihood calculation uses two evolutionary models simultaneously for different parts of the sequence: a paired-site model for the paired sites and a single-site model for the unpaired sites. We use Bayesian phylogenetic methods and a Markov chain Monte Carlo algorithm is used to obtain the most probable trees and posterior probabilities of clades. The results are well resolved for almost all the important branches on the mammalian tree. They support the arrangement of mammalian orders within the four supra-ordinal clades that have been identified by studies of much larger data sets mainly comprising nuclear genes. Groups such as the hedgehogs and the murid rodents, which have been problematic in previous studies with mitochondrial proteins, appear in their expected position with the other members of their order. Our choice of genes and evolutionary model appears to be more reliable and less subject to biases caused by variation in base composition than previous studies with mitochondrial genomes.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... These characters, however, are no less independent than the other helices throughout the rRNA molecule, which often account for over 80% of the data (Higgs, 2000). Given that more attention is being paid to addressing the issue of stem interdependence in rRNA molecules (Savill et al., 2000 and others therein;Jow et al., 2002;Hudelot et al., 2003), the next step is to accommodate the non-independence of complementary RECs within a maximum likelihood approach. While an acceptable model for ambiguously-aligned regions has not been formulated, most likely due to the difficulty associated with modeling indels, compositional information within RECs and terminal bulges may be modeled to determine the directionality of expansion and contraction in an rRNA hairpin-stem loop. ...
... Phylogenetic studies that simultaneously incorporate RNA and DNA models for the analysis of rRNA and tRNA sequences are beginning to appear (e.g., Hudelot et al., 2002;Jow et al., 2003;Kjer, 2004;Gibson et al., 2005;Gillespie et al., 2005). expansion segments from 231 chrysomelid leaf beetles (Insecta: Coleoptera) (Fig. 11). ...
... Published phylogenetic studies using mixed RNA models in PHASE have all used model 7A (Hudelot et al., 2002;Jow et al., 2003;Gillespie et al., 2005), and performance of models 7A and 7D have not been compared with empirical datasets. ...
Article
Full-text available
Recent discoveries of basal extracellular Rickettsiales have illuminated divergent evolutionary paths to host dependency in later-evolving lineages. Family Rickettsiaceae, primarily comprised of numerous protist- and invertebrate-associated species, also includes human pathogens from two genera, Orientia and Rickettsia. Once considered sister taxa, these bacteria form distinct lineages with newly appreciated lifestyles and morphological traits. Contrasting other rickettsial human pathogens in Family Anaplasmataceae, Orientia and Rickettsia species do not reside in host-derived vacuoles and lack glycolytic potential. With only a few described mechanisms, strategies for commandeering host glycolysis to support cytosolic growth remain to be discovered. While regulatory systems for this unique mode of intracellular parasitism are unclear, conjugative transposons unique to Orientia and Rickettsia species provide insights that are critical for determining how these obligate intracellular pathogens overtake eukaryotic cytosol.
... Most widely used programs for phylogenetic inference do not implement these RNA-specific substitution models, though MrBayes (Ronquist et al., 2012) does provide a doublet model. The PHASE package (Jow et al., 2002;Hudelot et al., 2003;Allen and Whelan, 2014) was specifically designed for phylogenetic analyses of RNA and includes RNA-specific substitution models. Application of these RNA models has often demonstrated their superiority over commonly used DNA models based on their shorter inferred branch lengths and higher likelihoods (e.g. ...
... Application of these RNA models has often demonstrated their superiority over commonly used DNA models based on their shorter inferred branch lengths and higher likelihoods (e.g. Hudelot et al., 2003;Telford et al., 2005;Patiño-Galindo et al., 2018). But other studies have found that using RNA models downweights phylogenetic signal from stems, thereby effectively up-weighting signal from loops (Letsch et al., 2010;Letsch and Kjer, 2011). ...
... Application of RNA-specific models is thus theoretically justified but still largely confined to studies of ancient lineages (e.g. Hudelot et al., 2003;Mallatt et al., 2010;Letsch and Kjer, 2011;Allen and Whelan, 2014;Patiño-Galindo et al., 2018). ...
Article
Full-text available
Background and aims: Compensatory base changes (CBCs) that occur in stems of ribosomal internal transcribed spacer 2 (ITS2) can have important phylogenetic implications because they are not expected to occur within a single species and also affect selection of appropriate DNA substitution models. These effects have been demonstrated when studying ancient lineages. Here we examine these effects to quantify their importance within a more recent lineage by using both DNA- and RNA-specific models. Methods: We examined the phylogenetic implications of the CBC process by using a comprehensive sampling of ITS2 from ten closely related species of Corydalis. We predicted ITS2 secondary structures by using homology modelling, which was then used for a structure-based alignment. Paired and unpaired regions were analysed separately and in combination by using both RNA-specific substitution models and conventional DNA models. We mapped all base-pair states of CBCs on the phylogenetic tree to infer their evolution and relative timing. Key results: Our results indicate that selection acted to increase the thermodynamic stability of the secondary structure. Thus, the unpaired and paired regions did not evolve under a common substitution model. Only two CBCs occurred within the lineage sampled and no striking differences in topology or support for the shared clades were found between trees constructed using DNA- or RNA-specific substitution models. Conclusions: Although application of RNA-specific substitution models remains preferred over more conventional DNA models, we infer that application of conventional DNA models is unlikely to be problematic when conducting phylogenetic analyses of ITS2 within closely related lineages wherein few CBCs are observed. Each of the two CBCs was found within the same lineages but was not observed within a given species, which supports application of the CBC species concept.
... although the hypothesis they present is based upon relatively simple evolutionary models, including maximum parsimony, and a Bayesian analysis employing a model that allows for differential rates of transitions and transversions (HKY85). Phylogenetic studies have demonstrated that structural partitioning (i.e., the partitioning of RNA sequences into 'stems' and 'loops' and separately modeling the substitution parameters in each) and the use of complex evolutionary models may better account for the actual mutational processes occurring in RNA sequences with conserved secondary structure (Wilgenbusch and de Querioz, 2000;Savill et al, 2001;Jow et al, 2002;Hudelot et al, 2003;Kjer, 2004;Telford et al., 2005). In particular, maximum likelihood approaches to phylogeny reconstruction have facilitated the development of biologically realistic models of RNA sequence evolution, and specifically, several models are now available which treat stem nucleotides as paired sites, and thus account for the possible non-independence of sites within stempairing regions (e.g. ...
... In particular, maximum likelihood approaches to phylogeny reconstruction have facilitated the development of biologically realistic models of RNA sequence evolution, and specifically, several models are now available which treat stem nucleotides as paired sites, and thus account for the possible non-independence of sites within stempairing regions (e.g. Tillier and Collins, 1998;Schoniger and von Haeseler, 1999;Higgs, 2000;Savill et al, 2001;Jow et al, 2003). ...
... Recently, several substitution models have been proposed which treat stem nucleotides as paired sites (see Savill et al, 2001, and references therein), and a number of studies have supported their utility for RNAs with conserved secondary structure (e.g. Savill et al, 2001;Hudelot et al, 2003;Telford et al, 2005). The more parameter rich models provide rates for the commonly observed base pairs in secondary structure (i.e. ...
... These covariation patterns of paired sites do not display independent phylogenetic signal. Ignoring this correlation, results in an overestimation of phylogenetic information of these sites, which can lead to inflated measurements of tree robustness (Schoeniger & von 1994;Rzhetsky 1995;Tillier & Collins 1995;Tillier & Collins 1998;Parsch et al. 2000;Savill et al. 2001;Jow et al. 2002;Hudelot et al. 2003;Higgs et al. 2003). As solution to the two above mentioned drawbacks, rRNA secondary structure information as independent set of characters have been used to (1) guide alignments of ribosomal RNA genes, as structure motifs can provide frame homology statements for positional alignment and (2) to aid tree reconstruction by the application of specific RNA substitution models, which take interdependence of corresponding sites into account. ...
... As a consequence, the application of secondary structure information in specifically focused phylogenetic analyses should rely on taxonomically lessinclusive models, to benefit most from the additional information (Misof et al. 2006). However, with a few exceptions (Kjer et al. 1994;Kjer 1995;Gillespie et al. 2006;Niehuis et al. 2006a;Niehuis et al. 2006b), taxon-specific models of complete mitochondrial and nuclear ribosomal genes are rare at present, but as the application of sequence evolution models, which take correlation of paired sites of rRNA molecules into account, requires explicit statements on base pairings in a given data set (Jow et al. 2002;Hudelot et al. 2003), investigations on individual and reliable secondary-structure models should be paramount in phylogenetic analyses based on rRNA genes. ...
... Only few tools are available, which try to apply objective algorithms to identify Stephan 1996; Tillier & Collins 1998;Parsch et al. 2000) and implemented in phylogenetic methods Hudelot et al. 2003;Ronquist & Huelsenbeck 2003), further enhanced by the possibility to combine RNA and DNA substitution models in partitioned data (see chapter 4.2. and 5.2; Hudelot et al. 2003;Nylander et al. 2004;Niehuis et al. 2006b;Fleck et al. submitted). ...
Thesis
Full-text available
Ribosomal genes still form the backbone of molecular systematics. A pattern ofhighly variable areas, nested within conserved, slowly substituting elements, holdsvaluable sources for studying phylogenetic relationships of both recent andancient splits. Nevertheless, recent studies have shown some drawbacks of thesemarkers: Many variable domains in ribosomal sequences are only ambiguouslyalignable and functional constraints on ribosomal secondary structures result ininterdependent variation of correlated sites. These covariation patterns of pairedsites do not display independent phylogenetic signal. Ignoring this correlation,results in an overestimation of phylogenetic information of these sites.The present thesis aims to investigate the potentials of a new alignment approach,RNASALSA. This method considers secondary structure information of ribosomalsequences in the alignment process. Structure motifs can provide frame homologystatements for positional alignment. They further allow the application of specificRNA substitution models in tree reconstruction which take interdependence ofcorresponding sites into account. This thesis further seeks to reconstruct the phylogenetic relationships ofanisopteran dragonflies. The monophyly of Anisoptera, with roughly 2600 extanttaxa a fairly species poor suborder of the insect order Odonata, is assumed innearly all recent work on dragonfly phylogenetics. Within Anisoptera, species areclassified into 13 families. All families seem to form monophyletic groups, wellcharacterized by morphological autapomorphies. But among these groups, therelationships remain disputed. Special focus is on the Libellulidae. This familyrepresents the most diverse clade and many of the 13 traditionally recognizedsubfamilies have been proposed unnatural. To investigate the relationships within Anisoptera and Libellulidae, the almostentire sequences of nuclear (nc) 28S rRNA, as well as on a major fragment of themitochondrial (mt) 12S rRNA, the complete mt tRNA Valine and the (nearly)complete mt 16S rRNA were used. The taxon sampling includes representatives ofall anisopteran families. Within Libellulidae, species of all currently recognizedsubfamilies are considered. Alignment was done with RNASALSA and further treereconstruction relied on Bayesian inference with application of specific RNAsubstitution models. RNASALSA (developed by Roman Stocsits, University of Leipzig, in cooperation withthe work group of Bernhard Misof, Forschungsmuseum Koenig, Bonn) is a newmethod for aligning ribosomal RNA sequences, by adopting thermodynamicfoldingand comparative evidence algorithms, combined in a suitable framework.The program simultaneously generates secondary structures for a set ofhomologous RNA genes and aligns them by taking sequence and structureinformation into account. These additional information for each individualsequence extent the scoring function of the alignment algorithm. In RNASALSA,simple maximal sequence similarity is not the exclusive score, as with secondarystructure, functional properties of the molecule are incorporated and corroboratehomology hypotheses for individual sequence positions.To test the potentials of RNASALSA, a secondary structure model of the nuclear 28Swas investigated and compared with previously published models of arthropod28S structures. RNASALSA was able to predict nearly 95 % of the conserved coredomains of the 28S molecule. Covariation pattern additionally supportedconserved structure motifs in the expansion segments D2 and D10. Structures ofthe remaining expansion segments mainly rely on thermodynamic folding.Bayesian analysis of the combined sequence data and applied RNA substitutionmodels corroborates the commonly proposed monophyly of Anisoptera.Furthermore, the superfamily Libelluloidea and all major families of theAnisoptera appear monophyletic. Aeshnomorpha (Aeshnidae + Austropetaliidae)form the sister group to all remaining Anisoptera and Petaluridae turned out assister group to Libelluloidea. Nodal support for these relationships appears low. A subsequent analysis which ignores correlation of base pairings in both thealignment process and tree reconstruction shows an essentially congruenttopology with increased nodal support. These observations corroborate thepreviously proposed influence of nucleotide interdependences: If paired sites arestrongly correlated, but treated as independent, phylogenetic information isscored twice, thus leading to unjustified high support for in itself stronglysupported trees. The phylogenetic reconstruction of the Libellulidae reveals that many subfamiliesof this family are not monophyletic. The common classification seems artificial:only Leucorrhiniinae, Urothemiinae, Zyxommatinae and Rhyothemistinae turnedout monophyletic. Libellulinae appear paraphyletic and members ofBrachydiplacinae, Trithemistinae and Tetrathemistinae are scattered throughoutthe libellulid tree.
... Finally, RNAforester has a correlation coefficient of about 0.65 between true and predicted structures for the divergence times of the snoRNA1 and miRNA1 alignments [55]. I used the con- Phase-2.0 is a software package for PHylogenetics And Sequence Evolution analysis [97]. It is an RNA analogue of PAML which can also deal with DNA sequences. ...
... Previous studies using Phase showed that general models fit known ncRNAs better than simpler models [97]. In the context of the work described in this thesis, the criteria for model selection was therefore to use the most general model that the amount of available data would allow. ...
... Moreover, estimating evolutionary rates for base paired nucleotides is imperative for assessing my hypothesis. Phase [97] is the only available software that allows RNA evolutionary analysis using both structure information and class definitions. ...
Article
I would like to thank my supervisor Prof. Jotun Hein for providing the academic environment which allowed me to grow as a Bioinformatician. From my group, I am also internally indebted to Dr. Rune Lyngsoe; he is a very conscientious individual and I shall certainly aspire to follow his model. My first supervisor in Bioinformatics was Dr. Gerton Lunter who transmitted his passion for the field to me. Throughout the D.Phil, I have continued collaborating with him, and with Dr. Andrea Rocco we published on statistical alignment. Prof. Dave Gavaghan and Dr. James Wakefield, with the financial assistance of the Clarendon Fund, allowed me onto Oxford and the DTC course where I learnt a great deal. I would also like to express my gratitude to all my teachers from the little primary school I attended in central Algiers to Oxbridge. They have all inspired me and they all throve to impart their knowledge onto me. My friends over the years have contributed a great deal to my happiness. I’ve lost many and gained many along the way, but every friend I’ve had has taught me something. You know who you are. Thank you.
... Multiple sequence alignments were carried out using T-COFFEE (Notredame et al. 2000) and the alignment was edited to remove columns having more than 20% gaps. Phylogenetic analysis was performed using the MCMC program available in the PHASE package (Jow et al. 2003). The mtREV24 amino acid substitution matrix with a gamma distribution for site variation with four categories was used. ...
... The general reversible model was used for the unpaired sites, and a model specifically treating compensatory substitutions was used for the paired sites (model 7A in Savill et al. 2001). Parameters for both models are optimized simultaneously by the MCMC program (see Hudelot et al. 2003 and the documentation to the PHASE package available at http://www.bioinf.man.ac.uk/resources/phase/). For both models, variation of rates across sites was accounted four using 4 gamma-distributed categories. ...
... In addition, the programs RAxML version 8.2.12 [23], MrBayes version 3.2.7a [24], and PHASE package 2.0 [25][26][27][28][29] were used. ...
Article
Full-text available
Coccoid Ulvophyceae are often overlooked despite their wide distribution. They occur as epiphytes on marine seaweeds or grow on stones or on shells of mussels and corals. Most of the species are not easy to identify based solely on morphology. However, they form two groups based on the flagellated cells during asexual reproduction. The biflagellated coccoids are monophyletic and represent the genus Sykidion (Sykidiales). In contrast, the quadriflagellated taxa are polyphyletic and belong to different genera and orders. The newly investigated strains NIES-1838 and NIES-1839, originally identified as Halochlorococcum, belong to the genus Chlorocystis (C. john-westii) among the order Chlorocystidales. The unidentified strain CCMP 1293 had almost an identical SSU and ITS-2 sequence to Symbiochlorum hainanense (Ignatiales) but showed morphological differences (single chloroplast, quadriflagellated zoospores) compared with the original description of this species (multiple chloroplasts, aplanospores). Surprisingly, the strain SAG 2662 (= ULVO-129), together with the published sequence of MBIC 10461, formed a new monophyletic lineage among the Ulvophyceae, which is highly supported in all of the bootstrap and Bayesian analyses and approximately unbiased tests of user-defined trees. This strain is characterized by a spherical morphology and also form quadriflagellated zoospores, have a unique ITS-2 barcode, and can tolerate a high variation of salinities. Considering our results, we emend the diagnosis of Symbiochlorum and propose the new genus Solotvynia among the new order Solotvyniales.
... The 28S and 18S ratio is 2:1 but actual ratio is 2.7:1. The secondary RNA structure is very useful in calculating the phylogenic inferences (Jow et al, 2002;Hudelot et al, 2003;Anonymous, 2003;Anonymous, 2020;Anonymous, 2020a). 28S rDNA sequences showed great resolution for Schwenkiella icemi (Pal and Singh, 2016). ...
Article
Full-text available
Single stranded RNA molecules quickly fold due to hydrogen bonding mechanism if they are left in their environment. Helices which are made from the folding process known as stem. Only six (AU, GU, GC, UA, UG & CG) are stable to form base pairs among 16 possible ones. The nucleotide sequence of stems can vary and made variable RNA helical regions. The substitution of RNA bases are important in maintaining the secondary structure of RNA. DNA structure is not important to study the evolutionary models because that is double stranded due to which the base pairs in DNA does not give accurate results. So, secondary structure of RNA gives validate consequences in evolution of parasitic nematodes of Periplaneta americana. How to cite : Sangeeta Pal, Manoj Nimesh, Ashish Kumar Gupta and Pankaj Pandey (2021) Phylogenetic comparison of some nematode parasites of Periplaneta americana based on electropherogram and secondary RNA structure. J. Exp. Zool. India 24, 1545-1552. DocID: https://connectjournals.com/03895.2021.24.1545
... The following methods were used for the phylogenetic analyses: distance, maximum parsimony, maximum likelihood, and Bayesian inference. Programs used included PAUP version 4.0b169 [32] and the PHASE package 2.0 [33][34][35][36][37]. For the Bayesian calculations, the secondary structure models of SSU and ITS (RNA7D in PHASE) were also taken into account. ...
Article
Full-text available
Two novel Chlamydomonas-like species, belonging to the Moewusii clade, have been described. The first species inhabits eutrophic and neutral to basic pH waters in Sweden and England. It is easily recognizable under a light microscope due to its morphology (a small green prolate spheroidal shape with a large and truncated papilla at its anterior end, two equal flagella, a single lateral eyespot, a basal nucleus, and a well-defined pyrenoid) and to its peculiar whole-body pendulum movement while resting on surfaces or attached to floating particles. The species occurs as free-living individuals and is able to gather temporarily into groups of individual cells. No particular binding structures or palmelloid cells were observed in cultures. The second species, previously assigned to Chlamydomonas cf. proboscigera, was collected from persistent snow in Svalbard, Norway. Its morphology is revised herein. Using SSU rDNA sequence analyses, these two species formed a well-supported clade. Moreover, ITS-2 secondary structure analyses confirmed sexual incompatibility between these biological species. Considering these results, a new genus Limnomonas and its type species L. gaiensis and L. spitsbergensis are proposed.
... Consequently, the SSU and ITS sequences were included into the SSU/ITS dataset (20 taxa, 2375 bp) and analyzed using the same programs as described in Frantal et al. [1]. The analyses were conducted using PAUP version 4.0a169 [15], RAxML [16] and PHASE [17][18][19][20][21]. The settings of the best models according to the Akaike Information Criterion are provided in the figure legends. ...
Article
Full-text available
Ciliates of the genus Urotricha are widely distributed and occur in almost any freshwater body. Thus far, almost all species have been described from morphology only. Here, we applied an integrative approach on the morphology, molecular phylogeny and biogeography of two species isolated from high mountain lakes in the Central Alps, Austria. As these remote lakes are known to have water temperatures <15 °C, our hypothesis was that these urotrichs might prefer ‘cold’ environments. We studied the morphological details from living and silver-stained individuals, and their molecular sequences (ribosomal operon, ITS), and screened available datasets for their biogeography. The two Urotricha species resembled morphological features of several congeners. An accurate species assignment was difficult due to several overlapping characteristics. However, we tentatively attributed the investigated species to Urotricha nais and Urotricha globosa. The biogeographic analyses revealed their occurrence in Europe, Africa and Asia, and no correlations to (cold) temperatures were found. Our findings suggest that these two urotrichs, originating from two cold and remote habitats, are probably cryptic species well adapted to their harsh environment.
... Compensatory substitutions occur frequently in the paired regions; the property contradicts the assumption of independent mutations [95,96]. The analysis of RNA secondary structures is helpful to aid the alignment of rRNA sequences [97] and contributes to the increasingly sophisticated models of sequence evolution being applied in maximum likelihood and Bayesian approaches [20,98]. For instance, the doublet model of MrBayes [81,82] is intended for stem regions of ribosomal sequences, where nucleotides pair with each other to form doublets. ...
Article
Full-text available
Comparative studies on mitochondrial genomes (mitogenomes) as well as the structure and evolution of the mitochondrial control region are few in the Lacertidae family. Here, the complete mitogenomes of five individuals of Eremias scripta (2 individuals), Eremias nikolskii, Eremias szczerbaki, and Eremias yarkandensis were determined using next-generation sequencing and were compared with other lacertids available in GenBank. The circular mitogenomes comprised the standard set of 13 protein-coding genes (PCGs), 22 transfer RNA genes, 2 ribosomal RNA genes and a long non-coding control region (CR). The extent of purifying selection was less pronounced for the COIII and ND2 genes in comparison with the rest of the PCGs. The codons encoding Leucine (CUN), Threonine, and Isolecucine were the three most frequently present. The secondary structure of rRNA of Lacertidae (herein, E. scripta KZL15 as an example) comprised four domains and 28 helices for 12S rRNA, with six domains and 50 helices for 16S rRNA. Five types and twenty-one subtypes of CR in Lacertidae were described by following the criteria of the presence and position of tandem repeats (TR), termination-associated sequence 1 (TAS1), termination-associated sequence 2 (TAS2), conserved sequence block 1 (CBS1), conserved sequence block 2 (CSB2), and conserved sequence block 3 (CSB3). The compositions of conserved structural elements in four genera, Acanthodactylus, Darevskia, Eremias, and Takydromus, were further explored in detail. The base composition of TAS2 – TATACATTAT in Lacertidae was updated. In addition, the motif “TAGCGGCTTTTTTG” of tandem repeats in Eremias and the motif ”GCGGCTT” in Takydromus were presented. Nucleotide lengths between CSB2 and CSB3 remained 35 bp in Eremias and Darevskia. The phylogenetic analyses of Lacertidae recovered the higher-level relationships among the three subfamilies and corroborated a hard polytomy in the Lacertinae phylogeny. The phylogenetic position of E. nikolskii challenged the monophyly of the subgenus Pareremias within Eremias. Some mismatches between the types of CR and their phylogeny demonstrated the complicated evolutionary signals of CR such as convergent evolution. These findings will promote research on the structure and evolution of the CR and highlight the need for more mitogenomes in Lacertidae.
... Earlier studies on tRNA genes regarding ti and tv have mainly been carried out by comparing genes across the species (Higgs 2000;Savill et al. 2001;Jow et al. 2002;Hudelot et al. 2003). The main finding is that compensatory transition or transversion substitutions are more frequent than single site independent substitution in stem regions of these genes. ...
Article
Full-text available
Transversion and transition mutations have variable effects on the stability of RNA secondary structure considering that the former destabilizes the double helix geometry to a greater extent by introducing purine:purine (R:R) or pyrimidine:pyrimidine (Y:Y) base pairs. Therefore, transversion frequency is likely to be lower than that of transition in the secondary structure regions of RNA genes. Here, we performed an analysis of transition and transversion frequencies in tRNA genes defined well with secondary structure and compared with the intergenic regions in five bacterial species namely Escherichia coli, Klebsiella pneumoniae, Salmonella enterica, Staphylococcus aureus and Streptococcus pneumoniae using a large genome sequence data set. In general, the transversion frequency was observed to be lower than that of transition in both tRNA genes and intergenic regions. The transition to transversion ratio was observed to be greater in tRNA genes than that in the intergenic regions in all the five bacteria that we studied. Interestingly, the intraspecies base substitution analysis in tRNA genes revealed that non-compensatory substitutions were more frequent than compensatory substitutions in the stem region. Further, transition to transversion ratio in the loop region was observed to be significantly lesser than that among the non-compensatory substitutions in the stem region. This indicated that the transversion is more deleterious than transition in the stem regions. In addition, substitutions from amino bases (A/C) to keto bases (G/T) were also observed to be more than the reverse substitutions in the stem region. Substitution from amino bases to keto bases are likely to facilitate the stable G:U pairing unlike the reverse substitution that facilitates the unstable A:C pairing in the stem region of tRNA. This work provides additional support that the secondary structure of tRNA molecule is what drives the different substitutions in its gene sequence.
... For the phylogenetic analysis, read alignments were further filtered, discarding all haplotypes below 0.5% [35]. Haplotypes from the pre-and post-LT quasispecies were clustered by UPGMA (Unweighted Pair Group Method with Arithmetic mean) on the matrix of Kimura-80 genetic distances [45]. ...
Article
Full-text available
Cirrhosis derived from chronic hepatitis C virus (HCV) infection is still a common indication for liver transplantation (LT). Reinfection of the engrafted liver is universal in patients with detectable viral RNA at the time of transplant and causes fast progression of cirrhosis (within 5 years) in around one-third of these patients. To prevent damage to the liver graft, effective direct-acting antiviral (DAA) therapy is required as soon as possible. However, because of post-LT clinical instability, it is difficult to determine the optimal time to start DAAs with a low risk of complications. Evaluate changes in quasispecies complexity following LT and seek a predictive index of fast liver damage progression to determine the timing of DAA initiation. HCV genomes isolated from pre-LT and 15-day post-LT serum samples of ten patients, who underwent orthotopic LT, were quantified and sequenced using a next-generation sequencing platform. Sequence alignments, phylogenetic trees, quasispecies complexity measures, biostatistics analyses, adjusted R2 values, and analysis of variance (ANOVA) were carried out. Three different patterns of reinfection were observed (viral bottlenecking, conserved pre-LT population, and mixed populations), suggesting that bottlenecking or homogenization of the viral population is not a generalized effect after liver graft reinfection. None of the quasispecies complexity measures predicted the future degree of liver damage. Higher and more uniform viral load (VL) values were observed in all pre-LT samples, but values were more dispersed in post-LT samples. However, VL increased significantly from the pre-LT to 15-day post-LT samples in patients with advanced fibrosis at 1-year post-LT, suggesting that a VL increase on day 15 may be a predictor of fast liver fibrosis progression. HCV kinetics after LT differ between patients and are not fibrosis-dependent. Higher VL at day 15 post-LT versus pre-LT samples may predict fast liver fibrosis progression.
... Programs used included PAUP version 4.0a169 [36], RAxML version 8.2.12 [37], MrBayes version 3.2.7a [38], and PHASE package 2.0 [39][40][41][42][43]. ...
Article
Full-text available
Most marine coccoid and sarcinoid green algal species have traditionally been placed within genera dominated by species from freshwater or soil habitats. For example, the genera Chlorocystis and Halochlorococcum contain exclusively marine species; however, their familial and ordinal affinities are unclear. They are characterized by a vegetative cell with lobated or reticulated chloroplast, formation of quadriflagellated zoospores and living epi- or endophytically within benthic macroalgae. They were integrated into the family Chlorochytriaceae which embraces all coccoid green algae with epi- or endophytic life phases. Later, they were excluded from the family of Chlorococcales based on studies of their life histories in culture, and transferred to their newly described order, Chlorocystidales of the Ulvophyceae. Both genera form a “Codiolum”-stage that serves as the unicellular sporophyte in their life cycles. Phylogenetic analyses of SSU and ITS rDNA sequences confirmed that these coccoid taxa belong to the Chlorocystidales, together with the sarcinoid genus Desmochloris. The biflagellated coccoid strains were members of the genus Sykidion, which represented its own order, Sykidiales, among the Ulvophyceae. Considering these results and the usage of the ITS-2/CBC approach revealed three species of Desmochloris, six of Chlorocystis, and three of Sykidion. Three new species and several new combinations were proposed.
... Analysis of RNA secondary structure is helpful to aid alignment of rRNA sequences (e.g. Kjer 1995) and contributes to the increasingly sophisticated models of sequence evolution being applied in maximum likelihood and Bayesian approaches (Brown 2005;Hudelot et al. 2003). Accordingly, the phylogenetic performance of the rRNA can be improved by incorporating information regarding its secondary structure in analyses for more accurate phylogenetic inference Telford et al. 2005). ...
Article
Full-text available
Vertebrate mitochondrial genomes (mitogenomes) are valuable for studying phylogeny, evolutionary genetics and genomics. To date, however, compared to other vertebrate groups, our knowledge about the mitogenomes of skinks (the family Scincidae), even of reptile, has been relatively limited. In the present study, we determined the complete mitogenome of a blue-tailed skink Plestiodon capito for the first time, and compared it with other skinks available in GenBank. The circular genome is 17,344 bp long, showing a typical vertebrate pattern with 13 protein-coding genes (PCGs), 22 transfer RNA (tRNA) genes, two ribosomal RNA (rRNA) genes and one control region (CR). The gene organization, nucleotide composition, and codon usage are similar to those from skinks previously published. Twelve out of 13 PCGs initiates with canonical start codon (ATG), while COX1 starts with GTG. The codon usage analysis revealed a preferential use of the LeuCUN (Leu1), Pro, and Thr codons with the A/U ending. All tRNAs in P. capito were predicted to fold into typical clover-leaf secondary structure, except tRNA-Ser AGY. The secondary structures of 12S rRNA and 16S rRNA comprises 34 helices and 56 helices, respectively. The alignment of the Plesitodon species CRs exhibited high genetic variability and rich A + T content. Besides, variable types and numbers of tandem repeat units were also identified in the CR of Plestiodon. Phylogenetic analyses recovered P. capito as the sister species to P. tunganus; monophyly of the Scincidae is well supported. Our results will help to better understand structure and evolution of the mitochondrial DNA control region in reptiles as well as the evolutionary status of P. capito, and to lay foundation for further phylogenetic study of skinks in a mitogenomic framework.
... Programs used included PAUP version 4.0b167 (Swofford 2002), RAxML version 8.2.12 (Stamatakis 2014), MrBayes version 3.2.7a (Ronquist et al. 2012) and the PHASE package 2.0 (Jow et al. 2002, Hudelot et al. 2003, Gibson et al. 2005, Telford et al. 2005. For the Bayesian calculations, the secondary structure models of SSU and ITS (doublet in MrBayes and RNA7D in PHASE) have been taken. ...
Article
Full-text available
Associations of freshwater sponges with coccoid green algae have been known for a long time. Two types of coccoid green algae, which are commonly assigned as zoochlorellae, are recognized by morphology: small coccoids (< 3 μm) without pyrenoids and larger Chlorella-like algae (4–6 μm) with pyrenoids. Despite their wide distribution in some freshwater sponges, these green algae were never studied using a combined analysis of morphology and molecular phylogeny. We investigated several endosymbiotic strains isolated from different Spongilla species, which were available in culture collections. Phylogenetic analyses of SSU and ITS rDNA sequences revealed that the strain SAG 211-40a is a member of the Chlorellaceae and represents a new species of the newly erected genus Lewiniosphaera, L symbiontica. The phylogenetic position was confirmed by morphology and ITS-2 barcode. The endosymbionts without pyrenoid were identified as Choricystis parasitica by morphology and phylogenetic analyses. The comparison with free-living strains revealed the recognition of two new Choricystis species, C. krienitzii and C. limnetica, which were confirmed by molecular signatures in V9 region of SSU rDNA and ITS-2 barcode.
... Programs used included PAUP version 4.0a166 (Swofford, 2002), RAxML version 8.2.12 (Stamatakis, 2014), MrBayes version 3.2.7a using the doublet approach (Ronquist et al., 2012), and PHASE package 2.0 (Jow et al., 2002, Hudelot et al., 2003, Gibson et al., 2005, Telford et al., 2005. To find genetic synapomorphies in the SSU, the program DeSigNate (https://designate.dbresearch.uni-salzburg.at) was used. ...
... Gene-tree analyses for each of the 80 separate lineages were performed using the PHASE 3.0 package (Jow et al. 2002;Hudelot et al. 2003;Allen and Whelan 2014) with the best-fit mixed model (HKY85 or REV for unpaired regions and a 16-state RNA-base-pair model for paired regions; Allen and Whelan 2014). Bayesian MCMC phylogenetic inference was performed using the mcmcPHASE program (Allen and Whelan 2014). ...
Article
Full-text available
Compensatory mutations are crucial for functional RNA because they maintain RNA configuration and thus function. Compensatory mutation has traditionally been considered to be a two-step substitution through the GU-base-pair intermediate. We tested for an alternative AC-mediated compensatory mutation (ACCM). We investigated ACCMs by using a comprehensive sampling of ribosomal internal transcribed spacer 2 (ITS2) from 3934 angiosperm species in 80 genera and 55 families. We predicted ITS2 consensus secondary structures by using LocARNA for structure-based alignment and partitioning paired and unpaired regions. We examined and compared the substitution rates and frequencies among base pairs by using RNA-specific models. Base-pair states of ACCMs were mapped onto the inferred phylogenetic trees to infer their evolution. All types of compensatory mutations involving the AC intermediate were observed, but the most frequent substitutions were with AU or GC pairs, which are part of the AU-AC-GC pathway. Compared with the GU intermediate, AC had a lower frequency and higher mutability. Within the AU-AC-GC pathway, the AU-AC substitution rate was much slower than the AC-GC substitution rate. No consistently higher overall rate was identified for either pathway among all 80 sampled lineages, though compensatory mutations through the AC intermediate averaged about half that through the GU intermediate. These results demonstrate an alternative compensatory mutation between AU and GC that helps address the controversial inference of inferred simultaneous double substitutions.
... Further phylogenetic analysis incorporating paleontological evidence also suggests that primates and colugos are sister taxa (Beard, 1993). Previous molecular studies also support colugos as the closest living relatives of primates (Bininda-Emonds et al., 2007;Hudelot et al., 2003;Waddell et al., 2001). Genomic analyses further postulated a third potential topology: ((primates, colugos), tree shrews) (Janecka et al., 2007;Perelman et al., 2011), though this was based on analyses of limited genomic changes (insertion and deletions, InDels) and few nuclear gene fragments. ...
Article
Full-text available
Elucidating the closest living relatives of extant primates is essential for fully understanding important biological processes related to the genomic and phenotypic evolution of primates, especially of humans. However, the phylogenetic placement of these primate relatives remains controversial, with three primary hypotheses currently espoused based on morphological and molecular evidence. In the present study, we used two algorithms to analyze differently partitioned genomic datasets consisting of 45.4 Mb of conserved non-coding elements and 393 kb of concatenated coding sequences to test these hypotheses. We assessed different genomic histories and compared with other molecular studies found solid support for colugos being the closest living relatives of primates. Our phylogeny showed Cercopithecinae to have low levels of nucleotide divergence, especially for Papionini, and gibbons to have a high rate of divergence. The MCMCtree comprehensively updated divergence dates of early evolution of Primatomorpha and Primates.
... (Stamatakis, 2006), MrBayes version 3.2.3 (Ronquist et al., 2012), and PHASE package 2.0 (Jow et al., 2002, Hudelot et al., 2003, Gibson et al., 2005, Telford et al., 2005. ...
Article
Chlorella-like coccoid green algae are widely distributed in many types of habitats such as freshwater, terrestrial and marine. One group of terrestrial microalgae belonging to the Trebouxiophyceae forms the monophyletic lineage of the Watanabea clade. This clade exclusively comprises of ellipsoid and spherical coccoid green algae, which traditionally have been assigned as different species of Chlorella. Within this clade, seven out of ten genera are described mainly based on phylogenetic analyses of SSU and rbcL sequences. Most of the genera are represented by only one or two species that are rarely found in natural samples. In contrast, the genus Chloroidium is widely distributed across different habitats. We investigated 34 new isolates, which were originally assigned as Chloroidium or Chlorella, using an integrative approach. The phylogenetic analyses of SSU and ITS rDNA sequences revealed nine lineages, eight of which were highly supported in all of our bootstrap and Bayesian analyses. The ITS-2/CBC approach clearly demonstrated that these nine lineages represent individual species. The haplotype network analyses revealed that three out of them were widely distributed and showed no preference for any habitat. The comprehensive study of SSU and rbcL datasets also revealed that no clear synapomorphy could be found to support the assigned genus Parachloroidium. As a result of our findings, we proposed that both species belonging to Parachloroidium be transferred to Chloroidium. In addition, we re-established two species originally described by Chodat as new members of Chloroidium (C. lichenum, C. viscosum). Two of the nine lineages (C. antarcticum, C. arboriculum) were newly described in this study.
... (Stamatakis, 2006), MrBayes version 3.2.3 (Ronquist et al., 2012), and PHASE package 2.0 (Jow et al., 2002, Hudelot et al., 2003, Gibson et al., 2005, Telford et al., 2005. ...
Article
Chlamydomonas in the traditional sense is one of the largest green algal genera, comprising more than 500 described species. However, since the designation of the model organism C. reinhardtii as conserved type of this genus in 2007, only two species remained in Chlamydomonas. Investigations of three new strains isolated from soil samples, which were collected near Lake Nakuru (Kenya), demonstrated that the isolates represent a new species of Chlamydomonas. Phylogenetic analyses of nuclear SSU and ITS rDNA and plastid-coding rbcL sequences have clearly revealed that this species is closely related to C. reinhardtii and C. incerta. These results were confirmed by cross experiments of sporangium wall autolysins (VLE). All species belonged to the VLE group 1 sensu Schlösser. The comparison of the ITS-1 and ITS-2 secondary structures showed several compensatory base changes among the three species. In addition, the rbcL amino acid composition was also species-specific. The genus Chlamydomonas was phylogenetically closely related to the colonial families Goniaceae, Tetrabaenaceae and Volvocaceae. Chlamydomonas debaryana (VLE group 2) formed a separate clade among these colonial families of the Volvocales, a species of which autolysin dissolved the sporangium walls of the members of VLE group 1, suggesting its close relationship to Chlamydomonas. As consequence of our results, we propose Chlamydomonas schloesseri sp. nov. for the new Kenyan isolates. We also propose a new combination of C. debaryana to the newly erected genus Edaphochlamys.
... For the mitochondrial network, the haploid mitochondrial genes were concatenated and the median joining network option was used to calculate relationships among individuals in the network based on pairwise differences among haplotypes (Bandelt et al. 1999). To obtain haplotypes from the diploid nuclear DNA, we used the program seqPHASE (Flot 2010) to prepare our FASTA file for use in the computer program PHASE, which is a software package for phylogenetics and sequence evolution (Jow et al. 2002;Hudelot et al. 2003), to predict haplotypes for each individual. The two haplotypes for each individual were used to estimate a nuclear DNA haplotype network using the median joining network in PopART. ...
Article
Full-text available
A primary goal of landscape genetics is to elucidate factors associated with genetic structure among populations. Among the important patterns identified have been isolation by distance (IBD), isolation by barrier (IBB), and isolation by environment (IBE). We tested hypotheses relating each of these possible patterns to genetic divergence in the Slimy Salamander (Plethodon glutinosus (Green, 1818)) species complex across the lower Piedmont and Coastal Plain of Georgia, USA, and adjacent areas of South Carolina, USA. We sequenced 2148 total bp, including three regions of the mitochondrial genome and a nuclear intron, and related genetic distance to GIS-derived surrogate variables representing possible IBD (geographic distance), IBE (principal components of 19 climate variables, watershed, and normalized difference vegetation index (NDVI)), and IBB (streams of fourth order and higher). Multiple matrix regression with randomization analysis indicated significant relationships between genetic distance and two principal components of climate, as well as NDVI. These results support roles for environment (IBE) in helping to drive genetic divergence in this group of salamanders. The absence of a significant influence of IBD and IBB was surprising. It is possible that the signal effects of geographic distance and barriers on genetic divergence may have been erased by more recent responses to the environment.
... The phylogenetic calculations were conducted using the programs PAUP version 4.0b10 (Swofford 2002), RAxML version 7.0.3 (Stamatakis 2006), MrBayes version 3.1 (Huelsenbeck & Ronquist 2001, Ronquist & Huelsenbeck 2003), and PHASE package 2.0 (Jow , Hudelot et al. 2003, Gibson et al. 2005, Telford et al. 2005. Pseudendoclonium basiliense (CCALA 423) MF034643 # Pseudendoclonium basiliense (ULVO-11) MF034644 # Pseudendoclonium basiliense var. ...
Article
Phylogenetic analyses of SSU rDNA sequences have shown that coccoid and filamentous green algae are distributed among all classes of the Chlorophyta. One of these classes, the Ulvophyceae, mostly contains marine seaweeds and microalgae. However, new studies have shown that there are filamentous and sarcinoid freshwater and terrestrial species (including symbionts in lichens) among the Ulvophyceae, but very little is known about these species. Ultrastructural studies of some of them have confirmed that the flagellar apparatus of zoospores (counterclockwise basal body orientation) is typical for the Ulvophyceae. In addition to ultrastructural features, the presence of a “Codiolum”-stage is characteristic of some members of this algal class. We studied more than 50 strains of freshwater and terrestrial ulvophycean microalgae obtained from the different public culture collection and our own isolates using an integrative approach. Three independent lineages of the Ulvophyceae containing terrestrial species were revealed by these methods. Unexpectedly each of these lineages contained several isolates that morphologically developed a high degree of phenotypic plasticity, and included hidden phylogenetic diversity that let us to the description of several new genera and species.
... Several methods have been developed to sample sequences using an evolutionary model derived from a given phylogeny [14][15][16]. To the best of our knowledge, however, there is no previously published method for sampling sequences in overlapping coding regions. ...
Article
Full-text available
Background Retroviruses transcribe messenger RNA for the overlapping Gag and Gag-Pol polyproteins, by using a programmed -1 ribosomal frameshift which requires a slippery sequence and an immediate downstream stem-loop secondary structure, together called frameshift stimulating signal (FSS). It follows that the molecular evolution of this genomic region of HIV-1 is highly constrained, since the retroviral genome must contain a slippery sequence (sequence constraint), code appropriate peptides in reading frames 0 and 1 (coding requirements), and form a thermodynamically stable stem-loop secondary structure (structure requirement). ResultsWe describe a unique computational tool, RNAsampleCDS, designed to compute the number of RNA sequences that code two (or more) peptides p,q in overlapping reading frames, that are identical (or have BLOSUM/PAM similarity that exceeds a user-specified value) to the input peptides p,q. RNAsampleCDS then samples a user-specified number of messenger RNAs that code such peptides; alternatively, RNAsampleCDS can exactly compute the position-specific scoring matrix and codon usage bias for all such RNA sequences. Our software allows the user to stipulate overlapping coding requirements for all 6 possible reading frames simultaneously, even allowing IUPAC constraints on RNA sequences and fixing GC-content.We generalize the notion of codon preference index (CPI) to overlapping reading frames, and use RNAsampleCDS to generate control sequences required in the computation of CPI. Moreover, by applying RNAsampleCDS, we are able to quantify the extent to which the overlapping coding requirement in HIV-1 [resp. HCV] contribute to the formation of the stem-loop [resp. double stem-loop] secondary structure known as the frameshift stimulating signal. Using our software, we confirm that certain experimentally determined deleterious HCV mutations occur in positions for which our software RNAsampleCDS and RNAiFold both indicate a single possible nucleotide. We generalize the notion of codon preference index (CPI) to overlapping coding regions, and use RNAsampleCDS to generate control sequences required in the computation of CPI for the Gag-Pol overlapping coding region of HIV-1. These applications show that RNAsampleCDS constitutes a unique tool in the software arsenal now available to evolutionary biologists. Conclusion Source code for the programs and additional data are available at http://bioinformatics.bc.edu/clotelab/RNAsampleCDS/.
... Several Markov models have been developed for pairs of nucleotides [70, 71, assigned its own model of evolution, given the role it serves in the gene product (here, category 1 corresponds to a model assigned to sites encoding the anticodon, category 2 corresponds to a model assigned to sites that encode loops in the gene product, and category 3 corresponds to a model assigned to sites encoding the stems in the gene product). A more advanced approach uses structural information to link nucleotides that match each other in the gene product (c) (thin lines connect these pairs of nucleotides) 73-75, 77, 78, 80] and used to address a variety of phylogenetic questions [81,126,[142][143][144]. However, each of these Markov models still assumes that the sites in the DNA have evolved under stationary, reversible, and homogeneous conditions, an issue already discussed. ...
Chapter
Full-text available
Most phylogenetic methods are model-based and depend on models of evolution designed to approximate the evolutionary processes. Several methods have been developed to identify suitable models of evolution for phylogenetic analysis of alignments of nucleotide or amino acid sequences and some of these methods are now firmly embedded in the phylogenetic protocol. However, in a disturbingly large number of cases, it appears that these models were used without acknowledgement of their inherent shortcomings. In this chapter, we discuss the problem of model selection and show how some of the inherent shortcomings may be identified and overcome.
... The programs that were used for these analyses included PAUP version 4.0b10 (Swofford 2002), RAxML version 7.0.3 (Stamatakis 2006), MrBayes version 3.1 Ronquist 2001, Ronquist andHuelsenbeck 2003), and PHASE package 2.0 (Jow et al. 2002, Hudelot et al. 2003, Gibson et al. 2005, Telford et al. 2005. ...
Article
The genera Elliptochloris and Pseudochlorella were erected for Chlorella-like green algae producing two types of autospores and cell packages, respectively. Both genera are widely distributed in different soil habitats, either free-living or as photobionts of lichens. The species of these genera are often difficult to identify because of the high phenotypic plasticity and sometimes lack of characteristic features. The taxonomic and nomenclatural status of these species therefore remains unclear. In this study, 34 strains were investigated using an integrative approach. Phylogenetic analyses demonstrated that the isolates belong to two independent lineages of the Trebouxiophyceae (Elliptochloris and Prasiola clades) and confirmed that the genera are not closely related. The comparison of morphology, molecular phylogeny, and analyses of secondary structures of SSU and ITS rDNA sequences revealed that all of the strains belong to three genera: Elliptochloris, Pseudochlorella, and Edaphochlorella. As consequence of the taxonomic revisions, we propose two new combinations (Elliptochloris antarctica, and Pseudochlorella signiensis), and validate Elliptochloris reniformis, which is invalidly described according to the International Code for Nomenclature (ICN), by designating a holotype. To reflect the high phenotypic plasticity of Pseudochlorella signiensis, two new varieties were described: P. signiensis var. magna and P. signiensis var. communis. Chlorella mirabilis was not closely related to any of these genera and was therefore transferred to the new genus Edaphochlorella. All of the taxonomical changes were highly supported by all phylogenetic analyses and were confirmed by the ITS-2 Barcodes using the ITS-2/CBC approach. This article is protected by copyright. All rights reserved.
... Phylogenetic trees were inferred using neighbor joiningbased, Bayesian-based, and maximum likelihood-based methods as implemented in ClustalW v.2.1 (Chenna et al. 2003), Phase v. 2.0 (Jow et al. 2002;Hudelot et al. 2003) and FastTree v. 2.1.7 (Price et al. 2010), respectively. ...
Article
Full-text available
The genetic code is the cellular translation table for the conversion of nucleotide sequences into amino acid sequences. Changes to the meaning of sense codons would introduce errors into almost every translated message and are expected to be highly detrimental. However, reassignment of single or multiple codons in mitochondria and nuclear genomes, although extremely rare, demonstrates that the code can evolve. Several models for the mechanism of alteration of nuclear genetic codes have been proposed (including 'codon capture', 'genome streamlining' and 'ambiguous intermediate' theories), but with little resolution. Here, we report a novel sense codon reassignment in Pachysolen tannophilus, a yeast related to the Pichiaceae. By generating proteomics data and using tRNA sequence comparisons we show that Pachysolen translates CUG codons as alanine and not as the more usual leucine. The Pachysolen tRNA(CUG) is an anticodon-mutated tRNA(Ala) containing all major alanine tRNA recognition sites. The polyphyly of the CUG-decoding tRNAs in yeasts is best explained by a tRNA loss driven codon reassignment mechanism. Loss of the CUG-tRNA in the ancient yeast is followed by gradual decrease of respective codons and subsequent codon capture by tRNAs whose anticodon is not part of the aminoacyl-tRNA synthetase recognition region. Our hypothesis applies to all nuclear genetic code alterations and provides several testable predictions. We anticipate more codon reassignments to be uncovered in existing and upcoming genome projects.
... Many phylogenetic analyses of nematode rRNA sequences are based on datasets obtained using progressive multiple alignment rather than secondary structure; the latter are more difficult and time consuming to produce (Gardner et al., 2005). There are several theoretical advantages of alignments based on secondary structure, including greater potential accuracy of positional homology inference within features of rRNA such as stems that are conserved across distantly related species in the absence of significant sequence similarity and the ability to identify base-pairing and non-pairing positions and apply different pairedsite models to these regions such as implemented in PHASE (Hudelot et al., 2003), MrBayes (Ronquist and Huelsenbeck, 2003), or RAxML (Stamatakis, 2014). Theoretically, incorporating structural information in analysis of rRNA sequences, including modeling compensatory (and non-independent) evolution of pairedsite regions should increase phylogenetic accuracy (Gillespie, 2004). ...
Article
Full-text available
Near-full-length 18S and 28S rRNA gene sequences were obtained for 33 nematode species. Datasets were constructed based on secondary structure and progressive multiple alignments, and clades were compared for phylogenies inferred by Bayesian and maximum likelihood methods. Clade comparisons were also made following removal of ambiguously aligned sites as determined using the program ProAlign. Different alignments of these data produced tree topologies that differed, sometimes markedly, when analyzed by the same inference method. With one exception, the same alignment produced an identical tree topology when analyzed by different methods. Removal of ambiguously aligned sites altered the tree topology and also reduced resolution. Nematode clades were sensitive to differences in multiple alignments, and more than doubling the amount of sequence data by addition of 28S rRNA did not fully mitigate this result. Although some individual clades showed substantially higher support when 28S data were combined with 18S data, the combined analysis yielded no statistically significant increases in the number of clades receiving higher support when compared to the 18S data alone. Secondary structure alignment increased accuracy in positional homology assignment and, when used in combination with paired-site substitution models, these structural hypotheses of characters and improved models of character state change yielded high levels of phylogenetic resolution. Phylogenetic results included strong support for inclusion of Daubaylia potomaca within Cephalobidae, whereas the position of Fescia grossa within Tylenchina varied depending on the alignment, and the relationships among Rhabditidae, Diplogastridae, and Bunonematidae were not resolved.
... The phylogenetic tree (Fig 2) was inferred by distance (neighbor-joining [NJ] using the GTR+I +G model), maximum parsimony (MP), and maximum likelihood (ML; using GTR+I+G) criteria using PAUP version 4.0b10 [51], by randomized accelerated maximum likelihood using RAxML version 7.0.3 [52], and by Bayesian inference (BI) using MrBayes version 3.1 [53,54] and the PHASE package 2.0 [55][56][57][58][59]. The RAxML analyses of the concatenated dataset were performed partitioned according to their genes. ...
... (Stamatakis, 2006), MrBayes version 3.2.3 (Ronquist et al., 2012), and PHASE package 2.0 (Jow et al., 2002, Hudelot et al., 2003, Gibson et al., 2005, Telford et al., 2005. ...
Article
The genus Chlorella (in its traditional sense) is polyphyletic and belongs to at least twelve independent lineages of the Trebouxiophyceae and Chlorophyceae. Most of the aquatic species belong to the Chlorella and Parachlorella clades (within the so-called Chlorella-lineage of the Trebouxiophyceae), or to the genera Scenedesmus and Mychonastes (within the DO-group of the Chlorophyceae) according to phylogenetic analyses of the SSU and ITS rDNA sequences. In contrast to the aquatic species, the terrestrial strains investigated so far form a monophyletic lineage (Watanabea-clade) within the Trebouxia-lineage of the Trebouxiophyceae. Several genera with Chlorella-like morphology (Chloroidium, Heterochlorella, Watanabea, Kalinella, Viridiella and others) belong to the Watanabea clade. We studied 22 strains isolated from soil, bark, and artificial hard substrates, which have been traditionally identified as Chlorella luteoviridis or as unidentified Chlorella. To clarify the taxonomical status and intrageneric diversity of this group, we used an integrated approach (molecular phylogeny of SSU and ITS rDNA sequences, secondary structures, DNA barcoding, and morphology) including the ecological distribution. All investigated strains showed a low phenotypic plasticity, but a high genetic diversity, which could be only resolved in complex phylogenetic analyses based on the secondary structures of the investigated genes. Considering these results, we reestablished the genus Jaagichlorella for Heterochlorella and Heveochlorella, and proposed new combinations (J. luteoviridis, J. hainangensis, J. roystonensis, and J. sphaerica) as well as the new species, J. africana.
... Wible & Covert, 1987;Kay et al. 1992;Godinot, 2007). Clearly, the phylogenetic relationships between Dermoptera, Primates, and Scandentia cannot be considered resolved yet, with a minority of molecular analyses questioning the monophyly of Euarchonta altogether (Bailey et al. 1992;Porter et al. 1996;Nishihara et al. 2002;Schmitz et al. 2002;Hudelot et al. 2003;Zhou et al. 2015). Within primates, increasingly well-supported phylogenies are readily available for phylogenetic analysis, not least thanks to the on-going work of the 10 K Trees Project (Arnold et al. 2010). ...
Article
Ecomorphology - the characterisation of the adaptive relationship between an organism's morphology and its ecological role - has long been central to theories of the origin and early evolution of the primate order. This is exemplified by two of the most influential theories of primate origins: Matt Cartmill's Visual Predation Hypothesis, and Bob Sussman's Angiosperm Co-Evolution Hypothesis. However, the study of primate origins is constrained by the absence of data directly documenting the events under investigation, and has to rely instead on a fragmentary fossil record and the methodological assumptions inherent in phylogenetic comparative analyses of extant species. These constraints introduce particular challenges for inferring the ecomorphology of primate origins, as morphology and environmental context must first be inferred before the relationship between the two can be considered. Fossils can be integrated in comparative analyses and observations of extant model species and laboratory experiments of form-function relationships are critical for the functional interpretation of the morphology of extinct species. Recent developments have led to important advancements, including phylogenetic comparative methods based on more realistic models of evolution, and improved methods for the inference of clade divergence times, as well as an improved fossil record. This contribution will review current perspectives on the origin and early evolution of primates, paying particular attention to their phylogenetic (including cladistic relationships and character evolution) and environmental (including chronology, geography, and physical environments) contextualisation, before attempting an up-to-date ecomorphological synthesis of primate origins.
... For the loop regions, the evolutionary model GTR was used. The stem regions were analyzed under the S16 evolutionary model, thus taking into account secondary structure topology, i.e., compensatory mutations [61][62][63]. Substitution rate heterogeneity was taken into account using the gamma model of Yang [64]. Bootstrap values were computed over 1,000 replicates. ...
... For the mitochondrial network, the haploid mitochondrial genes were concatenated and the median joining network option was used to calculate relationships among individuals in the network based on pairwise differences among haplotypes (Bandelt et al. 1999). To obtain haplotypes from the diploid nuclear DNA, we used the program seqPHASE (Flot 2010) to prepare our FASTA file for use in the computer program PHASE, which is a software package for phylogenetics and sequence evolution (Jow et al. 2002;Hudelot et al. 2003), to predict haplotypes for each individual. The two haplotypes for each individual were used to estimate a nuclear DNA haplotype network using the median joining network in PopART. ...
Conference Paper
Full-text available
The advent of modern molecular techniques has enabled the discovery of unimagined genetic diversity within what were believed to represent widespread, largely homogeneous species. While early work depended on genetic distances determined by the analysis of allozymes, researchers now rely heavily on gene sequences because of their ability to detect differences among silent-site mutations and non-coding genes. They can also determine relationships directly from ancestor-descendent lineages. A quarter of a century ago the widespread taxon Plethodon glutinosus (Slimy Salamander) was split into 14 parapatric species based on allozyme analysis. Many workers have investigated the relationships among the named forms using DNA-sequence data. However, virtually none has conducted a fine-scale analysis of the phylogeography of single purported species. We sequenced four genes (three mitochondrial and one nuclear) for populations across the range of P. ocmulgee, which is distributed across the lower Piedmont and Coastal Plain of Georgia. We also examined populations of neighboring taxa within these regions. We found that genetic divergence is significantly related to geographic distance and climatic similarity with major rivers possibly also contributing as a barrier to gene flow. Sequence data did not conform to currently recognized taxa. This could be due to (1) introgression, (2) incomplete lineage assortment, or (3) the failure of allozyme-based distance measures to accurately define species and their boundaries.
... The phylogenetic tree (Fig 2) was inferred by distance (neighbor-joining [NJ] using the GTR+I +G model), maximum parsimony (MP), and maximum likelihood (ML; using GTR+I+G) criteria using PAUP version 4.0b10 [51], by randomized accelerated maximum likelihood using RAxML version 7.0.3 [52], and by Bayesian inference (BI) using MrBayes version 3.1 [53,54] and the PHASE package 2.0 [55][56][57][58][59]. The RAxML analyses of the concatenated dataset were performed partitioned according to their genes. ...
Article
Full-text available
Integrative taxonomy is an approach for defining species and genera by taking phylogenetic, morphological, physiological, and ecological data into account. This approach is appropriate for microalgae, where morphological convergence and high levels of morphological plasticity complicate the application of the traditional classification. Although DNA barcode markers are well-established for animals, fungi, and higher plants, there is an ongoing discussion about suitable markers for microalgae and protists because these organisms are genetically more diverse compared to the former groups. To solve these problems, we assess the usage of a polyphasic approach combining phenotypic and genetic parameters for species and generic characterization. The application of barcode markers for database queries further allows conclusions about the 'coverage' of culture-based approaches in biodiversity studies and integrates additional aspects into modern taxonomic concepts. Although the culture-dependent approach revealed three new lineages, which are described as new species in this paper, the culture-independent analyses discovered additional putative new species. We evaluated three barcode markers (V4, V9 and ITS-2 regions, nuclear ribosomal operon) and studied the morphological and physiological plasticity of Coccomyxa, which became a model organism because its whole genome sequence has been published. In addition, several biotechnological patents have been registered for Coccomyxa. Coccomyxa representatives are distributed worldwide, are free-living or in symbioses, and colonize terrestrial and aquatic habitats. We investigated more than 40 strains and reviewed the biodiversity and biogeographical distribution of Coccomyxa species using DNA barcoding. The genus Coccomyxa formed a monophyletic group within the Trebouxiophyceae separated into seven independent phylogenetic lineages representing species. Summarizing, the combination of different characteristics in an integrative approach helps to evaluate environmental data and clearly identifies microalgae at generic and species levels.
... For RNA encoding regions, however, secondary structure dictates that this is not a valid assumption, so that paired-site models developed to deal specifically with stem structures in RNA encoding sequences are more appropriate (Jow et al., 2002;Hudelot et al., 2003). It has also been shown that phylogenetic reconstructions that employ independent assumptions for non-independent data can over-estimate support (in terms of bootstrap) for internal branches (Jow et al., 2002;Galtier, 2004;Smith et al., 2004). ...
... In this study, only protein-encoding genes in the mitogenomes were analyzed. Hudelot et al. (2003) instead analyzed mitogenome tRNAs and rRNAs. Despite sparse taxon sampling, their Bayesian tree of 69 mammals, obtained using both paired and unpaired regions of mitochondrial rRNA and tRNA genes, was mostly consistent with the well-established nuclear tree including the sister group position of Dermoptera with Primates, although the sister group relationship of Tarsiidae with Strepsirrhini was preferred. ...
Article
Full-text available
Although molecular phylogenetics is a strong tool for reconstructing the tree of life, many problems persist due to systematic errors caused by model mis-specifications. Resolving misconstructed trees should lead us to better understand the processes of molecular evolution. Mammalian mitogenomes provide us with a good opportunity in this respect, because the mammalian tree is well established on the basis of multiple nuclear genes, and mitogenome trees are sometimes in conflict with it, for example concerning the positions of tarsiers and colugos. The utility of mitogenomes as a phylogenetic marker is therefore sometimes questioned, and an important problem is whether any method can overcome the misleading phylogenetic signals of mitogenomes. Here we show that the maximum likelihood tree of 463 eutherian mitogenomes reconstructed from nucleotide sequences of protein-encoding genes gives positions of tarsiers and colugos that are consistent with the well-established nuclear tree; this is the first study to obtain a consistent tree with respect to the positions of tarsiers and colugos using mitogenomes. Furthermore, our mitogenome tree of the 463 eutherians is mostly consistent with the nuclear gene tree. Previous mitogenomic studies have been hampered by sparse taxon sampling, and our analysis demonstrates the importance of dense taxon sampling to relieve the misleading phylogenetic signals of mitogenomes. However, because there are many convergent and parallel substitutions in the amino acid sequences, the effect of dense taxon sampling on the accuracy of tree reconstruction seems to be very limited. We further show the importance of using synonymous substitutions with dense taxon sampling as well as with appropriate modeling in recovering the well-established tree from lower to even higher levels of eutherian phylogeny.
... A further reason is that compensatory substitutions in paired regions of RNA genes are not directly accounted for by the phylogenetic inference methods unless explicitly modeled, such as the doublet model (Schöniger and Von Haeseler, 1994) used by Letsch et al. (2010). Therefore, substitutions at each site are assumed to be independent of one another possibly resulting in conflicting signal to the PCGs partition (Hudelot et al., 2003). ...
... The robustness of the tree topology was tested by bootstrapping using distance (neighbor-joining; NJ using the GTR+I+G model; 1000 replicates), parsimony (MP; 1000 replicates), and maximum likelihood (ML; using GTR+I+G; 1000 replicates) methods in PAUP, using randomized accelerated maximum likelihood (10000 replicates) using the RAxML version 7.3.0 (Stamatakis 2006) and by Bayesian inference (BI; 5 million generations) using the MrBayes version 3.2.1 (Huelsenbeck & Ronquist 2001, Ronquist & Huelsenbeck 2003 and the PHASE package 2.0 (Gibson et al. 2005, Telford et al. 2005, Hudelot et al. 2003, Jow et al. 2002. The RAxML analyses of the data set were performed partitioned according to their genes. ...
Article
Full-text available
Members of the cosmopolitan green algal genus Klebsormidium (Klebsormidiales, Streptophyta) are typical components of biological soil crusts, which exert many important ecological functions. In the present study four different Klebsormidium genotypes according to the clades of Rindi et al. (2011) were isolated from alpine soil crusts in the Tyrolean Alps, Austria between 649 and 2435 m a.s.l. The photophysiological performance was investigated under increasing photon fluence rates using an oxygen optode. Although isolate-specific response patterns could be documented, the differences were rather small. All data clearly indicated very low light requirements in the four Klebsormidium strains as reflected in low light compensation as well as in low light saturation points. In spite of these rather shade acclimation, no indication of photoinhibition was observed in 3 out of 4 isolates, at least up to the maximum applied photon fluence rate of 500 μmol photons PAR m–2 s–1. The remaining strain exhibited a small decrease in maximum oxygen development under the highest photon fluence rate tested. Dark respiration was measured directly before and after the application of the different light levels, and all Klebsormidium isolates showed strongly enhanced rates after treatment with the highest photon fluence rate. Although the photophysiological data showed some differences, the response patterns in the four different genotypes of Klebsormidium were relatively similar, which well explains the widespread abundance of members of this genus in biological soil crusts of the alpine regions of the Tyrolean Alps.
Preprint
The genetic code is the universal cellular translation table to convert nucleotide into amino acid sequences. Changes to sense codons are expected to be highly detrimental. However, reassignments of single or multiple codons in mitochondria and nuclear genomes demonstrated that the code can evolve. Still, alterations of nuclear genetic codes are extremely rare leaving hypotheses to explain these variations, such as the ‘codon capture’, the ‘genome streamlining’ and the ‘ambiguous intermediate’ theory, in strong debate. Here, we report on a novel sense codon reassignment in Pachysolen tannophilus , a yeast related to the Pichiaceae. By generating proteomics data and using tRNA sequence comparisons we show that in Pachysolen CUG codons are translated as alanine and not as the universal leucine. The polyphyly of the CUG-decoding tRNAs in yeasts is best explained by a tRNA loss driven codon reassignment mechanism. Loss of the CUG-tRNA in the ancient yeast is followed by gradual decrease of respective codons and subsequent codon capture by tRNAs whose anticodon is outside the aminoacyl-tRNA synthetase recognition region. Our hypothesis applies to all nuclear genetic code alterations and provides several testable predictions. We anticipate more codon reassignments to be uncovered in existing and upcoming genome projects.
Article
Full-text available
Endosymbiosis between coccoid green algae and ciliates are widely distributed and occur in various phylogenetic lineages among the Ciliophora. Most mixotrophic ciliates live in symbiosis with different species and genera of the so-called Chlorella clade (Trebouxiophyceae). The mixotrophic ciliates can be differentiated into two groups: (i) obligate, which always live in symbiosis with such green algae and are rarely algae-free and (ii) facultative, which formed under certain circumstances such as in anoxic environments an association with algae. A case of the facultative endosymbiosis is found in the recently described species of Tetrahymena, T. utriculariae, which lives in the bladder traps of the carnivorous aquatic plant Utricularia reflexa. The green endosymbiont of this ciliate belonged to the genus Micractinium. We characterized the isolated algal strain using an integrative approach and compared it to all described species of this genus. The phylogenetic analyses using complex evolutionary secondary structure-based models revealed that this endosymbiont represents a new species of Micractinium, M. tetrahymenae sp. nov., which was further confirmed by the ITS2/CBC approach.
Conference Paper
The sequence comparison is an important part in bioinformatics to understand the biological property of genome. Although the alignment based sequence comparison is traditional and reliable algorithm, alignment free methods have been actively researched because of their advantage in terms of computational complexity. In this paper, we suggest a new alignment free genome comparison scheme based on statistical approach. From sequence components, word frequency information of the sequence is estimated. By investigating the relationship between estimated frequency information and actual word frequency, the characteristics of the sequence are numerically represented. The phylogenetic tree and the sequence classification of mammalian sequences are provided to reveal the remarkable performance of our statistical algorithm.
Chapter
Full-text available
This review summarizes some major events in the evolution of body plans along the backbone of the arthropod tree, with a special focus on the origin of insects. The incompatibility among recent molecular phylogenies motivates a discussion about possible causes for failures: there is a worrisome lack of information in alignments, which can be visualized with spectra of split-supporting positions, and there are systematic errors occurring even when using correct models in maximum likelihood methods (Kück et al., this book). Currently, these problems cannot be avoided. Combining information from the fossil record and from extant arthropods, the morphology-based evolutionary scenario leads from worm-like stem-lineage arthropods via first euarthropods to the crown group of Mandibulata. The evolution of the mandibulate head is well documented in the Cambrian Orsten fossils. The evolution within crustaceans is also the evolution that leads to characters of the bauplan of myriapods and insects. It is argued that morphologicallymyriapods do not fit to the base of the mandibulatan tree and that this placement is also not plausible from a paleontological point of view. Available morphological evidencesuggests that myriapods are the sister-group to Hexapoda and that tracheates evolved from a marine ancestor that was similar in many ways to Remipedia. In the extant fauna, the Remipedia are the sister-group of Tracheata.
Article
Full-text available
Most molecular phylogenetic studies place all placental mammals into four superordinal groups, Laurasiatheria (e.g. dogs, bats, whales), Euarchontoglires (e.g. humans, rodents, colugos), Xenarthra (e.g. armadillos, anteaters) and Afrotheria (e.g. elephants, sea cows, tenrecs), and estimate that these clades last shared a common ancestor 90–110 million years ago. This phylogeny has provided a framework for numerous functional and comparative studies. Despite the high level of congruence among most molecular studies, questions still remain regarding the position and divergence time of the root of placental mammals, and certain ‘hard nodes’ such as the Laurasiatheria polytomy and Paenungulata that seem impossible to resolve. Here, we explore recent consensus and conflict among mammalian phylogenetic studies and explore the reasons for the remaining conflicts. The question of whether the mammal tree of life is or can be ever resolved is also addressed. This article is part of the themed issue ‘Dating species divergences using rocks and clocks’.
Chapter
RNA Structure and Evolution Fitting Evolutionary Models to Sequence Data Applications of Molecular Phylogenetics Summary References
Chapter
Understanding Phylogenetic Trees Choosing Sequences Distance Matrices and Clustering Methods Bootstrapping Tree Optimization Criteria and Tree Search Methods The Maximum-Likelihood Criterion The Parsimony Criterion Other Methods Related to Maximum Likelihood Summary References Problems Self-Test
Article
Full-text available
Annotation of orthologous and paralogous genes is necessary for many aspects of evolutionary analysis. Methods to infer these homology relationships have traditionally focused on protein-coding genes and evolutionary models used by these methods normally assume the positions in the protein evolve independently. However, as our appreciation for the roles of non-coding RNA genes has increased, consistently annotated sets of orthologous and paralogous ncRNA genes are increasingly needed. At the same time, methods such as PHASE or RAxML have implemented substitution models that consider pairs of sites to enable proper modelling of the loops and other features of RNA secondary structure. Here, we present a comprehensive analysis pipeline for the automatic detection of orthologues and paralogues for ncRNA genes. We focus on gene families represented in Rfam and for which a specific covariance model is provided. For each family ncRNA genes found in all Ensembl species are aligned using Infernal, and several trees are built using different substitution models. In parallel, a genomic alignment that includes the ncRNA genes and their flanking sequence regions is built with PRANK. This alignment is used to create two additional phylogenetic trees using the neighbour-joining (NJ) and maximum-likelihood (ML) methods. The trees arising from both the ncRNA and genomic alignments are merged using TreeBeST, which reconciles them with the species tree in order to identify speciation and duplication events. The final tree is used to infer the orthologues and paralogues following Fitch's definition. We also determine gene gain and loss events for each family using CAFE. All data are accessible through the Ensembl Comparative Genomics (‘Compara’) API, on our FTP site and are fully integrated in the Ensembl genome browser, where they can be accessed in a user-friendly manner. Database URL:http://www.ensembl.org
Article
Empirical models of substitution are often used in protein sequence analysis because the large alphabet of amino acids requires that many parameters be estimated in all but the simplest parametric models. When information about structure is used in the analysis of substitutions in structured RNA, a similar situation occurs. The number of parameters necessary to adequately describe the substitution process increases in order to model the substitution of paired bases. We have developed a method to obtain substitution rate matrices empirically from RNA alignments that include structural information in the form of base pairs. Our data consisted of alignments from the European Ribosomal RNA Database of Bacterial and Eukaryotic Small Subunit and Large Subunit Ribosomal RNA (Wuyts et al. 2001. Nucleic Acids Res. 29:175-177; Wuyts et al. 2002. Nucleic Acids Res. 30:183-185). Using secondary structural information, we converted each sequence in the alignments into a sequence over a 20-symbol code: one symbol for each of the four individual bases, and one symbol for each of the 16 ordered pairs. Substitutions in the coded sequences are defined in the natural way, as observed changes between two sequences at any particular site. For given ranges (windows) of sequence divergence, we obtained substitution frequency matrices for the coded sequences. Using a technique originally developed for modeling amino acid substitutions (Veerassamy, Smith, and Tillier. 2003. J. Comput. Biol. 10:997-1010), we were able to estimate the actual evolutionary distance for each window. The actual evolutionary distances were used to derive instantaneous rate matrices, and from these we selected a universal rate matrix. The universal rate matrices were incorporated into the Phylip Software package (Felsenstein 2002. http:// evolution.genetics.washington.edu/phylip.html), and we analyzed the ribosomal RNA alignments using both distance and maximum likelihood methods. The empirical substitution models performed well on simulated data, and produced reasonable evolutionary trees for 16S ribosomal RNA sequences from sequenced Bacterial genomes. Empirical models have the advantage of being easily implemented, and the fact that the code consists of 20 symbols makes the models easily incorporated into existing programs for protein sequence analysis. In addition, the models are useful for simulating the evolution of RNA sequence and structure simultaneously.
Article
Full-text available
Many water fowls have a long history of domestication for socioeconomic reasons. However, phylogenetic relation in this group remains contradictory. In this study, the divergence time between water fowls has been inferred with mitochondrial RNAs, which owned unique heredity mode,in order to elucidate speciation events in domestic geese. Firstly, totle DNA has been extracted from Taihue goose and Landaise goose, which stand respectively for Chinese geese and European geese. Mitochondrial RNAs have been amplified by PCR method. Secondly, 10 complete mtDNA sequences have been picked up from GenBank. Then, phylogenetic tree has been reconstruct with Bayesian method and RNA secondary structure. The ostrich mtDNA has been assigned to outgroup. At last, the divergence time is inferred with r8s. The split time of Anser-Branta at 7.15 Mya was chosed as fossil record. The results: divergence time for common ancestry of Chinese goose and European goose was assigned to 0.61 Mya, and split between Muscovy and Mallard had occurred in the early Miocence, i.e., 15.5 Mya. Associating divergence time with geological event in Pleistocene, we inferred climatic cycles with subsequent glacial and intergalcial periods had an important impact on speciation of two domestic geese. Results in this study were useful to conservation and utilization of genetic resource in water fowls.
Article
Many of the known microRNAs are encoded in polycistronic transcripts. Here, we reconstruct the evolutionary history of the mir17 microRNA clusters which consist of miR-17, miR-18, miR-19a, miR-19b, miR-20, miR-25, miR-92, miR-93, miR-106a, and miR-106b. The history of this cluster is governed by an initial phase of local (tandem) duplications, a series of duplications of entire clusters and subsequent loss of individual microRNAs from the resulting paralogous clusters. The complex history of the mir17 microRNA family appears to be closely linked to the early evolution of the vertebrate lineage.
Article
Deciphering relationships among the orders of placental mammals remains an important problem in evolutionary biology and has implications for understanding patterns of morphological character evolution, reconstructing the ancestral placental genome, and evaluating the role of plate tectonics and dispersal in the biogeographic history of this group. Until recently, both molecular and morphological studies provided only a limited and questionable resolution of placental relationships. Studies based on larger and more diverse molecular datasets, and using an array of methodological approaches, are now converging on a stable tree topology with four major groups of placental mammals. The emerging tree has revealed numerous instances of convergent evolution and suggests a role for plate tectonics in the early evolutionary history of placental mammals. The reconstruction of mammalian phylogeny illustrates both the pitfalls and the powers of molecular systematics.
Article
Full-text available
We explore the tree of mammalian mtDNA sequences, using particularly the LogDet transform on amino acid sequences, the distance Hadamard transform, and the Closest Tree selection criterion. The amino acid composition of different species show significant differences, even within mammals. After compensating for these differences, nearest-neighbor bootstrap results suggest that the tree is locally stable, though a few groups show slightly greater rearrangements when a large proportion of the constant sites are removed. Many parts of the trees we obtain agree with those on published protein ML trees. Interesting results include a preference for rodent monophyly. The detection of a few alternative signals to those on the optimal tree were obtained using the distance Hadamard transform (with results expressed as a Lento plot). One rearrangement suggested was the interchange of the position of primates and rodents on the optimal tree. The basic stability of the tree, combined with two calibration points (whale/cow and horse/rhinoceros), together with a distant secondary calibration from the mammal/bird divergence, allows inferences of the times of divergence of putative clades. Allowing for sampling variances due to finite sequence length, most major divergences amongst lineages leading to modern orders, appear to occur well before the Cretaceous/Tertiary (K/T) boundary. Implications arising from these early divergences are discussed, particularly the possibility of competition between the small dinosaurs and the new mammal clades.
Article
Full-text available
As a discipline, phylogenetics is becoming transformed by a flood of molecular data. These data allow broad questions to be asked about the history of life, but also present difficult statistical and computational problems. Bayesian inference of phylogeny brings a new perspective to a number of outstanding issues in evolutionary biology, including the analysis of large phylogenetic trees and complex evolutionary models and the detection of the footprint of natural selection in DNA sequences.
Article
Full-text available
The traditional views regarding the mammalian order Insectivora are that the group descended from a single common ancestor and that it is comprised of the following families: Soricidae (shrews), Tenrecidae (tenrecs), Solenodontidae (solenodons), Talpidae (moles), Erinaceidae (hedgehogs and gymnures), and Chrysochloridae (golden moles). Here we present a molecular analysis that includes representatives of all six families of insectivores, as well as 37 other taxa representing marsupials, monotremes, and all but two orders of placental mammals. These data come from complete sequences of the mitochondrial 12S rRNA, tRNA-Valine, and 16S rRNA genes (2.6 kb). A wide range of different methods of phylogenetic analysis groups the tenrecs and golden moles (both endemic to Africa) in an all-African superordinal clade comprised of elephants, sirenians, hyracoids, aardvark, and elephant shrews, to the exclusion of the other four remaining families of insectivores. Statistical analyses reject the idea of a monophyletic Insectivora as well as traditional concepts of the insectivore suborder Soricomorpha. These findings are supported by sequence analyses of several nuclear genes presented here: vWF, A2AB, and α-β hemoglobin. These results require that the order Insectivora be partitioned and that the two African families (golden moles and tenrecs) be placed in a new order. The African superordinal clade now includes six orders of placental mammals.
Article
Full-text available
The precise hierarchy of ancient divergence events that led to the present assemblage of modern placental mammals has been an area of controversy among morphologists, palaeontologists and molecular evolutionists. Here we address the potential weaknesses of limited character and taxon sampling in a comprehensive molecular phylogenetic analysis of 64 species sampled across all extant orders of placental mammals. We examined sequence variation in 18 homologous gene segments (including nearly 10,000 base pairs) that were selected for maximal phylogenetic informativeness in resolving the hierarchy of early mammalian divergence. Phylogenetic analyses identify four primary superordinal clades: (I) Afrotheria (elephants, manatees, hyraxes, tenrecs, aardvark and elephant shrews); (II) Xenarthra (sloths, anteaters and armadillos); (III) Glires (rodents and lagomorphs), as a sister taxon to primates, flying lemurs and tree shrews; and (IV) the remaining orders of placental mammals (cetaceans, artiodactyls, perissodactyls, carnivores, pangolins, bats and core insectivores). Our results provide new insight into the pattern of the early placental mammal radiation.
Article
Full-text available
Molecular phylogenetic studies have resolved placental mammals into four major groups, but have not established the full hierarchy of interordinal relationships, including the position of the root. The latter is critical for understanding the early biogeographic history of placentals. We investigated placental phylogeny using Bayesian and maximum-likelihood methods and a 16.4-kilobase molecular data set. Interordinal relationships are almost entirely resolved. The basal split is between Afrotheria and other placentals, at about 103 million years, and may be accounted for by the separation of South America and Africa in the Cretaceous. Crown-group Eutheria may have their most recent common ancestry in the Southern Hemisphere (Gondwana).
Article
Full-text available
where.,Abstracts,to,talks,and,posters presented,at this meeting,can be found,at www.utexas.edu/ftp/depts/systbiol/. The talks fell into three main sections, which we will now consider, followed by a summary where,we present our current best estimate of the tree for placental mammals. The Age of Intraordinal Divergences Perhaps even,more,than,the tree of re- lationships, the ages molecular divergence times are suggesting,have,caused,greatest consternation,to morphologists.,Part of this seems to be semantics. For example, when some,authors,suggest,that perissodactyls originated well back in the late Cretaceous, it is not always,clear if they mean,the stem group Perissodactyla (i.e., all species more
Article
Full-text available
A recent analysis of amino acid sequence data (Graur et al.) suggested that the mammalian order Rodentia is polyphyletic, in contrast to most morphological data, which support rodent monophyly. At issue is whether the hystricognath rodents, such as the guinea pig, represent an independent evolutionary lineage within mammals, separate from the sciurognath rodents. To resolve this problem, we sequenced a region (2,645 bp) of the mitochondrial genome of the guinea pig containing the complete 12S ribosomal RNA, 16S ribosomal RNA, and transfer RNA(VAL) genes for comparison with the available sciurognath and other mammalian sequences. Several methods of analysis and statistical tests of the data all show strong support for rodent monophyly (91%-98% bootstrap probability, or BP). Calibration with the mammalian fossil record suggests a Cretaceous date (107 mya) for the divergence of sciurognaths and hystricognaths. An older date (38 mya) for the controversial Mus-Rattus divergence also is supported by these data. Our neighbor-joining analyses of all available sequence data (25 genes) confirm that some individual genes support rodent polyphyly but that tandem analysis of all data does not. We propose that the conflicting results are due to several compounding factors. The unique biochemical properties of some hystricognath metabolic proteins, largely responsible for generating this controversy, may have a single explanation: a cascade effect resulting from inactivation of the zinc-binding abilities of insulin. After excluding six genes possibly affected by insulin inactivation, analyses of all available sequence data (7,117 nucleotide sites, 3,099 amino acid sites) resulted in strong support for rodent monophyly (94% BP for DNA sequences, 90% for protein sequences), which lends support to the insulin-cascade hypothesis.
Article
Full-text available
Two approximate methods are proposed for maximum likelihood phylogenetic estimation, which allow variable rates of substitution across nucleotide sites. Three data sets with quite different characteristics were analyzed to examine empirically the performance of these methods. The first, called the "discrete gamma model," uses several categories of rates to approximate the gamma distribution, with equal probability for each category. The mean of each category is used to represent all the rates falling in the category. The performance of this method is found to be quite good, and four such categories appear to be sufficient to produce both an optimum, or near-optimum fit by the model to the data, and also an acceptable approximation to the continuous distribution. The second method, called "fixed-rates model", classifies sites into several classes according to their rates predicted assuming the star tree. Sites in different classes are then assumed to be evolving at these fixed rates when other tree topologies are evaluated. Analyses of the data sets suggest that this method can produce reasonable results, but it seems to share some properties of a least-squares pairwise comparison; for example, interior branch lengths in nonbest trees are often found to be zero. The computational requirements of the two methods are comparable to that of Felsenstein's (1981, J Mol Evol 17:368-376) model, which assumes a single rate for all the sites.
Article
Full-text available
In 1991 Graur et al. raised the question of whether the guinea-pig, Cavia porcellus, is a rodent. They suggested that the guinea-pig and myomorph rodents diverged before the separation between myomorph rodents and a lineage leading to primates and artiodactyls. Several findings have since been reported, both for and against this phylogeny, thereby highlighting the issue of the validity of molecular analysis in mammalian phylogeny. Here we present findings based on the sequence of the complete mitochondrial genome of the guinea-pig, which strongly contradict rodent monophyly. The conclusions are based on cumulative evidence provided by orthologically inherited genes and the use of three different analytical methods, none of which joins the guinea-pig with myomorph rodents. In addition to the phylogenetic conclusions, we also draw attention to several factors that are important for the validity of phylogenetic analysis based on molecular data.
Article
Full-text available
A model is introduced describing nucleotide substitution in ribosomal RNA (rRNA) genes. In this model, substitution in the stem and loop regions of rRNA is modeled with 16- and four-state continuous time Markov chains, respectively. The mean substitution rates at nucleotide sites are assumed to follow gamma distributions that are different for the two types of regions. The simplest formulation of the model allows for explicit expressions for transition probabilities of the Markov processes to be found. These expressions were used to analyze several 16S-like rRNA genes from higher eukaryotes with the maximum likelihood method. Although the observed proportion of invariable sites was only slightly higher in the stem regions, the estimated average substitution rates in the stem regions were almost two times as high as in the loop regions. Therefore, the degree of site heterogeneity of substitution rates in the stem regions seems to be higher than in the loop regions of animal 16S-like rRNAs due to presence of a few rapidly evolving sites. The model appears to be helpful in understanding the regularities of nucleotide substitution in rRNAs and probably minimizing errors in recovering phylogeny for distantly related taxa from these genes.
Article
Full-text available
We show that in animal mitochondria homologous genes that differ in guanine plus cytosine (G + C) content code for proteins differing in amino acid content in a manner that relates to the G + C content of the codons. DNA sequences were analyzed using square plots, a new method that combines graphical visualization and statistical analysis of compositional differences in both DNA and protein. Square plots divide codons into four groups based on first and second position A + T (adenine plus thymine) and G + C content and indicate differences in amino acid content when comparing sequences that differ in G + C content. When sequences are compared using these plots, the amino acid content is shown to correlate with the nucleotide bias of the genes. This amino acid effect is shown in all protein-coding genes in the mitochondrial genome, including cox I, cox II, and cyt b, mitochondrial genes which are commonly used for phylogenetic studies. Furthermore, nucleotide content differences are shown to affect the content of all amino acids with A + T- and G + C-rich codons. We speculate that phylogenetic analysis of genes so affected may tend erroneously to indicate relatedness (or lack thereof) based only on amino acid content.
Article
Full-text available
Since branch lengths provide important information about the timing and the extent of evolutionary divergence among taxa, accurate resolution of evolutionary history depends as much on branch length estimates as on recovery of the correct topology. However, the empirical relationship between the choice of genes to sequence and the quality of branch length estimation remains ill defined. To address this issue, we evaluated the accuracy of branch lengths estimated from subsets of the mitochondrial genome for a mammalian phylogeny with known subordinal relationships. Using maximum-likelihood methods, we estimated branch lengths from an 11-kb sequence of all 13 protein-coding genes and compared them with estimates from single genes (0.2-1.8 kb) and from 7 different combinations of genes (2-3.5 kb). For each sequence, we separated the component of the log-likelihood deviation due to branch length differences associated with alternative topologies from that due to those that are independent of the topology. Even among the sequences that recovered the same tree topology, some produced significantly better branch length estimates than others did. The combination of correct topology and significantly better branch length estimation suggests that these gene combinations may prove useful in estimating phylogenetic relationships for mammalian divergences below the ordinal level. Thus, the proper choice of genes to sequence is a critical factor for reliable estimation of evolutionary history from molecular data.
Article
Full-text available
The complete mitochondrial genome of Tupaia belangeri, a representative of the eutherian order Scandentia, was determined and compared with full-length mitochondrial sequences of other eutherian orders described to date. The complete mitochondrial genome is 16, 754 nt in length, with no obvious deviation from the general organization of the mammalian mitochondrial genome. Thus, features such as start codon usage, incomplete stop codons, and overlapping coding regions, as well as the presence of tandem repeats in the control region, are within the range of mammalian mitochondrial (mt) DNA variation. To address the question of a possible close phylogenetic relationship between primates and Tupaia, the evolutionary affinities among primates, Tupaia and bats as representatives of the Archonta superorder, ferungulates, guinea pigs, armadillos, rats, mice, and hedgehogs were examined on the basis of the complete mitochondrial DNA sequences. The opossum sequence was used as an outgroup. The trees, estimated from 12 concatenated genes encoded on the mitochondrial H-strand, add further molecular evidence against an Archonta monophyly. With the new data described in this paper, most of both the mitochondrial and the nuclear data point away from Scandentia as the closest extant relatives to primates. Instead, the complete mitochondrial data support a clustering of Scandentia with Lagomorpha connecting to the branch leading to ferungulates. This closer phylogenetic relationship of Tupaia to rabbits than to primates first received support from several analyses of nuclear and partial mitochondrial DNA data sets. Given that short sequences are of limited use in determining deep mammalian relationships, the partial mitochondrial data available to date support this hypothesis only tentatively. Our complete mitochondrial genome data therefore add considerably more evidence in support of this hypothesis.
Article
Full-text available
A number of mitochondrial (mt) tRNAs have strong structural deviations from the classical tRNA cloverleaf secondary structure and from the conventional L-shaped tertiary structure. As a consequence, there is a general trend to consider all mitochondrial tRNAs as "bizarre" tRNAs. Here, a large sequence comparison of the 22 tRNA genes within 31 fully sequenced mammalian mt genomes has been performed to define the structural characteristics of this specific group of tRNAs. Vertical alignments define the degree of conservation/variability of primary sequences and secondary structures and search for potential tertiary interactions within each of the 22 families. Further horizontal alignments ascertain that, with the exception of serine-specific tRNAs, mammalian mt tRNAs do fold into cloverleaf structures with mostly classical features. However, deviations exist and concern large variations in size of the D- and T-loops. The predominant absence of the conserved nucleotides G18G19 and T54T55C56, respectively in these loops, suggests that classical tertiary interactions between both domains do not take place. Classification of the tRNA sequences according to their genomic origin (G-rich or G-poor DNA strand) highlight specific features such as richness/poorness in mismatches or G-T pairs in stems and extremely low G-content or C-content in the D- and T-loops. The resulting 22 "typical" mammalian mitochondrial sequences built up a phylogenetic basis for experimental structural and functional investigations. Moreover, they are expected to help in the evaluation of the possible impacts of those point mutations detected in human mitochondrial tRNA genes and correlated with pathologies.
Article
Full-text available
We test models for the evolution of helical regions of RNA sequences, where the base pairing constraint leads to correlated compensatory substitutions occurring on either side of the pair. These models are of three types: 6-state models include only the four Watson-Crick pairs plus GU and UG; 7-state models include a single mismatch state that combines all of the 10 possible mismatches; 16-state models treat all mismatch states separately. We analyzed a set of eubacterial ribosomal RNA sequences with a well-established phylogenetic tree structure. For each model, the maximum-likelihood values of the parameters were obtained. The models were compared using the Akaike information criterion, the likelihood-ratio test, and Cox's test. With a high significance level, models that permit a nonzero rate of double substitutions performed better than those that assume zero double substitution rate. Some models assume symmetry between GC and CG, between AU and UA, and between GU and UG. Models that relaxed this symmetry assumption performed slightly better, but the tests did not all agree on the significance level. The most general time-reversible model significantly outperformed any of the simplifications. We consider the relative merits of all these models for molecular phylogenetics.
Article
Full-text available
Transpositions of Alu sequences, representing the most abundant primate short interspersed elements (SINE), were evaluated as molecular cladistic markers to analyze the phylogenetic affiliations among the primate infraorders. Altogether 118 human loci, containing intronic Alu elements, were PCR analyzed for the presence of Alu sequences at orthologous sites in each of two strepsirhine, New World and Old World monkey species, Tarsius bancanus, and a nonprimate outgroup. Fourteen size-polymorphic amplification patterns exhibited longer fragments for the anthropoids (New World and Old World monkeys) and T. bancanus whereas shorter fragments were detected for the strepsirhines and the outgroup. From these, subsequent sequence analyses revealed three Alu transpositions, which can be regarded as shared derived molecular characters linking tarsiers and anthropoid primates. Concerning the other loci, scenarios are represented in which different SINE transpositions occurred independently in the same intron on the lineages leading both to the common ancestor of anthropoids and to T. bancanus, albeit at different nucleotide positions. Our results demonstrate the efficiency and possible pitfalls of SINE transpositions used as molecular cladistic markers in tracing back a divergence point in primate evolution over 40 million years old. The three Alu insertions characterized underpin the monophyly of haplorhine primates (Anthropoidea and Tarsioidea) from a novel perspective.
Article
Full-text available
Extensive phylogenetic analyses of the updated sequence data of mammalian mitochondrial genomes were carried out using the maximum likelihood method in order to resolve deep branchings in eutherian evolution. The divergence times in the mammalian tree were estimated by a relaxed molecular clock of the mitochondrial proteins calibrated with multiple references. A Chiroptera/Eulipotyphla (i.e. bat/mole) clade and a close relationship of this clade to Fereuungulata (Carnivora+Perissodactyla+Cetartiodactyla) were reconfirmed with high statistical significance. However, a support for a monophyly of Fereuungulata relative to the Chiroptera/Eulipotyphla clade was fragile, and we suggest that the three branchings among Carnivora, Perissodactyla, Cetartiodactyla and Chiroptera/Eulipotyphla occurred successively in a short time period, estimated to be approximately 77Myr BP. The Chiroptera/Eulipotyphla divergence was estimated to roughly coincide with the Cretaceous-Tertiary boundary (65Myr BP). The monophyly of Rodentia, the Lagomorpha/Rodentia clade (traditionally called Glires), and the Afrotheria/Xenarthra clade were preferred over alternative relationships, but the supports of these clades were not strong enough to exclude other possibilities. Although several super-order taxa of eutherians were strongly supported by the analyses of the mitochondrial genome data, the branching order in the deepest part of the eutherian tree remained ambiguous from the data presently available.
Article
Full-text available
1. Background to RNA structure 200 1.1 Types of RNA 200 1.1.1 Transfer RNA (tRNA) 200 1.1.2 Messenger RNA (mRNA) 201 1.1.3 Ribosomal RNA (rRNA) 201 1.1.4 Other ribonucleoprotein particles 202 1.1.5 Viruses and viroids 202 1.1.6 Ribozymes 202 1.2 Elements of RNA secondary structure 203 1.3 Secondary structure versus tertiary structure 205 2. Theoretical and computational methods for RNA secondary structure determination 208 2.1 Dynamic programming algorithms 208 2.2 Kinetic folding algorithms 210 2.3 Genetic algorithms 212 2.4 Comparative methods 213 3. RNA thermodynamics and folding mechanisms 216 3.1 The reliability of minimum free energy structure prediction 216 3.2 The relevance of RNA folding kinetics 218 3.3 Examples of RNA folding kinetics simulations 221 3.4 RNA as a disordered system 227 4. Aspects of RNA evolution 233 4.1 The relevance of RNA for studies of molecular evolution 233 4.1.1 Molecular phylogenetics 234 4.1.2 tRNAs and the genetic code 234 4.1.3 Viruses and quasispecies 235 4.1.4 Fitness landscapes 235 4.2 The interaction between thermodynamics and sequence evolution 236 4.3 Theory of compensatory substitutions in RNA helices 238 4.4 Rates of compensatory substitutions obtained from sequence analysis 240 5. Conclusions 246 6. Acknowledgements 246 7. References 246 This article takes an inter-disciplinary approach to the study of RNA secondary structure, linking together aspects of structural biology, thermodynamics and statistical physics, bioinformatics, and molecular evolution. Since the intended audience for this review is diverse, this section gives a brief elementary level discussion of the chemistry and structure of RNA, and a rapid overview of the many types of RNA molecule known. It is intended primarily for those not already familiar with molecular biology and biochemistry. Ribonucleic acid consists of a linear polymer with a backbone of ribose sugar rings linked by phosphate groups. Each sugar has one of the four ‘bases’ adenine, cytosine, guanine and uracil (A, C, G, and U) linked to it as a side group. The structure and function of an RNA molecule is specific to the sequence of bases. The phosphate groups link the 5′ carbon of one ribose to the 3′ carbon of the next. This imposes a directionality on the backbone. The two ends are referred to as 5′ and 3′ ends, since one end has an unlinked 5′ carbon and one has an unlinked 3′ carbon. The chemical differences between RNA and DNA (deoxyribonucleic acid) are fairly small: one of the OH groups in ribose is replaced by an H in deoxyribose, and DNA contains thymine (T) bases instead of U. However, RNA structure is very different from DNA structure. In the familiar double helical structure of DNA the two strands are perfectly complementary in sequence. RNA usually occurs as single strands, and base pairs are formed intra -molecularly, leading to a complex arrangement of short helices which is the basis of the secondary structure. Some RNA molecules have well-defined tertiary structures. In this sense, RNA structures are more akin to globular protein structures than to DNA. The role of proteins as biochemical catalysts and the role of DNA in storage of genetic information have long been recognised. RNA has sometimes been considered as merely an intermediary between DNA and proteins. However, an increasing number of functions of RNA are now becoming apparent, and RNA is coming to be seen as an important and versatile molecule in its own right.
Article
Full-text available
The order Rodentia contains half of all extant mammal species, and from an evolutionary standpoint, there are persistent controversies surrounding the monophyly of the order, divergence dates for major lineages, and relationships among families. Exons of growth hormone receptor (GHR) and breast cancer susceptibility (BRCA1) genes were sequenced for a wide diversity of rodents and other mammals and combined with sequences of the mitochondrial 12S rRNA gene and previously published sequences of von Willebrand factor (vWF). Rodents exhibit rates of amino acid replacement twice those observed for nonrodents, and this rapid rate of evolution influences estimates of divergence dates. Based on GHR sequences, monophyly is supported, with the estimated divergence between hystricognaths and most sciurognaths dating to about 75 MYA. Most estimated dates of divergence are consistent with the fossil record, including a date of 23 MYA for Mus-Rattus divergence. These dates are considerably later than those derived from some other molecular studies. Among combined and separate analyses of the various gene sequences, moderate to strong support was found for several clades. GHR appears to have greater resolving power than do 12S or vWF. Despite its complete unresponsiveness to growth hormone, Cavia (and other hystricognaths) exhibits a conservative rate of change in the intracellular domain of GHR.
Article
Full-text available
The monotremes, the duck-billed platypus and the echidnas, are characterized by a number of unique morphological characteristics, which have led to the common belief that they represent the living survivors of an ancestral stock of mammals. Analysis of new data from the complete mitochondrial (mt) genomes of a second monotreme, the spiny anteater, and another marsupial, the wombat, yielded clear support for the Marsupionta hypothesis. According to this hypothesis marsupials are more closely related to monotremes than to eutherians, consistent with a basal split between eutherians and marsupials/monotremes among extant mammals. This finding was also supported by analysis of new sequences from a nuclear gene--18S rRNA. The mt genome of the wombat shares some unique features with previously described marsupial mtDNAs (tRNA rearrangement, a missing tRNA(Lys), and evidence for RNA editing of the tRNA(Asp)). Molecular estimates of genetic divergence suggest that the divergence between the platypus and the spiny anteater took place approximately 34 million years before present (MYBP), and that between South American and Australian marsupials approximately 72 MYBP.
Article
Full-text available
As a discipline, phylogenetics is becoming transformed by a flood of molecular data. These data allow broad questions to be asked about the history of life, but also present difficult statistical and computational problems. Bayesian inference of phylogeny brings a new perspective to a number of outstanding issues in evolutionary biology, including the analysis of large phylogenetic trees and complex evolutionary models and the detection of the footprint of natural selection in DNA sequences.
Article
Full-text available
Comparative analysis of RNA sequences is the basis for the detailed and accurate predictions of RNA structure and the determination of phylogenetic relationships for organisms that span the entire phylogenetic tree. Underlying these accomplishments are very large, well-organized, and processed collections of RNA sequences. This data, starting with the sequences organized into a database management system and aligned to reveal their higher-order structure, and patterns of conservation and variation for organisms that span the phylogenetic tree, has been collected and analyzed. This type of information can be fundamental for and have an influence on the study of phylogenetic relationships, RNA structure, and the melding of these two fields. We have prepared a large web site that disseminates our comparative sequence and structure models and data. The four major types of comparative information and systems available for the three ribosomal RNAs (5S, 16S, and 23S rRNA), transfer RNA (tRNA), and two of the catalytic intron RNAs (group I and group II) are: (1) Current Comparative Structure Models; (2) Nucleotide Frequency and Conservation Information; (3) Sequence and Structure Data; and (4) Data Access Systems. This online RNA sequence and structure information, the result of extensive analysis, interpretation, data collection, and computer program and web development, is accessible at our Comparative RNA Web (CRW) Site http://www.rna.icmb.utexas.edu. In the future, more data and information will be added to these existing categories, new categories will be developed, and additional RNAs will be studied and presented at the CRW Site.
Article
Full-text available
Inconsistencies between phylogenetic interpretations obtained from independent sources of molecular data occasionally hamper the recovery of the true evolutionary history of certain taxa. One prominent example concerns the primate infraordinal relationships. Phylogenetic analyses based on nuclear DNA sequences traditionally represent Tarsius as a sister group to anthropoids. In contrast, mitochondrial DNA (mtDNA) data only marginally support this affiliation or even exclude Tarsius from primates. Two possible scenarios might cause this conflict: a period of adaptive molecular evolution or a shift in the nucleotide composition of higher primate mtDNAs through directional mutation pressure. To test these options, the entire mt genome of Tarsius bancanus was sequenced and compared with mtDNA of representatives of all major primate groups and mammals. Phylogenetic reconstructions at both the amino acid (AA) and DNA level of the protein-coding genes led to faulty tree topologies depending on the algorithms used for reconstruction. We propose that these artifactual affiliations rather reflect the nucleotide compositional similarity than phylogenetic relatedness and favor the directional mutation pressure hypothesis because: (1) the overall nucleotide composition changes dramatically on the lineage leading to higher primates at both silent and nonsilent sites, and (2) a highly significant correlation exists between codon usage and the nucleotide composition at the third, silent codon position. Comparisons of mt genes with mt pseudogenes that presumably transferred to the nucleus before the directional mutation pressure took place indicate that the ancestral DNA composition is retained in the relatively fossilized mtDNA-like sequences, and that the directed acceleration of the substitution rate in higher primates is restricted to mtDNA.
Article
Full-text available
The strict orthology of mitochondrial (mt) coding sequences has promoted their use in phylogenetic analyses at different levels. Here we present the results of a mitogenomic study (i.e., analysis based on the set of protein-coding genes from complete mt genomes) of 60 mammalian species. This number includes 11 new mt genomes. The sampling comprises all but one of the traditional eutherian orders. The previously unrepresented order Dermoptera (flying lemurs) fell within Primates as the sister group of Anthropoidea, making Primates paraphyletic. This relationship was strongly supported. Lipotyphla ("insectivores") split into three distinct lineages: Erinaceomorpha, Tenrecomorpha, and Soricomorpha. Erinaceomorpha was the basal eutherian lineage. Sirenia (dugong) and Macroscelidea (elephant shrew) fell within the African clade. Pholidota (pangolin) joined the Cetferungulata as the sister group of Carnivora. The analyses identified monophyletic Pinnipedia with Otariidae (sea lions, fur seals) and Odobenidae (walruses) as sister groups to the exclusion of Phocidae (true seals).
Article
Full-text available
We explore the tree of mammalian mtDNA sequences, using particularly the LogDet transform on amino acid sequences, the distance Hadamard transform, and the Closest Tree selection criterion. The amino acid composition of different species show significant differences, even within mammals. After compensating for these differences, nearest-neighbor bootstrap results suggest that the tree is locally stable, though a few groups show slightly greater rearrangements when a large proportion of the constant sites are removed. Many parts of the trees we obtain agree with those on published protein ML trees. Interesting results include a preference for rodent monophyly. The detection of a few alternative signals to those on the optimal tree were obtained using the distance Hadamard transform (with results expressed as a Lento plot). One rearrangement suggested was the interchange of the position of primates and rodents on the optimal tree. The basic stability of the tree, combined with two calibration points (whale/cow and horse/rhinoceros), together with a distant secondary calibration from the mammal/bird divergence, allows inferences of the times of divergence of putative clades. Allowing for sampling variances due to finite sequence length, most major divergences amongst lineages leading to modern orders, appear to occur well before the Cretaceous/Tertiary (K/T) boundary. Implications arising from these early divergences are discussed, particularly the possibility of competition between the small dinosaurs and the new mammal clades.
Article
Transpositions of Alu sequences, representing the most abundant primate short interspersed elements (SINE), were evaluated as molecular cladistic markers to analyze the phylogenetic affiliations among the primate infraorders. Altogether 118 human loci, containing intronic Alu elements, were PCR analyzed for the presence of Alu sequences at orthologous sites in each of two strepsirhine, New World and Old World monkey species, Tarsius bancanus, and a nonprimate outgroup. Fourteen size-polymorphic amplification patterns exhibited longer fragments for the anthropoids (New World and Old World monkeys) and T. bancanus whereas shorter fragments were detected for the strepsirhines and the outgroup. From these, subsequent sequence analyses revealed three Alu transpositions, which can be regarded as shared derived molecular characters linking tarsiers and anthropoid primates. Concerning the other loci, scenarios are represented in which different SINE transpositions occurred independently in the same intron on the lineages leading both to the common ancestor of anthropoids and to T. bancanus, albeit at different nucleotide positions. Our results demonstrate the efficiency and possible pitfalls of SINE transpositions used as molecular cladistic markers in tracing back a divergence point in primate evolution over 40 million years old. The three Alu insertions characterized underpin the monophyly of haplorhine primates (Anthropoidea and Tarsioidea) from a novel perspective.
Chapter
There are many examples of RNA molecules in which the secondary structure has been strongly conserved during evolution, but the base sequence is much less conserved, e.g., transfer RNA, ribosomal RNA, and ribonuclease P. A model of compensatory neutral mutations is used here to describe the evolution of the base sequence in RNA helices. There are two loci (i.e., the two sides of the pair) with four alleles at each locus (corresponding to A, C., G, U). Watson-Crick base pairs (AU, CG, GC, and UA) are each assigned a fitness 1, whilst all other pairs are treated as mismatches and assigned fitness 1-s. A population of N diploid individuals is considered with a mutation rate of u per base. For biologically reasonable parameter values, the frequency of mismatches is always small but the frequency of the four matching pairs can vary over a wide range. Using a diffusion model, the stationary distribution for the frequency x of any of the four matching pairs is calculated. The shape depends on the combination of variables β = 8Nu2/9s. For small β, the distribution diverges at the two extremes, x = 0 and x = 1-z, where z is the mean frequency of mismatches. The population typically consists almost entirely of one of the four types of matching pairs, but occasionally makes shifts between the four possible states. The mean rate at which these shifts occur is calculated here. The effect of recombination between the two loci is to decrease the probability density at intermediate x, and to increase the weight at the extremes. The rate of transition between the four states is slowed by recombination (as originally shown by Kimura in a two-allele model with irreversible mutation). A very small recombination rate r ∼ u2/s is sufficient to increase the mean time between transitions dramatically. In addition to its application to RNA, this model is also relevant to the’ shifting balance’ theory describing the drift of populations between alternative equilibria separated by low fitness valleys. Equilibrium values for the frequencies of the different allele combinations in an infinite population are also calculated. It is shown that for low recombination rates the equilibrium is symmetric, but there is a critical recombination rate above which alternative asymmetric equilibria become stable.
Article
We concatenated sequences for four mitochondrial genes (12S rRNA, tRNA valine, 16S rRNA, cytochrome b) and four nuclear genes [aquaporin, alpha 2B adrenergic receptor (A2AB), interphotoreceptor retinoid-binding protein (IRBP), von Willebrand factor (vWF)] into a multigene data set representing 11 eutherian orders (Artiodactyla, Hyracoidea, Insectivora, Lagomorpha, Macroscelidea, Perissodactyla, Primates, Proboscidea, Rodentia, Sirenia, Tubulidentata). Within this data set, we recognized nine mitochondrial partitions (both stems and loops, for each of 12S rRNA, tRNA valine, and 16S rRNA; and first, second, and third codon positions of cytochrome b) and 12 nuclear partitions (first, second, and third codon positions, respectively, of each of the four nuclear genes). Four of the 21 partitions (third positions of cytochrome b, A2AB, IRBP, and vWF) showed significant heterogeneity in base composition across taxa. Phylogenetic analyses (parsimony, minimum evolution, maximum likelihood) based on sequences for all 21 partitions provide 99–100% bootstrap support for Afrotheria and Paenungulata. With the elimination of the four partitions exhibiting heterogeneity in base composition, there is also high bootstrap support (89–100%) for cow + horse. Statistical tests reject Altungulata, Anagalida, and Ungulata. Data set heterogeneity between mitochondrial and nuclear genes is most evident when all partitions are included in the phylogenetic analyses. Mitochondrial-gene trees associate cow with horse, whereas nuclear-gene trees associate cow with hedgehog and these two with horse. However, after eliminating third positions of A2AB, IRBP, and vWF, nuclear data agree with mitochondrial data in supporting cow + horse. Nuclear genes provide stronger support for both Afrotheria and Paenungulata. Removal of third positions of cytochrome b results in improved performance for the mitochondrial genes in recovering these clades.
Article
A pair of mutations at different loci (or sites) which are singly deleterious but restore normal fitness in combination may be called compensatory neutral mutations. Population dynamics concerning evolutionary substitutions of such mutants was developed by making use of the diffusion equation method. Based on this theory and, also, by the help of Monte Carlo simulation experiments, a remarkable phenomenon was disclosed that the double mutants can easily become fixed in the population by random drift under continued mutation pressure if the loci arc tightly linked, even when the single mutants are definitely deleterious. More specifically, I consider two loci with allelesA andA’ in the first locus, and allelesB andB’in the second locus, and assign relative fitnesses 1, 1-s’, 1-s’ and 1 respectively to the four gene combinationsAB, A’B, AB’ andA’B’, wheres’ is the selection coefficient against the single mutants (s’ > 0). Letv be the mutation rate per locus per generation and assume that mutation occurs irreversibly fromA toA’ at the first locus, and fromB toB’ at the second locus, whereA andB are wild type genes, andA’ andB’ are their mutant alleles. In a diploid population of effective size Ne (or a haploid population of 2N e breeding individuals), it was shown that the average time (T) until joint fixation of the double mutant (A’B’) starting from the state in which the population consists exclusively of the wild type genes (AB) is not excessively long even for large 4N es’ values. In fact, assuming2N ev = 1 we have -T = 54Ne for 4Nes’ = 400, and -T = 128Ne for 4N es’ = 1000. These values are not unrealistically long as compared with -T~ 5N e obtained for 4N es’ = 0. The approximate analytical treatment has also been extended to estimate the effect of low rate crossing over in retarding fixation. The bearing of these findings on molecular evolution is discussed with special reference to coupled substitutions at interacting amino acid (or nucleotide) sites within a folded protein (orrna) molecule. It is concluded that compensatory neutral mutants may play an important role in molecular evolution.
Article
Base composition varies at all levels of the phylogenetic hierarchy and throughout the genome, and can be caused by active selection or passive mutation pressure. This variation can make reconstructing trees difficult. However, recent tree-based analyses have shed light on the forces responsible for the evolution of base composition, forces that might be very general. More explicit tree-based work is encouraged.
Article
Long restricted to the domain of molecular systematics and studies of molecular evolution, likelihood methods are now being used in analyses of discrete morphological data, specifically to estimate ancestral character states and for tests of character correlation. Biologists are beginning to apply likelihood models within a Bayesian statistical framework, which promises not only to provide answers that evolutionary biologists desire, but also to make practical the application of more realistic evolutionary models.
Article
12S ribosomal RNA (rRNA) gene sequences from a suite of mammalian taxa (13 placentals, 4 marsupials, 1 monotreme), for which phylogenetic relationships are well established based on independent criteria, were employed to study the evolution of this gene. Phylogenetic analysis of 12S sequences produces a phylogeny that agrees with expectations. Base composition provides evidence for directional symmetrical substitution pressure in loops; in stems, base composition is much more even. Rates of nucleotide substitution are lower in stems than loops. Patterns of nucleotide substitution show an overall preference for transitions over transversions, with this difference more profound in stems than loops. Among different transversion pathways, there is a wide range of transformation frequencies. An analysis of compensatory substitutions shows that there is strong evidence for their occurrence and that a weighting factor of 0.61 should be applied in phylogenetic analyses to account for the dependence of mutations at stem positions relative to positions where changes are independent. Among stem variables (i.e., stem length, interaction distance, substitution rates, G+C content, and the percentage of bases that are paired), several significant correlations were discovered, but stem length and interaction distance are uncorrelated with other variables.
Article
Intrastrand base pairings give ribosomal and other RNA molecules characteristic structures that are important for their function. In order to maintain these structures, a substitution at one paired site may have to be compensated for by an appropriate substitution at the complementary site. Thus paired sites do not evolve independently of one another. Most current methods for inferring phylogeny from molecular sequences assume that the sites are independent and will therefore give statistically unreliable and possibly erroneous results when used on structured RNA sequences. We analyze a new probabilistic model for the evolution of double-stranded RNA molecules that considers substitutions of the base pairs rather than of each of the bases independently. The new model, called the double-stranded model, was incorporated into the neighbor-joining distance and maximum likelihood methods. Computer simulations show that maximum likelihood is very robust to the violation of the assumption of the independence of sites. In contrast, the neighbor-joining method is sensitive to such violations: the double-stranded model can provide a significant increase in the chance of obtaining the correct tree topologies with neighbor joining when distances are large and the tree is difficult to obtain. The new model also leads to lower but more realistic estimates for the statistical confidence in the branch lengths and tree topologies.
Article
Currently used stochastic models of DNA sequence evolution assume independent and identically distributed nucleotide sites. They are too simple to account for dependence structures obviously present in molecular data. Up to now more realistic stochastic models for nucleotide substitutions have been considered intractable. In this paper a procedure that accounts for non-overlapping correlations among pairs of sites of a DNA sequence is developed. We show that currently used models that ignore correlated sites underestimate distances inferred from observed sequence dissimilarities. For the analyzed mitochondrial sequence data this underestimation is not drastic in contrast to paired regions (stems) of bacterial 23S rRNA sequences.
Article
Evolutionary models appropriate for analyzing nucleotide sequences that are subject to constraints on secondary structure are developed. The models consider the evolution of pairs of nucleotides, and they incorporate the effects of base-pairing constraints on nucleotide substitution rates by introducing a new parameter to extensions of standard models of sequence evolution. To illustrate some potential uses of the models, a likelihood-ratio test is constructed for the null hypothesis that two (prespecified) regions of DNA evolve independently of each other. The sampling properties of the test are explored via simulation. The test is then incorporated into a heuristic method for identifying the location of unknown stems. The test and related procedures are applied to data from ribonuclease P RNA sequences of bacteria.
Article
Phylogenetic relationships among 27 extant mammalian species (representing 15 placental orders) were studied using sequences of exon 28 of the gene encoding von Willebrand Factor (vWF), a glycoprotein which functions in blood clotting. Analysis of sequences coding for vWF revealed evidence for several subordinal and superordinal groupings, but the earliest branching sequence of placental mammals was left largely unresolved. Strong support was found for a monophyletic clade consisting of elephants, sea cows, hyraxes, aardvarks, and elephant shrews. This systematic placement of the elephant shrews agrees strongly with two other molecular data sets (interphotoreceptor retinoid binding protein and alpha-lens crystallins) and is consistent with analysis of fossil elephant shrews recently discovered in north Africa. Evidence from vWF sequences agrees with a number of previous molecular and morphological studies in providing strong support for the monophyly of both bats and rodents. The orders Primates, Proboscidea, Carnivora, Perissodactyla, and Artiodactyla were represented by more than one species which joined in each case to form a monophyletic order.
Article
A two-locus model is presented to analyze the evolution of compensatory mutations occurring in stems of RNA secondary structures. Single mutations are assumed to be deleterious but harmless (neutral) in appropriate combinations. In proceeding under mutation pressure, natural selection and genetic drift from one fitness peak to another one, a population must therefore pass through a valley of intermediate deleterious states of individual fitness. The expected time for this transition is calculated using diffusion theory. The rate of compensatory evolution, kappa c, is then defined as the inverse of the expected transition time. When selection against deleterious single mutations is strong, kappa c becomes independent on the recombination fraction r between the two loci. Recombination generally reduces the rate of compensatory evolution because it breaks up favorable combinations of double mutants. For complete linkage, kappa c is given by the rate at which favorable combinations of double mutants are produced by compensatory mutation. For r > O, kappa c decreases exponentially with r. In contrast, kappa c becomes independent of r for weak selection. We discuss the dynamics of evolutionary substitutions of compensatory mutants in relation of WRIGHT's shifting balance theory of evolution and use our results to analyze the substitution process in helices of mRNA secondary structures.
Article
The complete mitochondrial 12S rRNA sequences of 5 placental mammals belonging to the 3 orders Sirenia, Proboscidea, and Hyracoidea are reported together with phylogenetic analyses (distance and parsimony) of a total of 51 mammalian orthologues. This 12S rRNA database now includes the 2 extant proboscideans (the African and Asiatic elephants Loxodonta africana and Elephas maximus), 2 of the 3 extant sirenian genera (the sea cow Dugong dugon and the West Indian manatee Trichechus manatus), and 2 of the 3 extant hyracoid genera (the rock and tree hyraxes Procavia capensis and Dendrohyrax dorsalis). The monophyly of the 3 orders Sirenia, Proboscidea, and Hyracoidea is supported by all kinds of analysis. There are 23 and 3 diagnostic subsitutions shared by the 2 proboscideans and the 2 hyracoids, respectively, but none by the 2 sirenians. The 2 proboscideans exhibit the fastest rates of 12S rRNA evolution among the 11 placental orders studied. Based on various taxonomic sampling methods among eutherian orders and marsupial outgroups, the most strongly supported clade in our comparisons clusters together the 3 orders Sirenia, Proboscidea, and Hyracoidea in the superorder Paenungulata. Within paenungulates, the grouping of sirenians and proboscideans within the mirorder Tethytheria is observed. This branching pattern is supported by all analyses by high bootstrap percentages (BPs) and decay indices. When only one species is selected per order or suborder, the taxonomic sampling leads to a relative variation in bootstrap support of 53% for Tethytheria (BPs ranging from 44 to 93%) and 7% for Paernungulata (92-99%). When each order or suborder is represented by two species, this relative variation decreased to 10% for Tethytheria (78-87%) and 3% for Paenungulata (96-99%). Two nearly exclusive synapomorphies for paenungulates are identified in the form of one transitional compensatory change, but none were detected for tethytherians. Such a robust and reliable resolution of the paenungulate node implies a long history of the common ancestors, allowing time for synapomorphies to accumulate. This observation suggests a Late Cretaceous/Early Paleocene origin for the Paenungulata.
Article
We present a model for the evolution of paired bases in RNA sequences. The new model allows for the instantaneous rate of substitution of both members of a base pair in a compensatory substitution (e.g., A-U-->G-C) and expands our previous work by allowing for unpaired bases or noncanonical pairs. We implemented the model with distance and maximum likelihood methods to estimate the rates of simultaneous substitution of both bases, alphad, vs. rates of substitution of individual bases, alphas in rRNA. In the rapidly evolving D2 expansion segments of Drosophila large subunit rRNA, we estimate a low ratio of alphad/alphas, indicating that most compensatory substitutions involve a G-U intermediate. In contrast, we find a surprisingly high ratio of alphad/alphas in the core small subunit rRNA, indicating that the evolution of the slowly evolving rRNA sequences is modeled much more accurately if simultaneous substitution of both members of a base pair is allowed to occur approximately as often as substitution of individual bases. Using simulations, we have ruled out several potential sources of error in the estimation of alphad/alphas. We conclude that in the core rRNA sequences compensatory substitutions can be fixed so rapidly as to appear to be instantaneous.
Article
There are many examples of RNA molecules in which the secondary structure has been strongly conserved during evolution, but the base sequence is much less conserved, e.g., transfer RNA, ribosomal RNA, and ribonuclease P. A model of compensatory neutral mutations is used here to describe the evolution of the base sequence in RNA helices. There are two loci (i.e., the two sides of the pair) with four alleles at each locus (corresponding to A, C, G, U). Watson-Crick base pairs (AU, CG, GC, and UA) are each assigned a fitness 1, whilst all other pairs are treated as mismatches and assigned fitness 1-s. A population of N diploid individuals is considered with a mutation rate of u per base. For biologically reasonable parameter values, the frequency of mismatches is always small but the frequency of the four matching pairs can vary over a wide range. Using a diffusion model, the stationary distribution for the frequency x of any of the four matching pairs is calculated. The shape depends on the combination of variables beta = 8Nu2/9s. For small beta, the distribution diverges at the two extremes, x = 0 and x = 1-z, where z is the mean frequency of mismatches. The population typically consists almost entirely of one of the four types of matching pairs, but occasionally makes shifts between the four possible states. The mean rate at which these shifts occur is calculated here. The effect of recombination between the two loci is to decrease the probability density at intermediate x, and to increase the weight at the extremes. The rate of transition between the four states is slowed by recombination (as originally shown by Kimura in a two-allele model with irreversible mutation). A very small recombination rate r approximately u2/s is sufficient to increase the mean time between transitions dramatically. In addition to its application to RNA, this model is also relevant to the 'shifting balance' theory describing the drift of populations between alternative equilibria separated by low fitness valleys. Equilibrium values for the frequencies of the different allele combinations in an infinite population are also calculated. It is shown that for low recombination rates the equilibrium is symmetric, but there is a critical recombination rate above which alternative asymmetric equilibria become stable.
Article
Higher level relationships among placental mammals, as well as the historical biogeography and morphological diversification of this group, remain unclear. Here we analyse independent molecular data sets, having aligned lengths of DNA of 5,708 and 2,947 base pairs, respectively, for all orders of placental mammals. Phylogenetic analyses resolve placental orders into four groups: Xenarthra, Afrotheria, Laurasiatheria, and Euarchonta plus Glires. The first three groups are consistently monophyletic with different methods of analysis. Euarchonta plus Glires is monophyletic or paraphyletic depending on the phylogenetic method. A unique nine-base-pair deletion in exon 11 of the BRCA1 gene provides additional support for the monophyly of Afrotheria, which includes proboscideans, sirenians, hyracoids, tubulidentates, macroscelideans, chrysochlorids and tenrecids. Laurasiatheria contains cetartiodactyls, perissodactyls, carnivores, pangolins, bats and eulipotyphlan insectivores. Parallel adaptive radiations have occurred within Laurasiatheria and Afrotheria. In each group, there are aquatic, ungulate and insectivore-like forms.
Article
The mitochondrial genome (mtDNA), due to its peculiar features such as exclusive presence of orthologous genes, uniparental inheritance, lack of recombination, small size and constant gene content, certainly represents a major model system in studies on evolutionary genomics in metazoan. In 800 million years of evolution the gene content of metazoan mitochondrial genomes has remained practically frozen but several evolutionary processes have taken place. These processes, reviewed here, include rearrangements of gene order, changes in base composition and arising of compositional asymmetry between the two strands, variations in the genetic code and evolution of codon usage, lineage-specific nucleotide substitution rates and evolutionary patterns of mtDNA control regions.
Article
We concatenated sequences for four mitochondrial genes (12S rRNA, tRNA valine, 16S rRNA, cytochrome b) and four nuclear genes [aquaporin, alpha 2B adrenergic receptor (A2AB), interphotoreceptor retinoid-binding protein (IRBP), von Willebrand factor (vWF)] into a multigene data set representing 11 eutherian orders (Artiodactyla, Hyracoidea, Insectivora, Lagomorpha, Macroscelidea, Perissodactyla, Primates, Proboscidea, Rodentia, Sirenia, Tubulidentata). Within this data set, we recognized nine mitochondrial partitions (both stems and loops, for each of 12S rRNA, tRNA valine, and 16S rRNA; and first, second, and third codon positions of cytochrome b) and 12 nuclear partitions (first, second, and third codon positions, respectively, of each of the four nuclear genes). Four of the 21 partitions (third positions of cytochrome b, A2AB, IRBP, and vWF) showed significant heterogeneity in base composition across taxa. Phylogenetic analyses (parsimony, minimum evolution, maximum likelihood) based on sequences for all 21 partitions provide 99-100% bootstrap support for Afrotheria and Paenungulata. With the elimination of the four partitions exhibiting heterogeneity in base composition, there is also high bootstrap support (89-100%) for cow + horse. Statistical tests reject Altungulata, Anagalida, and Ungulata. Data set heterogeneity between mitochondrial and nuclear genes is most evident when all partitions are included in the phylogenetic analyses. Mitochondrial-gene trees associate cow with horse, whereas nuclear-gene trees associate cow with hedgehog and these two with horse. However, after eliminating third positions of A2AB, IRBP, and vWF, nuclear data agree with mitochondrial data in supporting cow + horse. Nuclear genes provide stronger support for both Afrotheria and Paenungulata. Removal of third positions of cytochrome b results in improved performance for the mitochondrial genes in recovering these clades.