ArticlePDF Available

Abstract and Figures

Current understanding of the phylogeny of prokaryotes is based on the comparison of the highly conserved small ssu-rRNA subunit and similar regions. Although such molecules have proved to be very useful phylogenetic markers, mutational saturation is a problem, due to their restricted lengths. Now, a growing number of complete prokaryotic genomes are available. This paper addresses the problem of determining a prokaryotic phylogeny utilizing the comparison of complete genomes. We introduce a new strategy, GBDP, ‘genome blast distance phylogeny’, and show that different variants of this approach robustly produce phylogenies that are biologically sound, when applied to 91 prokaryotic genomes. In this approach, first Blast is used to compare genomes, then a distance matrix is computed, and finally a tree- or network-reconstruction method such as UPGMA, Neighbor-Joining, BioNJ or Neighbor-Net is applied. Contact: huson@informatik.uni-tuebingen.de
Content may be subject to copyright.
A preview of the PDF is not available
... In the case of bacteria, the average nucleotide identity (ANI) method was developed to mimic DDH using genome sequence information [8]. Another method, the genome blast distance phylogeny (GBDP) method, is also used in combination with the ANI method [9]. Using whole genome data for fungal identification is not yet widespread but can be a powerful tool [10]. ...
... B; GBDP score calculated with GGDC. The scores for formula 2 are shown according to the recommendation in Henz et al. [9], and scores by all three formulae are shown in Fig. S6. C; Percentage identity in the ITS sequence. ...
... As for the GBDP method, genome-to-genome distance scores showed significant differences depending on the formula used in both Cutaneotrichosporon and Saccharomyces, although the genome assemblies are almost complete. This result contrasts with a case in bacteria in which comparison of well-assembled genomes resulted in similar scores by all formulae [9]. The reason for obtaining different scores with different formulae is not clear. ...
Article
Full-text available
Background Since DNA information was first used in taxonomy, barcode sequences such as the internal transcribed spacer (ITS) region have greatly aided fungal identification; however, a barcode sequence alone is often insufficient. Thus, multi-gene- or whole-genome-based methods were developed. We previously isolated Basidiomycota yeasts classified in the Trichosporonales. Some strains were described as Cutaneotrichosporon cavernicola and C. spelunceum, whereas strain HIS471 remained unidentified. We analysed the genomes of these strains to elucidate their taxonomic relationship and genetic diversity. Results The long-read-based assembly resulted in chromosome-level draft genomes consisting of seven chromosomes and one mitochondrial genome. The genome of strain HIS471 has more than ten chromosome inversions or translocations compared to the type strain of C. cavernicola despite sharing identical ITS barcode sequences and displaying an average nucleotide identity (ANI) above 93%. Also, the chromosome synteny between C. cavernicola and the related species, C. spelunceum, showed significant rearrangements, whereas the ITS sequence identity exceeds 98.6% and the ANI is approximately 82%. Our results indicate that the relative evolutionary rates of barcode sequences, whole-genome nucleotide sequences, and chromosome synteny in Cutaneotrichosporon significantly differ from those in the model yeast Saccharomyces. Conclusions Our results revealed that the relative evolutionary rates of nucleotide sequences and chromosome synteny are different among fungal clades, likely because different clades have diverse mutation/repair rates and distinct selection pressures on their genomic sequences and syntenic structures. Because diverse syntenic structures can be a barrier to meiotic recombination and may lead to speciation, the non-linear relationships between nucleotide and synteny diversification indicate that sequence-level distances at the barcode or whole-genome level are not sufficient for delineating species boundaries.
... accessed on 23 December 2023), including ФKEN04. The phages included in the phylogenetic tree were selected according to the following criteria: i) should be a complete genome sequence [54]; ii) should exhibit a high similarity of >70% to ФKEN04 [55]; and iii) should have a genome size similar to that of ФKEN04 [56]. The analysis was conducted using the Virus Classification and Tree Building Online Resource (VICTOR), a method for the genome-based phylogeny and classification of prokaryotic viruses [54] (https://victor.dsmz.de, ...
Preprint
Full-text available
Enterococcus faecalis is a growing cause of nosocomial and antibiotic-resistant infections. Treating drug-resistant E. faecalis requires novel approaches. The use of bacteriophages (phages) against multidrug-resistant (MDR) bacteria has recently garnered global attention. Biofilms play a vital role in E. faecalis pathogenesis as they enhance antibiotic resistance. Phages eliminate biofilms by producing lytic enzymes, including depolymerases. In this study, Enterococcus phage vB_Efs8_KEN04 (ФKEN04), isolated from a sewage treatment plant in Nairobi, Kenya, was tested against clinical strains of MDR Enterococcus faecalis. This phage had a broad host range against 100% (26/26) of MDR E. faecalis clinical isolates and cross-species activity against Enterococcus faecium. It was able to withstand acidic and alkaline conditions, from pH 3 to 11, as well as temperatures between -80ᵒC and 37ᵒC. It could inhibit and disrupt the biofilms of MDR E. faecalis. Its linear double-stranded DNA genome of 142,402 bp contains 238 coding sequences with a G+C content and coding gene density of 36.01% and 91.46%, respectively. Genomic analyses showed that ФKEN04 belongs to the genus Kochikohdavirus in the family Herelleviridae. It lacked antimicrobial resistance, virulence, and lysogeny genes, and its stability, broad host range, and cross-species lysis indicate strong potential for the treatment of Enterococcus infections.
... The analysis of in silico DNA-DNA hybridization (DDH) was performed using the Genome-to-Genome Distance Calculator (GGDC) (Meier-Kolthoff et al. 2013). In silico DDH is calculated based on the Genome Blast Distance Phylogeny (GBDP), which has been devised as an approach for the construction of phylogenetic trees or networks from a given set of complete genomes (or even incompletely sequenced genomes) (Henz et al. 2005), which has been subsequently improved (Auch et al. 2010). The GBDP determines genome-to-genome distances between pairs of entirely or partially sequenced genomes, (González-Castillo et al. 2021). ...
Article
Full-text available
The family Vibrionaceae is classified into many clades based on their phylogenetic relationships. The Ponticus clade is one of its clades and consists of four species, Vibrio panuliri, V. ponticus, V. rhodolitus, and V. taketomensis. Two strains, CAIM 703 and CAIM 1902, were isolated from the diseased spotted rose snapper external lesion (Lutjanus guttatus), they were analyzed to determine their taxonomic position, a phylogenetic analysis was performed based on the 16S rRNA sequences proved that the two strains are members of the genus Vibrio and they belong to the Ponticus clade. Then, a phylogenomic analysis was performed with four type strains and four reference strains isolated from marine organisms and aquatic environments. Multilocus Sequence Analysis (MLSA) of 139 single-copy genes showed that CAIM 703 and CAIM 1902 belong to V. panuliri. The 16S rRNA sequence similarity value between CAIM 703 and CAIM 1902 was 99.61%. The Ponticus clade species showed Average Nucleotide Identity (ANI) values between 78 to 80% against the two strains for ANIb, except V. panuliri LBS2T (99% and 100% similarity). Finally, this analysis represents the first phylogenomic analysis of the Ponticus clade where V. panuliri strains are reported from Mexico.
... TYGS also extracted the 16S rDNA sequences from the chromosome, compared them to the 16S rRNA gene sequences of all type strains in the TYGS database, and selected the 10 type strains with the most similar 16S rRNA genes [44]. Based on this information the Genome Blast Distance Phylogeny (GBDP) approach [46,47] was used to obtain precise distances between the SO9 chromosome and the best 50 matching type strains using the "coverage" algorithm and distance formula d 5 [48] to determine the 10 closest type strains. To infer phylogenetic information, accurate intergenomic distances were calculated for the selected strains using the "trimming" algorithm and distance formula d 5 [48] with 100 distance replicates. ...
... Rights reserved. support determined through 100 pseudo-bootstrap replicates [30]. ...
Article
Full-text available
A bacterial strain designated as UC was isolated from farmland soil. Strain UCT formed a pale yellow colony on nutrient agar. Cell morphology revealed it as the rod-shaped bacterium that stained Gram-negative. The 16S rRNA gene sequence analysis identified strain UCT as a member of the genus Lysobacter that showed high identity with L. soli DCY21T (99.5%), L. panacisoli CJ29T (98.7%), and L. tabacisoli C8-1T (97.9%). It formed a distinct cluster with these strains in the neighbor-joining phylogenetic tree. A similar tree topology was observed in TYGS-based phylogenomic analysis. However, genome sequence analyses of strain UCT showed 87.7% average nucleotide identity and 34.7% digital DNA–DNA hybridization similarity with the phylogenetically closest species, L. soli DCY21T. The similarity was much less with other closely related strains of the genus Lysobacter. The G + C content of strain UCT was 68.1%. Major cellular fatty acids observed were C14:0 iso (13.4%), C15:0 iso (13.6%), and C15:0 anteiso (14.8%). Quinone Q-8 was the major respiratory ubiquinone. Predominant polar lipids were phosphatidylethanolamine, diphosphatidylglycerol, and phosphatidylglycerol. Production of xanthomonadin pigment was observed. Based on phenotypic differences and phylogenomic analysis, strain UCT represents a novel species of the genus Lysobacter, for which the name Lysobacter arvi is proposed. The type strain of the novel species is UCT (= KCTC 92613T = JCM 23757T = MTCC 12824T).
Chapter
DNA-DNA hybridization (DDH) has been used as the gold standard in prokaryote taxonomy, but it is labor-intensive and prone to experimental errors. The development of high-throughput DNA sequencing technologies has greatly reduced the cost of genome sequencing, resulting in increased availability of microbial genomes. Traditional taxonomic methods, such as DDH, are being challenged by the rapid development of genomic metrics. Overall genome relatedness indices (OGRIs) have been developed to replace DDH. The present chapter describes the OGRI that has been developed as well as their interpretation for defining new taxonomic ranks.
Chapter
A key challenge in microbial phylogenomics is that microbial gene families are often affected by extensive horizontal gene transfer (HGT). As a result, most existing methods for microbial phylogenomics can only make use of a small subset of the gene families present in the microbial genomes under consideration, potentially biasing their results and affecting their accuracy. One well-known approach for truly genome-scale phylogenomics is gene tree parsimony (GTP), which takes as input a collection of gene trees and finds a species tree that most parsimoniously reconciles with the input gene trees. While GTP based methods are widely used for phylogenomic studies of non-microbial species, their underlying reconciliation models are not designed to handle HGT and, therefore, they cannot be meaningfully applied to microbes. No GTP based methods have yet been developed for microbial phylogenomics. In this work, we (i) design and implement the first GTP based approach, PhyloGTP, for microbial phylogenomics, (ii) use an extensive simulation study to systematically assess the accuracies of PhyloGTP and two other recently developed methods, SpeciesRax and ASTRAL-Pro-2, under a range of different conditions, and (iii) analyze two real microbial datasets with different characteristics. We find that PhyloGTP and SpeciesRax are more accurate than ASTRAL-Pro-2 across nearly all tested conditions, that PhyloGTP and SpeciesRax have similar accuracies overall, but there are conditions under which PhyloGTP consistently outperforms SpeciesRax, and that both PhyloGTP and SpeciesRax can sometimes yield incorrect, misleading phylogenies on complex real datasets.
Chapter
You will be introduced to the classification of prokaryotes from species to the higher levels of class and phyla. You will learn that the 16S rRNA gene sequence comparison including phylogenetic analysis provides the main information for classification. DNA–DNA hybridization predicted in silico from whole genomic sequences is mainly used for classification of species. Now, whole genomic sequences (WGS) are increasingly used for classification and identification. In the activity, you will learn to identify an isolate by 16S rRNA sequence in the EzBioCloud server and by whole genomic sequencing in Type Strain Genome Server (TYGS).
Chapter
Phylogeny is a model of the relationships between organisms, genes, proteins, or other structures based on common ancestry. It is also used for epidemiological investigations and analysis of parallel evolution between host and parasite. Phylogenetic trees can be visualized as dendrograms or as radial trees. The most important information read from a phylogenetic tree is the location of the different monophyletic groups. The main types of model parameters needed to construct a tree from a given dataset are the tree shape and the substitution matrix. One of the four types of phylogenetic methods (maximum parsimony, neighbor joining, maximum likelihood and Mr. Bayes) can then be used to construct the tree. The strength of trees can be evaluated by bootstrap analysis. The major data formats used as input for phylogenetic programs are presented as well as the major program packages. Finally, the reader is guided to the construct own neighbor joining tree.
Article
Full-text available
We describe here the complete genome sequence (1,111,523 base pairs) of the obligate intracellular parasite Rickettsia prowazekii, the causative agent of epidemic typhus. This genome contains 834 protein-coding genes. The functional profiles of these genes show similarities to those of mitochondrial genes: no genes required for anaerobic glycolysis are found in either R. prowazekii or mitochondrial genomes, but a complete set of genes encoding components of the tricarboxylic acid cycle and the respiratory-chain complex is found in R. prowazekii. In effect, ATP production in Rickettsia is the same as that in mitochondria. Many genes involved in the biosynthesis and regulation of biosynthesis of amino acids and nucleosides in free-living bacteria are absent from R. prowazekii and mitochondria. Such genes seem to have been replaced by homologues in the nuclear (host) genome. The R. prowazekii genome contains the highest proportion of non-coding DNA (24%) detected so far in a microbial genome. Such non-coding sequences may be degraded remnants of 'neutralized' genes that await elimination from the genome. Phylogenetic analyses indicate that R. prowazekii is more closely related to mitochondria than is any other microbe studied so far.
Article
Full-text available
We describe here the complete genome sequence (1,111,523 base pairs) of the obligate intracellular parasite Rickettsia prowazekii, the causative agent of epidemic typhus. This genome contains 834 protein-coding genes. The functional profiles of these genes show similarities to those of mitochondrial genes: no genes required for anaerobic glycolysis are found in either R. prowazekii or mitochondrial genomes, but a complete set of genes encoding components of the tricarboxylic acid cycle and the respiratory-chain complex is found in R. prowazekii. In effect, ATP production in Rickettsia is the same as that in mitochondria. Many genes involved in the biosynthesis and regulation of biosynthesis of amino acids and nucleosides in free-living bacteria are absent from R. prowazekii and mitochondria. Such genes seem to have been replaced by homologues in the nuclear (host) genome. The R. prowazekii genome contains the highest proportion of non-coding DNA (24%) detected so far in a microbial genome. Such non-coding sequences may be degraded remnants of 'neutralized' genes that await elimination from the genome. Phylogenetic analyses indicate that R. prowazekii is more closely related to mitochondria than is any other microbe studied so far.
Conference Paper
Full-text available
Comparison of large, unfinished genomic sequences requires fast methods that are robust to misordering, misorientation, and duplications. A number of fast methods exist that can compute local similarities between such sequences, from which an optimal one-to-one correspondence might be desired. However, existing methods for computing such a correspondence are either too costly to run or are inappropriate for unfinished sequence. We propose an efficient method for refining a set of segment matches such that the resulting segments are of maximal size without non-identity overlaps. This resolved set of segments can be used in various ways to compute a similarity measure between any two large sequences, and hence can be used in alignment, matching, or tree construction algorithms for two or more sequences.
Article
A new method called the neighbor-joining method is proposed for reconstructing phylogenetic trees from evolutionary distance data. The principle of this method is to find pairs of operational taxonomic units (OTUs [= neighbors]) that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree. The branch lengths as well as the topology of a parsimonious tree can quickly be obtained by using this method. Using computer simulation, we studied the efficiency of this method in obtaining the correct unrooted tree in comparison with that of five other tree-making methods: the unweighted pair group method of analysis, Farris's method, Sattath and Tversky's method, Li's method, and Tateno et al.'s modified Farris method. The new, neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods.
Article
We consider specific additive decompositions d = d1 + … + dn of metrics, defined on a finite set X (where a metric may give distance zero to pairs of distinct points). The simplest building stones are the slit metrics, associated to splits (i.e., bipartitions) of the given set X. While an additive decomposition of a Hamming metric into split metrics is in no way unique, we achieve uniqueness by restricting ourselves to coherent decompositions, that is, decompositions d = d1 + … + dn such that for every map f:X → R with f(x) + f(y) ⩾ d(x, y) for all x, yϵX there exist maps f1, …, fn: X → R with f = f1 + … + fn and fi(x) + fi(y) ⩾ di(x, y) for all i = 1,…, n and all x, yϵX. These coherent decompositions are closely related to a geometric decomposition of the injective hull of the given metric. A metric with a coherent decomposition into a (weighted) sum of split metrics will be called totally split-decomposable. Tree metrics (and more generally, the sum of two tree metrics) are particular instances of totally split-decomposable metrics. Our main result confirms that every metric admits a coherent decomposition into a totally split-decomposable metric and a split-prime residue, where all the split summands and hence the decomposition can be determined in polynomial time, and that a family of splits can occur this way if and only if it does not induce on any four-point subset all three splits with block size two.
Conference Paper
During evolution, chromosomal rearrangements, such as reciprocal translocation, transposition and inversion, disrupt gene content and gene order on chromosomes. We discuss algorithmic and statistical approaches to the analysis of comparative genomic data. In a phylogenetic context, a combined approach is suggested, leading to the median problem for break- points. We solve this problem first for the case where all genomes have the same gene content, and then for the general case.
Conference Paper
We introduce NeighborNet, a network construction and data representation method that combines aspects of the neighbor joining (NJ) and SplitsTree. Like NJ, NeighborNet uses agglomeration: taxa are combined into progressively larger and larger overlapping clusters. Like SPLITSTREE, NeighborNet constructs networks rather than trees, and so can be used to represent multiple phylogenetic hypotheses simultaneously, or to detect complex evolutionary processes like recombination, lateral transfer and hybridization. NeighborNet tends to produce networks that are substantially more resolved than those made with SPLITSTREE. The method is efficient (O(n3) time) and is well suited for the preliminary analyses of complex phylogenetic data. We report results of three case studies: one based on mitochondrial gene order data from early branching eukaryotes, another based on nuclear sequence data from New Zealand alpine buttercups (Ranunculi), and a third on poorly corrected synthetic data.