About
52
Publications
22,820
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
6,128
Citations
Introduction
Skills and Expertise
Publications
Publications (52)
Replication stress (RS) is a primary source of genomic instability, tumorigenesis, and cancer progression. RS is defined as an uncoupling of the replicative helicase and DNA polymerase, resulting in long stretches of fragile single stranded DNA (ssDNA) that is prone to damage. Excessive RS can result in replication catastrophe and cell death, which...
DNA viruses are important infectious agents known to mediate a large number of human diseases, including cancer. Viral integration into the host genome and the formation of hybrid transcripts are also associated with increased pathogenicity. The high variability of viral genomes, however requires the use of sensitive ensemble hidden Markov models t...
Adaptive radiation is an important mechanism of organismal diversification and can be triggered by new ecological opportunities. Although poorly studied in this regard, parasites are an ideal group in which to study adaptive radiations because of their close associations with host species. Both experimental and comparative studies suggest that the...
We investigated the prevalence of coronaviruses in 44 bats from four families in northeastern Eswatini using high-throughput sequencing of fecal samples. We found evidence of coronaviruses in 18% of the bats. We recovered full or near-full-length genomes from two bat species: Chaerephon pumilus and Afronycteris nana , as well as additional coronavi...
The recent discovery and clinical validation of KRAS inhibitors (KRASi) has ushered in a new therapeutic approach to directly address the previously undruggable mutant KRAS-driven cancers. Unfortunately, as with other oncogene-directed therapies, acquired resistance to KRASi has been observed that is partially attributed to secondary mutations in K...
Purpose:
Human papillomavirus (HPV) plays a major role in oncogenesis and circular extrachromosomal DNA (ecDNA) is found in many cancers. However, the relationship between HPV and circular ecDNA in human cancer is not understood.
Experimental design:
Forty-four primary tumor tissue samples were obtained from a cohort of HPV-positive OPSCC patien...
Adaptive radiation is an important mechanism of organismal diversification, and can be triggered by new ecological opportunities. Although poorly studied in this regard, parasites present an ideal system to study adaptive radiations because of their close associations with host species. Both experimental and comparative studies suggest that the ect...
Introduction: Tumors with oncogene copy number amplification are aggressive, have poor prognosis and, to date, have been very difficult to treat. Computational analyses in a large pan-cancer study revealed that ecDNA comprises over 50% of highly amplified oncogenes. We sought to determine the underlying mechanisms that render tumors with amplified...
Extrachromosomal DNA (ecDNA) amplification promotes intratumoral genetic heterogeneity and accelerated tumor evolution1–3; however, its frequency and clinical impact are unclear. Using computational analysis of whole-genome sequencing data from 3,212 cancer patients, we show that ecDNA amplification frequently occurs in most cancer types but not in...
3123
Background: In the KEYNOTE-059 study, the anti-PD-1 checkpoint inhibitor pembrolizumab was shown to have a modest overall response of 11.6%. Common predictors of response including, high microsatellite instability (MSI-H), PD-L1 expression, tumor mutational burden (TMB) and tumor inflammation signature (TIS), were not individually sufficient f...
Extrachromosomal DNA (ecDNA) amplification promotes high oncogene copy number, intratumoral genetic heterogeneity, and accelerated tumor evolution, but its frequency and clinical impact are not well understood. Here we show, using computational analysis of whole-genome sequencing data from 1,979 cancer patients, that ecDNA amplification occurs in a...
Oncogenes are commonly amplified on particles of extrachromosomal DNA (ecDNA) in cancer1,2, but our understanding of the structure of ecDNA and its effect on gene regulation is limited. Here, by integrating ultrastructural imaging, long-range optical mapping and computational analysis of whole-genome sequencing, we demonstrate the structure of circ...
Green plants (Viridiplantae) include around 450,000–500,000 species1,2 of great diversity and have important roles in terrestrial and aquatic ecosystems. Here, as part of the One Thousand Plant Transcriptomes Initiative, we sequenced the vegetative transcriptomes of 1,124 species that span the diversity of plants in a broad sense (Archaeplastida),...
Green plants (Viridiplantae) include around 450,000–500,000 species1,2 of great diversity and have important roles in terrestrial and aquatic ecosystems. Here, as part of the One Thousand Plant Transcriptomes Initiative, we sequenced the vegetative transcriptomes of 1,124 species that span the diversity of plants in a broad sense (Archaeplastida),...
Microbiomes are vast communities of microorganisms and viruses that populate all natural ecosystems. Viruses have been considered to be the most variable component of microbiomes, as supported by virome surveys and examples of high genomic mosaicism. However, recent evidence suggests that the human gut virome is remarkably stable compared with that...
Green plants (Viridiplantae) include around 450,000–500,000 species1,2 of great diversity and have important roles in terrestrial and aquatic ecosystems. Here, as part of the One Thousand Plant Transcriptomes Initiative, we sequenced the vegetative transcriptomes of 1,124 species that span the diversity of plants in a broad sense (Archaeplastida),...
Microbiomes are vast communities of microbes and viruses that populate all natural ecosystems. Viruses have been considered the most variable component of microbiomes, as supported by virome surveys and examples of high genomic mosaicism. However, recent evidence suggests that the human gut virome is remarkably stable compared to other environments...
Aligning sequences for phylogenetic analysis (multiple sequence alignment; MSA) is an important, but increasingly computationally expensive step with the recent surge in DNA sequence data. Much of this sequence data is publicly available, but can be extremely fragmentary (i.e., a combination of full genomes and genomic fragments), which can compoun...
The diversification of parasite groups often occurs at the same time as the diversification of their hosts. However, most studies demonstrating this concordance only examine single host–parasite groups. Multiple diverse lineages of ectoparasitic lice occur across both birds and mammals. Here, we describe the evolutionary history of lice based on an...
Insects with restricted diets rely on symbiotic bacteria to provide essential metabolites missing in their diet. The blood-sucking lice are obligate, host-specific parasites of mammals and are themselves host to symbiotic bacteria. In human lice, these bacterial symbionts supply the lice with B-vitamins. Here we sequenced the genomes of symbiotic a...
Novel sequencing technologies are rapidly expanding the size of datasets that can be applied to phylogenetic studies. Currently the most commonly used phylogenomic approaches involve some form of genome reduction. While these approaches make assembling phylogenomic datasets more economical for organisms with large genomes, they reduce the genomic c...
Parasitic "wing lice" (Phthiraptera: Columbicola) and their dove and pigeon hosts are a well-recognized model system for coevolutionary studies at the intersection of micro- and macroevolution. Selection on lice in microevolutionary time occurs as pigeons and doves defend themselves against lice by preening. In turn, behavioral and morphological ad...
Background
Given a new biological sequence, detecting membership in a known family is a basic step in many bioinformatics analyses, with applications to protein structure and function prediction and metagenomic taxon identification and abundance profiling, among others. Yet family identification of sequences that are distantly related to sequences...
Outbreaks of zoonotic diseases in humans and livestock are not uncommon, and an important component in containment of such emerging viral diseases is rapid and reliable diagnostics. Such methods are often PCR-based and hence require the availability of sequence data from the pathogen. Rattus norvegicus (R. norvegicus) is a known reservoir for impor...
The standard pipeline for 16S amplicon analysis starts by clustering sequences within a percent sequence similarity threshold (typically 97%) into ‘Operational Taxonomic Units’ (OTUs). From each OTU, a single sequence is selected as a representative. This representative sequence is annotated, and that annotation is applied to all remaining sequence...
Many biological questions rely upon multiple sequence alignments (MSAs) and phylogenetic trees of large datasets. However, accurate MSA estimation is difficult for large datasets, especially when the dataset evolved under high rates of evolution or contains fragmentary sequences.
Many biological questions, including the estimation of deep evolutionary
histories and the detection of remote homology between protein sequences, rely
upon multiple sequence alignments (MSAs) and phylogenetic trees of large
datasets. However, accurate large-scale multiple sequence alignment is very
difficult, especially when the dataset contains f...
Abstract We introduce PASTA, a new multiple sequence alignment algorithm. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improv...
Motivation:
Abundance profiling (also called 'phylogenetic profiling') is a crucial step in understanding the diversity of a metagenomic sample, and one of the basic techniques used for this is taxonomic identification of the metagenomic reads.
Results:
We present taxon identification and phylogenetic profiling (TIPP), a new marker-based taxon i...
Reconstructing the origin and evolution of land plants and their
algal relatives is a fundamental problem in plant phylogenetics, and
is essential for understanding how critical adaptations arose, including
the embryo, vascular tissue, seeds, and flowers. Despite
advances inmolecular systematics, some hypotheses of relationships
remain weakly resol...
Reconstructing the origin and evolution of land plants and their algal relatives is a fundamental problem in plant phylogenetics, and is essential for understanding how critical adaptations arose, including the embryo, vascular tissue, seeds, and flowers. Despite advances in molecular systematics, some hypotheses of relationships remain weakly reso...
The 1,000 plants (1KP) project is an international multi-disciplinary consortium that has generated transcriptome data from over 1,000 plant species, with exemplars for all of the major lineages across the Viridiplantae (green plants) clade. Here, we describe how to access the data used in a phylogenomics analysis of the first 85 species, and how t...
In this paper, we introduce a new and highly scalable algorithm, PASTA, for large-scale multiple sequence alignment estimation. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing...
www.cs.utexas.edu.edu We address the problem of Phylogenetic Placement, in which the objective is to insert short molecular sequences (called query sequences) into an existing phylogenetic tree and alignment on full-length sequences for the same gene. Phylogenetic placement has the potential to provide information beyond pure “species identificatio...
Supertree methods combine trees on subsets of the full taxon set together to produce a tree on the entire set of taxa. Of the many supertree methods, the most popular is MRP (Matrix Representation with Parsimony), a method that operates by first encoding the input set of source trees by a large matrix (the "MRP matrix") over {0,1, ?}, and then runn...
Electronic design automation (EDA) tools have facilitated the design of ever more complex integrated circuits each year. Synthetic biology would also benefit from the development of genetic design automation (GDA) tools. Existing GDA tools require biologists to design genetic circuits at the molecular level, roughly equivalent to designing electron...
This paper presents results on the design and analysis of a robust genetic Muller C-element. The Muller C-element is a standard logic gate commonly used to synchronize independent processes in most asynchronous electronic circuits. Synthetic biological logic gates have been previously demonstrated, but there remain many open issues in the design of...
The power of electronic computation is due in part to the development of modular gate structures that can be coupled to carry out sophisticated logical operations and whose performance can be readily modelled. However, the equivalences between electronic and biochemical operations are far from obvious. In order to help cross between these disciplin...
iBioSim is a tool that supports learning of genetic circuit models, efficient abstraction-based analysis of these models and the design of synthetic genetic circuits. iBioSim includes project management features and a graphical user interface that facilitate the development and maintenance of genetic circuit models as well as both experimental and...
Synthetic biology uses engineering principles to design circuits out of genetic materials that are inserted into bacteria to perform various tasks. While synthetic combinational Boolean logic gates have been constructed, there are many open issues in the design of sequential logic gates. One such gate common in most asynchronous circuits is the Mul...
EDA tools have facilitated the design of ever more complex integrated circuits each year. Synthetic biology would also benefit from the development of genetic design automation (GDA) tools. Existing GDA tools require bi-ologists to design genetic circuits at the molecular level, roughly equivalent to designing electronic circuits at the layout leve...
Electronic Design Automation (EDA) tools have facilitated the design of ever more complex integrated circuits each year. Synthetic biology would also benefit from the development of Genetic Design Automation (GDA) tools. Existing GDA tools require biologists to design genetic circuits at the molecular level, roughly equivalent to designing electron...