Christopher J Neely's research while affiliated with University of Southern California and other places

What is this page?


This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

Publications (7)


Eukaryotic genomes from a global metagenomic data set illuminate trophic modes and biogeography of ocean plankton
  • Article
  • Full-text available

November 2023

·

112 Reads

·

10 Citations

mBio®

mBio®

·

·

·

[...]

·

Metagenomics is a powerful method for interpreting the ecological roles and physiological capabilities of mixed microbial communities. Yet, many tools for processing metagenomic data are neither designed to consider eukaryotes nor are they built for an increasing amount of sequence data. EukHeist is an automated pipeline to retrieve eukaryotic and prokaryotic metagenome-assembled genomes (MAGs) from large-scale metagenomic sequence data sets. We developed the EukHeist workflow to specifically process large amounts of both metagenomic and/or metatranscriptomic sequence data in an automated and reproducible fashion. Here, we applied EukHeist to the large-size fraction data (0.8–2,000 µm) from Tara Oceans to recover both eukaryotic and prokaryotic MAGs, which we refer to as TOPAZ (Tara Oceans Particle-Associated MAGs). The TOPAZ MAGs consisted of >900 environmentally relevant eukaryotic MAGs and >4,000 bacterial and archaeal MAGs. The bacterial and archaeal TOPAZ MAGs expand upon the phylogenetic diversity of likely particle- and host-associated taxa. We use these MAGs to demonstrate an approach to infer the putative trophic mode of the recovered eukaryotic MAGs. We also identify ecological cohorts of co-occurring MAGs, which are driven by specific environmental factors and putative host-microbe associations. These data together add to a number of growing resources of environmentally relevant eukaryotic genomic information. Complementary and expanded databases of MAGs, such as those provided through scalable pipelines like EukHeist, stand to advance our understanding of eukaryotic diversity through increased coverage of genomic representatives across the tree of life. IMPORTANCE Single-celled eukaryotes play ecologically significant roles in the marine environment, yet fundamental questions about their biodiversity, ecological function, and interactions remain. Environmental sequencing enables researchers to document naturally occurring protistan communities, without culturing bias, yet metagenomic and metatranscriptomic sequencing approaches cannot separate individual species from communities. To more completely capture the genomic content of mixed protistan populations, we can create bins of sequences that represent the same organism (metagenome-assembled genomes [MAGs]). We developed the EukHeist pipeline, which automates the binning of population-level eukaryotic and prokaryotic genomes from metagenomic reads. We show exciting insight into what protistan communities are present and their trophic roles in the ocean. Scalable computational tools, like EukHeist, may accelerate the identification of meaningful genetic signatures from large data sets and complement researchers’ efforts to leverage MAG databases for addressing ecological questions, resolving evolutionary relationships, and discovering potentially novel biodiversity.

Download
Share

Estimating the maximal growth rates of eukaryotic microbes from cultures and metagenomes via codon usage patterns

October 2021

·

86 Reads

·

2 Citations

Microbial eukaryotes are ubiquitous in the environment and play important roles in key ecosystem processes, including accounting for a significant portion of global primary production. Yet, our tools for assessing the functional capabilities of eukaryotic microbes in the environment are quite limited because many microbes have yet to be grown in culture. Maximum growth rate is a fundamental parameter of microbial lifestyle that reveals important information about an organism's functional role in a community. We developed and validated a genomic estimator of maximum growth rate for eukaryotic microbes, enabling the assessment of growth potential for both cultivated and yet-to-be-cultivated organisms. We produced a database of over 700 growth predictions from genomes, transcriptomes, and metagenome-assembled genomes, and found that closely related and/or functionally similar organisms tended to have similar maximal growth rates. By comparing the maximal growth rates of existing culture collections with environmentally-derived genomes we found that, unlike for prokaryotes, culture collections of microbial eukaryotes are only minimally biased in terms of growth potential. We then extended our tool to make community-wide estimates of growth potential from over 500 marine metagenomes, mapping growth potential across the global oceans. We found that prokaryotic and eukaryotic communities have highly correlated growth potentials near the ocean surface, but that this relationship disappears deeper in the water column. This suggests that fast growing eukaryotes and prokaryotes thrive under similar conditions at the ocean surface, but that there is a decoupling of these communities as resources become scarce deeper in the water column.


Figure 2: [Continued on next page.]
Figure 3: Reliability of EukMetaSanity using Order-level depleted databases for 34 fungal genomes in 15 Orders. Box plots comparing BUSCO completeness and protein prediction results for the three gene prediction tools and Tier 1 approach against the NCBI reference (n = 34) for the complete OrthoDB-MMETSP database (green) and depleted database lacking the associated Orderlevel set of proteins (orange). From left to right: Panel 1 -BUSCO completeness for each genome. p-value for Wilcoxon ranked sum with Benjamini-Hochberg false discovery correction. Panel 2 -The total number of BUSCO proteins lost/gained for each genome. Panel 3 -The log decrease in BUSCO proteins. Panel 4 -The log increase in BUSCO proteins. Panel 5 -Log size of proteins recovered.
Figure 4: Comparison of EukMetaSanity results to alternative annotation pipelines used for Tara Oceans MAGs. (A) MAGs from Delmont et al. (2021) (n = 682). (B) MAGs from Alexander et al. (2021) (n = 987). From left to right: Panel 1 -BUSCO completeness for each genome. p-value for Wilcoxon ranked sum with Benjamini-Hochberg false discovery correction. Panel 2 -The total number of BUSCO proteins lost/gained for each genome. Panel 3 -The log decrease for genomes that lost BUSCO proteins. Panel 4 -The log increase for genomes that gained BUSCO proteins. Panel 5 -Log size of proteins recovered. 10
The high-throughput gene prediction of more than 1,700 eukaryote genomes using the software package EukMetaSanity

July 2021

·

126 Reads

·

8 Citations

Gene prediction and annotation for eukaryotic genomes is challenging with large data demands and complex computational requirements. For most eukaryotes, genomes are recovered from specific target taxa. However, it is now feasible to reconstruct or sequence hundreds of metagenome-assembled genomes (MAGs) or single-amplified genomes directly from the environment. To meet this forthcoming wave of eukaryotic genome generation, we introduce EukMetaSanity, which combines state-of-the-art tools into three pipelines that have been specifically designed for extensive parallelization on high-performance computing infrastructure. EukMetaSanity performs an automated taxonomy search against a protein database of 1,482 species to identify phylogenetically compatible proteins to be used in downstream gene prediction. We present the results for intron, exon, and gene locus prediction for 112 genomes collected from NCBI, including fungi, plants, and animals, along with 1,669 MAGs and demonstrate that EukMetaSanity can provide reliable preliminary gene predictions for a single target taxon or at scale for hundreds of MAGs. EukMetaSanity is freely available at https://github.com/cjneely10/EukMetaSanity.


Eukaryotic genomes from a global metagenomic dataset illuminate trophic modes and biogeography of ocean plankton

July 2021

·

227 Reads

·

17 Citations

Metagenomics is a powerful method for interpreting the ecological roles and physiological capabilities of mixed microbial communities. Yet, many tools for processing metagenomic data are not designed to consider eukaryotes, nor are they built for an increasing amount of sequence data. EukHeist is an automated pipeline to retrieve eukaryotic and prokaryotic metagenome assembled genomes (MAGs) from large-scale metagenomic datasets. We developed the EukHeist workflow to specifically process large amounts of both metagenomic and/or metatranscriptomic sequence data in an automated and reproducible fashion. Here, we applied EukHeist to the large-size fraction data (0.8-2000 µm ) from Tara Oceans to recover both eukaryotic and prokaryotic MAGs, which we refer to as TOPAZ ( Tara Oceans Particle-Associated MAGs). The TOPAZ MAGs consisted of >900 environmentally-relevant eukaryotic MAGs and >4,000 bacterial and archaeal MAGs. The bacterial and archaeal TOPAZ MAGs expand the known marine phylogenetic diversity through the increase in coverage of likely particle- and host-associated taxa. We also demonstrate an approach to infer the putative functional mode of the recovered eukaryotic MAGs. A global survey of the TOPAZ MAGs enabled the identification of ecological cohorts, driven by specific environmental factors, and putative host-microbe associations. Importance Despite the ecological importance of single-celled eukaryotic organisms in marine environments, the majority are difficult to cultivate in the lab. Sequencing genetic material extracted from environmental samples enables researchers to document naturally-occurring protistan communities. However, conventional sequencing methodologies cannot separate out the genomes of individual organisms. To more completely capture the entire genomic content of mixed protistan community, we can create bins of sequences that represent the same organism. We developed a pipeline that enables scientists to bin individual organisms out of metagenomic reads, and show results that provide exciting insights into what protistan communities are present in the ocean and what roles they play in the ecosystem. Here, a global survey of both eukaryotic and prokaryotic MAGs enabled the identification of ecological cohorts, driven by specific environmental factors, and putative host-microbe associations. Accessible and scalable computational tools, such as EukHeist, are likely to accelerate the identification of meaningful genetic signatures from large datasets, ultimately expanding the eukaryotic tree of life.


MetaSanity: An integrated microbial genome evaluation and annotation pipeline

May 2020

·

50 Reads

·

27 Citations

Bioinformatics

As the importance of microbiome research continues to become more prevalent and essential to understanding a wide variety of ecosystems (e.g., marine, built, host-associated, etc.), there is a need for researchers to be able to perform highly reproducible and quality analysis of microbial genomes. MetaSanity incorporates analyses from eleven existing and widely used genome evaluation and annotation suites into a single, distributable workflow, thereby decreasing the workload of microbiologists by allowing for a flexible, expansive data analysis pipeline. MetaSanity has been designed to provide separate, reproducible workflows, that (1) can determine the overall quality of a microbial genome, while providing a putative phylogenetic assignment, and (2) can assign structural and functional gene annotations with varying degrees of specificity to suit the needs of the researcher. The software suite combines the results from several tools to provide broad insights into overall metabolic function. Importantly, this software provides built-in optimization for "big data" analysis by storing all relevant outputs in an SQL database, allowing users to query all the results for the elements that will most impact their research. Availability: MetaSanity is provided under the GNU General Public License v.3.0 and is available for download at https://github.com/cjneely10/MetaSanity. This application is distributed as a Docker image. MetaSanity is implemented in Python3/Cython and C ++. Instructions for its installation and use are available within the GitHub wiki page at https://github.com/cjneely10/MetaSanity/wiki, and additional instructions are available at https://cjneely10.github.io/year-archive/. MetaSanity is optimized for users with limited programming experience. Supplementary information: Supplementary data are available at Bioinformatics online.


MetaSanity: An integrated, customizable microbial genome evaluation and annotation pipeline

October 2019

·

15 Reads

·

3 Citations

As the importance of microbiome research continues to become more prevalent and essential to understanding a wide variety of ecosystems (e.g., marine, built, host-associated, etc.), there is a need for researchers to be able to perform highly reproducible and quality analysis of microbial genomes. MetaSanity incorporates analyses from eleven existing and widely used genome evaluation and annotation suites into a single, distributable workflow, thereby decreasing the workload of microbiologists by allowing for a flexible, expansive data analysis pipeline. MetaSanity has been designed to provide separate, reproducible workflows, that (1) can determine the overall quality of a microbial genome, while providing a putative phylogenetic assignment, and (2) can assign structural and functional gene annotations with varying degrees of specificity to suit the needs of the researcher. The software suite combines the results from several tools to provide broad insights into overall metabolic function and putative extracellular localization of peptidases and carbohydrate-active enzymes. Importantly, this software provides built-in optimization for “big data” analysis by storing all relevant outputs in an SQL database, allowing users to query all the results for the elements that will most impact their research. Availability and implementation MetaSanity is provided under the GNU General Public License v.3.0 and is available for download at https://github.com/cjneely10/MetaSanity . This application is distributed as a Docker image. MetaSanity is implemented in Python3/Cython and C++. Supplementary information Supplementary data are available below.


Genome Sequence of Hydrogenovibrio sp. Strain SC-1, a Chemolithoautotrophic Sulfur and Iron Oxidizer

February 2018

·

75 Reads

·

3 Citations

Genome Announcements

Hydrogenovibrio sp. strain SC-1 was isolated from pyrrhotite incubatedin situin the marine surface sediment of Catalina Island, CA. Strain SC-1 has demonstrated autotrophic growth through the oxidation of thiosulfate and iron. Here, we present the 2.45-Mb genome sequence of SC-1, which contains 2,262 protein-coding genes.

Citations (7)


... This choice is consistent with recent LLM configuration findings [40], including ESM-2 [49]. To tackle the challenge of data scarcity, we leveraged the Colab-FoldDB database [58], which focuses on metagenomic data sources such as BFD [1], MGnify [59], and specific eukaryotic and viral datasets including SMAG [23], MetaEuk [46], TOPAZ [3], MGV [61], and GPD [13]. We applied a stringent deduplication process with a maximum similarity threshold of 0.3 to preserve the diversity of the protein universe. ...

Reference:

Training Compute-Optimal Protein Language Models
Eukaryotic genomes from a global metagenomic data set illuminate trophic modes and biogeography of ocean plankton
mBio®

mBio®

... The MMETSP database is composed of over 678 protistan transcriptomes from 405 unique strains and is widely used in current marine omic workflows (e.g., Lampe et al., 2018;Kolody et al., 2019;Groussman et al., 2021;Krinos et al., 2022). Oceanographic field expeditions such as Tara Oceans (Carradec et al., 2018) and bioGEOTRACES (Biller et al., 2018) are leveraging this database and other recently generated eukaryotic transcriptomes and genomes to uncover the taxonomic and functional roles of protists across ocean basins (Carradec et al., 2018;Alexander et al., 2021;Weissman et al., 2021;Blaxter et al., 2022;Delmont et al., 2022). These large field datasets describing the biographical and functional distribution of protists consist of sequence data, taxonomic and functional annotations, and contextualizing environmental metadata, and serve as invaluable community resources for researchers, students, and the public to use (Biller et al., 2018;Carradec et al., 2018;Villar et al., 2018). ...

Estimating the maximal growth rates of eukaryotic microbes from cultures and metagenomes via codon usage patterns

... First, we downloaded all genomes affiliated to the Ochrophyta phylum in Genbank, except for genomes larger than 400 Mb and considered only one genome per species (Supplementary Table S4). Among the 58 remaining, 42 which did not have a public annotation were annotated with the EukMetaSanity software (Neely et al. 2021). We then used Busco with the Eukaryotic lineage mode (protein; 255 single copy genes). ...

The high-throughput gene prediction of more than 1,700 eukaryote genomes using the software package EukMetaSanity

... One such example is a tool called Traitar, which utilizes Support Vector Machines (SVMs) to predict lifestyle and pathogenic traits in prokaryotes based on gene family abundance profiles (Weimann et al., 2016). Other recent approaches have used machine learning approaches to train models using eukaryote MAG and transcriptome data to classify trophic mode (autotroph, mixotroph, or heterotroph) based on gene family abundance profiles (Lambert et al., 2022;Alexander et al., 2021). To our knowledge, there are no existing tools that predict the presence/absence of KEGG metabolic modules via machine learning models trained on gene features of high-quality genomes. ...

Eukaryotic genomes from a global metagenomic dataset illuminate trophic modes and biogeography of ocean plankton

... (Parks et al., 2015) lineage-specific workflow (Supplementary File S3). Bin identities of MAGs (> 59% complete, < 10% contamination) were determined using MetaSanity v. 1.2.0 (Neely et al., 2020) PhyloSanity pipeline. MAG genome size and GC content were calculated using the RASTtk server (Aziz et al., 2008;Overbeek et al., 2014;Brettin et al., 2015) accessed on January 2021. ...

MetaSanity: An integrated microbial genome evaluation and annotation pipeline

Bioinformatics

... Tree formatting was performed using the Interactive Tree of Life [42]. Functional orthologies as defined by the KEGG database [43] were annotated using MetaSanity v1.2.0 [44] (Supplementary Methods). Genes related to iron acquisition, storage, and oxidation/reduction were annotated using the FeGenie tool [45]. ...

MetaSanity: An integrated, customizable microbial genome evaluation and annotation pipeline

... These organisms typically use reduced sulphur species as electron donors, with a few species capable of using molecular hydrogen [4,[10][11][12] or ferrous iron [10,11,13,14]; reviewed in [1]. Molecular oxygen is the only electron acceptor supporting their growth, except in Tmr. ...

Genome Sequence of Hydrogenovibrio sp. Strain SC-1, a Chemolithoautotrophic Sulfur and Iron Oxidizer

Genome Announcements