David L Tabb

David L Tabb
Institut Pasteur · Structural Biology and Chemistry

PhD

About

179
Publications
54,480
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
22,142
Citations
Additional affiliations
January 2021 - present
Institut Pasteur
Position
  • Researcher
Description
  • I am assisting the MS Bio Unit in characterizing and refining software pipelines for "bottom-up" proteomics and "top-down" proteomics.
December 2015 - December 2020
Stellenbosch University
Position
  • Professor
Description
  • I served in the South African Tuberculosis Bioinformatics Initiative. I was a co-founder of the Centre for Bioinformatics and Computational Biology at Stellenbosch University.
August 2005 - June 2015
Vanderbilt University
Position
  • Professor (Associate)
Education
September 1996 - June 2003
University of Washington Seattle
Field of study
  • Molecular Biotechnology
August 1992 - May 1996
University of Arkansas
Field of study
  • Biology

Publications

Publications (179)
Preprint
Full-text available
Mass spectrometry is a powerful technique for analyzing molecules in complex biological samples. However, inter- and intra-laboratory variability and bias can affect the data due to various factors, including sample handling and preparation, instrument calibration and performance, and data acquisition and processing. To address this issue, the Qual...
Article
Although Top‐down (TD) proteomics techniques, aimed at the analysis of intact proteins and proteoforms, are becoming increasingly popular, efforts are needed at different levels to generalise their adoption. In this context, there are numerous improvements that are possible in the area of open science practices, including a greater application of t...
Preprint
Although Top-down (TD) proteomics techniques, aimed at the analysis of intact proteins and proteoforms, are becoming increasingly popular, efforts are needed at different levels to generalise its adoption. In this context, there are numerous improvements that are possible in the area of open science including the FAIR (Findability, Accessibility, I...
Article
Full-text available
Background: Microbiome research is providing important new insights into the metabolic interactions of complex microbial ecosystems involved in fields as diverse as the pathogenesis of human diseases, agriculture and climate change. Poor correlations typically observed between RNA and protein expression datasets make it hard to accurately infer mi...
Preprint
Although Top-down (TD) proteomics techniques, aimed at the analysis of intact proteins and proteoforms, are becoming increasingly popular, efforts are needed at different levels to generalise its adoption. In this context, there are numerous improvements that are possible in the area of open science including the FAIR (Findability, Accessibility, I...
Article
Full-text available
Generating top-down tandem mass spectra (MS/MS) from complex mixtures of proteoforms benefits from improvements in fractionation, separation, fragmentation, and mass analysis. The algorithms to match MS/MS to sequences have undergone a parallel evolution, with both spectral alignment and match-counting approaches producing high-quality proteoform-s...
Article
The Human Proteome Organization (HUPO) Proteomics Standards Initiative (PSI) has been successfully developing guidelines, data formats, and controlled vocabularies (CVs) for the proteomics community and other fields supported by mass spectrometry since its inception 20 years ago. Here we describe the general operation of the PSI, including its lead...
Preprint
Full-text available
Generating top-down tandem mass spectra (MS/MS) for complex mixtures of proteoforms has become possible through improvements in fractionation, on-line separation, dissociation, and mass analysis. The algorithms to match MS/MS to sequences have undergone a parallel evolution, with both spectral alignment and peak matching being paired with diverse m...
Preprint
Full-text available
The Human Proteome Organization (HUPO) Proteomics Standards Initiative (PSI) has been successfully developing guidelines, data formats, and controlled vocabularies (CVs) for the proteomics community and other fields supported by mass spectrometry since its inception twenty years ago. Here we describe the general operation of the PSI, including its...
Article
Full-text available
Background Machine learning is used to process big data volumes with complex non-linear relationships between predictive variables and predictions. Research into the usefulness of machine learning in small data volumes remains limited. Aim To compare conventional statistical methods and machine learning to predict angiogram outcomes in a small coh...
Article
Acquisition and homeostasis of essential metals during host colonization by bacterial pathogens rely on metal uptake, trafficking and storage proteins. How these factors have evolved within bacterial pathogens is poorly defined. Urease, a nickel enzyme, is essential for Helicobacter pylori to colonize the acidic stomach. Our previous data suggest t...
Chapter
The introduction of “open search” algorithms has enabled many researchers to discern which biological posttranslational modifications are hiding in their LC-MS/MS data. For chemical biologists, these tools are even more essential, since unanticipated side reactions and redox chemistry can dramatically expand the diversity of mass shifts present in...
Article
Full-text available
Vegetative desiccation tolerance, or the ability to survive the loss of ~95% relative water content (RWC), is rare in angiosperms, with these being commonly called resurrection plants. It is a complex multigenic and multi-factorial trait, with its understanding requiring a comprehensive systems biology approach. The aim of the current study was to...
Article
Full-text available
While proteomics has demonstrated its value for model organisms and for organisms with mature genome sequence annotations, proteomics has been of less value in nonmodel organisms that are unaccompanied by genome sequence annotations. This project sought to determine the value of RNA-Seq experiments as a basis for establishing a set of protein seque...
Article
The incomplete sampling of data in complex polarization measurements from radio telescopes negatively affects both the rotation measure (RM) transfer function and the Faraday depth spectra derived from these data. Such gaps in polarization data are mostly caused by flagging of radio frequency interference and their effects worsen as the percentage...
Preprint
Full-text available
Pathogenic mycobacteria, such as Mycobacterium tuberculosis , modulate the host immune system to evade clearance and promote long-term persistence, resulting in disease progression or latent infection. Understanding the mechanisms pathogenic mycobacteria use to escape elimination by the host immune system is critical to better understanding the mol...
Preprint
Full-text available
The incomplete sampling of data in complex polarization measurements from radio telescopes negatively affects both the rotation measure (RM) transfer function and the Faraday depth spectra derived from these data. Such gaps in polarization data are mostly caused by flagging of radio frequency interference and their effects worsen as the percentage...
Article
Chronic wounds are a serious and debilitating complication of diabetes. A better understanding of the dysregulated healing responses following injury will provide insight into the optimal time frame for therapeutic intervention. In this study a direct comparison was done between the healing dynamics and the proteome of acute and obese diabetic woun...
Article
Full-text available
Background Female genital tract (FGT) inflammation is an important risk factor for HIV acquisition. The FGT microbiome is closely associated with inflammatory profile; however, the relative importance of microbial activities has not been established. Since proteins are key elements representing actual microbial functions, this study utilized metapr...
Article
Full-text available
The integration of mass spectrometry-based proteomics with next-generation DNA and RNA sequencing profiles tumors more comprehensively. Here this “proteogenomics” approach was applied to 122 treatment-naive primary breast cancers accrued to preserve post-translational modifications, including protein phosphorylation and acetylation. Proteogenomics...
Article
Full-text available
Pancreatic cancer accounts for 2.8% of new cancer cases worldwide and is projected to become the second leading cause of cancer-related deaths by 2030. Patients of African ancestry appear to be at an increased risk for pancreatic ductal adenocarcinoma (PDAC), with more severe disease and outcomes. The purpose of this study was to map the proteomic...
Preprint
Full-text available
Pancreatic cancer accounts for 2.8% of new cancer cases worldwide and is projected to become by 2030 the second leading cause of cancer-related deaths. Patients of African ancestry appear to be at an increased risk for pancreatic ductal adenocarcinoma (PDAC), with worse severity and outcomes. The purpose of this study was to map the proteomic and g...
Article
Full-text available
Improved tuberculosis diagnostics and tools for monitoring treatment response are urgently needed. We developed a robust and simple, PCR-based host-blood transcriptomic signature, RISK6, for multiple applications: identifying individuals at risk of incident disease, as a screening test for subclinical or clinical tuberculosis, and for monitoring tu...
Article
The increasing amount of publicly available proteomics data creates opportunities for data scientists to investigate quality metrics in novel ways. QuaMeter IDFree was used to generate quality metrics from 665 RAW files and 97 WIFF files representing publicly available “shotgun” mass spectrometry datasets. These experiments were selected to represe...
Preprint
Background Female genital tract (FGT) inflammation is an important risk factor for HIV acquisition. The FGT microbiome is closely associated with inflammatory profile, however, the relative importance of microbial activities has not been established. Since proteins are key elements representing actual microbial functions, this study utilized metapr...
Article
Full-text available
Background: The prevalence of Parkinson's disease (PD) is increasing in sub-Saharan Africa, but little is known about the genetics of PD in these populations. Due to their unique ancestry and diversity, sub-Saharan African populations have the potential to reveal novel insights into the pathobiology of PD. In this study, we aimed to characterise t...
Article
The application of database search algorithms with very wide precursor mass tolerances for the “Open Search” paradigm has brought new efforts at post-translational modification discovery in shotgun proteomes. This approach has motivated the acceleration of database search tools by incorporating fragment indexing features. In this report, we compare...
Preprint
Metagenome-driven microbiome research is providing important new insights in fields as diverse as the pathogenesis of human disease, the metabolic interactions of complex microbial ecosystems involved in agriculture, and climate change. However, poor correlations typically observed between RNA and protein expression datasets even for single organis...
Article
We performed the first proteogenomic study on a prospectively collected colon cancer cohort. Comparative proteomic and phosphoproteomic analysis of paired tumor and normal adjacent tissues produced a catalog of colon cancer-associated proteins and phosphosites, including known and putative new biomarkers, drug targets, and cancer/testis antigens. P...
Article
Full-text available
There remains a pressing need for biomarkers that can predict who will progress to active tuberculosis (TB) after exposure to Mycobacterium tuberculosis (MTB) bacterium. By analyzing cohorts of household contacts of TB index cases (HHCs) and a stringent non-human primate (NHP) challenge model, we evaluated whether integration of blood transcription...
Article
Full-text available
There remains a pressing need for biomarkers that can predict who will progress to active tuberculosis (TB) after exposure to Mycobacterium tuberculosis (MTB) bacterium. By analyzing cohorts of household contacts of TB index cases (HHCs) and a stringent non-human primate (NHP) challenge model, we evaluated whether integration of blood transcription...
Chapter
plantGlycoMS is a set of tools, implemented in R, which is used to assess and validate glycopeptide spectrum matches (gPSMs). Validity of gPSMs is based on characteristic fragmentation patterns of glycopeptides (gPSMvalidator), adherence of the glycan moiety to the known N-glycan biosynthesis pathway in plants (pGlycoFilter), and elution of the gly...
Article
Full-text available
Biomarkers that predict who among recently Mycobacterium tuberculosis (MTB)-exposed individuals will progress to active tuberculosis are urgently needed. Intracellular microRNAs (miRNAs) regulate the host response to MTB and circulating miRNAs (c-miRNAs) have been developed as biomarkers for other diseases. We performed machine-learning analysis of...
Article
Full-text available
Biomarkers that predict who among recently Mycobacterium tuberculosis (MTB)-exposed individuals will progress to active tuberculosis are urgently needed. Intracellular microRNAs (miRNAs) regulate the host response to MTB and circulating miRNAs (c-miRNAs) have been developed as biomarkers for other diseases. We performed machine-learning analysis of...
Article
Full-text available
The developing world is seeing rapid growth in the availability of biological mass spectrometry (MS), particularly through core facilities. As proteomics and metabolomics becomes locally feasible for investigators in these nations, application areas associated with high burden in these nations, such as infectious disease, will see greatly increased...
Data
DISEASE scores predict PET-CT resolution.
Data
DISEASE and FAILURE signatures were confirmed via qRT-PCR.
Data
ACS COR scores predict treatment failure.
Article
Full-text available
Clinical proteomics requires large-scale analysis of human specimens to achieve statistical significance. In this study, we evaluated the long-term reproducibility of an iTRAQ (isobaric tags for relative and absolute quantification)-based quantitative proteomics strategy using one channel for reference across all samples in different iTRAQ sets. A...
Article
Full-text available
Motivation: Complex microbial communities can be characterized by metagenomics and metaproteomics. However, metagenome assemblies often generate enormous, and yet incomplete, protein databases, which undermines the identification of peptides and proteins in metaproteomics. This challenge calls for increased discrimination of true identifications f...
Article
Full-text available
The Proteomics Standards Initiative (PSI) of the Human Proteome Organization (HUPO) has now been developing and promoting open community standards and software tools in the field of proteomics for 15 years. Under the guidance of the chair, co-chairs, and other leadership positions, the PSI working groups are tasked with the development and maintena...
Article
Mycobacterium tuberculosis consists of a large number of different strains that display unique virulence characteristics. Whole-genome sequencing has revealed substantial genetic diversity among clinical M. tuberculosis isolates, and elucidating the phenotypic variation encoded by this genetic diversity will be of utmost importance to fully underst...
Article
Full-text available
Mass spectrometry is a highly complex analytical technique and mass spectrometry-based proteomics experiments can be subject to a large variability, which forms an obstacle to obtaining accurate and reproducible results. Therefore, a comprehensive and systematic approach to quality control is an essential requirement to inspire confidence in the ge...
Article
Full-text available
Biomarkers for tuberculosis treatment outcome will assist in guiding individualized treatment and evaluation of new therapies. To identify candidate biomarkers, RNA sequencing of whole blood from a well-characterized TB treatment cohort was performed. Application of a validated transcriptional correlate of risk for TB revealed symmetry in host gene...
Article
In order to be confident of the results acquired during biological mass spectrometry experiments, a systematic approach to quality control is of vital importance. Nonetheless, until now only scattered initiatives have been undertaken to this end, and these individual efforts have often not been complementary. To address this issue, the Human Proteo...
Article
Mass spectrometry (MS) is the main technology used in proteomics approaches. However, on average, 75% of spectra analyzed in an MS experiment remain unidentified. We propose to use spectrum clustering at a large scale to shed light on these unidentified spectra. The Proteomics Identifications (PRIDE) Database Archive is one of the largest MS proteo...
Article
Full-text available
Lipid identification from data produced with high-throughput technologies is essential to the elucidation of the roles played by lipids in cellular function and disease. Software tools for identifying lipids from tandem mass (MS/MS) spectra have been developed, but they are often costly or lack the sophistication of their proteomics counterparts. W...
Article
Full-text available
Plant secretory (Class III) peroxidases are redox enzymes that rely on N-glycosylation for full enzyme activity and stability. Peroxidases from palm tree leaves comprise the most stable and active plant peroxidases characterized to date. Herein, site-specific glycosylation and microheterogeneity of windmill palm tree (Trachycarpus fortunei) peroxid...
Article
Full-text available
The NCI Clinical Proteomic Tumor Analysis Consortium (CPTAC) employed a pair of reference xenograft proteomes for initial platform validation and ongoing quality control of its data collection for The Cancer Genome Atlas (TCGA) tumors. These two xenografts, representing basal and luminal-B human breast cancer, were fractionated and analyzed on six...
Article
Full-text available
To facilitate genome-based representation and analysis of proteomics data, we developed a new bioinformatics framework, proBAMsuite, in which a central component is the protein BAM (proBAM) file format for organizing peptide spectrum matches (PSMs) within the context of the genome. proBAMsuite also includes two R packages, proBAMr and proBAMtools,...
Article
Full-text available
Improvements in mass spectrometry (MS)-based peptide sequencing provide a new opportunity to determine whether polymorphisms, mutations and splice variants identified in cancer cells are translated. Herein we apply a proteogenomic data integration tool (QUILTS) to illustrate protein variant discovery using whole genome, whole transcriptome and glob...
Article
Full-text available
Collagen IV is the main structural protein that provides a scaffold for assembly of basement membrane proteins. Posttranslational modifications such as hydroxylation of proline and lysine and glycosylation of lysine are essential for the functioning of collagen IV triple-helical molecules. These modifications are highly abundant posing a difficult...
Article
N-Methyl-d-aspartate receptors (NMDARs) are major targets of both acute and chronic alcohol, as well as regulators of plasticity in a number of brain regions. Aberrant plasticity may contribute to the treatment resistance and high relapse rates observed in alcoholics. Recent work suggests that chronic alcohol treatment preferentially modulates both...
Article
Full-text available
Questions concerning longitudinal data quality and reproducibility of proteomic laboratories spurred the Protein Research Group of the Association of Biomolecular Resource Facilities (ABRF-PRG) to design a study to systematically assess the reproducibility of proteomic laboratories over an extended period of time. Developed as an open study, initia...
Article
Full-text available
Systematic bias in mass measurement adversely affects data quality and negates the advantages of high precision instruments. We introduce the mzRefinery tool for calibration of mass spectrometry data files. Using confident peptide spectrum matches, three different calibration methods are explored and the optimal transform function is chosen. After...
Article
Full-text available
Understanding proteomic differences underlying the different phenotypic classes of colon and rectal carcinoma is important and may eventually lead to a better assessment of clinical behavior of these cancers. We here present a comprehensive description of the proteomic data obtained from 90 colon and rectal carcinomas previously subjected to genomi...
Article
Aiming to improve the understanding of protein regulations in cancer, recent studies from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) have focused on analyzing cancer tissue using proteomic technologies and workflows. Although many proteogenomics approaches for the study of cancer samples have been proposed, serious methodological chal...
Article
Full-text available
Since its introduction in 1994, SEQUEST has gained many important new capabilities, and a host of successor algorithms have built upon its successes. This Account and Perspective maps the evolution of this important tool and charts the relationships among contributions to the SEQUEST legacy. Many of the changes represented improvements in computing...
Article
Full-text available
We report the implementation of high-quality signal processing algorithms into ProteoWizard, an efficient, open-source software package designed for analyzing proteomics tandem mass spectrometry data. Specifically, a new wavelet-based peak-picker (CantWaiT) and a precursor charge determination algorithm (Turbocharger) have been implemented. These a...
Article
Full-text available
Inferring which protein species have been detected in bottom-up proteomics experiments has been a challenging problem for which solutions have been maturing over the past decade. While many inference approaches now function well in isolation, comparing and reconciling the results generated across different tools remains difficult. It presently stan...
Article
Full-text available
Extensive genomic characterization of human cancers presents the problem of inference from genomic abnormalities to cancer phenotypes. To address this problem, we analysed proteomes of colon and rectal tumours characterized previously by The Cancer Genome Atlas (TCGA) and perform integrated proteogenomic analyses. Somatic variants displayed reduced...
Article
Full-text available
After raw data have been captured by mass spectrometers in biological LC-MS/MS experiments, they must be converted from vendor-specific binary files to open-format files for manipulation by most software. This protocol details the use of ProteoWizard software for this conversion, taking format features, coding options, and vendor particularities in...
Conference Paper
Full-text available
The National Institutes of Health (NIH), National Cancer Institute's Early Detection Research Network (EDRN) is a cross-institutional collaborative initiative seeking to accelerate the clinical application of cancer biomarker research. Over the past decade, it has been our role, as EDRN's Informatics Center (IC), to develop a comprehensive informat...
Article
Shotgun proteomics experiments integrate a complex sequence of processes, any of which can introduce variability. Quality metrics computed from LC-MS/MS data have relied upon identifying MS/MS scans, but a new mode for the QuaMeter software produces metrics that are independent of identifications. Rather than evaluating each metric independently, w...
Article
Full-text available
The blood-brain barrier (BBB) dynamically controls exchange between the brain and the body, but this interaction cannot be studied directly in the intact human brain or sufficiently represented by animal models. Most existing in vitro BBB models do not include neurons and glia with other BBB elements and do not adequately predict drug efficacy and...
Article
Full-text available
Both DNA and chromatin need to be duplicated during each cell division cycle. Replication happens in the context of defects in the DNA template and other forms of replication stress that present challenges to both genetic and epigenetic inheritance. The replication machinery is highly regulated by replication stress responses to accomplish this goa...
Article
Differentiating and quantifying protein differences in complex samples produces significant challenges in sensitivity and specificity. Label-free quantification can draw from two different information sources: precursor intensities and spectral counts. Intensities are accurate for calculating protein relative abundance, but values are often missing...
Article
Full-text available
Frequently, proteomic LC-MS/MS data may contain sets of modifications that evade identification during standard database search. For many laboratories, the standard technique to seek posttranslational modifications (PTMs) adds a short list of specified mass shifts to database search configuration. This technique provides information for only the sp...
Article
Full-text available
In shotgun proteomics, database search algorithms rely on fragmentation models to predict fragment ions that should be observed for a given peptide sequence. The most widely used strategy (Naive model) is oversimplified, cleaving all peptide bonds with equal probability to produce fragments of all charges below that of the precursor ion. More accur...
Article
Peptide sequence matching algorithms used for peptide identification by tandem mass spectrometry (MS/MS) enumerate theoretical peptides from the database, predict their fragment ions, and match them to the experimental MS/MS spectra. Here, we present an approach for scoring MS/MS identifications based on the high mass accuracy matching of precursor...
Article
Proteomics has emerged from the labs of technologists to enter widespread application in clinical contexts. This transition, however, has been hindered by overstated early claims of accuracy, concerns about reproducibility, and the challenges of handling batch effects properly. New efforts have produced sets of performance metrics and measurements...
Article
Full-text available
Mass-spectrometry-based proteomics has become an important component of biological research. Numerous proteomics methods have been developed to identify and quantify the proteins in biological and clinical samples1, identify pathways affected by endogenous and exogenous perturbations2, and characterize protein complexes3. Despite successes, the int...
Article
To complement the recent genomic sequencing of Chinese hamster ovary (CHO) cells, proteomic analysis was performed on CHO cells including the cellular proteome, secretome, and glycoproteome using tandem mass spectrometry (MS/MS) of multiple fractions obtained from gel electrophoresis, multidimensional liquid chromatography, and solid phase extracti...
Article
The domestication of animals, plants, and microbes funda-mentally transformed the lifestyle and demography of the human species [1]. Although the genetic and functional underpinnings of animal and plant domestication are well understood, little is known about microbe domestication [2–6]. Here, we systematically examined genome-wide se-quence and fu...
Article
2-Amino-1-methyl-6-phenylimidazo[4,5-b]pyridine (PhIP) is a heterocyclic aromatic amine that is formed during the cooking of meats. PhIP is a potential human carcinogen: it undergoes metabolic activation to form electrophilic metabolites that bind to DNA and proteins, including serum albumin (SA). The structures of PhIP-SA adducts formed in vivo ar...

Network

Cited By