![David L Tabb](https://i1.rgstatic.net/ii/profile.image/273701769838594-1442266995198_Q128/David-Tabb.jpg)
David L TabbInstitut Pasteur · Structural Biology and Chemistry
David L Tabb
PhD
About
179
Publications
54,480
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
22,142
Citations
Introduction
Additional affiliations
August 2005 - June 2015
Education
September 1996 - June 2003
August 1992 - May 1996
Publications
Publications (179)
Mass spectrometry is a powerful technique for analyzing molecules in complex biological samples. However, inter- and intra-laboratory variability and bias can affect the data due to various factors, including sample handling and preparation, instrument calibration and performance, and data acquisition and processing. To address this issue, the Qual...
Although Top‐down (TD) proteomics techniques, aimed at the analysis of intact proteins and proteoforms, are becoming increasingly popular, efforts are needed at different levels to generalise their adoption. In this context, there are numerous improvements that are possible in the area of open science practices, including a greater application of t...
Although Top-down (TD) proteomics techniques, aimed at the analysis of intact proteins and proteoforms, are becoming increasingly popular, efforts are needed at different levels to generalise its adoption. In this context, there are numerous improvements that are possible in the area of open science including the FAIR (Findability, Accessibility, I...
Background:
Microbiome research is providing important new insights into the metabolic interactions of complex microbial ecosystems involved in fields as diverse as the pathogenesis of human diseases, agriculture and climate change. Poor correlations typically observed between RNA and protein expression datasets make it hard to accurately infer mi...
Although Top-down (TD) proteomics techniques, aimed at the analysis of intact proteins and proteoforms, are becoming increasingly popular, efforts are needed at different levels to generalise its adoption. In this context, there are numerous improvements that are possible in the area of open science including the FAIR (Findability, Accessibility, I...
Generating top-down tandem mass spectra (MS/MS) from complex mixtures of proteoforms benefits from improvements in fractionation, separation, fragmentation, and mass analysis. The algorithms to match MS/MS to sequences have undergone a parallel evolution, with both spectral alignment and match-counting approaches producing high-quality proteoform-s...
The Human Proteome Organization (HUPO) Proteomics Standards Initiative (PSI) has been successfully developing guidelines, data formats, and controlled vocabularies (CVs) for the proteomics community and other fields supported by mass spectrometry since its inception 20 years ago. Here we describe the general operation of the PSI, including its lead...
Generating top-down tandem mass spectra (MS/MS) for complex mixtures of proteoforms has become possible through improvements in fractionation, on-line separation, dissociation, and mass analysis. The algorithms to match MS/MS to sequences have undergone a parallel evolution, with both spectral alignment and peak matching being paired with diverse m...
The Human Proteome Organization (HUPO) Proteomics Standards Initiative (PSI) has been successfully developing guidelines, data formats, and controlled vocabularies (CVs) for the proteomics community and other fields supported by mass spectrometry since its inception twenty years ago. Here we describe the general operation of the PSI, including its...
Background
Machine learning is used to process big data volumes with complex non-linear relationships between predictive variables and predictions. Research into the usefulness of machine learning in small data volumes remains limited.
Aim
To compare conventional statistical methods and machine learning to predict angiogram outcomes in a small coh...
Acquisition and homeostasis of essential metals during host colonization by bacterial pathogens rely on metal uptake, trafficking and storage proteins. How these factors have evolved within bacterial pathogens is poorly defined. Urease, a nickel enzyme, is essential for Helicobacter pylori to colonize the acidic stomach. Our previous data suggest t...
The introduction of “open search” algorithms has enabled many researchers to discern which biological posttranslational modifications are hiding in their LC-MS/MS data. For chemical biologists, these tools are even more essential, since unanticipated side reactions and redox chemistry can dramatically expand the diversity of mass shifts present in...
Vegetative desiccation tolerance, or the ability to survive the loss of ~95% relative water content (RWC), is rare in angiosperms, with these being commonly called resurrection plants. It is a complex multigenic and multi-factorial trait, with its understanding requiring a comprehensive systems biology approach. The aim of the current study was to...
While proteomics has demonstrated its value for model organisms and for organisms with mature genome sequence annotations, proteomics has been of less value in nonmodel organisms that are unaccompanied by genome sequence annotations. This project sought to determine the value of RNA-Seq experiments as a basis for establishing a set of protein seque...
The incomplete sampling of data in complex polarization measurements from radio telescopes negatively affects both the rotation measure (RM) transfer function and the Faraday depth spectra derived from these data. Such gaps in polarization data are mostly caused by flagging of radio frequency interference and their effects worsen as the percentage...
Pathogenic mycobacteria, such as Mycobacterium tuberculosis , modulate the host immune system to evade clearance and promote long-term persistence, resulting in disease progression or latent infection. Understanding the mechanisms pathogenic mycobacteria use to escape elimination by the host immune system is critical to better understanding the mol...
The incomplete sampling of data in complex polarization measurements from radio telescopes negatively affects both the rotation measure (RM) transfer function and the Faraday depth spectra derived from these data. Such gaps in polarization data are mostly caused by flagging of radio frequency interference and their effects worsen as the percentage...
Chronic wounds are a serious and debilitating complication of diabetes. A better understanding of the
dysregulated healing responses following injury will provide insight into the optimal time frame for
therapeutic intervention. In this study a direct comparison was done between the healing dynamics and
the proteome of acute and obese diabetic woun...
Background
Female genital tract (FGT) inflammation is an important risk factor for HIV acquisition. The FGT microbiome is closely associated with inflammatory profile; however, the relative importance of microbial activities has not been established. Since proteins are key elements representing actual microbial functions, this study utilized metapr...
The integration of mass spectrometry-based proteomics with next-generation DNA and RNA sequencing profiles tumors more comprehensively. Here this “proteogenomics” approach was applied to 122 treatment-naive primary breast cancers accrued to preserve post-translational modifications, including protein phosphorylation and acetylation. Proteogenomics...
Pancreatic cancer accounts for 2.8% of new cancer cases worldwide and is projected to become the second leading cause of cancer-related deaths by 2030. Patients of African ancestry appear to be at an increased risk for pancreatic ductal adenocarcinoma (PDAC), with more severe disease and outcomes. The purpose of this study was to map the proteomic...
Pancreatic cancer accounts for 2.8% of new cancer cases worldwide and is projected to become by 2030 the second leading cause of cancer-related deaths. Patients of African ancestry appear to be at an increased risk for pancreatic ductal adenocarcinoma (PDAC), with worse severity and outcomes. The purpose of this study was to map the proteomic and g...
Improved tuberculosis diagnostics and tools for monitoring treatment response are urgently needed. We developed a robust and simple, PCR-based host-blood transcriptomic signature, RISK6, for multiple applications: identifying individuals at risk of incident disease, as a screening test for subclinical or clinical tuberculosis, and for monitoring tu...
The increasing amount of publicly available proteomics data creates opportunities for data scientists to investigate quality metrics in novel ways. QuaMeter IDFree was used to generate quality metrics from 665 RAW files and 97 WIFF files representing publicly available “shotgun” mass spectrometry datasets. These experiments were selected to represe...
Background
Female genital tract (FGT) inflammation is an important risk factor for HIV acquisition. The FGT microbiome is closely associated with inflammatory profile, however, the relative importance of microbial activities has not been established. Since proteins are key elements representing actual microbial functions, this study utilized metapr...
Background:
The prevalence of Parkinson's disease (PD) is increasing in sub-Saharan Africa, but little is known about the genetics of PD in these populations. Due to their unique ancestry and diversity, sub-Saharan African populations have the potential to reveal novel insights into the pathobiology of PD. In this study, we aimed to characterise t...
The application of database search algorithms with very wide precursor mass tolerances for the “Open Search” paradigm has brought new efforts at post-translational modification discovery in shotgun proteomes. This approach has motivated the acceleration of database search tools by incorporating fragment indexing features. In this report, we compare...
Metagenome-driven microbiome research is providing important new insights in fields as diverse as the pathogenesis of human disease, the metabolic interactions of complex microbial ecosystems involved in agriculture, and climate change. However, poor correlations typically observed between RNA and protein expression datasets even for single organis...
We performed the first proteogenomic study on a prospectively collected colon cancer cohort. Comparative proteomic and phosphoproteomic analysis of paired tumor and normal adjacent tissues produced a catalog of colon cancer-associated proteins and phosphosites, including known and putative new biomarkers, drug targets, and cancer/testis antigens. P...
There remains a pressing need for biomarkers that can predict who will progress to active tuberculosis (TB) after exposure to Mycobacterium tuberculosis (MTB) bacterium. By analyzing cohorts of household contacts of TB index cases (HHCs) and a stringent non-human primate (NHP) challenge model, we evaluated whether integration of blood transcription...
There remains a pressing need for biomarkers that can predict who will progress to active tuberculosis (TB) after exposure to Mycobacterium tuberculosis (MTB) bacterium. By analyzing cohorts of household contacts of TB index cases (HHCs) and a stringent non-human primate (NHP) challenge model, we evaluated whether integration of blood transcription...
plantGlycoMS is a set of tools, implemented in R, which is used to assess and validate glycopeptide spectrum matches (gPSMs). Validity of gPSMs is based on characteristic fragmentation patterns of glycopeptides (gPSMvalidator), adherence of the glycan moiety to the known N-glycan biosynthesis pathway in plants (pGlycoFilter), and elution of the gly...
Biomarkers that predict who among recently Mycobacterium tuberculosis (MTB)-exposed individuals will progress to active tuberculosis are urgently needed. Intracellular microRNAs (miRNAs) regulate the host response to MTB and circulating miRNAs (c-miRNAs) have been developed as biomarkers for other diseases. We performed machine-learning analysis of...
Biomarkers that predict who among recently Mycobacterium tuberculosis (MTB)-exposed individuals will progress to active tuberculosis are urgently needed. Intracellular microRNAs (miRNAs) regulate the host response to MTB and circulating miRNAs (c-miRNAs) have been developed as biomarkers for other diseases. We performed machine-learning analysis of...
The developing world is seeing rapid growth in the availability of biological mass spectrometry (MS), particularly through core facilities. As proteomics and metabolomics becomes locally feasible for investigators in these nations, application areas associated with high burden in these nations, such as infectious disease, will see greatly increased...
DISEASE scores predict PET-CT resolution.
DISEASE and FAILURE signatures were confirmed via qRT-PCR.
ACS COR scores predict treatment failure.
Clinical proteomics requires large-scale analysis of human specimens to achieve statistical significance. In this study, we evaluated the long-term reproducibility of an iTRAQ (isobaric tags for relative and absolute quantification)-based quantitative proteomics strategy using one channel for reference across all samples in different iTRAQ sets. A...
Motivation:
Complex microbial communities can be characterized by metagenomics and metaproteomics. However, metagenome assemblies often generate enormous, and yet incomplete, protein databases, which undermines the identification of peptides and proteins in metaproteomics. This challenge calls for increased discrimination of true identifications f...
The Proteomics Standards Initiative (PSI) of the Human Proteome Organization (HUPO) has now been developing and promoting open community standards and software tools in the field of proteomics for 15 years. Under the guidance of the chair, co-chairs, and other leadership positions, the PSI working groups are tasked with the development and maintena...
Mycobacterium tuberculosis consists of a large number of different strains that display unique virulence characteristics. Whole-genome sequencing has revealed substantial genetic diversity among clinical M. tuberculosis isolates, and elucidating the phenotypic variation encoded by this genetic diversity will be of utmost importance to fully underst...
Mass spectrometry is a highly complex analytical technique and mass spectrometry-based proteomics experiments can be subject to a large variability, which forms an obstacle to obtaining accurate and reproducible results. Therefore, a comprehensive and systematic approach to quality control is an essential requirement to inspire confidence in the ge...
Biomarkers for tuberculosis treatment outcome will assist in guiding individualized treatment and evaluation of new therapies. To identify candidate biomarkers, RNA sequencing of whole blood from a well-characterized TB treatment cohort was performed. Application of a validated transcriptional correlate of risk for TB revealed symmetry in host gene...
In order to be confident of the results acquired during biological mass spectrometry experiments, a systematic approach to quality control is of vital importance. Nonetheless, until now only scattered initiatives have been undertaken to this end, and these individual efforts have often not been complementary. To address this issue, the Human Proteo...
Mass spectrometry (MS) is the main technology used in proteomics approaches. However, on average, 75% of spectra analyzed in an MS experiment remain unidentified. We propose to use spectrum clustering at a large scale to shed light on these unidentified spectra. The Proteomics Identifications (PRIDE) Database Archive is one of the largest MS proteo...
Lipid identification from data produced with high-throughput technologies is essential to the elucidation of the roles played by lipids in cellular function and disease. Software tools for identifying lipids from tandem mass (MS/MS) spectra have been developed, but they are often costly or lack the sophistication of their proteomics counterparts. W...
Plant secretory (Class III) peroxidases are redox enzymes that rely on N-glycosylation for full enzyme activity and stability. Peroxidases from palm tree leaves comprise the most stable and active plant peroxidases characterized to date. Herein, site-specific glycosylation and microheterogeneity of windmill palm tree (Trachycarpus fortunei) peroxid...
The NCI Clinical Proteomic Tumor Analysis Consortium (CPTAC) employed a pair of reference xenograft proteomes for initial platform validation and ongoing quality control of its data collection for The Cancer Genome Atlas (TCGA) tumors. These two xenografts, representing basal and luminal-B human breast cancer, were fractionated and analyzed on six...
To facilitate genome-based representation and analysis of proteomics data, we developed a new bioinformatics framework, proBAMsuite, in which a central component is the protein BAM (proBAM) file format for organizing peptide spectrum matches (PSMs) within the context of the genome. proBAMsuite also includes two R packages, proBAMr and proBAMtools,...
Improvements in mass spectrometry (MS)-based peptide sequencing provide a new opportunity to determine whether polymorphisms, mutations and splice variants identified in cancer cells are translated. Herein we apply a proteogenomic data integration tool (QUILTS) to illustrate protein variant discovery using whole genome, whole transcriptome and glob...
Collagen IV is the main structural protein that provides a scaffold for assembly of basement membrane proteins. Posttranslational modifications such as hydroxylation of proline and lysine and glycosylation of lysine are essential for the functioning of collagen IV triple-helical molecules. These modifications are highly abundant posing a difficult...
N-Methyl-d-aspartate receptors (NMDARs) are major targets of both acute and chronic alcohol, as well as regulators of plasticity in a number of brain regions. Aberrant plasticity may contribute to the treatment resistance and high relapse rates observed in alcoholics. Recent work suggests that chronic alcohol treatment preferentially modulates both...
Questions concerning longitudinal data quality and reproducibility of proteomic laboratories spurred the Protein Research Group of the Association of Biomolecular Resource Facilities (ABRF-PRG) to design a study to systematically assess the reproducibility of proteomic laboratories over an extended period of time. Developed as an open study, initia...
Systematic bias in mass measurement adversely affects data quality and negates the advantages of high precision instruments.
We introduce the mzRefinery tool for calibration of mass spectrometry data files. Using confident peptide spectrum matches, three different calibration methods are explored and the optimal transform function is chosen. After...
Understanding proteomic differences underlying the different phenotypic classes of colon and rectal carcinoma is important and may eventually lead to a better assessment of clinical behavior of these cancers. We here present a comprehensive description of the proteomic data obtained from 90 colon and rectal carcinomas previously subjected to genomi...
Aiming to improve the understanding of protein regulations in cancer, recent studies from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) have focused on analyzing cancer tissue using proteomic technologies and workflows. Although many proteogenomics approaches for the study of cancer samples have been proposed, serious methodological chal...
Since its introduction in 1994, SEQUEST has gained many important new capabilities, and a host of successor algorithms have built upon its successes. This Account and Perspective maps the evolution of this important tool and charts the relationships among contributions to the SEQUEST legacy. Many of the changes represented improvements in computing...
We report the implementation of high-quality signal processing algorithms into ProteoWizard, an efficient, open-source software package designed for analyzing proteomics tandem mass spectrometry data. Specifically, a new wavelet-based peak-picker (CantWaiT) and a precursor charge determination algorithm (Turbocharger) have been implemented. These a...
Inferring which protein species have been detected in bottom-up proteomics experiments has been a challenging problem for which solutions have been maturing over the past decade. While many inference approaches now function well in isolation, comparing and reconciling the results generated across different tools remains difficult. It presently stan...
Extensive genomic characterization of human cancers presents the problem of inference from genomic abnormalities to cancer phenotypes. To address this problem, we analysed proteomes of colon and rectal tumours characterized previously by The Cancer Genome Atlas (TCGA) and perform integrated proteogenomic analyses. Somatic variants displayed reduced...
After raw data have been captured by mass spectrometers in biological LC-MS/MS experiments, they must be converted from vendor-specific binary files to open-format files for manipulation by most software. This protocol details the use of ProteoWizard software for this conversion, taking format features, coding options, and vendor particularities in...
The National Institutes of Health (NIH), National Cancer Institute's Early Detection Research Network (EDRN) is a cross-institutional collaborative initiative seeking to accelerate the clinical application of cancer biomarker research. Over the past decade, it has been our role, as EDRN's Informatics Center (IC), to develop a comprehensive informat...
Shotgun proteomics experiments integrate a complex sequence of processes, any of which can introduce variability. Quality metrics computed from LC-MS/MS data have relied upon identifying MS/MS scans, but a new mode for the QuaMeter software produces metrics that are independent of identifications. Rather than evaluating each metric independently, w...
The blood-brain barrier (BBB) dynamically controls exchange between the brain and the body, but this interaction cannot be studied directly in the intact human brain or sufficiently represented by animal models. Most existing in vitro BBB models do not include neurons and glia with other BBB elements and do not adequately predict drug efficacy and...
Both DNA and chromatin need to be duplicated during each cell division cycle. Replication happens in the context of defects in the DNA template and other forms of replication stress that present challenges to both genetic and epigenetic inheritance. The replication machinery is highly regulated by replication stress responses to accomplish this goa...
Differentiating and quantifying protein differences in complex samples produces significant challenges in sensitivity and specificity. Label-free quantification can draw from two different information sources: precursor intensities and spectral counts. Intensities are accurate for calculating protein relative abundance, but values are often missing...
Frequently, proteomic LC-MS/MS data may contain sets of modifications that evade identification during standard database search. For many laboratories, the standard technique to seek posttranslational modifications (PTMs) adds a short list of specified mass shifts to database search configuration. This technique provides information for only the sp...
In shotgun proteomics, database search algorithms rely on fragmentation models to predict fragment ions that should be observed for a given peptide sequence. The most widely used strategy (Naive model) is oversimplified, cleaving all peptide bonds with equal probability to produce fragments of all charges below that of the precursor ion. More accur...
Peptide sequence matching algorithms used for peptide identification by tandem mass spectrometry (MS/MS) enumerate theoretical peptides from the database, predict their fragment ions, and match them to the experimental MS/MS spectra. Here, we present an approach for scoring MS/MS identifications based on the high mass accuracy matching of precursor...
Proteomics has emerged from the labs of technologists to enter widespread application in clinical contexts. This transition, however, has been hindered by overstated early claims of accuracy, concerns about reproducibility, and the challenges of handling batch effects properly. New efforts have produced sets of performance metrics and measurements...
Mass-spectrometry-based proteomics has become an important component of biological research. Numerous proteomics methods have been developed to identify and quantify the proteins in biological and clinical samples1, identify pathways affected by endogenous and exogenous perturbations2, and characterize protein complexes3. Despite successes, the int...
To complement the recent genomic sequencing of Chinese hamster ovary (CHO) cells, proteomic analysis was performed on CHO cells including the cellular proteome, secretome, and glycoproteome using tandem mass spectrometry (MS/MS) of multiple fractions obtained from gel electrophoresis, multidimensional liquid chromatography, and solid phase extracti...
The domestication of animals, plants, and microbes funda-mentally transformed the lifestyle and demography of the human species [1]. Although the genetic and functional underpinnings of animal and plant domestication are well understood, little is known about microbe domestication [2–6]. Here, we systematically examined genome-wide se-quence and fu...
2-Amino-1-methyl-6-phenylimidazo[4,5-b]pyridine (PhIP) is a heterocyclic aromatic amine that is formed during the cooking of meats. PhIP is a potential human carcinogen: it undergoes metabolic activation to form electrophilic metabolites that bind to DNA and proteins, including serum albumin (SA). The structures of PhIP-SA adducts formed in vivo ar...