Figure 2 - uploaded by Akira Cortal
Content may be subject to copyright.
Cell-ID cell type prediction of human cord blood mononuclear cells using pre-established marker lists. (A) UMAP representation of 8005 cord blood mononuclear cells profiled by CITE-seq 3 . Dots representing cells are colored according to Cell-ID cell type predictions using pre-established immune cell signatures, as indicated by the labels in the figure. (B) Performance measured through the F1 score achieved by Cell-ID, AUCell and SCINA cell type predictions for each of the blood cell types reported in the original publication 3 . Boxplots summarize the F1 scores for each method. The numbers above boxplots denote the global performance (macro F1 score, upper digits) and its standard deviation (lower digits), where the maximum and minimum values across methods are colored in black and grey, respectively. (C) Zoomed UMAP representation on Erythrocytes and CD34+ cells showing that the Cell-ID multi-class cell assignments capture transient cell states consistent with the cell-type hierarchy associated with immature hematopoietic stem cell differentiation. Cells are color-coded according to the -log10 enrichment p-value obtained by Cell-ID in tests of the association of their gene signature with the celltype signatures associated to their pre-cursor cell types: HSC, MPP, CMP, GMP, MEP and erythrocytes . The color scale for cells extends from white (p value=1) to dark red (p value = 1e-10), with p-values<1e-10 fixed at this value). D) Heatmap representing, for each individual cell (displayed in columns), the -log10 transformed p-value

Cell-ID cell type prediction of human cord blood mononuclear cells using pre-established marker lists. (A) UMAP representation of 8005 cord blood mononuclear cells profiled by CITE-seq 3 . Dots representing cells are colored according to Cell-ID cell type predictions using pre-established immune cell signatures, as indicated by the labels in the figure. (B) Performance measured through the F1 score achieved by Cell-ID, AUCell and SCINA cell type predictions for each of the blood cell types reported in the original publication 3 . Boxplots summarize the F1 scores for each method. The numbers above boxplots denote the global performance (macro F1 score, upper digits) and its standard deviation (lower digits), where the maximum and minimum values across methods are colored in black and grey, respectively. (C) Zoomed UMAP representation on Erythrocytes and CD34+ cells showing that the Cell-ID multi-class cell assignments capture transient cell states consistent with the cell-type hierarchy associated with immature hematopoietic stem cell differentiation. Cells are color-coded according to the -log10 enrichment p-value obtained by Cell-ID in tests of the association of their gene signature with the celltype signatures associated to their pre-cursor cell types: HSC, MPP, CMP, GMP, MEP and erythrocytes . The color scale for cells extends from white (p value=1) to dark red (p value = 1e-10), with p-values<1e-10 fixed at this value). D) Heatmap representing, for each individual cell (displayed in columns), the -log10 transformed p-value

Source publication
Preprint
Full-text available
The exhaustive exploration of human cell heterogeneity requires the unbiased identification of molecular signatures that can serve as unique cell identity cards for every cell in the body. However, the stochasticity associated with high-throughput single-cell RNA sequencing has made it necessary to use clustering-based computational approaches in w...

Contexts in source publication

Context 1
... this end, we used two independent sets of human blood mononuclear cells for which individual cells were confidently annotated with an actual cell type through concomitant measurements of single-cell protein marker levels: (i) cord blood mononuclear cells (CBMCs) profiled with a CITE-seq protocol 14 ; and (ii) peripheral blood mononuclear cells (PBMCs) profiled with a REAP-seq protocol 15 . Cell-ID per-cell gene signatures were significantly enriched in the lists of markers associated with the corresponding cell type (Figure 2). This enrichment made it possible to recognize cell types with high rates of precision (87% and 90%) and recall (84% and 73%) for both datasets (multinomial p-value < 10 -16 for all figures), outperforming reference methods for cell-type classification on the basis of marker lists (AUCELL 11 and SCINA 12 ; Fig. 2B, Supplementary Fig. 3-4). ...
Context 2
... the corresponding cell type (Figure 2). This enrichment made it possible to recognize cell types with high rates of precision (87% and 90%) and recall (84% and 73%) for both datasets (multinomial p-value < 10 -16 for all figures), outperforming reference methods for cell-type classification on the basis of marker lists (AUCELL 11 and SCINA 12 ; Fig. 2B, Supplementary Fig. 3-4). In more challenging scenarios, Cell-ID was capable of non-disjoint multi-class cell-type assignments capturing smooth transitions between hematopoietic differentiation states from the most immature hematopoietic stem cell (HSC) to downstream myeloid (CMP/GMP) and erythroid progenitors (MEP) (Fig. 2, C-D). It was consistently able to ...
Context 3
... 11 and SCINA 12 ; Fig. 2B, Supplementary Fig. 3-4). In more challenging scenarios, Cell-ID was capable of non-disjoint multi-class cell-type assignments capturing smooth transitions between hematopoietic differentiation states from the most immature hematopoietic stem cell (HSC) to downstream myeloid (CMP/GMP) and erythroid progenitors (MEP) (Fig. 2, C-D). It was consistently able to identify singleton cells, i.e. rare cell types represented by only one cell within a dataset (Supplementary Note 2). The capacity of Cell-ID to recover well-established cell types at single-cell resolution supports its use for automated cell-type annotation, even for extremely rare cells, without the need ...
Context 4
... then determined whether the per-cell gene rankings obtained with MCA were consistent with the gene expression values for neighboring cells in the MCA space. As expected, genes specific to a given cell had higher log-fold changes in expression in the 5% of cells closest to target cell (n = 50) than in the other cells (Supplementary Fig. 2A). We then investigated how the ranking of genes with zero-counts in a cell related to the specificity of the genes concerned in neighboring cells (Supplementary Fig. 2A). ...
Context 5
... expected, genes specific to a given cell had higher log-fold changes in expression in the 5% of cells closest to target cell (n = 50) than in the other cells (Supplementary Fig. 2A). We then investigated how the ranking of genes with zero-counts in a cell related to the specificity of the genes concerned in neighboring cells (Supplementary Fig. 2A). We found that genes not detected in a given cell were nevertheless attributed a high ranking in this cell if the surrounding cells displayed high levels of expression for these genes (Supplementary Fig. 2A). ...
Context 6
... then investigated how the ranking of genes with zero-counts in a cell related to the specificity of the genes concerned in neighboring cells (Supplementary Fig. 2A). We found that genes not detected in a given cell were nevertheless attributed a high ranking in this cell if the surrounding cells displayed high levels of expression for these genes (Supplementary Fig. 2A). This is an important result, highlighting the capacity of multivariate approaches to consider a gene to be specific to a cell in which it was not detected, provided that the gene concerned is specific to very similar cells. ...
Context 7
... MCA approach is thus robust to zero-count values that probably correspond to technical dropouts. These results could be generalized to all individual cells in a given dataset: Spearman's rank correlation coefficient between rank and log-fold change = 0.72, p-value <10e-16 ( Supplementary Fig. 2B). By contrast, the correlation was weaker for per-cell gene rankings Fig. 2, E-F, respectively). ...
Context 8
... robust to zero-count values that probably correspond to technical dropouts. These results could be generalized to all individual cells in a given dataset: Spearman's rank correlation coefficient between rank and log-fold change = 0.72, p-value <10e-16 ( Supplementary Fig. 2B). By contrast, the correlation was weaker for per-cell gene rankings Fig. 2, E-F, ...
Context 9
... this end, we used two independent sets of human blood mononuclear cells for which individual cells were confidently annotated with an actual cell type through concomitant measurements of single-cell protein marker levels: (i) cord blood mononuclear cells (CBMCs) profiled with a CITE-seq protocol 14 ; and (ii) peripheral blood mononuclear cells (PBMCs) profiled with a REAP-seq protocol 15 . Cell-ID per-cell gene signatures were significantly enriched in the lists of markers associated with the corresponding cell type (Figure 2). This enrichment made it possible to recognize cell types with high rates of precision (87% and 90%) and recall (84% and 73%) for both datasets (multinomial p-value < 10 -16 for all figures), outperforming reference methods for cell-type classification on the basis of marker lists (AUCELL 11 and SCINA 12 ; Fig. 2B, Supplementary Fig. 3-4). ...
Context 10
... the corresponding cell type (Figure 2). This enrichment made it possible to recognize cell types with high rates of precision (87% and 90%) and recall (84% and 73%) for both datasets (multinomial p-value < 10 -16 for all figures), outperforming reference methods for cell-type classification on the basis of marker lists (AUCELL 11 and SCINA 12 ; Fig. 2B, Supplementary Fig. 3-4). In more challenging scenarios, Cell-ID was capable of non-disjoint multi-class cell-type assignments capturing smooth transitions between hematopoietic differentiation states from the most immature hematopoietic stem cell (HSC) to downstream myeloid (CMP/GMP) and erythroid progenitors (MEP) (Fig. 2, C-D). It was consistently able to ...
Context 11
... 11 and SCINA 12 ; Fig. 2B, Supplementary Fig. 3-4). In more challenging scenarios, Cell-ID was capable of non-disjoint multi-class cell-type assignments capturing smooth transitions between hematopoietic differentiation states from the most immature hematopoietic stem cell (HSC) to downstream myeloid (CMP/GMP) and erythroid progenitors (MEP) (Fig. 2, C-D). It was consistently able to identify singleton cells, i.e. rare cell types represented by only one cell within a dataset (Supplementary Note 2). The capacity of Cell-ID to recover well-established cell types at single-cell resolution supports its use for automated cell-type annotation, even for extremely rare cells, without the need ...
Context 12
... then determined whether the per-cell gene rankings obtained with MCA were consistent with the gene expression values for neighboring cells in the MCA space. As expected, genes specific to a given cell had higher log-fold changes in expression in the 5% of cells closest to target cell (n = 50) than in the other cells (Supplementary Fig. 2A). We then investigated how the ranking of genes with zero-counts in a cell related to the specificity of the genes concerned in neighboring cells (Supplementary Fig. 2A). ...
Context 13
... expected, genes specific to a given cell had higher log-fold changes in expression in the 5% of cells closest to target cell (n = 50) than in the other cells (Supplementary Fig. 2A). We then investigated how the ranking of genes with zero-counts in a cell related to the specificity of the genes concerned in neighboring cells (Supplementary Fig. 2A). We found that genes not detected in a given cell were nevertheless attributed a high ranking in this cell if the surrounding cells displayed high levels of expression for these genes (Supplementary Fig. 2A). ...
Context 14
... then investigated how the ranking of genes with zero-counts in a cell related to the specificity of the genes concerned in neighboring cells (Supplementary Fig. 2A). We found that genes not detected in a given cell were nevertheless attributed a high ranking in this cell if the surrounding cells displayed high levels of expression for these genes (Supplementary Fig. 2A). This is an important result, highlighting the capacity of multivariate approaches to consider a gene to be specific to a cell in which it was not detected, provided that the gene concerned is specific to very similar cells. ...
Context 15
... MCA approach is thus robust to zero-count values that probably correspond to technical dropouts. These results could be generalized to all individual cells in a given dataset: Spearman's rank correlation coefficient between rank and log-fold change = 0.72, p-value <10e-16 ( Supplementary Fig. 2B). By contrast, the correlation was weaker for per-cell gene rankings Fig. 2, E-F, respectively). ...
Context 16
... robust to zero-count values that probably correspond to technical dropouts. These results could be generalized to all individual cells in a given dataset: Spearman's rank correlation coefficient between rank and log-fold change = 0.72, p-value <10e-16 ( Supplementary Fig. 2B). By contrast, the correlation was weaker for per-cell gene rankings Fig. 2, E-F, ...

Similar publications

Article
Full-text available
Background Genetic variants have been found to influence red blood cell (RBC) susceptibility to hemolytic stress and affect transfusion outcomes and the severity of blood diseases. Males have a higher susceptibility to hemolysis than females, but little is known about the genetic mechanism contributing to the difference. Results To investigate the...
Article
Full-text available
Introduction: The development of multimodal single-cell omics methods has enabled the collection of data across different omics modalities from the same set of single cells. Each omics modality provides unique information about cell type and function, so the ability to integrate data from different modalities can provide deeper insights into cellul...
Article
Full-text available
Accurate transfer learning of clinical outcomes from one cellular context to another, between cell types, developmental stages, omics modalities or species, is considered tremendously useful. When transferring a prediction task from a source domain to a target domain, what counts is the high quality of the predictions in the target domain, requirin...
Article
Full-text available
Data from omics studies have been used for prediction and classification of various diseases in biomedical and bioinformatics research. In recent years, Machine Learning (ML) algorithms have been used in many different fields related to healthcare systems, especially for disease prediction and classification tasks. Integration of molecular omics da...
Preprint
Full-text available
A bstract Tremendous advances in next-generation sequencing technology have enabled the accumulation of large amounts of omics data in various research areas over the past decade. However, study limitations due to small sample sizes, especially in rare disease clinical research, technological heterogeneity, and batch effects limit the applicability...