Figure - available from: Genome Biology
This content is subject to copyright. Terms and conditions apply.
Illustration of the specification of the prior probabilities for DE/DM under a cell type hierarchy. The cell type hierarchy is represented by three cell types and a few features (genes or CpG sites). The three cell types form a simple tree (shown in the left). In the array of squares and circles, each column represents a feature. Circles represent root or internal nodes, and the squares represent leaf nodes. Colors represent the differential states of the node (black: 1; gray: 0). The root node Dg{1, 2, 3}, internal node Dg{2, 3}, and leaf nodes Zg1, Zg2 and Zg3 are binary random variables representing the g-th feature differential states. π represents the marginal probability for a node to be in state 1. p represents the conditional probability of a node to be in state 1 when its parent node is in state 1

Illustration of the specification of the prior probabilities for DE/DM under a cell type hierarchy. The cell type hierarchy is represented by three cell types and a few features (genes or CpG sites). The three cell types form a simple tree (shown in the left). In the array of squares and circles, each column represents a feature. Circles represent root or internal nodes, and the squares represent leaf nodes. Colors represent the differential states of the node (black: 1; gray: 0). The root node Dg{1, 2, 3}, internal node Dg{2, 3}, and leaf nodes Zg1, Zg2 and Zg3 are binary random variables representing the g-th feature differential states. π represents the marginal probability for a node to be in state 1. p represents the conditional probability of a node to be in state 1 when its parent node is in state 1

Source publication
Article
Full-text available
Bulk high-throughput omics data contain signals from a mixture of cell types. Recent developments of deconvolution methods facilitate cell type-specific inferences from bulk data. Our real data exploration suggests that differential expression or methylation status is often correlated among cell types. Based on this observation, we develop a novel...

Citations

... One future direction is to exploit the relations among the cell types, which may better capture the underlying phenotypic states of the subjects. A more recently developed method called CeDAR [51] uses known cell-type hierarchy as prior to infer CTS expression in bulk data as opposed to our de novo sub-cell-type inference and will leave a more detailed comparison as future work. Moreover, we can also extend GTM-decon to modeling multi-omic single-cell data to identify multi-omic CTS topic distributions and then use them to deconvolve multi-omic bulk data. ...
Article
Full-text available
Cell-type composition is an important indicator of health. We present Guided Topic Model for deconvolution (GTM-decon) to automatically infer cell-type-specific gene topic distributions from single-cell RNA-seq data for deconvolving bulk transcriptomes. GTM-decon performs competitively on deconvolving simulated and real bulk data compared with the state-of-the-art methods. Moreover, as demonstrated in deconvolving disease transcriptomes, GTM-decon can infer multiple cell-type-specific gene topic distributions per cell type, which captures sub-cell-type variations. GTM-decon can also use phenotype labels from single-cell or bulk data to infer phenotype-specific gene distributions. In a nested-guided design, GTM-decon identified cell-type-specific differentially expressed genes from bulk breast cancer transcriptomes.
Preprint
Full-text available
Bulk transcriptomics in tissue samples reflects the average expression levels across different cell types and is highly influenced by cellular fractions. As such, it is critical to estimate cellular fractions to both deconfound differential expression analyses and infer cell type specific differential expression. Since experimentally counting cells is infeasible in most tissues and studies, in silico cellular deconvolution methods have been developed as an alternative. However, existing methods are designed for tissues consisting of clearly distinguishable cell types and have difficulties estimating highly correlated or rare cell types. To address this challenge, we propose Hierarchical Deconvolution (HiDecon) that uses single-cell RNA sequencing references and a hierarchical cell type tree, which models the similarities among cell types and cell differentiation relationships, to estimate cellular fractions in bulk data. By coordinating cell fractions across layers of the hierarchical tree, cellular fraction information is passed up and down the tree, which helps correct estimation biases by pooling information across related cell types. The flexible hierarchical tree structure also enables estimating rare cell fractions by splitting the tree to higher resolutions. Through simulations and real data applications with the ground truth of measured cellular fractions, we demonstrate that HiDecon significantly outperforms existing methods and accurately estimates cellular fractions.
Article
Full-text available
Accounting for cell type compositions has been very successful at analyzing high-throughput data from heterogeneous tissues. Differential gene expression analysis at cell type level is becoming increasingly popular, yielding biomarker discovery in a finer granularity within a particular cell type. Although several computational methods have been developed to identify cell type-specific differentially expressed genes (csDEG) from RNA-seq data, a systematic evaluation is yet to be performed. Here, we thoroughly benchmark six recently published methods: CellDMC, CARseq, TOAST, LRCDE, CeDAR and TCA, together with two classical methods, csSAM and DESeq2, for a comprehensive comparison. We aim to systematically evaluate the performance of popular csDEG detection methods and provide guidance to researchers. In simulation studies, we benchmark available methods under various scenarios of baseline expression levels, sample sizes, cell type compositions, expression level alterations, technical noises and biological dispersions. Real data analyses of three large datasets on inflammatory bowel disease, lung cancer and autism provide evaluation in both the gene level and the pathway level. We find that csDEG calling is strongly affected by effect size, baseline expression level and cell type compositions. Results imply that csDEG discovery is a challenging task itself, with room to improvements on handling low signal-to-noise ratio and low expression genes.