Lin Hou's research while affiliated with Tsinghua University and other places

What is this page?


This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

Publications (63)


Cofea: correlation-based feature selection for single-cell chromatin accessibility data
  • Article
  • Full-text available

December 2023

·

28 Reads

·

1 Citation

Briefings in Bioinformatics

Keyi Li

·

Xiaoyang Chen

·

Shuang Song

·

[...]

·

Single-cell chromatin accessibility sequencing (scCAS) technologies have enabled characterizing the epigenomic heterogeneity of individual cells. However, the identification of features of scCAS data that are relevant to underlying biological processes remains a significant gap. Here, we introduce a novel method Cofea, to fill this gap. Through comprehensive experiments on 5 simulated and 54 real datasets, Cofea demonstrates its superiority in capturing cellular heterogeneity and facilitating downstream analysis. Applying this method to identification of cell type-specific peaks and candidate enhancers, as well as pathway enrichment analysis and partitioned heritability analysis, we illustrate the potential of Cofea to uncover functional biological process.

Download
Share

Partitioning and aggregating cross-tissue and tissue-specific genetic effects in identifying gene-trait associations

December 2023

·

38 Reads

Transcriptome-wide association studies (TWAS) have shown great promises in extending GWAS loci to a functional understanding of disease mechanisms. In an effort to fully unleash the TWAS and GWAS information, we propose MTWAS, a statistical framework that partitions and aggregates cross-tissue and tissue-specific genetic effects in identifying gene-trait associations. Different from previous methods, we introduce a non-parametric imputation strategy to augment the inaccessible tissues, which allows for barren conditions, such as complex interactions and non-linear expression data structure across tissues. We further classify eQTLs into cross-tissue eQTLs (ct-eQTLs) and tissue-specific eQTLs (ts-eQTLs) via a step-wise procedure based on the extended Bayesian information criterion, which was consistent under high-dimensional settings. We have shown that MTWAS significantly improves the imputation accuracy across all 47 GTEx tissues compared with other single-tissue and multi-tissue methods, such as PrediXcan and UTMOST. MTWAS also identifies more predictable genes that can be replicated with independent studies. Applications to 84 UKBB GWAS studies have provided novel insights into disease etiology. The R package implementing MTWAS is available at https://github.com/szcf-weiya/MTWAS.


Cofea: correlation-based feature selection for single-cell chromatin accessibility data

June 2023

·

55 Reads

Single-cell sequencing technologies have revolutionized the understanding of cellular heterogeneity at an unprecedented resolution. However, the high-noise and high-dimensional nature of single-cell data poses challenges for downstream analysis, and thus increases the demand for selecting biologically informative features when processing and analyzing single-cell data. Such approaches are mature for single-cell RNA sequencing (scRNA-seq) data, while for single-cell chromatin accessibility sequencing data, the epigenomic profiles at the cellular level, there is a significant gap in the availability of effective methods. Here we present Cofea, a correlation-based framework that focuses on the correlation between accessible chromatin regions, to accurately select scCAS data's features which are highly relevant to biological processes. With various simulated datasets, we quantitively demonstrate the advantages of Cofea for capturing cellular heterogeneity of imbalanced cell populations or differentiation trajectories. We further demonstrate that Cofea outperforms existing feature selection methods in facilitating downstream analysis, particularly in cell clustering, on a wide range of real scCAS datasets. Applying this method to identification of cell type-specific peaks and candidate enhancers, pathway enrichment analysis and partitioned heritability analysis, we show the potential of Cofea to uncover functional biological process and the genetic basis of cellular characteristics.


Global age-structured spatial modeling for emerging infectious diseases (EIDs) like COVID-19

April 2023

·

254 Reads

PNAS Nexus

Modeling the global dynamics of emerging infectious diseases (EIDs) like COVID-19 can provide important guidance in the preparation and mitigation of pandemic threats. While age-structured transmission models are widely used to simulate the evolution of EIDs, most of these studies focus on the analysis of specific countries and fail to characterize the spatial spread of EIDs across the world. Here, we developed a global pandemic simulator that integrates age-structured disease transmission models across 3,157 cities and explored its usage under several scenarios. We found that without mitigations, EIDs like COVID-19 are highly likely to cause profound global impacts. For pandemics seeded in most cities, the impacts are equally severe by the end of the first year. The result highlights the urgent need for strengthening global infectious disease monitoring capacity to provide early warnings of future outbreaks. Additionally, we found that the global mitigation efforts could be easily hampered if developed countries or countries near the seed origin take no control. The result indicates that successful pandemic mitigations require collective efforts across countries. The role of developed countries is vitally important as their passive responses may significantly impact other countries.


X-Wing workflow
X-Wing uses GWAS summary statistics and population-matched LD references as input. It first employs a scan statistic approach to detect genome segments showing local genetic correlation between populations. Next, it incorporates the local genetic correlation annotation into a Bayesian PRS model, amplifying SNP effects that are correlated between populations. Finally, it uses summary statistics-based repeated learning to combine multiple population-specific PRS and produce the final PRS with improved accuracy.
X-Wing achieves superior statistical power in identifying cross-population local genetic correlation
a, b Statistical power in simulations under a heritability enrichment framework. Power is defined as the proportion of simulation repeats that the true signal region is identified. Panels (a) and (b) illustrate results for continuous and binary trait outcomes, respectively. c Number of regions with significant cross-population genetic correlations identified by X-Wing and PESCA for 31 complex traits. d Proportion of total genetic covariance explained by significant local regions for 31 complex traits. Genetic covariance measures covariance of additive genetic component between two populations. In both panels (c) and (d), GWAS sample sizes are indicated by the color of each data point, and the diagonal line is highlighted in red.
X-Wing identifies genomic regions strongly enriched for correlated genetic effects between Europeans and East Asians
a Scatter plot shows the proportion of SNPs in regions identified by X-Wing and the proportion of cross-population genetic covariance explained by these SNPs. All data points are above the diagonal line highlighted in red, showing substantial enrichment. b Cross-population genetic correlation for 31 complex traits. Three bars denote the global genetic correlation estimated from genome-wide data (light green), genetic correlation in regions identified by X-Wing (brown), and genetic correlation outside regions identified by X-Wing (dark green). Results for a simulated uncorrelated trait are labeled as ‘Control’. All traits are ordered according to the global genetic correlation estimates. Error bars indicate 95% confidence interval. The centre for the error bars represents the point estimates for genetic correlation. A list of trait acronyms can be found in Supplementary Data 7. c Bar plot shows the number of significant regions identified only in discovery stage (purple), only in replication stage (orange), and in both stages (blue) for four lipid traits. HDL, LDL, TC, TG stand for HDL cholesterol, LDL cholesterol, total cholesterol, and triglycerides, respectively. d Cumulative proportion of genetic covariance explained by regions identified in the discovery stage for triglycerides. Analogous results for HDL cholesterol, LDL cholesterol, and total cholesterol are shown in Supplementary Fig. 6. Pink dashed line indicates FDR cutoff of 0.05. Red line represents the diagonal line of y = x. Genetic correlation and genetic covariance were calculated using XPASS.
Local genetic correlation annotation improves PRS prediction accuracy for 31 traits in East Asians
a The percentage relative increase in R² for prediction accuracy of annotation-informed European PRS over PRS-CSx European PRS. A list of trait acronyms can be found in Supplementary Data 7. b The percentage relative increase in R² for prediction accuracy of annotation-informed over PRS-CSx European PRS using only annotated and non-annotated SNPs (n = 31 traits). In the boxplot, the center line, box limits and whiskers denote the median, upper and lower quartiles, and 1.5 × interquartile range, respectively. c Comparison of R² between annotation-informed European PRS using only annotated and non-annotated SNPs. Each point represents a trait. X-axis is the R² for PRS based on non-annotated SNPs. Y-axis is the R² for PRS based on annotated SNPs.
Performance of X-Wing in combining population-specific PRS using GWAS summary statistics for 31 traits in East Asian samples
a The percentage relative increase in R² of X-Wing PRS over PRS-CSx. The dashed line represents the average increase. A list of trait acronyms can be found in Supplementary Data 5. b Comparison of R² for linearly combined PRS with mixing weights obtained using GWAS summary statistics and individual-level data. The X-axis represents the R² using weights estimated from individual-level data, while the Y-axis shows the R² using summary statistics-based weights. The dashed line represents the diagonal line of y = x. c The percentage relative increase in R² of X-Wing PRS over PRS-CSx using GWAS summary statistics. PRS-CSx PRS is calculated based on European posterior mean effects. The dashed line represents the average increase.
Quantifying portable genetic effects and improving cross-ancestry genetic prediction with GWAS summary statistics

February 2023

·

72 Reads

·

25 Citations

Nature Communications

Polygenic risk scores (PRS) calculated from genome-wide association studies (GWAS) of Europeans are known to have substantially reduced predictive accuracy in non-European populations, limiting their clinical utility and raising concerns about health disparities across ancestral populations. Here, we introduce a statistical framework named X-Wing to improve predictive performance in ancestrally diverse populations. X-Wing quantifies local genetic correlations for complex traits between populations, employs an annotation-dependent estimation procedure to amplify correlated genetic effects between populations, and combines multiple population-specific PRS into a unified score with GWAS summary statistics alone as input. Through extensive benchmarking, we demonstrate that X-Wing pinpoints portable genetic effects and substantially improves PRS performance in non-European populations, showing 14.1%–119.1% relative gain in predictive R² compared to state-of-the-art methods based on GWAS summary statistics. Overall, X-Wing addresses critical limitations in existing approaches and may have broad applications in cross-population polygenic risk prediction.


CITEdb: a manually curated database of cell- cell interactions in human

October 2022

·

33 Reads

Motivation: The interactions among various types of cells play critical roles in cell functions and the maintenance of the entire organism. While cell-cell interactions are traditionally revealed from experimental studies, recent developments in single cell technologies combined with data mining methods have enabled computational prediction of cell-cell interactions, which have broadened our understanding of how cells work together, and have important implications in therapeutic interventions targeting cell-cell interactions for cancers and other diseases. Despite the importance, to our knowledge, there is no database for systematic documentation of high-quality cell-cell interactions at cell type level, which hinders the development of computational approaches to identify cell-cell interactions. Results: We develop a publicly accessible database, CITEdb (Cell-cell InTEraction database, https://citedb.cn/), which not only facilitates interactive exploration of cell-cell interactions in specific physiological contexts (e.g., a disease or an organ), but also provides a benchmark dataset to interpret and evaluate computationally derived cell-cell interactions from different tools. CITEdb contains 728 pairs of cell-cell interactions in human that are manually curated. Each interaction is equipped with structured annotations including the physiological context, the ligand-receptor pairs that mediate the interaction, etc. Our database provides a web interface to search, visualize, and download cell-cell interactions. Users can search for cell-cell interactions by selecting the physiological context of interest or specific cell types involved. CITEdb is the first attempt to catalogue cell-cell interactions at cell type level, which is beneficial to both experimental, computational, and clinical studies of cell-cell interactions.


Fig. 1. Overview of the CITEdb website. (A) Users can search for cell-cell interactions by selecting the contexts or the cell types of interest, as well as selecting contexts and cell types simultaneously. (B) Graphical display of the search result. Nodes represent cell types, and edges represent cell-cell interactions. The thickness of edges indicates the number of evidences supporting the interaction
Fig. 2. Precision-recall curves in evaluating various algorithms that predict cell-cell interactions from a human metastatic melanoma scRNA-seq dataset. CITEdb interactions are used as a benchmark dataset. The directional interactions are predicted by six algorithms and then summarized into cell level interactions by the sum of communication scores (A) and the count of active LR pairs (B). The unidirectional interactions are inferred by Bray-Curtis score and the enrichment score (C). The area under the curve (AUC) is presented with 95% confidence interval (CI)
CITEdb: a manually curated database of cell-cell interactions in human

September 2022

·

68 Reads

·

4 Citations

Bioinformatics

Motivation: The interactions among various types of cells play critical roles in cell functions and the maintenance of the entire organism. While cell-cell interactions are traditionally revealed from experimental studies, recent developments in single cell technologies combined with data mining methods have enabled computational prediction of cell-cell interactions, which have broadened our understanding of how cells work together, and have important implications in therapeutic interventions targeting cell-cell interactions for cancers and other diseases. Despite the importance, to our knowledge, there is no database for systematic documentation of high-quality cell-cell interactions at cell type level, which hinders the development of computational approaches to identify cell-cell interactions. Results: We develop a publicly accessible database, CITEdb (Cell-cell InTEraction database, https://citedb.cn/), which not only facilitates interactive exploration of cell-cell interactions in specific physiological contexts (e.g., a disease or an organ), but also provides a benchmark dataset to interpret and evaluate computationally derived cell-cell interactions from different tools. CITEdb contains 728 pairs of cell-cell interactions in human that are manually curated. Each interaction is equipped with structured annotations including the physiological context, the ligand-receptor pairs that mediate the interaction, etc. Our database provides a web interface to search, visualize, and download cell-cell interactions. Users can search for cell-cell interactions by selecting the physiological context of interest or specific cell types involved. CITEdb is the first attempt to catalogue cell-cell interactions at cell type level, which is beneficial to both experimental, computational, and clinical studies of cell-cell interactions. Availability and implementation: CITEdb is freely available at https://citedb.cn/ and the R package implementing benchmark is available at https://github.com/shanny01/benchmark. Supplementary information: Supplementary data are available at Bioinformatics online.


Workflow of DiffScan
a Taking raw reactivities as input, DiffScan first normalizes them relative to one another in the Normalization module (b) to correct for systematic bias, and then identifies SVRs in the Scan module (c). b The Normalization module transforms raw reactivities into normalized reactivities to remove systematic bias. The raw reactivities are from the icSHAPE SRP vivo dataset which has no SVRs. The normalized reactivities are comparable as far as possible across different cellular conditions. c Taking normalized reactivities as input, the Scan module first calculates the significance of any differential signals for each nucleotide position with two-sided Wilcoxon test, and then concatenates positional p values into a regional signal via scan statistic. The significance of the scan statistic for each enumerated region is evaluated by Monto Carlo sampling, and those regions crossing a specified significance threshold are reported as SVRs.
Comparison of DiffScan and existing SVR detection methods in simulated datasets
Default search length of 5 nt is used for deltaSHAPE and minimum search length of 5 nt is used for dStruct. The empirical model in Sükösd et al.⁴⁰ was used to simulate reactivities. a Jaccard index between the top predicted nucleotides and the true SVRs at varying cutoffs. b Average distance between the top predicted nucleotides and the true SVRs at varying cutoffs. c Precision-Recall curves. Columns: three levels of strength of differential signals at simulated SVRs. Note deltaSHAPE does not allow external thresholding, and therefore it is represented as dots instead of curves.
Comparison of DiffScan and existing SVR detection methods with benchmark datasets
Default search length of 5 nt is used for deltaSHAPE and minimum search length of 5 nt is used for dStruct following the original article of the method. a Jaccard index between the top-20 ranked nucleotides and the annotated SVRs. b Average distance from the top-20 ranked nucleotides to annotated SVRs. “X” indicates that the corresponding method was not applicable for the dataset. “*” indicates that the average distance cannot be calculated since the corresponding method did not report any region.
Application of DiffScan to explore the roles of RNA structural variation
a in regulating mRNA abundance and b shaping human traits in an icSHAPE dataset mapping RNA structure across human cellular compartments. Ch chromatin, Np nucleoplasm, Cy cytoplasm. a Predicted SVRs between Ch and Np were enriched with protein binding sites and RNA modification sites. *p value (one-sided Fisher’s exact test) < 0.05, ***p value < 1e-6. P values of enrichment: m¹A = 5.85e-3, m⁶A = 4.77e-2, ψ = 2.87e-14, protein binding < 2.2e-16. b Enrichment of trait-associated SNPs in predicted SVRs. Proportion of trait-associated SNPs in SVR and non-SVR positions are plotted. P values are calculated by one-sided Fisher’s exact test.
Differential analysis of RNA structure probing experiments at nucleotide resolution: uncovering regulatory functions of RNA structure

July 2022

·

92 Reads

·

5 Citations

Nature Communications

RNAs perform their function by forming specific structures, which can change across cellular conditions. Structure probing experiments combined with next generation sequencing technology have enabled transcriptome-wide analysis of RNA secondary structure in various cellular conditions. Differential analysis of structure probing data in different conditions can reveal the RNA structurally variable regions (SVRs), which is important for understanding RNA functions. Here, we propose DiffScan, a computational framework for normalization and differential analysis of structure probing data in high resolution. DiffScan preprocesses structure probing datasets to remove systematic bias, and then scans the transcripts to identify SVRs and adaptively determines their lengths and locations. The proposed approach is compatible with most structure probing platforms (e.g., icSHAPE, DMS-seq). When evaluated with simulated and benchmark datasets, DiffScan identifies structurally variable regions at nucleotide resolution, with substantial improvement in accuracy compared with existing SVR detection methods. Moreover, the improvement is robust when tested in multiple structure probing platforms. Application of DiffScan in a dataset of multi-subcellular RNA structurome and a subsequent motif enrichment analysis suggest potential links of RNA structural variation and mRNA abundance, possibly mediated by RNA binding proteins such as the serine/arginine rich splicing factors. This work provides an effective tool for differential analysis of RNA secondary structure, reinforcing the power of structure probing experiments in deciphering the dynamic RNA structurome. The authors present DiffScan, an advanced tool for normalization and differential analysis of RNA structure probing experiments, combining their power in deciphering the dynamic RNA structurome and facilitating the discovery of RNA regulatory functions.


Figure 2. The statistical power of OWAS-joint, OWAS applied with each of the three cell types, and a union of single-cell-type methods with Bonferroni correction. Both phenotype effects with high (0.1%) and low (0.02%) heritability were considered. The simulation settings 1, 2, and 3 correspond to one (Th1), two (Th1 and GM12878), and three causal cell types.
Figure 3. Heritability (h 2 ) explained by OWAS-joint segments and single-cell-type OWAS segments. Varying p-value thresholds were considered. The error bars correspond to the standard error of the heritability estimated by GCTA software. The heritability was evaluated with the WTCCC individual-level genotype data. The red dashed lines mark the heritability explained by OWAS-joint segments.
Figure 4. Replication rates of OWAS-joint results. OWAS-joint was performed with GWAS summary statistics on CD, RA, HT, PrCa, HD, and LDL from the discovery cohort (with larger sample sizes) and the replication cohort from UKBB and GERA. In the discovery cohort, GWAS SNPs were divided into five bins according to their p-values (I: (0, 5 × 10 −6 ), II: [5 × 10 −6 , 5 × 10 −5 ), III: [5 × 10 −5 , 5 × 10 −4 ), IV: [5 × 10 −4 , 5 × 10 −3 ), V: [5 × 10 −3 , 0.05)). In the replication cohort, GWAS significant SNPs were identified with a relaxed threshold (p < 0.05). In each bin, SNPs were broken down into prioritized and non-prioritized groups by the OWAS-joint results (p < 5 × 10 −8 ). The p-values shown in the figure were derived from the binomial test.
The number of segments and genes identified by OWAS-joint, the union of single-cell-type OWAS with 12 cell types, and the average number of single-cell-type OWAS. The standard deviations across 12 cell types are shown in brackets. The p-value cutoffs for the union of segment-level association tests were determined by Bonferroni correction. The largest numbers of identified signals are highlighted in boldface.
Multi-Cell-Type Openness-Weighted Association Studies for Trait-Associated Genomic Segments Prioritization

July 2022

·

53 Reads

Genes

Openness-weighted association study (OWAS) is a method that leverages the in silico prediction of chromatin accessibility to prioritize genome-wide association studies (GWAS) signals, and can provide novel insights into the roles of non-coding variants in complex diseases. A prerequisite to apply OWAS is to choose a trait-related cell type beforehand. However, for most complex traits, the trait-relevant cell types remain elusive. In addition, many complex traits involve multiple related cell types. To address these issues, we develop OWAS-joint, an efficient framework that aggregates predicted chromatin accessibility across multiple cell types, to prioritize disease-associated genomic segments. In simulation studies, we demonstrate that OWAS-joint achieves a greater statistical power compared to OWAS. Moreover, the heritability explained by OWAS-joint segments is higher than or comparable to OWAS segments. OWAS-joint segments also have high replication rates in independent replication cohorts. Applying the method to six complex human traits, we demonstrate the advantages of OWAS-joint over a single-cell-type OWAS approach. We highlight that OWAS-joint enhances the biological interpretation of disease mechanisms, especially for non-coding regions.


Quantifying concordant genetic effects of de novo mutations on multiple disorders

June 2022

·

57 Reads

·

3 Citations

eLife

Exome sequencing on tens of thousands of parent-proband trios has identified numerous deleterious de novo mutations (DNMs) and implicated risk genes for many disorders. Recent studies have suggested shared genes and pathways are enriched for DNMs across multiple disorders. However, existing analytic strategies only focus on genes that reach statistical significance for multiple disorders and require large trio samples in each study. As a result, these methods are not able to characterize the full landscape of genetic sharing due to polygenicity and incomplete penetrance. In this work, we introduce EncoreDNM, a novel statistical framework to quantify shared genetic effects between two disorders characterized by concordant enrichment of DNMs in the exome. EncoreDNM makes use of exome-wide, summary-level DNM data, including genes that do not reach statistical significance in single-disorder analysis, to evaluate the overall and annotation-partitioned genetic sharing between two disorders. Applying EncoreDNM to DNM data of nine disorders, we identified abundant pairwise enrichment correlations, especially in genes intolerant to pathogenic mutations and genes highly expressed in fetal tissues. These results suggest that EncoreDNM improves current analytic approaches and may have broad applications in DNM studies.


Citations (38)


... Inference relying solely on summary statistics is widely used in the statistical genetics literature for practical reasons. Summary statistics-based methods have been developed for tasks such as variance component inference and polygenic risk prediction (Bulik-Sullivan et al., 2015b,a;Ruan et al., 2022;Miao et al., 2023a;Zhao et al., 2022). In contrast to our work, these applications do not leverage ML predictions, but instead focus on inference using summary statistics obtained from observed outcomes. ...

Reference:

Task-Agnostic Machine Learning-Assisted Inference
Quantifying portable genetic effects and improving cross-ancestry genetic prediction with GWAS summary statistics

Nature Communications

... They compared all possible method-database combinations but also lacked a gold standard and, therefore, their report focused on the overlap between the highest ranked predictions (which is low), robustness to noise (adequate), and enrichment of CCIs among spatially adjacent cell types (present for some datasets only) [13]. Shan et al. created a gold standard of 728 pairs of CCIs [14] and performed a benchmark comparison of the tools from LIANA. However, their study was limited to predictions of "source cell-target cell" interactions, and it did not consider the "source cell-target cell-ligand-receptor" model. ...

CITEdb: a manually curated database of cell-cell interactions in human

Bioinformatics

... The binding preferences of RBPs depend not only on the RNA sequences but also on the structural features of these RBP binding sites [19][20][21][22][23][24][25]. For example, A G-rich internal loop in the lncRNA Braveheart is critical for binding to a zinc-finger protein CNBP, which is required for the cardiac specification in mice [26]. ...

Differential analysis of RNA structure probing experiments at nucleotide resolution: uncovering regulatory functions of RNA structure

Nature Communications

... While recent research has shown that certain genes and biological pathways are commonly affected by DNVs in various disorders, current methods tend to only consider genes that are statistically significant across multiple disorders and cannot fully capture the complexity of genetic associations due to the polygenic nature of diseases and incomplete penetrance. EncoreDNM is a novel statistical method that quantifies the overall genetic sharing of DNVs between two disorders for different variant types [52]. Instead of using the Bayesian framework, it constructs mixed-effects Poisson regression models to evaluate the correlation between two traits by providing the estimated correlation and p-values from statistical inference. ...

Quantifying concordant genetic effects of de novo mutations on multiple disorders

eLife

... Several methods have been developed to improve genetic correlation estimation using individual-level GWAS data [3,4], GWAS summary statistics [5,6], or both [7]. Recent studies have also expanded this concept to quantify genetic correlations in local genomic regions [8][9][10][11], between human ancestral populations [12,13], and using other types of genetic variations [14]. Overall, these methods have become a routine component of complex trait genetic studies and provided insights into the genetic basis of numerous human traits. ...

Quantifying portable genetic effects and improving cross-ancestry genetic prediction with GWAS summary statistics

... Since s-LDSC uses c 2 statistics of SNPs as the dependent variable, residual correlations are present for SNPs in LD, and this reduces the precision of the estimates of SNP heritability and functional enrichment. 22,26,27 In recognition of these limitations, we have developed a new approach, generalized LD score regression (g-LDSC), for estimating functional enrichments. The method uses information on the relation between c 2 statistics and the squared LD matrix, and differs from s-LDSC in using feasible generalized least-squares (FGLS) estimation, 28 which accounts for possible correlated error structure, instead of WLS. ...

Leveraging LD eigenvalue regression to improve the estimation of SNP heritability and confounding inflation
  • Citing Article
  • April 2022

The American Journal of Human Genetics

... The CASTLE framework CASTLE takes as input the cell-by-region matrix that has undergone a series of preprocessing operations, including binarization, feature selection, TF-IDF transformation, and normalization (Fig. 1a). CASTLE first filters the binarized count matrix to retain only the peaks that are accessible in at least 1% of cells, which resembles existing methods 5,38,[46][47][48] . Similar to other existing methods 2,37,38,46,47,49 , CASTLE uses TF-IDF transformation on the filtered count matrix. ...

Cell type annotation of single-cell chromatin accessibility data via supervised Bayesian embedding

Nature Machine Intelligence

... patients, its demonstrating its potential as an molecular marker for early breast cancer diagnosis. Studies have validated that genome methylation signatures are significantly associated with cancer immunotherapy response (Xu et al., 2021;Qin et al., 2024;Ressler et al., 2024). Consistently, we found that 23 abnormally methylated probes in BRCA blood and tissues were associated with patients' immune therapy response. ...

A Pan-Cancer Analysis of Predictive Methylation Signatures of Response to Cancer Immunotherapy
Frontiers in Immunology

Frontiers in Immunology

... Recently, we have developed the openness-weighted association studies (OWAS) approach, a computational framework that leverages predicted chromatin accessibility for the prioritization of GWAS signals [13]. The first step in OWAS is to choose a trait-related cell type (e.g., the liver for low-density lipoprotein (LDL), and whole-blood for Crohn's disease (CD)). ...

Openness Weighted Association Studies: Leveraging Personal Genome Information to Prioritize Noncoding Variants

Bioinformatics

... Multi-omics approaches hold potential as a means of improving GRS prediction accuracy by capturing and incorporating additional biological information into predictive models [45]. A study by Shan et al. combined the GRS calculation tool LDpred with a transcriptional risk score (TRS), which showed improvements in AUCs after adding TRS [46]. Another way to achieve more complex models is by adding a machine learning approach to the GRS. ...

A novel transcriptional risk score for risk prediction of complex human diseases
  • Citing Article
  • July 2021

Genetic Epidemiology