Directed annotations parially explain gene expression variance in GTEx. The BAGEA model was fit using various GTEx eQTL data (supplemented with GEAU-VADIS eQTL data) and with Expecto-derived directed annotations on genes in the trainig set (chr1,..,chr15) with a top nominal p-value< 10 −7 . Expecto includes 2002 total annotations, of which histone and DNase1 annotations from Roadmap were used (1187 annotations in total). For each gene j in the test set (chr16,..,chr22 and top nominal p-value< 10 −7 ), we calculated an approximate version of S j , the squared magnitude of the directed predictorˆµpredictorˆ predictorˆµ j , where the approximation uses external LD information. Further, we calculated an approximate version of M SE dir j , the mean squared error (M SE) when predicting gene expression y j fromˆµfromˆ fromˆµ j . a) Displayed is the average (approximated) M SE dir

Directed annotations parially explain gene expression variance in GTEx. The BAGEA model was fit using various GTEx eQTL data (supplemented with GEAU-VADIS eQTL data) and with Expecto-derived directed annotations on genes in the trainig set (chr1,..,chr15) with a top nominal p-value< 10 −7 . Expecto includes 2002 total annotations, of which histone and DNase1 annotations from Roadmap were used (1187 annotations in total). For each gene j in the test set (chr16,..,chr22 and top nominal p-value< 10 −7 ), we calculated an approximate version of S j , the squared magnitude of the directed predictorˆµpredictorˆ predictorˆµ j , where the approximation uses external LD information. Further, we calculated an approximate version of M SE dir j , the mean squared error (M SE) when predicting gene expression y j fromˆµfromˆ fromˆµ j . a) Displayed is the average (approximated) M SE dir

Source publication
Preprint
Full-text available
A longstanding goal of regulatory genetics is to understand how variants in genome sequences lead to changes in gene expression. Here we present a method named Bayesian Annotation Guided eQTL Analysis (BAGEA), a variational Bayes framework to model cis-eQTLs using directed and undirected genomic annotations. In a use case, we integrated directed ge...

Contexts in source publication

Context 1
... again split genes into training and test set, fitting BAGEA on the training set and building directed expression predictorsˆµpredictorsˆ predictorsˆµ j for all genes in the test set. We observed that the average M SE dir per data set was variable across GTEx data sets ranging from 100% to below 98.5% (Figure 5a). ...
Context 2
... again saw that increased directed predictor magnitude tended to decrease M SE dir . For instance in fibroblast, the quarter of the genes with the highest directed predictor magnitude had an average M SE dir of 97.2%, whereas the quarter with the lowest directed predictor magnitude had an average M SE dir close to 100% (Figure 5b). ...
Context 3
... accommodate the wide range of tissues explored in GTEx, we expanded the number of directed annotations used in the fitting process to over a thousand. While for some tissues, the analysis strategy was underpowered to derive a predictive model of gene expression from directed annotations, others had a significant fraction of gene expression explained by directed annotations ( Figure 5). Many of the directed annotations BAGEA selected, were derived from tissues that were biologically related to the original tissue of the eQTL studies (Figure 6a). ...

Similar publications

Preprint
Full-text available
While large-scale genome-wide association studies (GWAS) have identified hundreds of loci associated with neuropsychiatric and neurodegenerative traits, identifying the variants, genes and molecular mechanisms underlying these traits remains challenging. Integrating GWAS results with expression quantitative trait loci (eQTLs) and identifying shared...
Preprint
Full-text available
Single-cell RNA sequencing (scRNA-seq) technologies profile gene expression patterns in individual cells. It is often of interest to test for differential expression (DE) between conditions, e.g. treatment and control or between cell types. Simulation studies have shown that non-parametric tests, such as the Wilcoxon-rank sum test, can robustly det...
Preprint
Full-text available
While large-scale genome-wide association studies (GWAS) have identified hundreds of loci associated with neuropsychiatric and neurodegenerative traits, identifying the variants, genes and molecular mechanisms underlying these traits remains challenging. Integrating GWAS results with expression quantitative trait loci (eQTLs) and identifying shared...