(A) Schematic representation of cis-regulatory eQTL model in Equations 1 and 2. (B) Example of allelic expression associated with each of the alleles of a cis-eQTL (eVariant Chr 5: 96252589 T/C; eGene ERAP2) in GTEx adipose subcutaneous. Each dot corresponds to allelic imbalance in one individual heterozygous for the eVariant, measured using reads that overlap heterozygous SNPs (aeSNP) in the eGene. Phasing between the aeSNP and the eQTL SNP is utilized to associate the measured allelic expression with each of the eQTL alleles. (C,D) eGene expression for the same example eQTL. The green dashed line connects the median expression of the two homozygous classes. Expression is linear with number of alternative alleles (C ), but the linearity is lost after log transformation (D). 

(A) Schematic representation of cis-regulatory eQTL model in Equations 1 and 2. (B) Example of allelic expression associated with each of the alleles of a cis-eQTL (eVariant Chr 5: 96252589 T/C; eGene ERAP2) in GTEx adipose subcutaneous. Each dot corresponds to allelic imbalance in one individual heterozygous for the eVariant, measured using reads that overlap heterozygous SNPs (aeSNP) in the eGene. Phasing between the aeSNP and the eQTL SNP is utilized to associate the measured allelic expression with each of the eQTL alleles. (C,D) eGene expression for the same example eQTL. The green dashed line connects the median expression of the two homozygous classes. Expression is linear with number of alternative alleles (C ), but the linearity is lost after log transformation (D). 

Source publication
Article
Full-text available
Large-scale efforts like the ENCODE Project have made tremendous progress in cataloging the genomic binding patterns of DNA-associated proteins (DAPs), such as transcription factors (TFs). However, most chromatin immunoprecipitation-sequencing (ChIP-seq) analyses have focused on a few immortalized cell lines whose activities and physiology differ i...

Citations

... Additionally, we examined whether PhyeQTLs affect the binding sites of DNA-associated proteins (DAPs), such as transcription factors (TFs), which bind to genomic regulatory elements such as promoters, enhancers, silencers, and insulators [47]. We utilized ChIP-seq data obtained from liver tissue to quantify the binding of 17 different DAPs [48] , including liver-specific factors (e.g., HNF4A and RXRA) and chromatin structure-related factors (e.g., CTCF and RAD21). Our analysis aimed to evaluate the overlap between QTLs and DAP binding sites, and comparing the overlap patterns between PhyeQTLs and non-PhyeQTLs. ...
... We further ask whether the disruption of DAP binding sites by PhyeQTLs is tissue specific. As reported by Ramaker et al., the data on DAP binding sites derived from liver tissue closely reflects liver biology than that from the HepG2 cell line [48]. Given the unavailability of primary tissue-based data on DAP binding sites for other tissues analyzed in this study, we utilized liver tissue-derived DAP binding site data to assess tissue-specificity and evaluate the overlap enrichment of PhyeQTLs from other tissues with DAP binding sites. ...
Preprint
Full-text available
Genome-wide association studies (GWAS) have linked thousands of genetic variants to various complex traits or diseases. However, most identified variants have weak individual effects, are correlated with nearby polymorphisms due to linkage disequilibrium (LD), and are located in non-coding cis-regulatory elements (CREs). These characteristics complicate the assessment of the direct impact of each variant on tissue specific gene expression and phenotype. To address this challenge, we have developed a novel algorithm that leverages polymer folding and 3D chromatin interactions to prioritize and identify putative causal variants and their target genes. From the millions of eQTL-Gene pairs identified by GTEx in human somatic tissues, we classify only ∼10-20% as putative functional eQTL-Gene pairs supported by phenotypic associations confirmed through CRISPR deletion experiments. Our findings show that unlike most variants, functional eQTL-Gene pairs predominantly reside within the same topologically associating domain (TAD) and have strong associations with cell-type specific cis-regulatory elements (CREs), enriched for binding sites of tissue-specific transcription factors. Unlike most approaches that rely on linear distance or other chromatin features (histone code, accessibility), our algorithm emphasizes the importance of physical interactions and 3D chromatin folding in gene regulation, as the identified eQTL-Gene pairs are all among the small fraction of physical chromatin interactions sufficient for chromatin locus folding. Overall, our algorithm reduces false positive associations between DNA variants and genes identified by eQTL analysis and uncovers novel variant-gene pair associations. These findings suggest a mechanism where a small number of regulatory variants control tissue specific gene expression via their physical association with target genes confined within the same TAD. Our approach provides new insights into the molecular mechanisms driving GWAS phenotypes.
... For ChIP-seq analysis, the ChIP-seq profiles were downloaded from ENCODE (83)(84)(85). Read mapping, filtering and peak track generation were done using the same strategy as ATAC-seq analysis. MACS2 was used for peak calling with the following settings, -g hs. ...
Preprint
Full-text available
Background and Aims: Hepatic organoid cultures are considered a powerful model system to study liver development and diseases in vitro. However, hepatocyte-like cells differentiated from such organoids remain immature compared to primary human hepatocytes. Therefore, a comprehensive understanding of differences in gene regulatory mechanisms between primary human hepatocytes and hepatic organoids is essential to obtain functional hepatocyte-like cells in vitro for fundamental and therapeutic applications. Methods: We obtained primary human hepatocytes at high purity from all zones of the liver lobule using an optimized two-step perfusion protocol. We captured the single-cell transcriptome and chromatin accessibility landscape using scRNA-seq and ATAC-seq, respectively. We identified key transcription factors and compared the gene regulatory mechanisms in primary human hepatocytes and (un)differentiated intrahepatic cholangiocyte organoids. Using siRNA-mediated perturbations, we showed the functional relevance of an organoid-enriched transcription factor during in vitro differentiation of hepatocyte-like cells. Results: Our integrative omics analysis revealed that Activator Protein 1 (AP-1) family members cooperate with hepatocyte-specific transcription factors, including HNF4A, in maintaining cellular functionality of mature human hepatocytes. Comparative analysis identified distinct transcription factor sets specifically active in human hepatocytes and organoids. Amongst these ELF3 is unique to intrahepatic cholangiocyte organoids and its expression level negatively correlate with expression of hepatic marker genes. Functional analysis of ELF3 furthermore revealed that ELF3 depletion optimizes the formation of hepatocyte-like cells from intrahepatic cholangiocyte organoids. Conclusions: Collectively, our integrative analysis provides insights into the transcriptional regulatory networks of human hepatocytes and hepatic organoids, thereby informing future strategies for better establishment of urgently-needed hepatic model systems in vitro.
... For RNA-seq, total RNA was purified from patient samples using the Norgen Total RNA Purification Kit (Norgen, 35300). ChIPseq using anti-GR-alpha (BD, 611227) and anti-H3K27ac (Active Motif, 39133) antibodies was performed as previously described [29,30]. Fast-ATAC data from primary ALL cells were downloaded from NCBI Gene Expression Omnibus (GSE161501). ...
Article
Full-text available
Glucocorticoids (GCs) are a mainstay of contemporary, multidrug chemotherapy in the treatment of childhood acute lymphoblastic leukemia (ALL), and resistance to GCs remains a major clinical concern. Resistance to GCs is predictive of ALL relapse and poor clinical outcome, and therefore represents a major hurdle limiting further improvements in survival rates. While advances have been made in identifying genes implicated in GC resistance, there remains an insufficient understanding of the impact of cis -regulatory disruptions in resistance. To address this, we mapped the gene regulatory response to GCs in two ALL cell lines using functional genomics and high-throughput reporter assays and identified thousands of GC-responsive changes to chromatin state, including the formation of over 250 GC-responsive super-enhancers and a depletion of AP-1 bound cis -regulatory elements implicated in cell proliferation and anti-apoptotic processes. By integrating our GC response maps with genetic and epigenetic datasets in primary ALL cells from patients, we further uncovered cis -regulatory disruptions at GC-responsive genes that impact GC resistance in childhood ALL. Overall, these data indicate that GCs initiate pervasive effects on the leukemia epigenome, and that alterations to the GC gene regulatory network contribute to GC resistance.
... For RNA-seq, total RNA was puri ed from patient samples using the Norgen Total RNA Puri cation Kit (Norgen, 35300). ChIP-seq using anti-GR-alpha (BD, 611227) and anti-H3K27ac (Active Motif, 39133) antibodies was performed as previously described 28,29 . Fast-ATAC data from primary ALL cells were downloaded from NCBI Gene Expression Omnibus (GSE161501). ...
Preprint
Full-text available
Glucocorticoids (GCs) are a mainstay of contemporary, multi-drug chemotherapy in the treatment of childhood acute lymphoblastic leukemia (ALL), and resistance to GCs remains a major clinical concern. Resistance to GCs is predictive of ALL relapse and poor clinical outcome, and therefore represents a major hurdle limiting further improvements in survival rates. While advances have been made in identifying genes implicated in GC resistance, there remains an insufficient understanding of the impact of cis -regulatory disruptions in resistance. To address this, we mapped the gene regulatory response to GCs in two ALL cells using functional genomics and high-throughput reporter assays and identified thousands of GC-responsive changes to chromatin state, including the formation of over 250 GC-responsive super-enhancers and a depletion of AP-1 bound cis -regulatory elements implicated in cell proliferation and anti-apoptotic processes. By integrating our GC response maps with genetic and epigenetic datasets in primary ALL cells from patients, we further uncovered cis -regulatory disruptions at GC-responsive genes that impact GC resistance in childhood ALL. Overall, these data indicate that GCs initiate pervasive effects on the leukemia epigenome, and that alterations to the GC gene regulatory network contribute to GC resistance.
... We found 90 TF motifs enriched in peaks (E-value < 1 3 10 À100 ; Table S8), including motifs for HNF4G (MIM: 605966), FOXA family members (HNF3), CEBPB 72 (MIM: 189965), the multifaceted protein CTCF 73 (MIM: 604167), and KLF family members, which regulate numerous processes in liver. 74 Of 17 TFs with ChIP-seq data in liver tissue, 45 binding sites for all TFs were significantly enriched (permutation p < 1 3 10 À3 ) in ATAC peaks (Table S9), and 11 TFs had over 90% of their binding sites within ATAC peaks (Table S9), similar to previous findings. 15 Taken together, ATAC peaks marked previously annotated transcriptional regulatory elements and TF binding sites in liver tissue. ...
Article
Full-text available
Identifying the molecular mechanisms by which genome-wide association study (GWAS) loci influence traits remains challenging. Chromatin accessibility quantitative trait loci (caQTLs) help identify GWAS loci that may alter GWAS traits by modulating chromatin structure, but caQTLs have been identified in a limited set of human tissues. Here we mapped caQTLs in human liver tissue in 20 liver samples and identified 3,123 caQTLs. The caQTL variants are enriched in liver tissue promoter and enhancer states and frequently disrupt binding motifs of transcription factors expressed in liver. We predicted target genes for 861 caQTL peaks using proximity, chromatin interactions, correlation with promoter accessibility or gene expression, and colocalization with expression QTLs. Using GWAS signals for 19 liver function and/or cardiometabolic traits, we identified 110 colocalized caQTLs and GWAS signals, 56 of which contained a predicted caPeak target gene. At the LITAF LDL-cholesterol GWAS locus, we validated that a caQTL variant showed allelic differences in protein binding and transcriptional activity. These caQTLs contribute to the epigenomic characterization of human liver and help identify molecular mechanisms and genes at GWAS loci.
... Haplotype phasing is the process of determining the sequences of genetic variants that cooccur along an intact maternal or paternal homologous chromosome [3,4]. Haplotype information is crucial for performing linkage analysis, association studies, population, and clinical genetic studies and also for allele specific impacts on gene expression [3,5]. Haplotype information are also critical for identifying heterozygous structural variants (SVs) [6]. ...
Article
Full-text available
Until recently, genome-scale phasing was limited due to the short read sizes of sequence data. Though the use of long-read sequencing can overcome this limitation, they require extensive error correction. The emergence of technologies such as 10X genomics linked read sequencing and Hi-C which uses short-read sequencers along with library preparation protocols that facilitates long-read assemblies have greatly reduced the complexities of genome scale phasing. Moreover, it is possible to accurately assemble phased genome of individual samples using these methods. Therefore, in this study, we compared three phasing strategies which included two sample preparation methods along with the Long Ranger pipeline of 10X genomics and HapCut2 software, namely 10X-LG, 10X-HapCut2, and HiC-HapCut2 and assessed their performance and accuracy. We found that the 10X-LG had the best phasing performance amongst the method analyzed. They had the highest phasing rate (89.6%), longest adjusted N50 (1.24 Mb), and lowest switch error rate (0.07%). Moreover, the phasing accuracy and yield of the 10X-LG stayed over 90% for distances up to 4 Mb and 550 Kb respectively, which were considerably higher than 10X-HapCut2 and Hi-C Hapcut2. The results of this study will serve as a good reference for future benchmarking studies and also for reference-based imputation in Hanwoo.
... Furthermore, the allele-specificity of DAP associations is not considered by our analysis. Few allele-specific analyses have been conducted on a large number of DAPs in the same cell line or tissue, but some evidence exists that DAPs may favor a single allele in the context of allelic sequence variation (Ramaker et al. 2017;Reddy et al. 2012). ...
... Predicted mutation effects were determined using the lsgkm analysis suite in a manner previously described (Lee 2016;Ramaker et al. 2017). Briefly, genome sequence was obtained in FASTA format for each HepG2 DAP narrow peak using the bedtools getfasta command. ...
Article
Full-text available
DNA associated proteins (DAPs) regulate gene expression by binding to regulatory loci such as enhancers or promoters. An understanding of how DAPs cooperate at regulatory loci is essential to deciphering how these regions contribute to normal development and disease. In this study, we aggregated publicly available ChIP-seq data from 469 human DNA-associated proteins assayed in three cell lines and integrated these data with an orthogonal dataset of 352 non-redundant, in vitro-derived motifs mapped to the genome within DNase hypersensitivity footprints in an effort to characterize regions of the genome that have exceptionally high numbers of DAP associations. We subsequently performed a massively parallel mutagenesis assay to discover the key sequence elements driving transcriptional activity at these loci and explored plausible biological mechanisms underlying their formation. We establish a generalizable definition for High Occupancy Target (HOT) loci and identify putative driver DAP motifs, including HNF4A, SP1, SP5, and ETV4, that are highly prevalent and exhibit sequence conservation at HOT loci. We also found the number of DAP associations is positively associated with evidence of regulatory activity and, by systematically mutating 245 HOT loci in our massively parallel reporter assay, localize regulatory activity in these loci to a central core region that is dependent on the motif sequences of our previously nominated driver DAPs. In sum, our work leverages the increasingly large number of DAP motif and ChIP-seq data publicly available to explore how DAP associations contribute to genome-wide transcriptional regulation.
... Furthermore, the allele-specificity of DAP associations is not considered by our analysis. Few allele-specific analyses have been conducted on a large number of DAPs in the same cell line or tissue, but some evidence exists that DAPs may favor a single allele in the context of allelic sequence variation (Ramaker et al. 2017;Reddy et al. 2012). ...
... Predicted mutation effects were determined using the lsgkm analysis suite in a manner previously described (Lee 2016;Ramaker et al. 2017). Briefly, genome sequence was obtained in FASTA format for each HepG2 DAP narrow peak using the bedtools getfasta command. ...
Preprint
Full-text available
DNA associated proteins (DAPs) regulate gene expression by binding to regulatory loci such as enhancers or promoters. An understanding of how DAPs cooperate at regulatory loci is essential to deciphering how these regions contribute to normal development and disease. In this study, we aggregated publicly available ChIP-seq data from 469 human DNA-associated proteins assayed in three cell lines and integrated these data with an orthogonal dataset of 352 non-redundant, in vitro-derived motifs mapped to the genome within DNase hypersensitivity footprints in an effort to characterize regions of the genome that have exceptionally high numbers of DAP associations. We subsequently performed a massively parallel mutagenesis assay to discover the key sequence elements driving transcriptional activity at these loci and explored plausible biological mechanisms underlying their formation. We establish a generalizable definition for High Occupancy Target (HOT) loci and identify putative driver DAP motifs, including HNF4A, SP1, SP5, and ETV4, that are highly prevalent and exhibit sequence conservation at HOT loci. We also found the number of DAP associations is positively associated with evidence of regulatory activity and, by systematically mutating 245 HOT loci in our massively parallel reporter assay, localize regulatory activity in these loci to a central core region that is dependent on the motif sequences of our previously nominated driver DAPs. In sum, our work leverages the increasingly large number of DAP motif and ChIP-seq data publicly available to explore how DAP associations contribute to genome-wide transcriptional regulation.
... When analyzing data from rare disease cohorts, knowing if potentially pathogenic variants are in cis or trans is necessary for interpreting clinical impact. In addition, haplotype information is necessary for understanding allele-specific impacts on gene expression (Ramaker et al. 2017). ...
Article
Full-text available
Large-scale population analyses coupled with advances in technology have demonstrated that the human genome is more diverse than originally thought. To date, this diversity has largely been uncovered using short-read whole-genome sequencing. However, these short-read approaches fail to give a complete picture of a genome. They struggle to identify structural events, cannot access repetitive regions, and fail to resolve the human genome into haplotypes. Here, we describe an approach that retains long range information while maintaining the advantages of short reads. Starting from ∼1 ng of high molecular weight DNA, we produce barcoded short-read libraries. Novel informatic approaches allow for the barcoded short reads to be associated with their original long molecules producing a novel data type known as "Linked-Reads". This approach allows for simultaneous detection of small and large variants from a single library. In this manuscript, we show the advantages of Linked-Reads over standard short-read approaches for reference-based analysis. Linked-Reads allow mapping to 38 Mb of sequence not accessible to short reads, adding sequence in 423 difficult-to-sequence genes including disease-relevant genes STRC, SMN1, and SMN2 Both Linked-Read whole-genome and whole-exome sequencing identify complex structural variations, including balanced events and single exon deletions and duplications. Further, Linked-Reads extend the region of high-confidence calls by 68.9 Mb. The data presented here show that Linked-Reads provide a scalable approach for comprehensive genome analysis that is not possible using short reads alone.
Article
The liver plays a unique role as a metabolic center of the body, and also performs other important functions such as detoxification and immune response. Here, we establish a cell type-resolved healthy human liver proteome including hepatocytes (HCs), hepatic stellate cells (HSCs), Kupffer cells (KCs), and liver sinusoidal endothelial cells (LSECs) by high-resolution mass spectrometry. Overall, we quantify total 8354 proteins for four cell types and over 6000 proteins for each cell type. Analysis of this data set and regulatory pathway reveals the cellular labor division in the human liver follows the pattern that parenchymal cells make the main components of pathways, but nonparenchymal cells trigger these pathways. Human liver cells show some novel molecular features: HCs maintain KCs and LSECs homeostasis by producing cholesterol and ketone bodies; HSCs participate in xenobiotics metabolism as an agent deliverer; KCs and LSECs mediate immune response through MHC class II−TLRs and MHC class I−TGFβ cascade, respectively; and KCs play a central role in diurnal rhythms regulation through sensing diurnal IGF and temperature flux. Together, this work expands our understandings of liver physiology and provides a useful resource for future analyses of normal and diseased livers.