Article

Mapping the proteo-genomic convergence of human diseases

Authors:
  • Berlin Institute of Health (BIH) at Charité – Universitätsmedizin Berlin
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Detangling gene-disease connections Many diseases are at least partially due to genetic causes that are not always understood or targetable with specific treatments. To provide insight into the biology of various human diseases as well as potential leads for therapeutic development, Pietzner et al . undertook detailed, genome-wide proteogenomic mapping. The authors analyzed thousands of connections between potential disease-associated mutations, specific proteins, and medical conditions, thereby providing a detailed map for use by future researchers. They also supplied some examples in which they applied their approach to medical contexts as varied as connective tissue disorders, gallstones, and COVID-19 infections, sometimes even identifying single genes that play roles in multiple clinical scenarios. —YN

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... With the development of aptamer-based and immunoassay-based platforms, including SomaScan and Olink, for more than ~ 1000 to 7000 proteins, largescale GWAS datasets for the plasma proteome involving large-scale samples, such as studies on 35,559 Icelanders, 10,708 Fenland, and 54,306 UK Biobank participants, have been released [8,[16][17][18]. These protein biomarkers could be well matched with coding genes in available GWAS datasets for transcriptomes, such as eQTLGen and GTEx [19,20]. ...
... This GWAS replicated 83% of reported pQTLs in the INTERVAL study (based on SomaScan) and 64% of the pQTLs from the SCALLOP consortium (based on Olink) [16]. (2) A total of 10,674 genetic associations (P < 1.004 × 10 −11 ) for 3892 plasma proteins were identified in 10,708 European-descent participants from Fenland using the SomaScan platform [17]. Conditional analysis was also utilized to detect sentinel (n = 8328) and secondary signals (n = 2346) for each genomic region identified by distance-based clumping with GCTA [17]. ...
... (2) A total of 10,674 genetic associations (P < 1.004 × 10 −11 ) for 3892 plasma proteins were identified in 10,708 European-descent participants from Fenland using the SomaScan platform [17]. Conditional analysis was also utilized to detect sentinel (n = 8328) and secondary signals (n = 2346) for each genomic region identified by distance-based clumping with GCTA [17]. This GWAS replicated 61% of pQTLs using the Olink technique, with a higher proportion for cis-pQTLs (81.2%) [17]. ...
Article
Full-text available
Background Chronic kidney disease (CKD) is a progressive disease for which there is no effective cure. We aimed to identify potential drug targets for CKD and kidney function by integrating plasma proteome and transcriptome. Methods We designed a comprehensive analysis pipeline involving two-sample Mendelian randomization (MR) (for proteins), summary-based MR (SMR) (for mRNA), and colocalization (for coding genes) to identify potential multi-omics biomarkers for CKD and combined the protein–protein interaction, Gene Ontology (GO), and single-cell annotation to explore the potential biological roles. The outcomes included CKD, extensive kidney function phenotypes, and different CKD clinical types (IgA nephropathy, chronic glomerulonephritis, chronic tubulointerstitial nephritis, membranous nephropathy, nephrotic syndrome, and diabetic nephropathy). Results Leveraging pQTLs of 3032 proteins from 3 large-scale GWASs and corresponding blood- and tissue-specific eQTLs, we identified 32 proteins associated with CKD, which were validated across diverse CKD datasets, kidney function indicators, and clinical types. Notably, 12 proteins with prior MR support, including fibroblast growth factor 5 (FGF5), isopentenyl-diphosphate delta-isomerase 2 (IDI2), inhibin beta C chain (INHBC), butyrophilin subfamily 3 member A2 (BTN3A2), BTN3A3, uromodulin (UMOD), complement component 4A (C4a), C4b, centrosomal protein of 170 kDa (CEP170), serologically defined colon cancer antigen 8 (SDCCAG8), MHC class I polypeptide-related sequence B (MICB), and liver-expressed antimicrobial peptide 2 (LEAP2), were confirmed. To our knowledge, 20 novel causal proteins have not been previously reported. Five novel proteins, namely, GCKR (OR 1.17, 95% CI 1.10–1.24), IGFBP-5 (OR 0.43, 95% CI 0.29–0.62), sRAGE (OR 1.14, 95% CI 1.07–1.22), GNPTG (OR 0.90, 95% CI 0.86–0.95), and YOD1 (OR 1.39, 95% CI 1.18–1.64,) passed the MR, SMR, and colocalization analysis. The other 15 proteins were also candidate targets (GATM, AIF1L, DQA2, PFKFB2, NFATC1, activin AC, Apo A-IV, MFAP4, DJC10, C2CD2L, TCEA2, HLA-E, PLD3, AIF1, and GMPR1). These proteins interact with each other, and their coding genes were mainly enrichment in immunity-related pathways or presented specificity across tissues, kidney-related tissue cells, and kidney single cells. Conclusions Our integrated analysis of plasma proteome and transcriptome data identifies 32 potential therapeutic targets for CKD, kidney function, and specific CKD clinical types, offering potential targets for the development of novel immunotherapies, combination therapies, or targeted interventions.
... The genetic variations of plasma proteins were selected by compiling results from seven large-scale proteomic studies (Pietzner et al. [13], 4,775 proteins; Ferkingstad et al. [14], 4,719 proteins; Sun_1 et al. [15], 2,995 proteins; Sun_2 et al. [16], 1,463 proteins; Suhre et al. [17], 1,124 proteins; Folkersen et al. [18], 90 proteins; Yao et al. [19], 71 proteins). out of which three studies lacked complete protein summary data (Pietzner et al. [13], Sun_2 et al. [16], Yao et al. [19]). ...
... The genetic variations of plasma proteins were selected by compiling results from seven large-scale proteomic studies (Pietzner et al. [13], 4,775 proteins; Ferkingstad et al. [14], 4,719 proteins; Sun_1 et al. [15], 2,995 proteins; Sun_2 et al. [16], 1,463 proteins; Suhre et al. [17], 1,124 proteins; Folkersen et al. [18], 90 proteins; Yao et al. [19], 71 proteins). out of which three studies lacked complete protein summary data (Pietzner et al. [13], Sun_2 et al. [16], Yao et al. [19]). A detailed description of these studies is provided in Supplementary Table S18. ...
... Because the same platform was employed in all studies, the comprehensive protein dataset obtained from the study by Sun_3 et al. [24] superseded that of the Sun_2 et al. [25] study. Similarly, the complete protein data from the Ferkingstad et al. [14] study was replaced by the protein data from the Pietzner et al. [13] and Sun_1 et al. [15] studies. ...
Article
Full-text available
Background Ischemic heart disease is one of the leading causes of mortality worldwide, and thus calls for development of more effective therapeutic strategies. This study aimed to identify potential therapeutic targets for coronary heart disease (CHD) and myocardial infarction (MI) by investigating the causal relationship between plasma proteins and these conditions. Methods A two-sample Mendelian randomization (MR) study was performed to evaluate more than 1600 plasma proteins for their causal associations with CHD and MI. The MR findings were further confirmed through Bayesian colocalization, Summary-data-based Mendelian Randomization (SMR), and Transcriptome-Wide Association Studies (TWAS) analyses. Further analyses, including enrichment analysis, single-cell analysis, MR analysis of cardiovascular risk factors, phenome-wide Mendelian Randomization (Phe-MR), and protein-protein interaction (PPI) network construction were conducted to verify the roles of selected causal proteins. Results Thirteen proteins were causally associated with CHD, seven of which were also causal for MI. Among them, FES and PCSK9 were causal proteins for both diseases as determined by several analytical methods. PCSK9 was a risk factor of CHD (OR = 1.25, 95% CI: 1.13–1.38, P = 7.47E-06) and MI (OR = 1.36, 95% CI: 1.21–1.54, P = 2.30E-07), whereas FES was protective against CHD (OR = 0.68, 95% CI: 0.59–0.79, P = 6.40E-07) and MI (OR = 0.65, 95% CI: 0.54–0.77, P = 5.38E-07). Further validation through enrichment and single-cell analysis confirmed the causal effects of these proteins. Moreover, MR analysis of cardiovascular risk factors, Phe-MR, and PPI network provided insights into the potential drug development based on the proteins. Conclusions This study investigated the causal pathways associated with CHD and MI, highlighting the protective and risk roles of FES and PCSK9, respectively. FES. Specifically, the results showed that these proteins are promising therapeutic targets for future drug development.
... Proteins are functional products of the genome that provide insight about the normal processes of organisms; in addition, alterations in their levels are indicators of changes in disease status 3 . Recent technological advancements, including the development of multiplex immunoassays and aptamer assays, have provided opportunities for the measurement of thousands of plasma-and serum-based protein levels [4][5][6][7][8] . ...
... The genetic backgrounds of protein levels are uncovered through the linking of these levels to genetic variability via protein quantitative trait locus (pQTL) analysis. Many recent pQTL studies have been largescale [4][5][6][7][8] , with the largest of them including 54,306 individuals from the UK Biobank 9 . Their primary focus has been the identification of common [minor allele frequency (MAF) > 0.01] variants affecting inter-individual protein variability, but Sun et al. 9 reported that approximately 5.6% (570/10,248) and 1.5% (155/10,248) of the variants with primary associations had MAFs < 0.01 and < 0.005, respectively. ...
... To provide a comparison with previous research, we compared our results with previously published data. From the Pietzner et al. study 7 , 147 pQTLs (52.88%) were nominally significant (P < 0.05) and accessible for comparisons. After correcting for multiple testing, 147 pQTLs remained significant (Benjamini-Hochberg FDR < 0.05) and 91.84% (135/147) of pQTLs were directionally concordant with the current study (Supplementary Table S2 www.nature.com/scientificreports/ ...
Article
Full-text available
The proteome holds great potential as an intermediate layer between the genome and phenome. Previous protein quantitative trait locus studies have focused mainly on describing the effects of common genetic variations on the proteome. Here, we assessed the impact of the common and rare genetic variations as well as the copy number variants (CNVs) on 326 plasma proteins measured in up to 500 individuals. We identified 184 cis and 94 trans signals for 157 protein traits, which were further fine-mapped to credible sets for 101 cis and 87 trans signals for 151 proteins. Rare genetic variation contributed to the levels of 7 proteins, with 5 cis and 14 trans associations. CNVs were associated with the levels of 11 proteins (7 cis and 5 trans), examples including a 3q12.1 deletion acting as a hub for multiple trans associations; and a CNV overlapping NAIP, a sensor component of the NAIP-NLRC4 inflammasome which is affecting pro-inflammatory cytokine interleukin 18 levels. In summary, this work presents a comprehensive resource of genetic variation affecting the plasma protein levels and provides the interpretation of identified effects.
... Summaries of the study are listed in Table 1. The Fenland Study consisted of 10,708 genotyped participants of European ancestry who were recruited from general practice surgeries in the Cambridgeshire region of the UK from 2005 to 2015 [44]. Genotyping was conducted using three different arrays (Affymetrix UK Biobank Axiom array [Affymetrix, Santa Clara, CA, USA], Illumina Infinium Core Exome 24v1 [Illumina, San Diego, CA, USA], and Affymetrix SNP5.0 [Affymetrix, Santa Clara, CA, USA]), and levels for each protein target were measured using the rank-based inverse normal-transformed aptamer abundance method [44]. ...
... The Fenland Study consisted of 10,708 genotyped participants of European ancestry who were recruited from general practice surgeries in the Cambridgeshire region of the UK from 2005 to 2015 [44]. Genotyping was conducted using three different arrays (Affymetrix UK Biobank Axiom array [Affymetrix, Santa Clara, CA, USA], Illumina Infinium Core Exome 24v1 [Illumina, San Diego, CA, USA], and Affymetrix SNP5.0 [Affymetrix, Santa Clara, CA, USA]), and levels for each protein target were measured using the rank-based inverse normal-transformed aptamer abundance method [44]. GWAS analysis was then performed using the transformed protein levels, with the residuals used as input for the genetic association analyses [45]. ...
... GWAS analysis was then performed using the transformed protein levels, with the residuals used as input for the genetic association analyses [45]. The beta coefficients for each protein target, representing one standard deviation (SD) change in normalized plasma abundance of protein per effect allele of the SNPs, were estimated, adjusting for age, sex, sample collection site, and the first ten principal components [44]. Our study selected SNPs associated with LPH levels at the genome-wide significant threshold of p < 5 × 10 −8 [46]. ...
Article
Full-text available
Previous research has found that milk is associated with a decreased risk of colorectal cancer (CRC). However, it is unclear whether the milk digestion by the enzyme lactase-phlorizin hydrolase (LPH) plays a role in CRC susceptibility. Our study aims to investigate the direct causal relationship of CRC risk with LPH levels by applying a two-sample Mendelian Randomization (MR) strategy. Genetic instruments for LPH were derived from the Fenland Study, and CRC-associated summary statistics for these instruments were extracted from the FinnGen Study, PLCO Atlas Project, and Pan-UK Biobank. Primary MR analyses focused on a cis-variant (rs4988235) for LPH levels, with results integrated via meta-analysis. MR analyses using all variants were also undertaken. This analytical approach was further extended to assess CRC subtypes (colon and rectal). Meta-analysis across the three datasets illustrated an inverse association between genetically predicted LPH levels and CRC risk (OR: 0.92 [95% CI, 0.89–0.95]). Subtype analyses revealed associations of elevated LPH levels with reduced risks for both colon (OR: 0.92 [95% CI, 0.89–0.96]) and rectal cancer (OR: 0.92 [95% CI, 0.87, 0.98]). Consistency was observed across varied analytical methods and datasets. Further exploration is warranted to unveil the underlying mechanisms and validate LPH’s potential role in CRC prevention.
... Circulating adiponectin has been shown to exhibit substantial heritability ranging from 30% to 70% [32][33][34][35][36][37]. Genome and exome-wide association scans have provided evidence for more than 30 loci robustly associated with plasma adiponectin levels [35,36,[38][39][40][41][42][43][44][45][46][47][48][49][50][51]. The top prioritized gene in Caucasians is the adiponectin-encoding gene ADIPOQ, located on chromosome 3q27.3 ...
... and spanning 15.8 kb. ADIPOQ is composed of three exons and harbors numerous single nucleotide polymorphisms (SNPs) that have been extensively studied in different populations [52,53], with several of these variants showing associations with alterations in adiponectin levels and metabolic syndrome-related phenotypes [33,[35][36][37][38][39][40][41][42][43][44][45][46][47][48][49]51,[54][55][56][57][58][59][60][61][62][63][64]. The intricate effects of adiponectin in the nervous system have raised the hypothesis that functional variants in ADIPOQ could be involved in susceptibility to LOAD. ...
... Additional SNPs in our study, located in the upstream region, promoter, or intron 1, showed a clear trend of adiponectin levels increasing (rs822387, rs17300539) or decreasing (rs266729, rs182052, rs822393) with each copy of the minor allele. Despite not quite reaching the significance threshold, these effects are consistent in magnitude and direction with those reported for the five variants (or their proxies) in several candidate genes and GWA studies on circulating adiponectin [33,[36][37][38][40][41][42]44,[46][47][48][49][50][51][54][55][56][57][59][60][61][62]64,[71][72][73][74][75][91][92][93][94][95][96][97][98][99][100][101][102]. On the other hand, we failed to replicate previous, albeit often inconsistent, findings in the literature for several variants in haploblocks 3 and 4, particularly rs17366568 in intron 1 [35,36,45,47,51,57,59,60,76], rs2241766 in exon 2 [33,40,55,60,64,89,98,99,103], rs1501299, rs3821799, rs3774261 in intron 2 [33,[35][36][37]40,51,54,55,57,59,60,64,71,75,76,[89][90][91]99,100,[103][104][105], and rs1063539 in the 3 ′ -untranslated region (UTR) [36,55]. ...
Article
Full-text available
Adiponectin, a hormone secreted by adipose tissue, plays a complex role in regulating metabolic homeostasis and has also garnered attention for its potential involvement in the pathogenesis of late-onset Alzheimer’s disease (LOAD). The objective of this study was to investigate the association of ADIPOQ variants with plasma adiponectin levels and LOAD risk in subjects from the Slovak Caucasian population. For this purpose, 385 LOAD patients and 533 controls without cognitive impairment were recruited and genotyped for a total of eighteen ADIPOQ single nucleotide polymorphisms (SNPs). Both single-locus and haplotype-based logistic regression analyses were employed to assess the association of SNPs with LOAD risk, while linear regression analysis was used to explore their influence on adiponectin levels in LOAD patients. ADIPOQ variants rs822395 and rs2036373 in intron 1 were found to significantly elevate total adiponectin levels after accounting for several potential confounders. Additional SNPs in the 5′ region and intron 1 exhibited a non-significant trend of association with adiponectin. However, none of the ADIPOQ SNPs showed an association with LOAD risk, neither in the whole-group analysis nor in subgroup analyses after stratification for sex or the APOE ε4 allele, a well-established LOAD risk factor. In summary, while adiponectin has emerged as a potential contributor to the development of LOAD, this study did not unveil any significant involvement of its gene variants in susceptibility to the disease.
... The advent of high-throughput platforms that can measure hundreds to thousands of biomarkers simultaneously using small sample volumes has enabled hypothesis-free discovery analyses, but costs remain prohibitively high. An alternative cost-effective approach, that also limits bias by confounding and reverse causation, is to use robust genetic proxies of blood biomarkers to evaluate their aetiological relevance along the lines of Mendelian randomisation (MR) 14,15 . Using such MR-based approaches facilitates simultaneously querying thousands of markers in relation to the risk of multiple cancers using genome-wide association data, which can identify risk markers and assess their association with one or multiple cancers. ...
... decode.com/summarydata/, and from Pietzner et al.15 at https:// omicscience.org. We obtained summary genetic association data on breast cancer risk from the Breast Cancer Association Consortium (https://bcac.ccge.medschl.cam.ac.uk/), ovarian cancer risk from the Ovarian Cancer Association Consortium (https://ocac.ccge.medschl. ...
Article
Full-text available
Circulating proteins can reveal key pathways to cancer and identify therapeutic targets for cancer prevention. We investigate 2,074 circulating proteins and risk of nine common cancers (bladder, breast, endometrium, head and neck, lung, ovary, pancreas, kidney, and malignant non-melanoma) using cis protein Mendelian randomisation and colocalization. We conduct additional analyses to identify adverse side-effects of altering risk proteins and map cancer risk proteins to drug targets. Here we find 40 proteins associated with common cancers, such as PLAUR and risk of breast cancer [odds ratio per standard deviation increment: 2.27, 1.88-2.74], and with high-mortality cancers, such as CTRB1 and pancreatic cancer [0.79, 0.73-0.85]. We also identify potential adverse effects of protein-altering interventions to reduce cancer risk, such as hypertension. Additionally, we report 18 proteins associated with cancer risk that map to existing drugs and 15 that are not currently under clinical investigation. In sum, we identify protein-cancer links that improve our understanding of cancer aetiology. We also demonstrate that the wider consequence of any protein-altering intervention on well-being and morbidity is required to interpret any utility of proteins as potential future targets for therapeutic prevention.
... MR analyses were conducted using GWAS summary statistics and large-scale pQTL statistics. We obtained pQTL data from published studies by Pietzner et al. [18] and Ferkingstad et al. [19] and collected GWAS statistics from UK Biobank, the FinnGen study, and GWAS Catalog. The present study utilized the single-cell transcriptome sequencing dataset GSE178833 obtained from the GEO database for analysis of AAU in HLA-B27-positive patients. ...
... The plasma pQTL data used in our preliminary analysis were obtained from a previous study by Pietzner et al. [18]. Their study measured protein targets from 10,708 participants of European ancestry using the SomaScan v4 assay and identified a total of 3,323 cis-pQTLs and 7314 trans-pQTLs. ...
Article
Full-text available
Background Patients with spondyloarthritis (SpA)/HLA-B27-associated acute anterior uveitis (AAU) experience recurring acute flares, which pose significant visual and financial challenges. Despite established links between SpA and HLA-B27-associated AAU, the exact mechanism involved remains unclear, and further understanding is needed for effective prevention and treatment. Methods To investigate the acute pathogenesis of SpA/HLA-B27-associated AAU, Mendelian randomization (MR) and single-cell transcriptomic analyses were employed. The MR incorporated publicly available protein quantitative trait locus data from previous studies, along with genome-wide association study data from public databases. Causal relationships between plasma proteins and anterior uveitis were assessed using two-sample MR. Additionally, colocalization analysis was performed using Bayesian colocalization. Single-cell transcriptome analysis utilized the anterior uveitis dataset from the Gene Expression Omnibus (GEO) database. Dimensionality reduction, clustering, transcription factor analysis, pseudotime analysis, and cell communication analysis were subsequently conducted to explore the underlying mechanisms involved. Results Mendelian randomization analysis revealed that circulating levels of AIF1 and VARS were significantly associated with a reduced risk of developing SpA/HLA-B27-associated AAU, with AIF1 showing a robust correlation with anterior uveitis onset. Colocalization analysis supported these findings. Single-cell transcriptome analysis showed predominant AIF1 expression in myeloid cells, which was notably lower in the HLA-B27-positive group. Pseudotime analysis revealed dendritic cell terminal positions in differentiation branches, accompanied by gradual decreases in AIF1 expression. Based on cell communication analysis, CD141⁺CLEC9A⁺ classic dendritic cells (cDCs) and the APP pathway play crucial roles in cellular communication in the Spa/HLA-B27 group. Conclusions AIF1 is essential for the pathogenesis of SpA/HLA-B27-associated AAU. Myeloid cell differentiation into DCs and decreased AIF1 levels are also pivotal in this process.
... This could complement and address limitations of earlier studies of selected candidates, like insulin-like growth factor 1 or adiponectin 3,5 , or small molecule profiling 10 . Further, we 11,12 and others [13][14][15] have recently demonstrated how methodological approaches that integrate human proteomic and genomic data can identify novel pathways to diseases. This uniquely enabled us to combine understanding from proteomic changes driven by a standardized extreme caloric restriction intervention with causal understanding of the links between those same proteins with hundreds of diseases through independent, large-scale genomic studies. ...
... We next systematically tested whether any of the identified cis-pQTLs does also affect complex phenotypes and diseases by performing phenome-wide co-localization screens similar to our previous work 11,12 . Briefly, we first queried cis-pQTLs in the OpenGWAS database (using the R package ieugwas (v0.1.5)) ...
Article
Full-text available
Surviving long periods without food has shaped human evolution. In ancient and modern societies, prolonged fasting was/is practiced by billions of people globally for religious purposes, used to treat diseases such as epilepsy, and recently gained popularity as weight loss intervention, but we still have a very limited understanding of the systemic adaptions in humans to extreme caloric restriction of different durations. Here we show that a 7-day water-only fast leads to an average weight loss of 5.7 kg (±0.8 kg) among 12 volunteers (5 women, 7 men). We demonstrate nine distinct proteomic response profiles, with systemic changes evident only after 3 days of complete calorie restriction based on in-depth characterization of the temporal trajectories of ~3,000 plasma proteins measured before, daily during, and after fasting. The multi-organ response to complete caloric restriction shows distinct effects of fasting duration and weight loss and is remarkably conserved across volunteers with >1,000 significantly responding proteins. The fasting signature is strongly enriched for extracellular matrix proteins from various body sites, demonstrating profound non-metabolic adaptions, including extreme changes in the brain-specific extracellular matrix protein tenascin-R. Using proteogenomic approaches, we estimate the health consequences for 212 proteins that change during fasting across ~500 outcomes and identified putative beneficial (SWAP70 and rheumatoid arthritis or HYOU1 and heart disease), as well as adverse effects. Our results advance our understanding of prolonged fasting in humans beyond a merely energy-centric adaptions towards a systemic response that can inform targeted therapeutic modulation.
... To identify tissues exhibiting a significant genome-wide enrichment, we used LD score regression applied to specifically expressed gene (LDSC-SEG) 71 approach, with eQTL datasets from cross-tissue meta-analysed GTEx eQTL v.7 72 , eQTLGen 73 and Brain-eMeta 74 . The same set of analyses were also applied to a protein quantitative trait locus (pQTL) dataset 75 . Finally, by integrating GWAS summary statistics with data from gene expression, biological pathway, and predicted protein-protein interaction, candidate genes were identified using the gene-level polygenic priority score (PoPS) method 76 . ...
Article
Full-text available
Mosaic loss of the X chromosome (mLOX) is the most common clonal somatic alteration in leukocytes of female individuals1,2, but little is known about its genetic determinants or phenotypic consequences. Here, to address this, we used data from 883,574 female participants across 8 biobanks; 12% of participants exhibited detectable mLOX in approximately 2% of leukocytes. Female participants with mLOX had an increased risk of myeloid and lymphoid leukaemias. Genetic analyses identified 56 common variants associated with mLOX, implicating genes with roles in chromosomal missegregation, cancer predisposition and autoimmune diseases. Exome-sequence analyses identified rare missense variants in FBXO10 that confer a twofold increased risk of mLOX. Only a small fraction of associations was shared with mosaic Y chromosome loss, suggesting that distinct biological processes drive formation and clonal expansion of sex chromosome missegregation. Allelic shift analyses identified X chromosome alleles that are preferentially retained in mLOX, demonstrating variation at many loci under cellular selection. A polygenic score including 44 allelic shift loci correctly inferred the retained X chromosomes in 80.7% of mLOX cases in the top decile. Our results support a model in which germline variants predispose female individuals to acquiring mLOX, with the allelic content of the X chromosome possibly shaping the magnitude of clonal expansion.
... Nevertheless, they are appropriate for the examination of the transcriptome, epigenome, proteome, and metabolome in relevant, accessible tissues, and can potentially lead to candidate DNA variants and genes for further exploration in these experimental contexts and larger-scale studies. Thus, experimental studies on overfeeding or negative energy balance could contribute significantly to identifying the proteo-genomic convergence in obesity research [29]. ...
... Protein quan ta ve trait loci (pQTLs) are central in the drug discovery process as they provide suppor ng evidence for drug target iden fica on and hypothesis genera on on their modes of ac on 2 . Most large-scale GWAS so far have been conducted using affinity proteomics pla orms [3][4][5][6][7][8][9][10][11][12] . The largest studies to date are from deCODE using the SOMAscan pla orm with 4,907 aptamers in 35,559 ...
Preprint
Full-text available
Genome-wide association studies (GWAS) with proteomics are essential tools for drug discovery. To date, most studies have used affinity proteomics platforms, which have limited discovery to protein panels covered by the available affinity binders. Furthermore, it is not clear to which extent protein epitope changing variants interfere with the detection of protein quantitative trait loci (pQTLs). Mass spectrometry-based (MS) proteomics can overcome some of these limitations. Here we report a GWAS using the MS-based Seer Proteograph TM platform with blood samples from a discovery cohort of 1,260 American participants and a replication in 325 individuals from Asia, with diverse ethnic backgrounds. We analysed 1,980 proteins quantified in at least 80% of the samples, out of 5,753 proteins quantified across the discovery cohort. We identified 252 and replicated 90 pQTLs, where 30 of the replicated pQTLs have not been reported before. We further investigated 200 of the strongest associated cis-pQTLs previously identified using the SOMAscan and the Olink platforms and found that up to one third of the affinity proteomics pQTLs may be affected by epitope effects, while another third were confirmed by MS proteomics to be consistent with the hypothesis that genetic variants induce changes in protein expression. The present study demonstrates the complementarity of the different proteomics approaches and reports pQTLs not accessible to affinity proteomics, suggesting that many more pQTLs remain to be discovered using MS-based platforms. Graphical Abstract Summarizing the approach taken to identify potential epitope effects.
... Future biomarker GWAS in non-European cohorts are imperative to facilitate trans-ethnic MR analyses, expected to yield more broadly applicable findings. Third, our study data did not include the latest protein GWASs [65][66][67] and MI GWAS [68]. Therefore, a further and more comprehensive exploration of potential drug targets for MI is still warranted. ...
... We 254 hypothesise that the C allele leads to a higher binding affinity of CTCF, and the binding will 255 repress the expression of GNLY. The A allele has been reported to have an increasing effect 256 on the protein levels 37 and toxoplasma antibody IgG levels 38 (p-value = 9.1 x 10 -4 , not genome-257 wide significant given limited sample size = 557). This may suggest that individuals with an A 258 allele tend to have higher levels of IgG antibodies against the toxoplasma parasite. ...
Preprint
Full-text available
Gene expression levels can vary substantially across cells, even in a seemingly homogeneous cell population. Identifying the relationships between genetic variation and gene expression is critical for understanding the mechanisms of genome regulation. However, the genetic control of gene expression variability among the cells within individuals has yet to be extensively examined. This is primarily due to the statistical challenges, such as the need for sufficiently powered cohorts and adjusting mean-variance dependence. Here, we introduce MEOTIVE (Mapping genetic Effects On inTra-Individual Variability of gene Expression), a novel statistical framework to identify genetic effects on the gene expression variability (sc-veQTL) accounting for the mean-variance dependence. Using single-cell RNA-seq data of 1.2 million peripheral blood mononuclear cells from 980 human donors, we identified 14 - 3,488 genes with significant sc-veQTLs (study-wide q-value < 0.05) across different blood cell types, 2,103 of which were shared across more than one cell type. We further detected 55 SNP-gene pairs (in 34 unique genes) by directly linking genetic variations with gene expression dispersion (sc-deQTL) regardless of mean-variance dependence, and these genes were enriched in biological processes relevant to immune response and viral infection. An example is rs1131017 (p<9.08x10-52), a sc-veQTL in the 5 UTR of RPS26, which shows a ubiquitous dispersion effect across cell types, with higher dispersion levels associated with lower auto-immune disease risk, including rheumatoid arthritis and type 1 diabetes. Another example is LYZ, which is associated with antibacterial activity against bacterial species and was only detected with a monocyte-specific deQTL (rs1384) located at the 3 UTR region (p=1.48x10-11) and replicated in an independent cohort. Our results demonstrate an efficient and robust statistical method to identify genetic effects on gene expression variability and how these associations and their involved pathways confer auto-immune disease risk. This analytical framework provides a new approach to unravelling the genetic regulation of gene expression at the single-cell resolution, advancing our understanding of complex biological processes.
... Several genome-wide signi cant results were notable, including our nding that PTCH1 gene was associated with variation in depression symptoms, as this gene has previously been reported to be associated with depression-related phenotypes, including neuroticism 21,22 , anxiety 23,24 , depression symptoms 21 , feeling emotionally hurt 25 and sensitivity to environmental stress and adversity 25 . C15orf38 gene (also known as ARPIN-AP3S2) was associated with variance in anxiety symptoms in our child samples and has previously been associated with type 2 diabetes in adults 22,26 and corticotropin-releasing factor protein levels 27 , which are involved in regulating anxiety, mood, eating, and in ammation 28 . Hypoglycaemia symptoms in Type 2 diabetes include rapid heartbeat, sweating, and nervousness, all of which are physical sensations associated with anxiety. ...
Preprint
Full-text available
Individual sensitivity to environmental exposures may be genetically influenced. This genotype-by-environment interplay implies differences in phenotypic variance across genotypes. However, environmental sensitivity genetic variants have proven challenging to detect. GWAS of monozygotic twin differences is a family-based variance analysis method, which is more robust to systemic biases that impact population-based methods. We combined data from up to 21,792 monozygotic twins (10,896 pairs) from 11 studies to conduct the largest GWAS meta-analysis of monozygotic phenotypic differences in children and adolescents/adults for seven psychiatric and neurodevelopmental phenotypes: attention deficit hyperactivity disorder (ADHD) symptoms, autistic traits, anxiety and depression symptoms, psychotic-like experiences, neuroticism, and wellbeing. The SNP-heritability of variance in these phenotypes were estimated (h2: 0% to 18%), but were imprecise. We identified a total of 13 genome-wide significant associations (SNP, gene, and gene-set), including genes related to stress-reactivity for depression, growth factor-related genes for autistic traits and catecholamine uptake-related genes for psychotic-like experiences. Monozygotic twins are an important new source of evidence about the genetics of environmental sensitivity.
... Pleotropic variants may account for this finding. A missense variant in SH2B3, is both (a) a 'master regulator' influencing the concentration of over 50 plasma protein (Ferkingstad et al., 2021;Pietzner et al., 2021;Sun et al., 2018), and (b) associated with a range of hematological measurements and disorders (Morris et al., 2021). The active form of vitamin D (1,25OHD) is a potent driver of cellular differentiation (in keeping with other steroid hormones) and in the presence of vitamin D deficiency, the hematological cell lines may be less differentiated, which in turn may explain the decrease in mature cell counts (Medrano et al., 2018). ...
Article
Full-text available
While it is known that vitamin D deficiency is associated with adverse bone outcomes, it remains unclear whether low vitamin D status may increase the risk of a wider range of health outcomes. We had the opportunity to explore the association between common genetic variants associated with both 25 hydroxyvitamin D (25OHD) and the vitamin D binding protein (DBP, encoded by the GC gene) with a comprehensive range of health disorders and laboratory tests in a large academic medical center. We used summary statistics for 25OHD and DBP to generate polygenic scores (PGS) for 66,482 participants with primarily European ancestry and 13,285 participants with primarily African ancestry from the Vanderbilt University Medical Center Biobank (BioVU). We examined the predictive properties of PGS 25OHD , and two scores related to DBP concentration with respect to 1322 health-related phenotypes and 315 laboratory-measured phenotypes from electronic health records. In those with European ancestry: (a) the PGS 25OHD and PGS DBP scores, and individual SNPs rs4588 and rs7041 were associated with both 25OHD concentration and 1,25 dihydroxyvitamin D concentrations; (b) higher PGS 25OHD was associated with decreased concentrations of triglycerides and cholesterol, and reduced risks of vitamin D deficiency, disorders of lipid metabolism, and diabetes. In general, the findings for the African ancestry group were consistent with findings from the European ancestry analyses. Our study confirms the utility of PGS and two key variants within the GC gene (rs4588 and rs7041) to predict the risk of vitamin D deficiency in clinical settings and highlights the shared biology between vitamin D-related genetic pathways a range of health outcomes.
... Genome-wide association studies (GWAS) of protein levels have identi ed genetic variants associated with proteins, referring to as "protein quantitative trait loci (pQTLs)" [9]. pQTLs provide valuable insights into the molecular basis of complex traits and diseases by mediating the relationship between genotype and phenotype [10]. ...
Preprint
Full-text available
Background Epidemiological evidence links inflammation to the etiology and pathophysiology of asthma. To assess the causal relationship between circulating inflammation-related proteins and asthma, we performed a two-sample Mendelian randomization (MR) analysis. Methods Protein quantitative trait locis (pQTLs) were derived from twelve genome-wide association studies (GWASs) cohorts on the circulating inflammation-related proteome. Genetic associations with asthma were obtained from a large-scale GWAS, categorized into childhood-onset asthma (COA) and adult-onset asthma (AOA). Bidirectional MR analysis, Bayesian co-localization, and phenotype scanning were employed to confirm the robustness of MR results. Furthermore, pathway enrichment analysis, protein-protein interaction (PPI) network analysis, and molecule docking were conducted to evaluate the druggability of identified proteins and prioritize potential therapeutic targets. These results were further validated in eQTLGen, GTEx Consortium, and two dependent cohorts. Results Collectively, elevated MMP-1 and decreased levels of three proteins (ADA, CD40L, CST5) were associated with an increased risk of both COA and AOA. CXCL6 had an adverse effect specifically on COA. These associations were validated in sensitivity analyses. Apart from CST5, the other proteins interacted with therapeutic targets of asthma medications. Furthermore, therapeutic targeting of three proteins (ADA, CD40L, MMP1) is currently under evaluation, while CST5 and CXCL6 are considered druggable. Molecular docking showed excellent binding between drugs and proteins (ADA and MMP-1) with available structural data. Conclusions This study identified five circulating inflammatory-related protein biomarkers associated with asthma and provided novel insights into its etiology. Drugs targeting these proteins are expected to facilitate future prioritization of drug targets for asthma.
... PYCR3, on the other hand, lacks 40 amino acids at the C-terminus and is mainly situated in the cytoplasm. Additionally, PYCR3 prefers utilizing NADPH as a cofactor to catalyze the production of proline from ornithine [7]. Recent studies indicate that PYCRs are upregulated in various tumors and are associated with the development of certain cancers. ...
Preprint
Full-text available
Background Pyrroline-5-carboxylate reductase (PYCR) is pivotal in converting pyrroline-5-carboxylate (P5C) to proline, the final step in proline synthesis. Three isoforms, PYCR1, PYCR2, and PYCR3, existed and played significant regulatory roles in tumor initiation and progression. Methods In this study, we firstly assessed molecular and immune characteristics of PYCRs by a pan-cancer analysis, especially focusing on their prognostic relevance. Then, a kidney renal clear cell carcinoma (KIRC)-specific prognostic model was established, incorporating pathomics features to enhance predictive capabilities. The biological functions and regulatory mechanisms of PYCR1 and PYCR2 were investigated by in vitro experiments in renal cancer cells. Results The PYCRs’ expressions were elevated in diverse tumors, correlating with unfavorable clinical outcomes. PYCRs were enriched in cancer signaling pathways, significantly correlating with immune cell infiltration, tumor mutation burden (TMB), and microsatellite instability (MSI). In KIRC, a prognostic model based on PYCR1 and PYCR2 was independently validated statistically. Leveraging features from H&E-stained images, a pathomics feature model reliably predicted patient prognosis. In vitro experiments demonstrated that PYCR1 and PYCR2 enhanced the proliferation and migration of renal carcinoma cells by activating the mTOR pathway, at least in part. Conclusion This study underscores PYCRs' pivotal role in various tumors, positioning them as potential prognostic biomarkers and therapeutic targets, particularly in malignancies like KIRC. The findings emphasize the need for broader exploration of PYCRs' implications in pan-cancer contexts.
... Compared to gene expressions, proteins more directly affect various biological processes, often dysregulated in disease, and are important drug targets. While several recent protein quantitative trait loci (pQTL) studies that identified genetic variants associated with inter-individual protein variability have uncovered intermediate molecular pathways for disease outcomes, they have been restricted to circulating plasma proteins 4,5 . ...
Article
Full-text available
Comprehensive expression quantitative trait loci studies have been instrumental for understanding tissue-specific gene regulation and pinpointing functional genes for disease-associated loci in a tissue-specific manner. Compared to gene expressions, proteins more directly affect various biological processes, often dysregulated in disease, and are important drug targets. We previously performed and identified tissue-specific protein quantitative trait loci in brain, cerebrospinal fluid, and plasma. We now enhance this work by analyzing more proteins (1,300 versus 1,079) and an almost twofold increase in high quality imputed genetic variants (8.4 million versus 4.4 million) by using TOPMed reference panel. We identified 38 genomic regions associated with 43 proteins in brain, 150 regions associated with 247 proteins in cerebrospinal fluid, and 95 regions associated with 145 proteins in plasma. Compared to our previous study, this study newly identified 12 loci in brain, 30 loci in cerebrospinal fluid, and 22 loci in plasma. Our improved genomic atlas uncovers the genetic control of protein regulation across multiple tissues. These resources are accessible through the Online Neurodegenerative Trait Integrative Multi-Omics Explorer for use by the scientific community.
... Such variants have been captured by Karczewski and colleagues (2022) [61], who identified germline rare variants in cell death genes that associate with human traits via whole exome sequencing of large populations. Current data in large-scale biobanks preclude comparative analysis of gene expression/protein abundance associations, however, there is significant sharing of regulatory information between gene and protein-level expression QTLs [62,63], suggesting that our results may extrapolate to protein-level phenomena. Databases for interrogating pathogenic mutations of cell death genes and protein structures/post-translational modifications are described more deeply elsewhere [64]. ...
Article
Full-text available
Cell death mediated by genetically defined signaling pathways influences the health and dynamics of all tissues, however the tissue specificity of cell death pathways and the relationships between these pathways and human disease are not well understood. We analyzed the expression profiles of an array of 44 cell death genes involved in apoptosis, necroptosis, and pyroptosis cell death pathways across 49 human tissues from GTEx, to elucidate the landscape of cell death gene expression across human tissues, and the relationship between tissue-specific genetically determined expression and the human phenome. We uncovered unique cell death gene expression profiles across tissue types, suggesting there are physiologically distinct cell death programs in different tissues. Using summary statistics-based transcriptome wide association studies (TWAS) on human traits in the UK Biobank ( n ~ 500,000), we evaluated 513 traits encompassing ICD-10 defined diagnoses and laboratory-derived traits. Our analysis revealed hundreds of significant (FDR < 0.05) associations between genetically regulated cell death gene expression and an array of human phenotypes encompassing both clinical diagnoses and hematologic parameters, which were independently validated in another large-scale DNA biobank (BioVU) at Vanderbilt University Medical Center ( n = 94,474) with matching phenotypes. Cell death genes were highly enriched for significant associations with blood traits versus non-cell-death genes, with apoptosis-associated genes enriched for leukocyte and platelet traits. Our findings are also concordant with independently published studies (e.g. associations between BCL2L11 /BIM expression and platelet & lymphocyte counts). Overall, these results suggest that cell death genes play distinct roles in their contribution to human phenotypes, and that cell death genes influence a diverse array of human traits.
... Conversely, if a pQTL is situated far from the cognate gene (or on a different chromosome), it is termed a "trans-pQTL," with the assumption that it operates through an intermediate gene (9). Various studies have employed distinct distance cutoffs to distinguish cis-pQTLs from intrachromosomal trans-pQTLs, with common thresholds being 500 kb or 1,000 kb (16,17). In the proteome-wide MR study focused on drug targets, we opted for pQTLs as instrumental variables, applying specific selection criteria. ...
Article
Full-text available
Background Ankylosing Spondylitis (AS) is a chronic inflammatory disorder which can lead to considerable pain and disability. Mendelian randomization (MR) has been extensively applied for repurposing licensed drugs and uncovering new therapeutic targets. Our objective is to pinpoint innovative therapeutic protein targets for AS and assess the potential adverse effects of druggable proteins. Methods We conducted a comprehensive proteome-wide MR study to assess the causal relationships between plasma proteins and the risk of AS. The plasma proteins were sourced from the UK Biobank Pharma Proteomics Project (UKB-PPP) database, encompassing GWAS data for 2,940 plasma proteins. Additionally, GWAS data for AS were extracted from the R9 version of the Finnish database, including 2,860 patients and 270,964 controls. The colocalization analysis was executed to identify shared causal variants between plasma proteins and AS. Finally, we examined the potential adverse effects of druggable proteins for AS therapy by conducting a phenome-wide association study (PheWAS) utilizing the extensive Finnish database in version R9, encompassing 2,272 phenotypes categorized into 46 groups. Results The findings revealed a positive genetic association between the predicted plasma levels of six proteins and an elevated risk of AS, while two proteins exhibited an inverse association with AS risk (P fdr < 0.05). Among these eight plasma proteins, colocalization analysis identified AIF1, TNF, FKBPL, AGER, ALDH5A1, and ACOT13 as shared variation with AS(PPH3+PPH4>0.8), suggesting that they represent potential direct targets for AS intervention. Further phenotype-wide association studies have shown some potential side effects of these six targets (P fdr < 0.05). Conclusion Our investigation examined the causal connections between six plasma proteins and AS, providing a comprehensive understanding of potential therapeutic targets.
... 3.2 × 10 − 11 2.0 × 10 − 8 CADM2: Anxiety factor of neuroticism, general risk tolerance, adventurousness, cannabis use, smoking initiation, drinks per week, problematic alcohol use, and BMI Clifton et al., 2018;Hill et al., 2020;Karlsson Linnér et al., 2019;Pasman et al., 2018;Xu et al., 2020;Zhou et al., 2020). ZNF180: Alzheimer's disease (Gouveia et al., 2022 REG1CP: Regenerating islet-derived protein 3-alpha levels (Pietzner et al., 2021). CTNNA2: see above. ...
... [12][13][14] These methods have also been applied among larger population cohorts, pairing plasma proteomics with genome-wide genotyping among studies including tens of thousands of participants. [15][16][17] These larger studies have provided important insights into the genetic regulation of the plasma proteome, providing a foundation for evaluating the causal effects of plasma proteins on the development and progression of various conditions. However, studies assessing the putative causal relationship between the proteome and the risk of adverse outcomes in human HF are not available. ...
Article
Background Identifying novel molecular drivers of disease progression in heart failure (HF) is a high‐priority goal that may provide new therapeutic targets to improve patient outcomes. The authors investigated the relationship between plasma proteins and adverse outcomes in HF and their putative causal role using Mendelian randomization. Methods and Results The authors measured 4776 plasma proteins among 1964 participants with HF with a reduced left ventricular ejection fraction enrolled in PHFS (Penn Heart Failure Study). Assessed were the observational relationship between plasma proteins and (1) all‐cause death or (2) death or HF‐related hospital admission (DHFA). The authors replicated nominally significant associations in the Washington University HF registry (N=1080). Proteins significantly associated with outcomes were the subject of 2‐sample Mendelian randomization and colocalization analyses. After correction for multiple testing, 243 and 126 proteins were found to be significantly associated with death and DHFA, respectively. These included small ubiquitin–like modifier 2 (standardized hazard ratio [sHR], 1.56; P <0.0001), growth differentiation factor‐15 (sHR, 1.68; P <0.0001) for death, A disintegrin and metalloproteinase with thrombospondin motifs–like protein (sHR, 1.40; P <0.0001), and pulmonary‐associated surfactant protein C (sHR, 1.24; P <0.0001) for DHFA. In pathway analyses, top canonical pathways associated with death and DHFA included fibrotic, inflammatory, and coagulation pathways. Genomic analyses provided evidence of nominally significant associations between levels of 6 genetically predicted proteins with DHFA and 11 genetically predicted proteins with death. Conclusions This study implicates multiple novel proteins in HF and provides preliminary evidence of associations between genetically predicted plasma levels of 17 candidate proteins and the risk for adverse outcomes in human HF.
... We tested a total of 5,699,237 high-quality imputed single nucleotide polymorphisms (SNPs) for associations with the cytokines induced in each stimulation, adjusting for age, sex, technical variables and major immune cell population counts (Supplementary Table 3) and report 44 reponse pQTLs ( Table 1). The Somalogic and Olink databases are the main resources of plasma pQTLs, which have identified pQTLs for some of our tested cytokines at steady state [20][21][22][23][24] . However, to our knowledge, only one study-the 500FG study-tested for response pQTLs for some of our tested cytokines in whole blood 25 . ...
Article
Full-text available
Individuals differ widely in their immune responses, with age, sex and genetic factors having major roles in this inherent variability 1–6 . However, the variables that drive such differences in cytokine secretion—a crucial component of the host response to immune challenges—remain poorly defined. Here we investigated 136 variables and identified smoking, cytomegalovirus latent infection and body mass index as major contributors to variability in cytokine response, with effects of comparable magnitudes with age, sex and genetics. We find that smoking influences both innate and adaptive immune responses. Notably, its effect on innate responses is quickly lost after smoking cessation and is specifically associated with plasma levels of CEACAM6, whereas its effect on adaptive responses persists long after individuals quit smoking and is associated with epigenetic memory. This is supported by the association of the past smoking effect on cytokine responses with DNA methylation at specific signal trans -activators and regulators of metabolism. Our findings identify three novel variables associated with cytokine secretion variability and reveal roles for smoking in the short- and long-term regulation of immune responses. These results have potential clinical implications for the risk of developing infections, cancers or autoimmune diseases.
... The eBFC-based gene network, which covers most genes in the human genome, has been derived through topological semantics extraction from A GPR and A GTR . It is essential to note that gene expression may be regulated through various mechanisms, resulting in one gene being associated with multiple diseases due to distinct regulatory pathways [65][66][67]. In other words, several common pathogenic genes can be identified across different diseases, with differential regulation of these genes being particularly prevalent among brain diseases [18,20]. ...
Article
Full-text available
Background Brain diseases pose a significant threat to human health, and various network-based methods have been proposed for identifying gene biomarkers associated with these diseases. However, the brain is a complex system, and extracting topological semantics from different brain networks is necessary yet challenging to identify pathogenic genes for brain diseases. Results In this study, we present a multi-network representation learning framework called M-GBBD for the identification of gene biomarker in brain diseases. Specifically, we collected multi-omics data to construct eleven networks from different perspectives. M-GBBD extracts the spatial distributions of features from these networks and iteratively optimizes them using Kullback–Leibler divergence to fuse the networks into a common semantic space that represents the gene network for the brain. Subsequently, a graph consisting of both gene and large-scale disease proximity networks learns representations through graph convolution techniques and predicts whether a gene is associated which brain diseases while providing associated scores. Experimental results demonstrate that M-GBBD outperforms several baseline methods. Furthermore, our analysis supported by bioinformatics revealed CAMP as a significantly associated gene with Alzheimer's disease identified by M-GBBD. Conclusion Collectively, M-GBBD provides valuable insights into identifying gene biomarkers for brain diseases and serves as a promising framework for brain networks representation learning.
Article
Full-text available
Pubertal timing varies considerably and is associated with later health outcomes. We performed multi-ancestry genetic analyses on ~800,000 women, identifying 1,080 signals for age at menarche. Collectively, these explained 11% of trait variance in an independent sample. Women at the top and bottom 1% of polygenic risk exhibited ~11 and ~14-fold higher risks of delayed and precocious puberty, respectively. We identified several genes harboring rare loss-of-function variants in ~200,000 women, including variants in ZNF483, which abolished the impact of polygenic risk. Variant-to-gene mapping approaches and mouse gonadotropin-releasing hormone neuron RNA sequencing implicated 665 genes, including an uncharacterized G-protein-coupled receptor, GPR83, which amplified the signaling of MC3R, a key nutritional sensor. Shared signals with menopause timing at genes involved in DNA damage response suggest that the ovarian reserve might signal centrally to trigger puberty. We also highlight body size-dependent and independent mechanisms that potentially link reproductive timing to later life disease.
Preprint
Full-text available
Despite widespread use of drugs targeting traditional cardiovascular risk factors such as lipids and blood pressure, a high burden of coronary heart disease (CHD) remains, hence novel therapeutics are needed for people who harbor residual risk. Using transcriptomic and proteomic data to instrument 15,527 genes or proteins, we conducted systematic cis-Mendelian randomization (MR) and conditional colocalization analyses with a genetic meta-analysis involving nearly 300,000 CHD cases. We identified 567 targets with putative causal relevance to CHD, of which 69 were not identified in previous genetic discovery or MR studies and were the sole causal signal in that genomic region. To aid translation of our findings, we annotated results with up-to-date information on drugs acting on these targets. Our results revealed opportunities for drug repurposing and development prioritization. For example, we provide evidence that cilostazol, a drug that targets PDE3A and is currently used for claudication, could be repurposed for prevention of CHD.
Article
Importance Blood pressure response during acute exercise (exercise blood pressure [EBP]) is associated with the future risk of hypertension and cardiovascular disease (CVD). Biochemical characterization of EBP could inform disease biology and identify novel biomarkers of future hypertension. Objective To identify protein markers associated with EBP and test their association with incident hypertension. Design, Setting, and Participants This study assayed 4977 plasma proteins in 681 healthy participants (from 763 assessed) of the Health, Risk Factors, Exercise Training and Genetics (HERITAGE; data collection from January 1993 to December 1997 and plasma proteomics from January 2019 to January 2020) Family Study at rest who underwent 2 cardiopulmonary exercise tests. Individuals were free of CVD at the time of recruitment. Individuals with resting SBP ≥160 mm Hg or DBP ≥100 mm Hg or taking antihypertensive drug therapy were excluded from the study. The association between resting plasma protein levels to both resting BP and EBP was evaluated. Proteins associated with EBP were analyzed for their association with incident hypertension in the Framingham Heart Study (FHS; n = 1177) and validated in the Jackson Heart Study (JHS; n = 772) and Multi-Ethnic Study of Atherosclerosis (MESA; n = 1367). Proteins associated with incident hypertension were tested for putative causal links in approximately 700 000 individuals using cis-protein quantitative loci mendelian randomization (cis-MR). Data were analyzed from January 2023 to January 2024. Exposures Plasma proteins. Main Outcomes and Measures EBP was defined as the BP response during a fixed workload (50 W) on a cycle ergometer. Hypertension was defined as BP ≥140/90 mm Hg or taking antihypertensive medication. Results Among the 681 participants in the HERITAGE Family Study, the mean (SD) age was 34 (13) years; 366 participants (54%) were female; 238 (35%) were self-reported Black and 443 (65%) were self-reported White. Proteomic profiling of EBP revealed 34 proteins that would not have otherwise been identified through profiling of resting BP alone. Transforming growth factor β receptor 3 (TGFBR3) and prostaglandin D2 synthase (PTGDS) had the strongest association with exercise systolic BP (SBP) and diastolic BP (DBP), respectively (TGFBR3: exercise SBP, β estimate, −3.39; 95% CI, −4.79 to −2.00; P = 2.33 × 10 ⁻⁶ ; PTGDS: exercise DBP β estimate, −2.50; 95% CI, −3.29 to −1.70; P = 1.18 × 10 ⁻⁹ ). In fully adjusted models, TGFBR3 was inversely associated with incident hypertension in FHS, JHS, and MESA (hazard ratio [HR]: FHS, 0.86; 95% CI, 0.75-0.97; P = .01; JHS, 0.87; 95% CI, 0.77-0.97; P = .02; MESA, 0.84; 95% CI, 0.71-0.98; P = .03; pooled cohort, 0.86; 95% CI, 0.79-0.92; P = 6 × 10 ⁻⁵ ). Using cis-MR, genetically predicted levels of TGFBR3 were associated with SBP, hypertension, and CVD events (SBP: β, −0.38; 95% CI, −0.64 to −0.11; P = .006; hypertension: odds ratio [OR], 0.99; 95% CI, 0.98-0.99; P < .001; heart failure with hypertension: OR, 0.86; 95% CI, 0.77-0.97; P = .01; CVD: OR, 0.84; 95% CI, 0.77-0.92; P = 8 × 10 ⁻⁵ ; cerebrovascular events: OR, 0.77; 95% CI, 0.70-0.85; P = 5 × 10 ⁻⁷ ). Conclusions and Relevance Plasma proteomic profiling of EBP identified a novel protein, TGFBR3, which may protect against elevated BP and long-term CVD outcomes.
Preprint
Full-text available
Neurodegenerative pathologies such as Alzheimer’s disease, Parkinson’s disease, Huntington’s disease, Amyotrophic lateral sclerosis, Multiple sclerosis, HIV-associated neurocognitive disorder, and others significantly affect individuals, their families, caregivers, and healthcare systems. While there are no cures yet, researchers worldwide are actively working on the development of novel treatments that have the potential to slow disease progression, alleviate symptoms, and ultimately improve the overall health of patients. Huge volumes of new scientific information necessitate new analytical approaches for meaningful hypothesis generation. To enable the automatic analysis of biomedical data we introduced AGATHA, an effective AI-based literature mining tool that can navigate massive scientific literature databases, such as PubMed. The overarching goal of this effort is to adapt AGATHA for drug repurposing by revealing hidden connections between FDA-approved medications and a health condition of interest. Our tool converts the abstracts of peer-reviewed papers from PubMed into multidimensional space where each gene and health condition are represented by specific metrics. We implemented advanced statistical analysis to reveal distinct clusters of scientific terms within the virtual space created using AGATHA-calculated parameters for selected health conditions and genes. Partial Least Squares Discriminant Analysis was employed for categorizing and predicting samples (122 diseases and 20889 genes) fitted to specific classes. Advanced statistics were employed to build a discrimination model and extract lists of genes specific to each disease class. Here we focus on drugs that can be repurposed for dementia treatment as an outcome of neurodegenerative diseases. Therefore, we determined dementia-associated genes statistically highly ranked in other disease classes. Additionally, we report a mechanism for detecting genes common to multiple health conditions. These sets of genes were classified based on their presence in biological pathways, aiding in selecting candidates and biological processes that are exploitable with drug repurposing. Author Summary This manuscript outlines our project involving the application of AGATHA, an AI-based literature mining tool, to discover drugs with the potential for repurposing in the context of neurocognitive disorders. The primary objective is to identify connections between approved medications and specific health conditions through advanced statistical analysis, including techniques like Partial Least Squares Discriminant Analysis (PLSDA) and unsupervised clustering. The methodology involves grouping scientific terms related to different health conditions and genes, followed by building discrimination models to extract lists of disease-specific genes. These genes are then analyzed through pathway analysis to select candidates for drug repurposing.
Article
Background Anticoagulants are routinely used by millions of patients worldwide to prevent blood clots. Yet, problems with anticoagulant therapy remain, including a persistent and cumulative bleeding risk in patients undergoing prolonged anticoagulation. New safer anticoagulant targets are needed. Methods To prioritize anticoagulant targets with the strongest efficacy (venous thromboembolism [VTE] prevention) and safety (low bleeding risk) profiles, we performed two-sample Mendelian randomization (MR) and genetic colocalization. We leveraged three large-scale plasma protein datasets (deCODE as discovery dataset and Fenland and ARIC as replication datasets) and one liver gene expression dataset (IUCPQ bariatric biobank) to evaluate evidence for a causal effect of 26 coagulation cascade proteins on VTE from a new genome-wide association meta-analysis of 44,232 VTE cases and 847,152 controls, stroke subtypes, bleeding outcomes and parental lifespan as an overall measure of efficacy/safety ratio. Results A 1-SD genetically predicted reduction in F2 blood levels was associated with lower risk of VTE (OR = 0.44, 95% CI = 0.38-0.51, p = 2.6E-28) and cardioembolic stroke risk (OR = 0.55, 95% CI = 0.39-0.76, p = 4.2e-04) but not with bleeding (OR = 1.13, 95% CI = 0.93-1.36, p = 2.2e-01). Genetically predicted F11 reduction was associated with lower risk of VTE (OR = 0.61, 95% CI = 0.58-0.64, p = 4.1e-85) and cardioembolic stroke (OR = 0.77, 95% CI = 0.69-0.86, p = 4.1e-06), but not with bleeding (OR = 1.01, 95% CI = 0.95-1.08, p = 7.5e-01) (Figure 3). These MR associations were concordant across the three blood protein datasets and the hepatic gene expression dataset as well as colocalization analyses. Conclusion These results provide strong genetic evidence that F2 and F11 may represent safe and efficacious therapeutic targets to prevent VTE and cardioembolic strokes without substantially increasing bleeding risk.
Article
Advances in proteomic assay technologies have significantly increased coverage and throughput, enabling recent increases in the number of large-scale population-based proteomic studies of human plasma and serum. Improvements in multiplexed protein assays have facilitated the quantification of thousands of proteins over a large dynamic range, a key requirement for detecting the lowest-ranging, and potentially the most disease-relevant, blood-circulating proteins. In this perspective, we examine how populational proteomic datasets in conjunction with other concurrent omic measures can be leveraged to better understand the genomic and non-genomic correlates of the soluble proteome, constructing biomarker panels for disease prediction, among others. Mass spectrometry workflows are discussed as they are becoming increasingly competitive with affinity-based array platforms in terms of speed, cost, and proteome coverage due to advances in both instrumentation and workflows. Despite much success, there remain considerable challenges such as orthogonal validation and absolute quantification. We also highlight emergent challenges associated with study design, analytical considerations, and data integration as population-scale studies are run in batches and may involve longitudinal samples collated over many years. Lastly, we take a look at the future of what the nascent next-generation proteomic technologies might provide to the analysis of large sets of blood samples, as well as the difficulties in designing large-scale studies that will likely require participation from multiple and complex funding sources and where data sharing, study designs, and financing must be solved.
Article
Full-text available
Lysine acetyltransferase 8, also known as KAT8, is an enzyme involved in epigenetic regulation, primarily recognized for its ability to modulate histone acetylation. This review presents an overview of KAT8, emphasizing its biological functions, which impact many cellular processes and range from chromatin remodeling to genetic and epigenetic regulation. In many model systems, KAT8’s acetylation of histone H4 lysine 16 (H4K16) is critical for chromatin structure modification, which influences gene expression, cell proliferation, differentiation, and apoptosis. Furthermore, this review summarizes the observed genetic variability within the KAT8 gene, underscoring the implications of various single nucleotide polymorphisms (SNPs) that affect its functional efficacy and are linked to diverse phenotypic outcomes, ranging from metabolic traits to neurological disorders. Advanced insights into the structural biology of KAT8 reveal its interaction with multiprotein assemblies, such as the male-specific lethal (MSL) and non-specific lethal (NSL) complexes, which regulate a wide range of transcriptional activities and developmental functions. Additionally, this review focuses on KAT8’s roles in cellular homeostasis, stem cell identity, DNA damage repair, and immune response, highlighting its potential as a therapeutic target. The implications of KAT8 in health and disease, as evidenced by recent studies, affirm its importance in cellular physiology and human pathology.
Article
Full-text available
Introduction Insomnia, a common clinical disorder, significantly impacts the physical and mental well-being of patients. Currently, available hypnotic medications are unsatisfactory due to adverse reactions and dependency, necessitating the identification of new drug targets for the treatment of insomnia. Methods In this study, we utilized 734 plasma proteins as genetic instruments obtained from genome-wide association studies to conduct a Mendelian randomization analysis, with insomnia as the outcome variable, to identify potential drug targets for insomnia. Additionally, we validated our results externally using other datasets. Sensitivity analyses entailed reverse Mendelian randomization analysis, Bayesian co-localization analysis, and phenotype scanning. Furthermore, we constructed a protein-protein interaction network to elucidate potential correlations between the identified proteins and existing targets. Results Mendelian randomization analysis indicated that elevated levels of TGFBI (OR = 1.01; 95% CI, 1.01–1.02) and PAM ((OR = 1.01; 95% CI, 1.01–1.02) in plasma are associated with an increased risk of insomnia, with external validation supporting these findings. Moreover, there was no evidence of reverse causality for these two proteins. Co-localization analysis confirmed that PAM (coloc.abf-PPH4 = 0.823) shared the same variant with insomnia, further substantiating its potential role as a therapeutic target. There are interactive relationships between the potential proteins and existing targets of insomnia. Conclusion Overall, our findings suggested that elevated plasma levels of TGFBI and PAM are connected with an increased risk of insomnia and might be promising therapeutic targets, particularly PAM. However, further exploration is necessary to fully understand the underlying mechanisms involved.
Article
Motivation Colocalization analysis is commonly used to assess whether two or more traits share the same genetic signals identified in genome-wide association studies (GWAS), and is important for prioritizing targets for functional follow-up of GWAS results. Existing colocalization methods can have suboptimal performance when there are multiple causal variants in one genomic locus. Results We propose SharePro to extend the COLOC framework for colocalization analysis. SharePro integrates linkage disequilibrium (LD) modeling and colocalization assessment by grouping correlated variants into effect groups. With an efficient variational inference algorithm, posterior colocalization probabilities can be accurately estimated. In simulation studies, SharePro demonstrated increased power with a well-controlled false positive rate at a low computational cost. Compared to existing methods, SharePro provided stronger and more consistent colocalization evidence for known lipid-lowering drug target proteins and their corresponding lipid traits. Through an additional challenging case of the colocalization analysis of the circulating abundance of R-spondin 3 GWAS and estimated bone mineral density GWAS, we demonstrated the utility of SharePro in identifying biologically plausible colocalized signals. Availability and implementation SharePro for colocalization analysis is written in Python and openly available at https://github.com/zhwm/SharePro_coloc.
Preprint
Full-text available
Background: Immune-mediated diseases (IMD) encompass a wide range of autoimmune and inflammatory disorders with aetiology related to immune system dysfunction, signifying a disease area with great potential for drug repurposing. In this study, we employed the genetically informed Mendelian Randomization (MR) method with two distinct exposure types: immune blood cell abundance and protein quantitative trait loci (pQTL) to validate and repurpose 834 drug targets which have been investigated for IMD treatment. Methods: Utilizing two-sample MR, we first established causal relationships between major peripheral immune cell types and 14 IMD. Robust associations, particularly with eosinophils, were confirmed across diseases such as asthma, eczema, sinusitis, and rheumatoid arthritis, revealing 59 high-confidence relationships. Intragenic variants associated with causal immune cell types were then extracted to create instruments for 371 existing IMD drug targets ("intermediate trait" MR). In parallel, we leveraged four large blood plasma protein QTL datasets to obtain complementary instruments for 361 targets ("pQTL" MR). Results: In the intermediate trait MR analysis, we identified 811 gene-IMD associations (p-value <0.05), 169 of which were supported by strong colocalisation evidence (PPH4 > 0.8). In the pQTL MR analysis, we similarly found 841 protein-IMD associations (p-value <0.05), 83 of which were confirmed with colocalization. Comparison with a list of approved drugs indicated low sensitivities across disease outcomes for both exposure types (intermediate trait MR: 0.49 +/- 0.23 SD, pQTL MR: 0.28 +/- 0.12 SD). Conclusions: Drug targets identified in the pQTL and intermediate trait MR analyses show limited overlap (13%), presenting a comprehensive source of drug repurposing opportunities when the two approaches are combined.
Article
Protein level of Histo-Blood Group ABO System Transferase (BGAT) has been reported to be associated with cardiometabolic diseases. But its effect on pregnancy related outcomes still remains unclear. Here we conducted a two-sample Mendelian randomization (MR) study to ascertain the putative causal roles of protein levels of BGAT in pregnancy related outcomes. Cis-acting protein quantitative trait loci (pQTLs) robustly associated with protein level of BGAT (P < 5 ×10⁻⁸) were used as instruments to proxy the BGAT protein level (N = 35,559, data from deCODE), with two additional pQTL datasets from Fenland (N = 10,708) and INTERVAL (N = 3301) used as validation exposures. Ten pregnancy related diseases and complications were selected as outcomes. We observed that a higher protein level of BGAT showed a putative causal effect on venous complications and haemorrhoids in pregnancy (VH) (odds ratio [OR]=1.19, 95% confidence interval [95% CI]=1.12–1.27, colocalization probability=91%), which was validated by using pQTLs from Fenland and INTERVAL. The Mendelian randomization results further showed effects of the BGAT protein on gestational hypertension (GH) (OR=0.97, 95% CI=0.96–0.99), despite little colocalization evidence to support it. Sensitivity analyses, including proteome-wide Mendelian randomization of the cis-acting BGAT pQTLs, showed little evidence of horizontal pleiotropy. Correctively, our study prioritised BGAT as a putative causal protein for venous complications and haemorrhoids in pregnancy. Future epidemiology and clinical studies are needed to investigate whether BGAT can be considered as a drug target to prevent adverse pregnancy outcomes.
Article
Background Genetic background of healthy or pathological styles of aging and human lifespan is determined by joint gene interactions. Lucky combinations of antioxidant gene polymorphisms can result in a highly adaptive phenotype, providing a successful way to interact with external triggers. Our purpose was to identify the polygenic markers of survival and longevity in the antioxidant genes among elderly people with physiological and pathological aging. Methods In a 20-year follow-up study of 2350 individuals aged 18–114 years residing in the Volga-Ural region of Russia, sex-adjusted association analyses of MTHFR rs1801133, MSRA rs10098474, PON1 rs662, PON2 rs7493, SOD1 rs2070424, NQO1 rs1131341 and CAT rs1001179 polymorphic loci with longevity were carried out. Survival analysis was subsequently performed using the established single genes and gene–gene combinations as cofactors. Results The PON1 rs662*G allele was defined as the main longevity marker in women (OR = 1.44, p = 3E−04 in the log-additive model; HR = 0.77, p = 1.9E−04 in the Cox–survival model). The polymorphisms in the MTHFR, MSRA, PON2, SOD1, and CAT genes had an additive effect on longevity. A strong protective effect of combined MTHFR rs1801133*C, MSRA rs10098474*T, PON1 rs662*G, and PON2 rs7493*C alleles against mortality was obtained in women (HR = 0.81, p = 5E−03). The PON1 rs662*A allele had a meaningful impact on mortality for both long-lived men with cerebrovascular accidents (HR = 1.76, p = 0.027 for the PON1 rs662*AG genotype) and women with cardiovascular diseases (HR = 1.43, p = 0.002 for PON1 rs662*AA genotype). The MTHFR rs1801133*TT (HR = 1.91, p = 0.036), CAT rs1001179*TT (HR = 2.83, p = 0.031) and SOD1 rs2070424*AG (HR = 1.58, p = 0.018) genotypes were associated with the cancer mortality. Conclusion In our longitudinal 20-year study, we found the combinations of functional polymorphisms in antioxidant genes involved in longevity and survival in certain clinical phenotypes in the advanced age.
Article
Background The severity of chronic histopathologic lesions on kidney biopsy is independently associated with higher risk of progressive chronic kidney disease (CKD). Because kidney biopsies are invasive, identification of blood markers that report on underlying kidney histopathology has the potential to enhance CKD care. Methods We examined the association between 6592 plasma protein levels measured by aptamers and the severity of interstitial fibrosis and tubular atrophy (IFTA), glomerulosclerosis, arteriolar sclerosis, and arterial sclerosis among 434 participants of the Boston Kidney Biopsy Cohort. For proteins significantly associated with at least one histologic lesion, we assessed renal arteriovenous protein gradients among 21 individuals who had undergone invasive catheterization and assessed the expression of the cognate gene among 47 individuals with single cell RNA sequencing data in the Kidney Precision Medicine Project. Results In models adjusted for estimated glomerular filtration rate (eGFR), proteinuria, and demographic factors, we identified 35 proteins associated with one or more chronic histologic lesions, including 20 specific for IFTA, 8 specific for glomerulosclerosis, and 1 specific for arteriolar sclerosis. In general, higher levels of these proteins were associated with more severe histologic score and lower eGFR. Exceptions included testican-2 and NELL1, which were associated with less glomerulosclerosis and IFTA, respectively, and higher eGFR; notably, both of these proteins demonstrated significantly higher levels from artery to renal vein, demonstrating net kidney release. In the Kidney Precision Medicine Project, 13 of the 35 protein hits had cognate gene expression enriched in one or more cell types in the kidney, including podocyte expression of select glomerulosclerosis markers (including testican-2) and tubular expression of several IFTA markers (including NELL1). Conclusions Proteomic analysis identified circulating proteins associated with chronic histopathologic lesions, some of which have concordant site-specific expression within the kidney.
Preprint
Full-text available
While clozapine is the most effective antipsychotic drug, its use is limited due to hematological side effects involving reduction of granulocyte counts with potential life-threatening agranulocytosis. It is not yet possible to predict or prevent the risk of agranulocytosis, and the mechanisms are unknown, but likely related to clozapine metabolism. Genome-wide association studies (GWASs) of clozapine metabolism and clozapine-induced agranulocytosis have identified few genetic loci. We used the largest available GWAS summary statistics of clozapine metabolism (clozapine-to-norclozapine ratio) and clozapine-induced agranulocytosis, applying the conditional false discovery rate (condFDR) method to increase power for genetic discovery by conditioning on granulocyte counts variants. To investigate potential causal effects of shared loci, we performed Mendelian Randomization analyses. After conditioning on granulocyte counts, we identified two novel loci associated with clozapine-to-norclozapine ratio. These loci were associated with clozapine metabolism in a validation sample of 392 clozapine-treated individuals. For clozapine-induced agranulocytosis, five loci were identified after conditioning on granulocyte counts. Genetic liability to slow clozapine metabolism (high clozapine-to-norclozapine ratio) showed evidence of a causal effect on reduced neutrophil counts, and genetic liability to low neutrophil counts exhibited weak evidence of a causal effect on clozapine-induced agranulocytosis. Our findings of shared genetic variants associated with clozapine metabolism and granulocyte counts may form the basis for developing prediction models for clozapine-induced agranulocytosis.
Article
Full-text available
Calcific aortic valve stenosis (CAVS) is characterized by increasing inflammation and progressive calcification in the aortic valve leaflets and is a major cause of death in the aging population. This study aimed to identify the inflammatory proteins involved in CAVS and provide potential therapeutic targets. We investigated the observational and causal associations of 92 inflammatory proteins, which were measured using affinity-based proteomic assays. Firstly, the case–control cohort identified differential proteins associated with the occurrence and progression of CAVS. Subsequently, we delved into exploring the causal impacts of these associated proteins through Mendelian randomization. This involved utilizing genetic instruments derived from cis-protein quantitative loci identified in genome-wide association studies, encompassing a cohort of over 400,000 individuals. Finally, we investigated the gene transcription and protein expression levels of inflammatory proteins by single-cell and immunohistochemistry analysis. Multivariate logistic regression and spearman's correlation analysis showed that five proteins showed a significant positive correlation with disease severity. Mendelian randomization showed that elevated levels of two proteins, namely, matrix metallopeptidase-1 (MMP1) and sirtuin 2 (SIRT2), were associated with an increased risk of CAVS. Immunohistochemistry and single-cell transcriptomes showed that expression levels of MMP1 and SIRT2 at the tissue and cell levels were significantly higher in calcified valves than in non-calcified control valves. These findings indicate that MMP1 and SIRT2 are causally related to CAVS and open up the possibility for identifying novel therapeutic targets.
Preprint
Full-text available
Background Interstitial lung disease (ILD) has exhibited limited overall treatment advancements, with scant exploration into circulating protein biomarkers causally linked to ILD and its subtypes beyond idiopathic pulmonary fibrosis (IPF). Therefore, our study aims to investigate potential drug targets and circulating protein biomarkers for ILD and its subtypes. Methods We utilized the most recent large-scale plasma protein quantitative trait loci (pQTL) data detected from the antibody-based method and ILD and its subtypes’ GWAS data from the updated FinnGen database for Mendelian randomization analysis. To enhance the reliability of causal associations, we conducted external validation and sensitivity analyses, including Bayesian colocalization, bidirectional Mendelian randomization analysis, and phenotype scanning. Results Genetic prediction levels of eight proteins were associated with the risk of ILD or its subtypes. Through a series of sensitivity analyses, three proteins were identified as priority proteins for circulating biomarkers and potential therapeutic targets. Specifically, CDH15(Cadherin-15)increased the risk of ILD༈OR = 1.32, 95%CI 1.16–1.49, P = 1.60×10− 6༉, and LTBR༈Lymphotoxin-beta receptor༉increased the risk of sarcoidosis༈OR = 1.39, 95%CI 1.20–1.61, p = 9.38×10− 6༉, while ADAM15 (A disintegrin and metalloproteinase 15) were protective proteins for ILD༈OR = 0.86, 95% CI 0.81–0.92, P = 1.59×10− 6༉and IPF༈OR = 0.81, 95% CI 0.75–0.89). Moreover, no causal proteins for other ILD subtypes were found. Conclusion This study identified several new circulating protein biomarkers associated with the risk of ILD and its subtypes. It offers a new perspective for future research on the diagnosis and treatment of ILD and its subtypes.
Article
Aging represents a multifaceted process culminating in the deterioration of biological functions. Despite the introduction of numerous anti‐aging strategies, their therapeutic outcomes have often been less than optimal. Consequently, discovering new targets to mitigate aging effects is of critical importance. We applied Mendelian randomization (MR) to identify potential pharmacological targets against aging, drawing upon summary statistics from both the Decode and FinnGen cohorts, with further validation in an additional cohort. To address potential reverse causality, bidirectional MR analysis with Steiger filtering was utilized. Additionally, Bayesian co‐localization and phenotype scanning were implemented to investigate previous associations between genetic variants and traits. Summary‐data‐based Mendelian randomization (SMR) analysis was conducted to assess the impact of genetic variants on aging via their effects on protein expression. Additionally, mediation analysis was orchestrated to uncover potential intermediaries in these associations. Finally, we probed the systemic implications of drug‐target protein expression across diverse indications by MR‐PheWas analysis. Utilizing a Bonferroni‐corrected threshold, our MR examination identified 10 protein‐aging associations. Within this cohort of proteins, MST1, LCT, GMPR2, PSMB4, ECM1, EFEMP1, and ISLR2 appear to exacerbate aging risks, while MAX, B3GNT8, and USP8 may exert protective influences. None of these proteins displayed reverse causality except EFEMP1. Bayesian co‐localization inferred shared variants between aging and proteins such as B3GNT8 (rs11670143), ECM1 (rs61819393), and others listed. Mediator analysis pinpointed 1,5‐anhydroglucitol as a partial intermediary in the influence LCT exhibits on telomere length. Circulating proteins play a pivotal role in influencing the aging process, making them promising candidates for therapeutic intervention. The implications of these proteins in aging warrant further investigation in future clinical research.
Preprint
Full-text available
We used expression quantitative trait loci (eQTLs) and protein quantitative trait loci (pQTLs) to conduct genome-wide Mendelian randomization (MR) using 27,799 cases of heart failure (HF) with reduced ejection fraction (HFrEF), 27,579 cases of HF with preserved ejection fraction (HFpEF), and 367,267 control individuals from the Million Veteran Program (MVP). We identified 70 HFrEF and 10 HFpEF gene-hits, of which 58 are novel. In 14 known loci for unclassified HF, we identified HFrEF as the subtype responsible for the signal. HFrEF hits ZBTB17 , MTSS1 , PDLIM5 , and MLIP and novel HFpEF hits NFATC2IP, and PABPC4 showed robustness to MR assumptions, support from orthogonal sources, compelling evidence on mechanism of action needed for therapeutic efficacy, and no evidence of an unacceptable safety profile. We strengthen the value of pathways such as ubiquitin-proteasome system, small ubiquitin-related modifier pathway, inflammation, and mitochondrial metabolism as potential therapeutic targets for HF management. We identified IL6R , ADM, and EDNRA as suggestive hits for HFrEF and LPA for HFrEF and HFpEF, which enhances the odds of success for existing cardiovascular investigational drugs targeting. These findings confirm the unique value of human genetic studies in HFrEF and HFpEF for discovery of novel targets and generation of therapeutic target profiles needed to initiate new validation programs in HFrEF and HFpEF preclinical models.
Article
Full-text available
Genetic mechanisms of blood pressure (BP) regulation remain poorly defined. Using kidney-specific epigenomic annotations and 3D genome information we generated and validated gene expression prediction models for the purpose of transcriptome-wide association studies in 700 human kidneys. We identified 889 kidney genes associated with BP of which 399 were prioritised as contributors to BP regulation. Imputation of kidney proteome and microRNAome uncovered 97 renal proteins and 11 miRNAs associated with BP. Integration with plasma proteomics and metabolomics illuminated circulating levels of myo-inositol, 4-guanidinobutanoate and angiotensinogen as downstream effectors of several kidney BP genes (SLC5A11, AGMAT, AGT, respectively). We showed that genetically determined reduction in renal expression may mimic the effects of rare loss-of-function variants on kidney mRNA/protein and lead to an increase in BP (e.g., ENPEP). We demonstrated a strong correlation (r = 0.81) in expression of protein-coding genes between cells harvested from urine and the kidney highlighting a diagnostic potential of urinary cell transcriptomics. We uncovered adenylyl cyclase activators as a repurposing opportunity for hypertension and illustrated examples of BP-elevating effects of anticancer drugs (e.g. tubulin polymerisation inhibitors). Collectively, our studies provide new biological insights into genetic regulation of BP with potential to drive clinical translation in hypertension.
Preprint
Full-text available
Early detection of high-risk individuals is crucial for healthcare systems to cope with changing demographics and an ever-increasing patient population. Images of the retinal fundus are a non- invasive, low-cost examination routinely collected and potentially scalable beyond ophthalmology. Prior work demonstrated the potential of retinal images for risk assessment for common cardiometabolic diseases, but it remains unclear whether this potential extends to a broader range of human diseases. Here, we extended a retinal foundation model (RETFound) to systematically explore the predictive potential of retinal images as a low-cost screening strategy for disease onset across >750 incident diseases in >60,000 individuals. For more than a third (n=308) of the diseases, we demonstrated improved discriminative performance compared to readily available patient characteristics. This included 281 diseases outside of ophthalmology, such as type 2 diabetes (Delta C-Index: UK Biobank +0.073 (0.068, 0.079)) or chronic obstructive pulmonary disease (Delta C-Index: UK Biobank +0.047 (0.039, 0.054)), showcasing the potential of retinal images to complement screening strategies more widely. Moreover, we externally validated these findings in 7,248 individuals from the EPIC-Norfolk Eye Study. Notably, retinal information did not improve the prediction for the onset of cardiovascular diseases compared to established primary prevention scores, demonstrating the need for rigorous benchmarking and disease-agnostic efforts to design cost-efficient screening strategies to improve population health. We demonstrated that predictive improvements were attributable to retinal vascularisation patterns and less obvious features, such as eye colour or lens morphology, by extracting image attributions from risk models and performing genome-wide association studies, respectively. Genetic findings further highlighted commonalities between eye-derived risk estimates and complex disorders, including novel loci, such as IMAP1, for iron homeostasis. In conclusion, we present the first comprehensive evaluation of predictive information derived from retinal fundus photographs, illustrating the potential and limitations of easily accessible and low-cost retinal images for risk assessment across common and rare diseases.
Preprint
Robust and reliable proteome measurements provide mechanistic insights in biomedical research. SOMAmer (Slow Off-rate Modified Aptamer) reagents are modified, DNA-based, affinity reagents that measure defined target proteins with reproducibility and accuracy similar to monoclonal antibodies. Applying SOMAmer reagent technology, we developed SomaScan, a clinical proteome profiling platform with capability to measure 7,523 proteoforms for 6,594 human proteins by UniprotID in small sample volumes (e.g., 55μl plasma or serum). We evaluated the platform by profiling the proteome of a panel of well characterized Cell Line Encyclopedia (CCLE) cancer models. Unsupervised machine learning analyses demonstrate the SomaScan assay distinguishing cell lines on the basis of their proteome signatures, and identifying both tissue-specific and oncogenic pathways. The proteome measured by SomaScan correlates with published CCLE transcriptome at a level comparable to other published transcript to proteome studies. Taken together, we demonstrate that the SomaScan platform is a technically reproducible system suitable for biomedical and clinical applications that reliably illuminates underlying biomolecular mechanisms.
Article
Full-text available
The genetic makeup of an individual contributes to susceptibility and response to viral infection. While environmental, clinical and social factors play a role in exposure to SARS-CoV-2 and COVID-19 disease severity1,2, host genetics may also be important. Identifying host-specific genetic factors may reveal biological mechanisms of therapeutic relevance and clarify causal relationships of modifiable environmental risk factors for SARS-CoV-2 infection and outcomes. We formed a global network of researchers to investigate the role of human genetics in SARS-CoV-2 infection and COVID-19 severity. We describe the results of three genome-wide association meta-analyses comprised of up to 49,562 COVID-19 patients from 46 studies across 19 countries. We reported 13 genome-wide significant loci that are associated with SARS-CoV-2 infection or severe manifestations of COVID-19. Several of these loci correspond to previously documented associations to lung or autoimmune and inflammatory diseases3–7. They also represent potentially actionable mechanisms in response to infection. Mendelian Randomization analyses support a causal role for smoking and body mass index for severe COVID-19 although not for type II diabetes. The identification of novel host genetic factors associated with COVID-19, with unprecedented speed, was made possible by the community of human genetic researchers coming together to prioritize sharing of data, results, resources and analytical frameworks. This working model of international collaboration underscores what is possible for future genetic discoveries in emerging pandemics, or indeed for any complex human disease.
Preprint
Full-text available
Discovery of protein quantitative trait loci (pQTLs) has been enabled by affinity-based proteomic techniques and is increasingly used to guide genetically informed drug target evaluation. Large-scale proteomic data are now being created, but systematic, bidirectional assessment of platform differences is lacking, restricting clinical translation. We compared genetic, technical, and phenotypic determinants of 871 protein targets measured using both aptamer- (SomaScan® Platform v4) and antibody-based (Olink) assays in up to 10,708 individuals. Correlations coefficients for overlapping protein targets varied widely (median 0.38, IQR: 0.08-0.64). We found that 64% of pQTLs were shared across both platforms among all identified 608 cis- and 1,315 trans-pQTLs with sufficient power for replication, but with correlations of effect estimates being lower than previously reported (cis: 0.41, trans: 0.34). We identified technical, protein, and variant characteristics that contributed significantly to platform differences and found contradicting phenotypic associations attributable to those. We demonstrate how integrating phenomic and gene expression data improves genetic prioritisation strategies, including platform-specific pQTLs.
Article
Full-text available
To identify circulating proteins influencing Coronavirus Disease 2019 (COVID-19) susceptibility and severity, we undertook a two-sample Mendelian randomization (MR) study, rapidly scanning hundreds of circulating proteins while reducing bias due to reverse causation and confounding. In up to 14,134 cases and 1.2 million controls, we found that an s.d. increase in OAS1 levels was associated with reduced COVID-19 death or ventilation (odds ratio (OR) = 0.54, P = 7 × 10−8), hospitalization (OR = 0.61, P = 8 × 10−8) and susceptibility (OR = 0.78, P = 8 × 10−6). Measuring OAS1 levels in 504 individuals, we found that higher plasma OAS1 levels in a non-infectious state were associated with reduced COVID-19 susceptibility and severity. Further analyses suggested that a Neanderthal isoform of OAS1 in individuals of European ancestry affords this protection. Thus, evidence from MR and a case–control study support a protective role for OAS1 in COVID-19 adverse outcomes. Available pharmacological agents that increase OAS1 levels could be prioritized for drug development. A variant of the OAS1 gene, which encodes an enzyme that is critical for the innate immune response to viral infections, is associated with decreased risk of death in patients with COVID-19.
Article
Full-text available
Genome-wide association studies have discovered numerous genomic loci associated with Alzheimer’s disease (AD); yet the causal genes and variants are incompletely identified. We performed an updated genome-wide AD meta-analysis, which identified 37 risk loci, including new associations near CCDC6, TSPAN14, NCK2 and SPRED2. Using three SNP-level fine-mapping methods, we identified 21 SNPs with >50% probability each of being causally involved in AD risk and others strongly suggested by functional annotation. We followed this with colocalization analyses across 109 gene expression quantitative trait loci datasets and prioritization of genes by using protein interaction networks and tissue-specific expression. Combining this information into a quantitative score, we found that evidence converged on likely causal genes, including the above four genes, and those at previously discovered AD loci, including BIN1, APH1B, PTK2B, PILRA and CASS4.
Article
Full-text available
Genome-wide association studies (GWAS) have identified thousands of genomic regions affecting complex diseases. The next challenge is to elucidate the causal genes and mechanisms involved. One approach is to use statistical colocalization to assess shared genetic aetiology across multiple related traits (e.g. molecular traits, metabolic pathways and complex diseases) to identify causal pathways, prioritize causal variants and evaluate pleiotropy. We propose HyPrColoc (Hypothesis Prioritisation for multi-trait Colocalization), an efficient deterministic Bayesian algorithm using GWAS summary statistics that can detect colocalization across vast numbers of traits simultaneously (e.g. 100 traits can be jointly analysed in around 1 s). We perform a genome-wide multi-trait colocalization analysis of coronary heart disease (CHD) and fourteen related traits, identifying 43 regions in which CHD colocalized with ≥1 trait, including 5 previously unknown CHD loci. Across the 43 loci, we further integrate gene and protein expression quantitative trait loci to identify candidate causal genes.
Article
Full-text available
We integrated ubiquity, mass and lifespan of all major cell types to achieve a comprehensive quantitative description of cellular turnover. We found a total cellular mass turnover of 80 ± 20 grams per day, dominated by blood cells and gut epithelial cells. In terms of cell numbers, close to 90% of the (0.33 ± 0.02) × 1012 cells per day turnover was blood cells. A comprehensive census of the dynamics of death and regeneration of cells and tissues provides an estimation of the distribution of cellular turnover in the human body.
Article
Full-text available
In cross-platform analyses of 174 metabolites, we identify 499 associations (P < 4.9 × 10−10) characterized by pleiotropy, allelic heterogeneity, large and nonlinear effects and enrichment for nonsynonymous variation. We identify a signal at GLP2R (p.Asp470Asn) shared among higher citrulline levels, body mass index, fasting glucose-dependent insulinotropic peptide and type 2 diabetes, with β-arrestin signaling as the underlying mechanism. Genetically higher serine levels are shown to reduce the likelihood (by 95%) and predict development of macular telangiectasia type 2, a rare degenerative retinal disease. Integration of genomic and small molecule data across platforms enables the discovery of regulators of human metabolism and translation into clinical insights. A large-scale genome-wide meta-analysis conducted across different platforms identifies genetic loci regulating levels of circulating metabolites.
Article
Full-text available
Understanding the genetic architecture of host proteins interacting with SARS-CoV-2 or mediating the maladaptive host response to COVID-19 can help to identify new or repurpose existing drugs targeting those proteins. We present a genetic discovery study of 179 such host proteins among 10,708 individuals using an aptamer-based technique. We identify 220 host DNA sequence variants acting in cis (MAF 0.01-49.9%) and explaining 0.3-70.9% of the variance of 97 of these proteins, including 45 with no previously known protein quantitative trait loci (pQTL) and 38 encoding current drug targets. Systematic characterization of pQTLs across the phenome identified protein-drug-disease links and evidence that putative viral interaction partners such as MARK3 affect immune response. Our results accelerate the evaluation and prioritization of new drug development programmes and repurposing of trials to prevent, treat or reduce adverse outcomes. Rapid sharing and detailed interrogation of results is facilitated through an interactive webserver (https://omicscience.org/apps/covidpgwas/).
Article
Full-text available
The Genotype-Tissue Expression (GTEx) project was established to characterize genetic effects on the transcriptome across human tissues and to link these regulatory mechanisms to trait and disease associations. Here, we present analyses of the version 8 data, examining 15,201 RNA-sequencing samples from 49 tissues of 838 postmortem donors. We comprehensively characterize genetic associations for gene expression and splicing in cis and trans, showing that regulatory associations are found for almost all genes, and describe the underlying molecular mechanisms and their contribution to allelic heterogeneity and pleiotropy of complex traits. Leveraging the large diversity of tissues, we provide insights into the tissue specificity of genetic effects and show that cell type composition is a key factor in understanding gene regulatory mechanisms in human tissues. © 2020 American Association for the Advancement of Science. All rights reserved.
Article
Full-text available
The human proteome is a crucial intermediate between complex diseases and their genetic and environmental components, and an important source of drug development targets and biomarkers. Here, we comprehensively assess the genetic architecture of 257 circulating protein biomarkers of cardiometabolic relevance through high-depth (22.5×) whole-genome sequencing (WGS) in 1328 individuals. We discover 131 independent sequence variant associations (P < 7.45 × 10⁻¹¹) across the allele frequency spectrum, all of which replicate in an independent cohort (n = 1605, 18.4x WGS). We identify for the first time replicating evidence for rare-variant cis-acting protein quantitative trait loci for five genes, involving both coding and noncoding variation. We construct and validate polygenic scores that explain up to 45% of protein level variation. We find causal links between protein levels and disease risk, identifying high-value biomarkers and drug development targets.
Article
Full-text available
The Open Targets Platform (https://www.targetvalidation.org/) provides users with a queryable knowledgebase and user interface to aid systematic target identification and prioritisation for drug discovery based upon underlying evidence. It is publicly available and the underlying code is open source. Since our last update two years ago, we have had 10 releases to maintain and continuously improve evidence for target–disease relationships from 20 different data sources. In addition, we have integrated new evidence from key datasets, including prioritised targets identified from genome-wide CRISPR knockout screens in 300 cancer models (Project Score), and GWAS/UK BioBank statistical genetic analysis evidence from the Open Targets Genetics Portal. We have evolved our evidence scoring framework to improve target identification. To aid the prioritisation of targets and inform on the potential impact of modulating a given target, we have added evaluation of post-marketing adverse drug reactions and new curated information on target tractability and safety. We have also developed the user interface and backend technologies to improve performance and usability. In this article, we describe the latest enhancements to the Platform, to address the fundamental challenge that developing effective and safe drugs is difficult and expensive.
Article
Full-text available
Circulating proteins are vital in human health and disease and are frequently used as biomarkers for clinical decision-making or as targets for pharmacological intervention. Here, we map and replicate protein quantitative trait loci (pQTL) for 90 cardiovascular proteins in over 30,000 individuals, resulting in 451 pQTLs for 85 proteins. For each protein, we further perform pathway mapping to obtain trans-pQTL gene and regulatory designations. We substantiate these regulatory findings with orthogonal evidence for trans-pQTLs using mouse knockdown experiments (ABCA1 and TRIB1) and clinical trial results (chemokine receptors CCR2 and CCR5), with consistent regulation. Finally, we evaluate known drug targets, and suggest new target candidates or repositioning opportunities using Mendelian randomization. This identifies 11 proteins with causal evidence of involvement in human disease that have not previously been targeted, including EGF, IL-16, PAPPA, SPON1, F3, ADM, CASP-8, CHI3L1, CXCL16, GDF15 and MMP-12. Taken together, these findings demonstrate the utility of large-scale mapping of the genetics of the proteome and provide a resource for future precision studies of circulating proteins in human health.
Article
Full-text available
Fibulin-3 (also known as EGF-containing fibulin extracellular matrix protein 1 (EFEMP1)) is a secreted extracellular matrix glycoprotein, encoded by the EFEMP1 gene that belongs to the eight-membered fibulin protein family. It has emerged as a functionally unique member of this family, with a diverse array of pathophysiological associations predominantly centered on its role as a modulator of extracellular matrix (ECM) biology. Fibulin-3 is widely expressed in the human body, especially in elastic-fibre-rich tissues and ocular structures, and interacts with enzymatic ECM regulators, including tissue inhibitor of metalloproteinase-3 (TIMP-3). A point mutation in EFEMP1 causes an inherited early-onset form of macular degeneration called Malattia Leventinese/Doyne honeycomb retinal dystrophy (ML/DHRD). EFEMP1 genetic variants have also been associated in genome-wide association studies with numerous complex inherited phenotypes, both physiological (namely, developmental anthropometric traits) and pathological (many of which involve abnormalities of connective tissue function). Furthermore, EFEMP1 expression changes are implicated in the progression of numerous types of cancer, an area in which fibulin-3 has putative significance as a therapeutic target. Here we discuss the potential mechanistic roles of fibulin-3 in these pathologies and highlight how it may contribute to the development, structural integrity, and emergent functionality of the ECM and connective tissues across a range of anatomical locations. Its myriad of aetiological roles positions fibulin-3 as a molecule of interest across numerous research fields and may inform our future understanding and therapeutic approach to many human diseases in clinical settings.
Article
Full-text available
The human proteome is a major source of therapeutic targets. Recent genetic association analyses of the plasma proteome enable systematic evaluation of the causal consequences of variation in plasma protein levels. Here we estimated the effects of 1,002 proteins on 225 phenotypes using two-sample Mendelian randomization (MR) and colocalization. Of 413 associations supported by evidence from MR, 130 (31.5%) were not supported by results of colocalization analyses, suggesting that genetic confounding due to linkage disequilibrium is widespread in naïve phenome-wide association studies of proteins. Combining MR and colocalization evidence in cis-only analyses, we identified 111 putatively causal effects between 65 proteins and 52 disease-related phenotypes (https://www.epigraphdb.org/pqtl/). Evaluation of data from historic drug development programs showed that target-indication pairs with MR and colocalization support were more likely to be approved, evidencing the value of this approach in identifying and prioritizing potential therapeutic targets.
Article
Full-text available
Long-term evidence has confirmed the involvement of an inflammatory component in neurodegenerative disorders including Alzheimer’s disease (AD). This view is supported, in part, by data suggesting that selected non-steroidal anti-inflammatory drugs (NSAIDs) provide protection. Additionally, molecular players of the innate immune system have recently been proposed to contribute to these diseases. Toll-like receptors (TLRs) are transmembrane pattern-recognition receptors of the innate immune system that recognize different pathogen-derived and tissue damage-related ligands. TLR4 mediated signaling has been reported to contribute to the pathogenesis of age-related neurodegenerative diseases, including AD. Although the pathophysiology of AD is not clear, soluble aggregates (oligomers) of the amyloid β peptide (Aβo) have been proven to be key players in the pathology of AD. Among others, Aβo promote Ca²⁺ entry and mitochondrial Ca²⁺ overload leading to cell death in neurons. TLR4 has recently been found to be involved in AD but the mechanisms are unclear. Our group recently reported that lipopolysaccharide (LPS), a TLR4 receptor agonist, increases cytosolic Ca²⁺ concentration leading to apoptosis. Strikingly, this effect was only observed in long-term cultured primary neurons considered a model of aging neurons, but not in short-term cultured neurons resembling young neurons. These effects were significantly prevented by pharmacological blockade of TLR4 receptor signaling. Moreover, TLR4 expression in rat hippocampal neurons increased significantly in aged neurons in vitro. Therefore, molecular patterns associated with infection and/or brain cell damage may activate TLR4 and Ca²⁺ signaling, an effect exacerbated during neuronal aging. Here, we briefly review the data regarding the involvement of TLR4 in AD.
Preprint
Full-text available
Data generated by genome-wide association studies (GWAS) are growing fast with the linkage of biobank samples to health records, and expanding capture of high-dimensional molecular phenotypes. However the utility of these efforts can only be fully realised if their complete results are collected from their heterogeneous sources and formats, harmonised and made programmatically accessible. Here we present the OpenGWAS database, an open source, open access, scalable and high-performance cloud-based data infrastructure that imports and publishes complete GWAS summary datasets and metadata for the scientific community. Our import pipeline harmonises these datasets against dbSNP and the human genome reference sequence, generates summary reports and standardises the format of results and metadata. Users can access the data via a website, an application programming interface, R and Python packages, and also as downloadable files that can be rapidly queried in high performance computing environments. OpenGWAS currently contains 126 billion genetic associations from 14,582 complete GWAS datasets representing a range of different human phenotypes and disease outcomes across different populations. We developed R and Python packages to serve as conduits between these GWAS data sources and a range of available analytical tools, enabling Mendelian randomization, genetic colocalisation analysis, fine mapping, genetic correlation and locus visualisation. OpenGWAS is freely accessible at https://gwas.mrcieu.ac.uk , and has been designed to facilitate integration with third party analytical tools.
Article
Full-text available
R-spondin1 (Rspo1) has been featured as a Wnt agonist, serving as a potent niche factor for stem cells in many tissues. Here we unveil a novel role of Rspo1 in promoting estrogen receptor alpha (Esr1) expression, hence regulating the output of steroid hormone signaling in the mouse mammary gland. This action of Rspo1 relies on the receptor Lgr4 and intracellular cAMP-PKA signaling, yet is independent of Wnt/β-catenin signaling. These mechanisms were reinforced by genetic evidence. Luminal cells-specific knockout of Rspo1 results in decreased Esr1 expression and reduced mammary side branches. In contrast, luminal cells-specific knockout of Wnt4 , while attenuating basal cell Wnt/β-catenin signaling activities, enhances Esr1 expression. Our data reveal a novel Wnt-independent role of Rspo1, in which Rspo1 acts as a bona fide GPCR activator eliciting intracellular cAMP signaling. The identification of Rspo1-ERα signaling axis may have a broad implication in estrogen-associated diseases.
Article
Full-text available
The mitochondrial contact site and cristae junction (CJ) organizing system (MICOS) dynamically regulate mitochondrial membrane architecture. Through systematic proteomic analysis of human MICOS, we identified QIL1 (C19orf70) as a novel conserved MICOS subunit. QIL1 depletion disrupted CJ structure in cultured human cells and in Drosophila muscle and neuronal cells in vivo. In human cells, mitochondrial disruption correlated with impaired respiration. Moreover, increased mitochondrial fragmentation was observed upon QIL1 depletion in flies. Using quantitative proteomics, we show that loss of QIL1 resulted in MICOS disassembly with the accumulation of a MIC60-MIC19-MIC25 sub-complex and degradation of MIC10, MIC26, and MIC27. Additionally, we demonstrated that in QIL1-depleted cells, overexpressed MIC10 fails to significantly restore its interaction with other MICOS subunits and SAMM50. Collectively, our work uncovers a previously unrecognized subunit of the MICOS complex, necessary for CJ integrity, cristae morphology, and mitochondrial function and provides a resource for further analysis of MICOS architecture.
Article
Full-text available
The timing of puberty is highly variable and is associated with long-term health outcomes. To date, understanding of the genetic control of puberty timing is based largely on studies in women. Here, we report a multi-trait genome-wide association study for male puberty timing with an effective sample size of 205,354 men. We find moderately strong genomic correlation in puberty timing between sexes (rg = 0.68) and identify 76 independent signals for male puberty timing. Implicated mechanisms include an unexpected link between puberty timing and natural hair colour, possibly reflecting common effects of pituitary hormones on puberty and pigmentation. Earlier male puberty timing is genetically correlated with several adverse health outcomes and Mendelian randomization analyses show a genetic association between male puberty timing and shorter lifespan. These findings highlight the relationships between puberty timing and health outcomes, and demonstrate the value of genetic studies of puberty timing in both sexes.
Article
Full-text available
Background: Physical activity (PA) plays a role in the prevention of a range of diseases including obesity and cardiometabolic disorders. Large population-based descriptive studies of PA, incorporating precise measurement, are needed to understand the relative burden of insufficient PA levels and to inform the tailoring of interventions. Combined heart and movement sensing enables the study of physical activity energy expenditure (PAEE) and intensity distribution. We aimed to describe the sociodemographic correlates of PAEE and moderate-to-vigorous physical activity (MVPA) in UK adults. Methods: The Fenland study is a population-based cohort study of 12,435 adults aged 29-64 years-old in Cambridgeshire, UK. Following individual calibration (treadmill), participants wore a combined heart rate and movement sensor continuously for 6 days in free-living, from which we derived PAEE (kJ•day- 1•kg- 1) and time in MVPA (> 3 & > 4 METs) in bouts greater than 1 min and 10 min. Socio-demographic information was self-reported. Stratum-specific summary statistics and multivariable analyses were performed. Results: Women accumulated a mean (sd) 50(20) kJ•day- 1•kg- 1 of PAEE, and 83(67) and 33(39) minutes•day- 1 of 1-min bouted and 10-min bouted MVPA respectively. By contrast, men recorded 59(23) kJ•day- 1•kg- 1, 124(84) and 60(58) minutes•day- 1. Age and BMI were also important correlates of PA. Association with age was inverse in both sexes, more strongly so for PAEE than MVPA. Obese individuals accumulated less PA than their normal-weight counterparts, whether considering PAEE or allometrically-scaled PAEE (- 10 kJ•day- 1•kg- 1 or - 15 kJ•day- 1•kg-2/3 in men). Higher income and manual work were associated with higher PA; manual workers recorded 13-16 kJ•kg- 1•day- 1 more PAEE than sedentary counterparts. Overall, 86% of women and 96% of men accumulated a daily average of MVPA (> 3 METs) corresponding to 150 min per week. These values were 49 and 74% if only considering bouts > 10 min (15 and 31% for > 4 METs). Conclusions: PA varied by age, sex and BMI, and was higher in manual workers and those with higher incomes. Light physical activity was the main driver of PAEE; a component of PA that is currently not quantified as a target in UK guidelines.
Article
Full-text available
Proteins are effector molecules that mediate the functions of genes1,2 and modulate comorbidities3,4,5,6,7,8,9,10, behaviors and drug treatments¹¹. They represent an enormous potential resource for personalized, systemic and data-driven diagnosis, prevention, monitoring and treatment. However, the concept of using plasma proteins for individualized health assessment across many health conditions simultaneously has not been tested. Here, we show that plasma protein expression patterns strongly encode for multiple different health states, future disease risks and lifestyle behaviors. We developed and validated protein-phenotype models for 11 different health indicators: liver fat, kidney filtration, percentage body fat, visceral fat mass, lean body mass, cardiopulmonary fitness, physical activity, alcohol consumption, cigarette smoking, diabetes risk and primary cardiovascular event risk. The analyses were prospectively planned, documented and executed at scale on archived samples and clinical data, with a total of ~85 million protein measurements in 16,894 participants. Our proof-of-concept study demonstrates that protein expression patterns reliably encode for many different health issues, and that large-scale protein scanning12,13,14,15,16 coupled with machine learning is viable for the development and future simultaneous delivery of multiple measures of health. We anticipate that, with further validation and the addition of more protein-phenotype models, this approach could enable a single-source, individualized so-called liquid health check.
Article
Full-text available
Significance A sequence variant (I148M) in PNPLA3 is a major genetic risk factor for nonalcoholic fatty liver disease. Previously, we showed that PNPLA3(148M) evades ubiquitylation-mediated degradation and accumulates to high levels on lipid droplets (LDs). Here we address how this accumulation is related to steatosis. We generated an active, ubiquitylation-resistant isoform that accumulated on LDs and increased hepatic triglyceride levels when expressed in livers of mice. Conversely, depletion of PNPLA3 resolved the excess hepatic fat accumulation associated with expression of PNPLA3(148M). Our results provide direct evidence that accumulation of PNPLA3 per se causes fatty liver, and that depletion of the protein is a potential strategy for therapeutic intervention.
Article
Full-text available
Carpal tunnel syndrome (CTS) is a common and disabling condition of the hand caused by entrapment of the median nerve at the level of the wrist. It is the commonest entrapment neuropathy, with estimates of prevalence ranging between 5–10%. Here, we undertake a genome-wide association study (GWAS) of an entrapment neuropathy, using 12,312 CTS cases and 389,344 controls identified in UK Biobank. We discover 16 susceptibility loci for CTS with p < 5 × 10⁻⁸. We identify likely causal genes in the pathogenesis of CTS, including ADAMTS17, ADAMTS10 and EFEMP1, and using RNA sequencing demonstrate expression of these genes in surgically resected tenosynovium from CTS patients. We perform Mendelian randomisation and demonstrate a causal relationship between short stature and higher risk of CTS. We suggest that variants within genes implicated in growth and extracellular matrix architecture contribute to the genetic predisposition to CTS by altering the environment through which the median nerve transits.
Article
Full-text available
Reduced lung function predicts mortality and is key to the diagnosis of chronic obstructive pulmonary disease (COPD). In a genome-wide association study in 400,102 individuals of European ancestry, we define 279 lung function signals, 139 of which are new. In combination, these variants strongly predict COPD in independent populations. Furthermore, the combined effect of these variants showed generalizability across smokers and never smokers, and across ancestral groups. We highlight biological pathways, known and potential drug targets for COPD and, in phenome-wide association studies, autoimmune-related and other pleiotropic effects of lung function–associated variants. This new genetic evidence has potential to improve future preventive and therapeutic strategies for COPD.
Article
Full-text available
Alzheimer’s disease (AD) is highly heritable and recent studies have identified over 20 disease-associated genomic loci. Yet these only explain a small proportion of the genetic variance, indicating that undiscovered loci remain. Here, we performed a large genome-wide association study of clinically diagnosed AD and AD-by-proxy (71,880 cases, 383,378 controls). AD-by-proxy, based on parental diagnoses, showed strong genetic correlation with AD (rg = 0.81). Meta-analysis identified 29 risk loci, implicating 215 potential causative genes. Associated genes are strongly expressed in immune-related tissues and cell types (spleen, liver, and microglia). Gene-set analyses indicate biological mechanisms involved in lipid-related processes and degradation of amyloid precursor proteins. We show strong genetic correlations with multiple health-related outcomes, and Mendelian randomization results suggest a protective effect of cognitive ability on AD risk. These results are a step forward in identifying the genetic factors that contribute to AD risk and add novel insights into the neurobiology of AD. © 2019, The Author(s), under exclusive licence to Springer Nature America, Inc.
Article
Full-text available
Proteins and their functional interactions form the backbone of the cellular machinery. Their connectivity network needs to be considered for the full understanding of biological phenomena, but the available information on protein-protein associations is incomplete and exhibits varying levels of annotation granularity and reliability. The STRING database aims to collect, score and integrate all publicly available sources of protein-protein interaction information, and to complement these with computational predictions. Its goal is to achieve a comprehensive and objective global network, including direct (physical) as well as indirect (functional) interactions. The latest version of STRING (11.0) more than doubles the number of organisms it covers, to 5090. The most important new feature is an option to upload entire, genome-wide datasets as input, allowing users to visualize subsets as interaction networks and to perform gene-set enrichment analysis on the entire input. For the enrichment analysis, STRING implements well-known classification systems such as Gene Ontology and KEGG, but also offers additional, new classification systems based on high-throughput text-mining as well as on a hierarchical clustering of the association network itself. The STRING resource is available online at https://string-db.org/.
Article
Full-text available
The UK Biobank project is a prospective cohort study with deep genetic and phenotypic data collected on approximately 500,000 individuals from across the United Kingdom, aged between 40 and 69 at recruitment. The open resource is unique in its size and scope. A rich variety of phenotypic and health-related information is available on each participant, including biological measurements, lifestyle indicators, biomarkers in blood and urine, and imaging of the body and brain. Follow-up information is provided by linking health and medical records. Genome-wide genotype data have been collected on all participants, providing many opportunities for the discovery of new genetic associations and the genetic bases of complex traits. Here we describe the centralized analysis of the genetic data, including genotype quality, properties of population structure and relatedness of the genetic data, and efficient phasing and genotype imputation that increases the number of testable variants to around 96 million. Classical allelic variation at 11 human leukocyte antigen genes was imputed, resulting in the recovery of signals with known associations between human leukocyte antigen alleles and many diseases.
Article
Full-text available
Identifying genetic variants associated with circulating protein concentrations (protein quantitative trait loci; pQTLs) and integrating them with variants from genome-wide association studies (GWAS) may illuminate the proteome's causal role in disease and bridge a knowledge gap regarding SNP-disease associations. We provide the results of GWAS of 71 high-value cardiovascular disease proteins in 6861 Framingham Heart Study participants and independent external replication. We report the mapping of over 16,000 pQTL variants and their functional relevance. We provide an integrated plasma protein-QTL database. Thirteen proteins harbor pQTL variants that match coronary disease-risk variants from GWAS or test causal for coronary disease by Mendelian randomization. Eight of these proteins predict new-onset cardiovascular disease events in Framingham participants. We demonstrate that identifying pQTLs, integrating them with GWAS results, employing Mendelian randomization, and prospectively testing protein-trait associations holds potential for elucidating causal genes, proteins, and pathways for cardiovascular disease and may identify targets for its prevention and treatment.
Article
Full-text available
Endometrial cancer is the most commonly diagnosed cancer of the female reproductive tract in developed countries. Through genome-wide association studies (GWAS), we have previously identified eight risk loci for endometrial cancer. Here, we present an expanded meta-analysis of 12,906 endometrial cancer cases and 108,979 controls (including new genotype data for 5624 cases) and identify nine novel genome-wide significant loci, including a locus on 12q24.12 previously identified by meta-GWAS of endometrial and colorectal cancer. At five loci, expression quantitative trait locus (eQTL) analyses identify candidate causal genes; risk alleles at two of these loci associate with decreased expression of genes, which encode negative regulators of oncogenic signal transduction proteins (SH2B3 (12q24.12) and NF1 (17q11.2)). In summary, this study has doubled the number of known endometrial cancer risk loci and revealed candidate causal genes for future study.
Article
Full-text available
Although plasma proteins have important roles in biological processes and are the direct targets of many drugs, the genetic factors that control inter-individual variation in plasma protein levels are not well understood. Here we characterize the genetic architecture of the human plasma proteome in healthy blood donors from the INTERVAL study. We identify 1,927 genetic associations with 1,478 proteins, a fourfold increase on existing knowledge, including trans associations for 1,104 proteins. To understand the consequences of perturbations in plasma protein levels, we apply an integrated approach that links genetic variation with biological pathway, disease, and drug databases. We show that protein quantitative trait loci overlap with gene expression quantitative trait loci, as well as with disease-associated loci, and find evidence that protein biomarkers have causal roles in disease using Mendelian randomization analysis. By linking genetic factors to diseases via specific proteins, our analyses highlight potential therapeutic targets, opportunities for matching existing drugs with new disease indications, and potential safety concerns for drugs under development.
Article
Full-text available
Results from genome-wide association studies (GWAS) can be used to infer causal relationships between phenotypes, using a strategy known as 2-sample Mendelian randomization (2SMR) and bypassing the need for individual-level data. However, 2SMR methods are evolving rapidly and GWAS results are often insufficiently curated, undermining efficient implementation of the approach. We therefore developed MR-Base ( http://www.mrbase.org ): a platform that integrates a curated database of complete GWAS results (no restrictions according to statistical significance) with an application programming interface, web app and R packages that automate 2SMR. The software includes several sensitivity analyses for assessing the impact of horizontal pleiotropy and other violations of assumptions. The database currently comprises 11 billion single nucleotide polymorphism-trait associations from 1673 GWAS and is updated on a regular basis. Integrating data with software ensures more rigorous application of hypothesis-driven analyses and allows millions of potential causal relationships to be efficiently evaluated in phenome-wide association studies.
Article
Full-text available
These updated guidelines on the management of abnormal liver blood tests have been commissioned by the Clinical Services and Standards Committee (CSSC) of the British Society of Gastroenterology (BSG) under the auspices of the liver section of the BSG. The original guidelines, which this document supersedes, were written in 2000 and have undergone extensive revision by members of the Guidelines Development Group (GDG). The GDG comprises representatives from patient/carer groups (British Liver Trust, Liver4life, PBC Foundation and PSC Support), elected members of the BSG liver section (including representatives from Scotland and Wales), British Association for the Study of the Liver (BASL), Specialist Advisory Committee in Clinical Biochemistry/Royal College of Pathology and Association for Clinical Biochemistry, British Society of Paediatric Gastroenterology, Hepatology and Nutrition (BSPGHAN), Public Health England (implementation and screening), Royal College of General Practice, British Society of Gastrointestinal and Abdominal Radiologists (BSGAR) and Society of Acute Medicine. The quality of evidence and grading of recommendations was appraised using the AGREE II tool. These guidelines deal specifically with the management of abnormal liver blood tests in children and adults in both primary and secondary care under the following subheadings: (1) What constitutes an abnormal liver blood test? (2) What constitutes a standard liver blood test panel? (3) When should liver blood tests be checked? (4) Does the extent and duration of abnormal liver blood tests determine subsequent investigation? (5) Response to abnormal liver blood tests. They are not designed to deal with the management of the underlying liver disease.
Article
Full-text available
The timing of puberty is a highly polygenic childhood trait that is epidemiologically associated with various adult diseases. Using 1000 Genomes Project-imputed genotype data in up to ∼370,000 women, we identify 389 independent signals (P < 5 × 10(-8)) for age at menarche, a milestone in female pubertal development. In Icelandic data, these signals explain ∼7.4% of the population variance in age at menarche, corresponding to ∼25% of the estimated heritability. We implicate ∼250 genes via coding variation or associated expression, demonstrating significant enrichment in neural tissues. Rare variants near the imprinted genes MKRN3 and DLK1 were identified, exhibiting large effects when paternally inherited. Mendelian randomization analyses suggest causal inverse associations, independent of body mass index (BMI), between puberty timing and risks for breast and endometrial cancers in women and prostate cancer in men. In aggregate, our findings highlight the complexity of the genetic regulation of puberty timing and support causal links with cancer susceptibility.
Article
Full-text available
Recent advances in highly multiplexed immunoassays have allowed systematic large-scale measurement of hundreds of plasma proteins in large cohort studies. In combination with genotyping, such studies offer the prospect to 1) identify mechanisms involved with regulation of protein expression in plasma, and 2) determine whether the plasma proteins are likely to be causally implicated in disease. We report here the results of genome-wide association (GWA) studies of 83 proteins considered relevant to cardiovascular disease (CVD), measured in 3,394 individuals with multiple CVD risk factors. We identified 79 genome-wide significant (p<5e-8) association signals, 55 of which replicated at P<0.0007 in separate validation studies (n = 2,639 individuals). Using automated text mining, manual curation, and network-based methods incorporating information on expression quantitative trait loci (eQTL), we propose plausible causal mechanisms for 25 trans-acting loci, including a potential post-translational regulation of stem cell factor by matrix metalloproteinase 9 and receptor-ligand pairs such as RANK-RANK ligand. Using public GWA study data, we further evaluate all 79 loci for their causal effect on coronary artery disease, and highlight several potentially causal associations. Overall, a majority of the plasma proteins studied showed evidence of regulation at the genetic level. Our results enable future studies of the causal architecture of human disease, which in turn should aid discovery of new drug targets.
Article
Full-text available
Genome-wide association studies (GWAS) with intermediate phenotypes, like changes in metabolite and protein levels, provide functional evidence to map disease associations and translate them into clinical applications. However, although hundreds of genetic variants have been associated with complex disorders, the underlying molecular pathways often remain elusive. Associations with intermediate traits are key in establishing functional links between GWAS-identified risk-variants and disease end points. Here we describe a GWAS using a highly multiplexed aptamer-based affinity proteomics platform. We quantify 539 associations between protein levels and gene variants (pQTLs) in a German cohort and replicate over half of them in an Arab and Asian cohort. Fifty-five of the replicated pQTLs are located in trans. Our associations overlap with 57 genetic risk loci for 42 unique disease end points. We integrate this information into a genome-proteome network and provide an interactive web-tool for interrogations. Our results provide a basis for novel approaches to pharmaceutical and diagnostic applications.
Article
Many complex human phenotypes exhibit sex-differentiated characteristics. However, the molecular mechanisms underlying these differences remain largely unknown. We generated a catalog of sex differences in gene expression and in the genetic regulation of gene expression across 44 human tissue sources surveyed by the Genotype-Tissue Expression project (GTEx, v8 release). We demonstrate that sex influences gene expression levels and cellular composition of tissue samples across the human body. A total of 37% of all genes exhibit sex-biased expression in at least one tissue. We identify cis expression quantitative trait loci (eQTLs) with sex-differentiated effects and characterize their cellular origin. By integrating sex-biased eQTLs with genome-wide association study data, we identify 58 gene-trait associations that are driven by genetic regulation of gene expression in a single sex. These findings provide an extensive characterization of sex differences in the human transcriptome and its genetic regulation.
Article
Proteomic analysis of cells, tissues and body fluids has generated valuable insights into the complex processes influencing human biology. Proteins represent intermediate phenotypes for disease and provide insight into how genetic and non-genetic risk factors are mechanistically linked to clinical outcomes. Associations between protein levels and DNA sequence variants that colocalize with risk alleles for common diseases can expose disease-associated pathways, revealing novel drug targets and translational biomarkers. However, genome-wide, population-scale analyses of proteomic data are only now emerging. Here, we review current findings from studies of the plasma proteome and discuss their potential for advancing biomedical translation through the interpretation of genome-wide association analyses. We highlight the challenges faced by currently available technologies and provide perspectives relevant to their future application in large-scale biobank studies.
Article
Gene expression involves transcription, translation and the turnover of mRNAs and proteins. The degree to which protein abundances scale with mRNA levels and the implications in cases where this dependency breaks down remain an intensely debated topic. Here we review recent mRNA–protein correlation studies in the light of the quantitative parameters of the gene expression pathway, contextual confounders and buffering mechanisms. Although protein and mRNA levels typically show reasonable correlation, we describe how transcriptomics and proteomics provide useful non-redundant readouts. Integrating both types of data can reveal exciting biology and is an essential step in refining our understanding of the principles of gene expression control.
Article
Background Coronavirus disease 2019 (COVID-19) is characterised by dyspnoea and abnormal coagulation parameters, including raised D-dimer. Data suggests a high incidence of pulmonary embolism (PE) in ventilated patients with COVID-19. Objectives To determine the incidence of PE in hospitalised patients with COVID-19 and the diagnostic yield of Computer Tomography Pulmonary Angiography (CTPA) for PE. We also examined the utility of D-dimer and conventional pre-test probability for diagnosis of PE in COVID-19. Patients/methods Retrospective review of single-centre data of all CTPA studies in patients with suspected or confirmed COVID-19 identified from Electronic Patient Records (EPR). Results There were 1477 patients admitted with COVID-19 and 214 CTPA scans performed, of which n = 180 (84%) were requested outside of critical care. The diagnostic yield for PE was 37%. The overall proportion of PE in patients with COVID-19 was 5.4%. The proportions with Wells score of ≥4 (‘PE likely’) was 33/134 (25%) without PE vs 20/80 (25%) with PE (P = 0.951). The median National Early Warning-2 (NEWS2) score (illness severity) was 5 (interquartile range [IQR] 3–9) in PE group vs 4 (IQR 2–7) in those without PE (P = 0.133). D-dimer was higher in PE (median 8000 ng/mL; IQR 4665–8000 ng/mL) than non-PE (2060 ng/mL, IQR 1210–4410 ng/mL, P < 0.001). In the ‘low probability’ group, D-dimer was higher (P < 0.001) in those with PE but had a limited role in excluding PE. Conclusions Even outside of the critical care environment, PE in hospitalised patients with COVID-19 is common. Of note, approaching half of PE events were diagnosed on hospital admission. More data are needed to identify an optimal diagnostic pathway in patients with COVID-19. Randomised controlled trials of intensified thromboprophylaxis are urgently needed.
Article
Background: Prostasin (PRSS8) is a stimulator of epithelial sodium transport. In this study, we evaluated alteration of prostasin expression in the inflamed mucosa of patients with inflammatory bowel disease (IBD) and investigated the role of prostasin in the gut inflammation. Methods: Prostasin expression was evaluated by immunohistochemical staining. Dextran sodium sulfate (DSS)-colitis was induced in mice lacking prostasin specifically in intestinal epithelial cells (PRSS8ΔIEC mice). Results: In colonic mucosa of healthy individuals, prostasin was strongly expressed at the apical surfaces of epithelial cells, and this was markedly decreased in active mucosa of both ulcerative colitis and Crohn's disease. DSS-colitis was exacerbated in PRSS8ΔIEC mice compared to control PRSS8lox/lox mice. Toll-like receptor4 (TLR4) expression in colonic epithelial cells was stronger in DSS-treated PRSS8ΔIEC mice than in DSS-treated PRSS8 lox/lox mice. NF-κB activation in colonic epithelial cells was more pronounced in DSS-treated PRSS8ΔIEC mice than in DSS-treated PRSS8lox/lox mice, and the mRNA expression of inflammatory cytokines was significantly higher in DSS-treated PRSS8ΔIEC mice. Broad-spectrum antibiotic treatment completely suppressed the exacerbation of DSS-colitis in PRSS8ΔIEC mice. The mRNA expression of tight junction proteins and mucosal permeability assessed using FITC-dextran were comparable between DSS-treated PRSS8lox/lox and DSS-treated PRSS8ΔIEC mice. Conclusion: Prostasin has an anti-inflammatory effect via downregulation of TLR4 expression in colonic epithelial cells. Reduced prostasin expression in IBD mucosa is linked to the deterioration of local anti-inflammatory activity and may contribute to the persistence of mucosal inflammation.
Article
Nearly all human complex traits and disease phenotypes exhibit some degree of sex differences, including differences in prevalence, age of onset, severity or disease progression. Until recently, the underlying genetic mechanisms of such sex differences have been largely unexplored. Advances in genomic technologies and analytical approaches are now enabling a deeper investigation into the effect of sex on human health traits. In this Review, we discuss recent insights into the genetic models and mechanisms that lead to sex differences in complex traits. This knowledge is critical for developing deeper insight into the fundamental biology of sex differences and disease processes, thus facilitating precision medicine.
Preprint
While many disease-associated variants have been identified through genome-wide association studies, their downstream molecular consequences remain unclear. To identify these effects, we performed cis- and trans-expression quantitative trait locus (eQTL) analysis in blood from 31,684 individuals through the eQTLGen Consortium. We observed that cis -eQTLs can be detected for 88% of the studied genes, but that they have a different genetic architecture compared to disease-associated variants, limiting our ability to use cis -eQTLs to pinpoint causal genes within susceptibility loci. In contrast, trans-eQTLs (detected for 37% of 10,317 studied trait-associated variants) were more informative. Multiple unlinked variants, associated to the same complex trait, often converged on trans-genes that are known to play central roles in disease etiology. We observed the same when ascertaining the effect of polygenic scores calculated for 1,263 genome-wide association study (GWAS) traits. Expression levels of 13% of the studied genes correlated with polygenic scores, and many resulting genes are known to drive these traits.
Article
The blood proteome in disease Understanding the function of human blood serum proteins in disease has been limited by difficulties in monitoring their production, accumulation, and distribution. Emilsson et al. investigated human serum proteins of more than 5000 Icelanders over the age of 65. The composition of blood serum includes a complex regulatory network of proteins that are globally coordinated across most or all tissues. The authors identified modules and functional groups associated with disease and health outcomes and were able to link genetic variants to complex diseases. Science , this issue p. 769
Article
Although plasma proteins have important roles in biological processes and are the direct targets of many drugs, the genetic factors that control inter-individual variation in plasma protein levels are not well understood. Here we characterize the genetic architecture of the human plasma proteome in healthy blood donors from the INTERVAL study. We identify 1,927 genetic associations with 1,478 proteins, a fourfold increase on existing knowledge, including trans associations for 1,104 proteins. To understand the consequences of perturbations in plasma protein levels, we apply an integrated approach that links genetic variation with biological pathway, disease, and drug databases. We show that protein quantitative trait loci overlap with gene expression quantitative trait loci, as well as with disease-associated loci, and find evidence that protein biomarkers have causal roles in disease using Mendelian randomization analysis. By linking genetic factors to diseases via specific proteins, our analyses highlight potential therapeutic targets, opportunities for matching existing drugs with new disease indications, and potential safety concerns for drugs under development.
Article
We previously demonstrated that the WNT/β‐catenin pathway is present and active in platelets and established that the canonical WNT ligand, WNT‐3a, suppresses platelet adhesion and activation. In nucleated cells, β‐catenin, the key downstream effector of this pathway, is a dual function protein, regulating the coordination of gene transcription and cell–cell adhesion. The specific role of β‐catenin in the anucleate platelet however remains elusive. Here, we performed a label‐free quantitative proteomic analysis of β‐catenin immunoprecipitates from human platelets and identified 9 co‐immunoprecipitating proteins. Three of the co‐immunoprecipitating proteins (α‐catenin‐1, cadherin 6, and β‐catenin‐interacting protein 1) were common to both resting and activated conditions. Bioinformatics analysis of proteomics data revealed a strong association of our dataset with both cadherin adherens junctions and regulators of WNT signalling. We then verified that platelet β‐catenin and cadherin‐6 interact and that this interaction is regulated by the activation state of the platelet. Taken together our proteomics study suggests a novel role for β‐catenin in human platelets where it interacts with platelet cadherins and associated junctional proteins. This article is protected by copyright. All rights reserved
Article
Non-alcoholic fatty liver disease (NAFLD) is now recognised as the most common liver disease worldwide. It encompasses a broad spectrum of conditions, from simple steatosis, through non-alcoholic steatohepatitis, to fibrosis and ultimately cirrhosis and hepatocellular carcinoma. A hallmark of NAFLD is the substantial inter-patient variation in disease progression. NAFLD is considered a complex disease trait such that interactions between the environment and a susceptible polygenic host background determine disease phenotype and influence progression. Recent years have witnessed multiple genome-wide association and large candidate gene studies, which have enriched our understanding of the genetic basis of NAFLD. Notably, the I148M PNPLA3 variant has been identified as the major common genetic determinant of NAFLD. Variants with moderate effect size in TM6SF2, MBOAT7 and GCKR have also been shown to have a significant contribution. The premise for this review is to discuss the status of research into important genetic and epigenetic modifiers of NAFLD progression. The potential to translate the accumulating wealth of genetic data into the design of novel therapeutics and the clinical implementation of diagnostic/prognostic biomarkers will be explored. Finally, personalised medicine and the opportunities for future research and challenges in the immediate post genetics era will be illustrated and discussed.
Article
Target identification (determining the correct drug targets for a disease) and target validation (demonstrating an effect of target perturbation on disease biomarkers and disease end points) are important steps in drug development. Clinically relevant associations of variants in genes encoding drug targets model the effect of modifying the same targets pharmacologically. To delineate drug development (including repurposing) opportunities arising from this paradigm, we connected complex disease- and biomarker-associated loci from genome-wide association studies to an updated set of genes encoding druggable human proteins, to agents with bioactivity against these targets, and, where there were licensed drugs, to clinical indications. We used this set of genes to inform the design of a new genotyping array, which will enable association studies of druggable genes for drug target selection and validation in human disease.
Article
Primary open-angle glaucoma (POAG), the most common optic neuropathy, is a heritable disease. Siblings of POAG cases have a ten-fold increase risk of developing the disease. Intraocular pressure (IOP) and optic nerve head characteristics are used clinically to predict POAG risk. We conducted a genome-wide association meta-analysis of IOP and optic disc parameters and validated our findings in multiple sets of POAG cases and controls. Using imputation to the 1000 genomes (1000G) reference set, we identified 9 new genomic regions associated with vertical cup disc ratio (VCDR) and 1 new region associated with IOP. Additionally, we found 5 novel loci for optic nerve cup area and 6 for disc area. Previously it was assumed that genetic variation influenced POAG either through IOP or via changes to the optic nerve head; here we present evidence that some genomic regions affect both IOP and the disc parameters. We characterized the effect of the novel loci through pathway analysis and found that pathways involved are not entirely distinct as assumed so far. Further, we identified a novel association between CDKN1A and POAG. Using a zebrafish model we show that six6b (associated with POAG and optic nerve head variation) alters the expression of cdkn1a In summary, we have identified several novel genes influencing the major clinical risk predictors of POAG and showed that genetic variation in CDKN1A is important in POAG risk.
Article
We describe a reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole-genome sequence data from 20 studies of predominantly European ancestry. Using this resource leads to accurate genotype imputation at minor allele frequencies as low as 0.1% and a large increase in the number of SNPs tested in association studies, and it can help to discover and refine causal loci. We describe remote server resources that allow researchers to carry out imputation and phasing consistently and efficiently.