Figure - available from: Scientific Reports
This content is subject to copyright. Terms and conditions apply.
RNA-seq simulation results. Area under the ROC curves (AUCs) of eight meta-analysis methods were compared for different numbers of associated studies and significant genes. Among 20 studies used for the meta-analysis, 2, 5 and 10 studies were introduced as associated studies. The results with 10, 30 and 60% of significant genes are shown in the first, second, and third rows, respectively.

RNA-seq simulation results. Area under the ROC curves (AUCs) of eight meta-analysis methods were compared for different numbers of associated studies and significant genes. Among 20 studies used for the meta-analysis, 2, 5 and 10 studies were introduced as associated studies. The results with 10, 30 and 60% of significant genes are shown in the first, second, and third rows, respectively.

Source publication
Article
Full-text available
Meta-analyses increase statistical power by combining statistics from multiple studies. Meta-analysis methods have mostly been evaluated under the condition that all the data in each study have an association with the given phenotype. However, specific experimental conditions in each study or genetic heterogeneity can result in “unassociated statis...

Similar publications

Preprint
Full-text available
Autism spectrum disorder is a multifactorial neurodevelopmental disorder with high genetic heterogeneity. Studies of brain networks in autism can provide new insights into the dynamics of information processing in individuals who suffer from such a condition. This paper proposes a method for automatic diagnosis of autism based on fMRI time series a...
Article
Full-text available
Multiple myeloma (MM) is a complex hematologic malignancy characterized by the uncontrolled proliferation of clonal plasma cells in the bone marrow that secrete large amounts of immunoglobulins and other non-functional proteins. Despite decades of progress and several landmark therapeutic advancements, MM remains incurable in most cases. Standard o...
Article
Full-text available
Background Autism spectrum disorder (ASD) is a complex neurodevelopmental disorder that is highly phenotypically and genetically heterogeneous. With the accumulation of biological sequencing data, more and more studies shift to molecular subtype-first approach, from identifying molecular subtypes based on genetic and molecular data to linking molec...
Article
Full-text available
Background Until now, Mendelian randomization (MR) studies have investigated the causal association of risk factors with Alzheimer’s disease (AD) using large-scale AD genome-wide association studies (GWAS), GWAS by proxy (GWAX), and meta-analyses of GWAS and GWAX (GWAS+GWAX) datasets. However, it currently remains unclear about the consistency of M...
Article
Full-text available
Background Biofilms disperse in response to specific environmental cues, such as reduced oxygen concentration, changes in nutrient concentration and exposure to nitric oxide. Interestingly, biofilms do not completely disperse under these conditions, which is generally attributed to physiological heterogeneity of the biofilm. However, our results su...

Citations

... Chi-squared and Fisher's exact probability tests were used for the analysis of qualitative data differences. The metaanalysis of TWAS statistics from the two discovery cohorts and the ADNI cohort was performed with the use of the weighted Fisher's method (wFisher) in metapro R package [49]. The p values from the TWAS analysis, corresponding sample sizes, and effect direction were input for this metaanalytical approach for the calculation of combined p values. ...
... The individual WGS data from the ADNI cohort (Table 1) underwent TWAS individual analysis using Predixcan software, employing the same reference as the mentioned TWAS summary analysis, to predict the regulation of genes, including the SLC25 family genes. We next utilized the weighted Fisher's method (wFisher) to perform meta-analysis based on the TWAS statistics from two discovery cohorts and the ADNI cohort as validation to enhance statistical power for the combined p values [49]. At the nominal threshold of p value at 0.01, three AD-associated genes, SLC25A10, SLC25A17, and SLC25A22 were determined as candidate genes for further geneto-disease trait association analysis ( Fig. 2 and Table 2). ...
Article
Full-text available
The etiopathogenesis of late-onset Alzheimer’s disease (AD) is increasingly recognized as the result of the combination of the aging process, toxic proteins, brain dysmetabolism, and genetic risks. Although the role of mitochondrial dysfunction in the pathogenesis of AD has been well-appreciated, the interaction between mitochondrial function and genetic variability in promoting dementia is still poorly understood. In this study, by tissue-specific transcriptome-wide association study (TWAS) and further meta-analysis, we examined the genetic association between mitochondrial solute carrier family (SLC25) genes and AD in three independent cohorts and identified three AD-susceptibility genes, including SLC25A10, SLC25A17, and SLC25A22. Integrative analysis using neuroimaging data and hippocampal TWAS-predicted gene expression of the three susceptibility genes showed an inverse correlation of SLC25A22 with hippocampal atrophy rate in AD patients, which outweighed the impacts of sex, age, and apolipoprotein E4 (ApoE4). Furthermore, SLC25A22 downregulation demonstrated an association with AD onset, as compared with the other two transcriptome-wide significant genes. Pathway and network analysis related hippocampal SLC25A22 downregulation to defects in neuronal function and development, echoing the enrichment of SLC25A22 expression in human glutamatergic neurons. The most parsimonious interpretation of the results is that we have identified AD-susceptibility genes in the SLC25 family through the prediction of hippocampal gene expression. Moreover, our findings mechanistically yield insight into the mitochondrial cascade hypothesis of AD and pave the way for the future development of diagnostic tools for the early prevention of AD from a perspective of precision medicine by targeting the mitochondria-related genes.
... Yoon and colleagues reported that P value combination robust approaches outperform most current approaches, including the effect size aggregation, capable of detecting incomplete associations [16]. Raw P values for each disease-specific dataset as derived from the DESeq2 approach were loaded and meta-analyzed with the Fisher's combined probability test with the MetaVolcanoR package (v1.14.0). ...
Article
Full-text available
Despite the abundance of epidemiological evidence for the high comorbid rate between psoriasis and obesity, systematic approaches to common inflammatory mechanisms have not been adequately explored. We performed a meta-analysis of publicly available RNA-sequencing datasets to unveil putative mechanisms that are postulated to exacerbate both diseases, utilizing both late-stage, disease-specific meta-analyses and consensus gene co-expression network (cWGCNA). Single-gene meta-analyses reported several common inflammatory mechanisms fostered by the perturbed expression profile of inflammatory cells. Assessment of gene overlaps between both diseases revealed significant overlaps between up- (n = 170, P value = 6.07 × 10⁻⁶⁵) and down-regulated (n = 49, P value = 7.1 × 10⁻⁷) genes, associated with increased T cell response and activated transcription factors. Our cWGCNA approach disentangled 48 consensus modules, associated with either the differentiation of leukocytes or metabolic pathways with similar correlation signals in both diseases. Notably, all our analyses confirmed the association of the perturbed T helper (Th)17 differentiation pathway in both diseases. Our novel findings through whole transcriptomic analyses characterize the inflammatory commonalities between psoriasis and obesity implying the assessment of several expression profiles that could serve as putative comorbid disease progression biomarkers and therapeutic interventions.
... Lancaster (1961) [3] suggested utilizing the sum of chi-squared quantiles, denoted as m i=1 χ 2 d i (1−P i ), where χ 2 d i (1−P i ) represents the 1−P i quantile of the chi-squared distribution with d i degrees of freedom and d i could correspond to the sample size of the ith study. [4] proposed a method based on the Gamma-transform m i=1 G −1 (1 − P i , 1/P i , 1), where G −1 (1 − P i , ξ, η) is 1 − P i quantile of the Gamma distribution with shape parameter 1/P i and scale parameter 1. Yoon et al. (2021) [5] developed a method denoted as m i=1 G −1 (P i , m n i n , 2), where n 1 , · · · , n m are the sample sizes for the m studies and n = m i=1 n i . Zhang and Wu (2022) [6]constructed a general Fisher type statistic given by m i=1 w i χ 2 d i (1 − p i ). ...
Article
Full-text available
Combining p-values is a well-known issue in statistical inference. When faced with a study involving m p-values, determining how to effectively combine them to arrive at a comprehensive and reliable conclusion becomes a significant concern in various fields, including genetics, genomics, and economics, among others. The literature offers a range of combination strategies tailored to different research objectives and data characteristics. In this work, we aim to provide users with a systematic exploration of the p-value combination problem. We present theoretical results for combining p-values using a logarithmic transformation, which highlights the benefits of this approach. Additionally, we propose a combination strategy together with its statistical properties utilizing the gold section method, showcasing its performance through extensive computer simulations. To further illustrate its effectiveness, we apply this approach to a real-world scenario.
... To overcome the variability of sample-specific transcriptomes, the meta-p-values for each stress, tissue and timepoint were computed by the generalized weighted Fisher's method with sample-sizes correction (66). The selection of this method is based on the meta-analysis decision scheme proposed in (24), considering the source data (different platforms) and the heterogeneity of the dataset. ...
Preprint
Understanding how plants adapt their physiology to overcome severe stress conditions is vital in light of the current climate crisis. This remains a challenge given the complex nature of the underlying molecular mechanisms. To provide a full picture of stress mitigation mechanisms, an exhaustive analysis of publicly available stress-related transcriptomic data was conducted. We combined a meta-analysis with an unsupervised machine learning algorithm to identify a core of stress-related genes. To ensure robustness and biological significance of the output, often lacking in meta-analyses, a three-layered biovalidation was incorporated. Our results present a ‘stress gene core’, a set of key genes involved in plant tolerance to a multitude of adverse environmental conditions rather than specific ones. In addition, we provide a biologically validated database to assist in design of multi-stress resilience. Taken together, our results pave the way towards future-proof sustainable agriculture. Teaser Using a machine learning-driven meta-analysis, a plant ‘stress gene core’ was identified as a hub mediating multi-stress regulation
... For this current study, a novel cross omic meta-analysis was conducted by combining the DGE and gene-based GWAS p-values using Fisher's method implemented in R package metapro 48 . This DGE-GWAS meta-analyses was performed to detect loci that may not be robustly significant in a single domain (DGE or GWAS) but where the combined evidence from independent analyses is robust. ...
Preprint
Full-text available
Genes influencing opioid use disorder (OUD) biology have been identified via genome-wide association studies (GWAS), gene expression, and network analyses. These discoveries provide opportunities to identifying existing compounds targeting these genes for drug repurposing studies. However, systematically integrating discovery results and identifying relevant available pharmacotherapies for OUD repurposing studies is challenging. To address this, we've constructed a framework that leverages existing results and drug databases to identify candidate pharmacotherapies. For this study, two independent OUD related meta-analyses were used including a GWAS and a differential gene expression (DGE) study of post-mortem human brain. Protein-Protein Interaction (PPI) sub-networks enriched for GWAS risk loci were identified via network analyses. Drug databases Pharos, Open Targets, Therapeutic Target Database (TTD), and DrugBank were queried for clinical status and target selectivity. Cross-omic and drug query results were then integrated to identify candidate compounds. GWAS and DGE analyses revealed 3 and 335 target genes (FDR q<0.05), respectively while network analysis detected 70 genes in 22 enriched PPI networks. Four selection strategies with different statistical thresholds were implemented, which yielded between 72 and 676 genes with statistically significant support and 110 to 683 drugs targeting these genes, respectively. After filtering out less specific compounds or those targeting well-established psychiatric-related receptors (OPRM1 and DRD2), between 2 and 329 approved drugs remained across the four strategies. By leveraging multiple lines of biological evidence and resources, we identified many FDA approved drugs that target genes associated with OUD. This approach a) allows high-throughput querying of OUD-related genes, b) detects OUD-related genes and compounds not identified using a single domain or resource, and c) produces a succinct summary of FDA approved compounds eligible for efficient expert review. Identifying larger pools of candidate pharmacotherapies and summarizing the supporting evidence bridges the gap between discovery and drug repurposing studies.
... Afterward, the meta-analysis was performed based on the Amanida method. According to Amanida, a combination of weighted p values 25 , which is a modification of Fisher's method 26 , is used to evaluate the significance of a statistical result using the p value. The gamma distribution is used to assign nonintegrated weights to each P value that are proportional to the study size. ...
Article
Full-text available
Primary glomerulonephritis diseases (PGDs) are known as the top causes of chronic kidney disease worldwide. Renal biopsy, an invasive method, is the main approach to diagnose PGDs. Studying the metabolome profiles of kidney diseases is an inclusive approach to identify the disease’s underlying pathways and discover novel non-invasive biomarkers. So far, different experiments have explored the metabolome profiles in different PGDs, but the inconsistencies might hinder their clinical translations. The main goal of this meta-analysis study was to achieve consensus panels of dysregulated metabolites in PGD sub-types. The PGDs-related metabolome profiles from urine samples in humans were selected in a comprehensive search. Amanida package in R software was utilized for performing the meta-analysis. Through sub-type analyses, the consensus list of metabolites in each category was obtained. To identify the most affected pathways, functional enrichment analysis was performed. Also, a gene-metabolite network was constructed to identify the key metabolites and their connected proteins. After a vigorous search, among the 11 selected studies (15 metabolite profiles), 270 dysregulated metabolites were recognized in urine of 1154 PGDs and control samples. Through sub-type analyses by Amanida package, the consensus list of metabolites in each category was obtained. Top dysregulated metabolites (vote score of ≥ 4 or ≤ − 4) in PGDs urines were selected as main panel of meta-metabolites including glucose, leucine, choline, betaine, dimethylamine, fumaric acid, citric acid, 3-hydroxyisovaleric acid, pyruvic acid, isobutyric acid, and hippuric acid. The enrichment analyses results revealed the involvement of different biological pathways such as the TCA cycle and amino acid metabolisms in the pathogenesis of PGDs. The constructed metabolite-gene interaction network revealed the high centralities of several metabolites, including pyruvic acid, leucine, and choline. The identified metabolite panels could shed a light on the underlying pathological pathways and be considered as non-invasive biomarkers for the diagnosis of PGD sub-types.
... The Fisher's combined probability test, which was used to assess the utility of using a combination of parameters, has traditionally been used in meta-analysis to combine data from independent studies to evaluate the statistical power of a hypothesis [42]. The same methodology has been adopted to combine the data from the different modalities in our experiment to assess the statistical power of a combined analysis. ...
Article
Full-text available
Background Technologies for quick and label-free diagnosis of malignancies from breast tissues have the potential to be a significant adjunct to routine diagnostics. The biophysical phenotypes of breast tissues, such as its electrical, thermal, and mechanical properties (ETM), have the potential to serve as novel markers to differentiate between normal, benign, and malignant tissue. Results We report a system-of-biochips (SoB) integrated into a semi-automated mechatronic system that can characterize breast biopsy tissues using electro-thermo-mechanical sensing. The SoB, fabricated on silicon using microfabrication techniques, can measure the electrical impedance (Z), thermal conductivity (K), mechanical stiffness (k), and viscoelastic stress relaxation (%R) of the samples. The key sensing elements of the biochips include interdigitated electrodes, resistance temperature detectors, microheaters, and a micromachined diaphragm with piezoresistive bridges. Multi-modal ETM measurements performed on formalin-fixed tumour and adjacent normal breast biopsy samples from N = 14 subjects were able to differentiate between invasive ductal carcinoma (malignant), fibroadenoma (benign), and adjacent normal (healthy) tissues with a root mean square error of 0.2419 using a Gaussian process classifier. Carcinoma tissues were observed to have the highest mean impedance (110018.8 ± 20293.8 Ω) and stiffness (0.076 ± 0.009 kNm⁻¹) and the lowest thermal conductivity (0.189 ± 0.019 Wm⁻¹ K⁻¹) amongst the three groups, while the fibroadenoma samples had the highest percentage relaxation in normalized load (47.8 ± 5.12%). Conclusions The work presents a novel strategy to characterize the multi-modal biophysical phenotype of breast biopsy tissues to aid in cancer diagnosis from small-sized tumour samples. The methodology envisions to supplement the existing technology gap in the analysis of breast tissue samples in the pathology laboratories to aid the diagnostic workflow.
... Here, we utilized Bonferroni threshold for combining the p-values to build up the same power for univariate and multivariate techniques. However, there are other methods for combining p-values for detection of partial association such ad Fisher and weighted Fisher methods, which are more relevant in case of many comparisons 27 . ...
Preprint
Full-text available
This study utilized multivariate ANOVA analysis to investigate age-related microstructural changes in the brain tissues driven primarily by myelin, iron, and water content. Voxel-wise analyses were performed on gray matter (GM) and white matter (WM), in addition to region of interest (ROI) analyses. The multivariate approach identified brain regions showing coordinated alterations in multiple tissue properties and demonstrated bidirectional correlations between age and all examined modalities in various brain regions, including the caudate nucleus, putamen, insula, cerebellum, lingual gyri, hippocampus, and olfactory bulb. The multivariate model was more sensitive than univariate analyses as evidenced by detecting a larger number of significant voxels within clusters in the supplementary motor area, frontal cortex, hippocampus, amygdala, occipital cortex, and cerebellum bilaterally. The examination of normalized, smoothed, and z-transformed maps within the ROIs revealed age-dependent differences in myelin, iron, and water content. These findings contribute to our understanding of age-related brain differences and provide insights into the underlying mechanisms of aging. The study emphasizes the importance of multivariate analysis for detecting subtle microstructural changes associated with aging that may motivate interventions to mitigate cognitive decline in older adults.
... Since differential expression effects due to age were identified, subsequent differential gene expression (DGE) analyses included age as a factor when fitting the linear model. To further explore the effects of age on endometrial gene expression n = 87 subjects from GSE141549 were also analysed, and a meta-analysis was performed using the weighted Fisher's method for combining p-values implemented in the metapro R package 39 , where weights were proportional to each study's sample size. Ensembl ID's from the current data were matched with the Illumina probe ID's from GSE141549 resulting in 12,868 genes in common between the 2 data sets. ...
Article
Full-text available
Natural variability in menstrual cycle length, coupled with rapid changes in endometrial gene expression, makes it difficult to accurately define and compare different stages of the endometrial cycle. Here we develop and validate a method for precisely determining endometrial cycle stage based on global gene expression. Our ‘molecular staging model’ reveals significant and remarkably synchronised daily changes in expression for over 3400 endometrial genes throughout the cycle, with the most dramatic changes occurring during the secretory phase. Our study significantly extends existing data on the endometrial transcriptome, and for the first time enables identification of differentially expressed endometrial genes with increasing age and different ethnicities. It also allows reinterpretation of all endometrial RNA-seq and array data that has been published to date. Our molecular staging model will significantly advance understanding of endometrial-related disorders that affect nearly all women at some stage of their lives, such as heavy menstrual bleeding, endometriosis, adenomyosis, and recurrent implantation failure.
... or other scores or ranks to a given DMR is often useful in order to narrow down focus to specific regions that may have biological significance. While there are a number of approaches available to aggregate consecutive p values 34 , the combination of using test statistics that contribute to p values in the identification of a given region, along with complications associated with the lack of independence of methylation calls across nearby cytosines, makes any p value associated with these changepoint identified regions dubious. As an alternative, we developed a score that integrates the size of a DMR and the mean Z-statistic of the cytosines within it (see Methods Data analysis). ...
Article
Full-text available
Epigenetic variation in plant populations is an important factor in determining phenotype and adaptation to the environment. However, while advances have been made in the molecular and computational methods to analyze the methylation status of a given sample of DNA, tools to profile and compare the methylomes of multiple individual plants or groups of plants at high resolution and low cost are lacking. Here, we describe a computational approach and R package (sounDMR) that leverages the benefits of long read nanopore sequencing to enable robust identification of differential methylation from complex experimental designs, as well as assess the variability within treatment groups and identify individual plants of interest. We demonstrate the utility of this approach by profiling a population of Arabidopsis thaliana exposed to a demethylating agent and identify genomic regions of high epigenetic variability between individuals. Given the low cost of nanopore sequencing devices and the ease of sample preparation, these results show that high resolution epigenetic profiling of plant populations can be made more broadly accessible in plant breeding and biotechnology.