Figure 9 - uploaded by Mitra Ebrahimpoor
Content may be subject to copyright.
Shiny App: Active Volcano Plot. A classic volcano plot is made by selecting P = 0.0008 (3.1 in log-scale) and τ = 1. The table above the plot presents the mFDP estimate along with the corresponding bounds for the selected features.

Shiny App: Active Volcano Plot. A classic volcano plot is made by selecting P = 0.0008 (3.1 in log-scale) and τ = 1. The table above the plot presents the mFDP estimate along with the corresponding bounds for the selected features.

Source publication
Article
Full-text available
Motivation Volcano plots are used to select the most interesting discoveries when too many discoveries remain after application of Benjamini–Hochberg’s procedure (BH). The volcano plot suggests a double filtering procedure that selects features with both small adjusted $P$-value and large estimated effect size. Despite its popularity, this type of...

Context in source publication

Context 1
... that the BH procedure at level 0.045 was applied, however, the classic volcano plot selection resulted in an FDP of 0.17. The screenshot in Figure 9 shows the same volcano plot where the median estimate mFDP is 0.12, much closer to the expected value and FDP is (0,0.76), which includes the true value of FDP. ...

Similar publications

Preprint
Full-text available
I respond to Goeman’s (2022) review of my recent article: “When to adjust alpha during multiple testing: A consideration of disjunction, conjunction, and individual testing” (Rubin, 2021b). Goeman argues that I misdefine studywise error rates as only entailing weak error control. He argues that studywise error rates should entail strong error contr...

Citations

... The utilization of a volcano plot, a type of scatterplot, has been employed as a means of visualizing the dissimilarities in gene expression between two distinct groups [14]. The x-axis represents the log2 fold change, while the y-axis depicts the -log10 (p-value). ...
Article
Full-text available
Background Anti-melanoma differentiation-associated gene five antibody positive (MDA5⁺) dermatomyositis (DM) is significantly associated with rapidly progressive interstitial lung disease (RP-ILD). Early detection of RP-ILD remains a major challenge. This study aims to identify and validate prognostic factors for RP-ILD in MDA5⁺ DM patients. Methods Plasma samples from 20 MDA5⁺ DM patients and 10 healthy controls (HC) were collected for proteomic analysis using liquid chromatography-tandem mass spectrometry (LC–MS/MS) analysis. The proteins of interest were validated in independent samples (20 HC, 20 MDA5⁺ DM with RP-ILD, and 20 non-RP-ILD patients) with enzyme-linked immunosorbent assay (ELISA). Results A total of 413 differentially expressed proteins (DEPs) were detected between the MDA5⁺ DM patients and HC. When comparing DEPs between RP-ILD and non-RP-ILD patients, 79 proteins were changed in RP-ILD patients, implicating acute inflammatory response, coagulation, and complement cascades. Six candidate biomarkers were confirmed with ELISA. Secreted phosphoprotein 1 (SPP1), serum amyloid A1 (SAA1), and Kininogen 1 (KNG1) concentrations were significantly elevated in RP-ILD patients than those in non-RP-ILD patients and HC. In the different clinical subgroups, SPP1 was particularly elevated in the high-risk RP-ILD subgroup of MDA5⁺ DM. Conclusion This study provides novel insights into the pathogenesis of RP-ILD development in MDA5⁺ DM and suggests the plasma protein SPP1 could serve as a potential blood biomarker for RP-ILD early warning.
... A volcano plot is a type of scatter plot to explore the most interesting genes within large datasets. Typically, the x-axis of a volcano plot represents log2 of the fold change (FC), and the y-axis represents the -log 10 of the adjusted P-value, which is called as "double filtering" criterion [24,25]. As depicted in Figure 3 Therefore, researchers should be aware that while exploratory data analysis is useful for an initial overview, it does not provide sufficient information to derive meaningful conclusions. ...
... The statistically principled way is to formally test the null hypothesis |log 2 FC| ≤ t null , where t null is the chosen minimum significant threshold [32]. However, many practitioners instead use an alternative approach that we will designate as post hoc thresholding (also called double filtering in [33]). In this approach, only the null hypothesis of a zero fold change is formally tested, followed by a filtering step in which the significant genes whose estimated fold change is below a chosen cutoff are removed. ...
Preprint
Full-text available
The high-dimensional and heterogeneous nature of transcriptomics data from RNA sequencing (RNA-Seq) experiments poses a challenge to routine down-stream analysis steps, such as differential expression analysis and enrichment analysis. Additionally, due to practical and financial constraints, RNA-Seq experiments are often limited to a small number of biological replicates; three replicates is a commonly employed minimum cohort size. In light of recent studies on the low replicability of preclinical cancer research, it is essential to understand how the combination of population heterogeneity and underpowered cohort sizes affects the replicability of RNA-Seq research. Using 7’000 simulated RNA-Seq experiments based on real gene expression data from seven different cancer types, we find that the analysis results from underpowered experiments exhibit inflated effect sizes and are unlikely to replicate well. However, the ground-truth results obtained by analyzing large cohorts show that the precision of differentially expressed genes can be high even for small cohort sizes. The poor replicability of underpowered experiments is thus a direct consequence of their low recall (sensitivity). In other words, the low replicability of underpowered RNA-Seq cancer studies does not necessarily indicate a high prevalence of false positives. Instead, the results obtained from such studies are limited to small and mostly random subsets of a larger ground truth. We conclude with a set of practical recommendations to alleviate problems with underpowered RNA-Seq studies. Author Summary Transcriptomics data from RNA sequencing (RNA-Seq) experiments are complex and challenging to analyze due to their high dimensionality and variability. These experiments often involve limited biological replicates due to practical and financial constraints. Recent concerns about the replicability of cancer research highlight the need to explore how this combination of limited cohort sizes and population heterogeneity impacts the reliability of RNA-Seq studies. To investigate these issues, we conducted 7’000 simulated RNA-Seq experiments based on real gene expression data from seven different cancer types. We show that experiments with small cohort sizes tend to produce results with exaggerated effects that can be difficult to replicate. We further found that while underpowered studies with few replicates indeed lead to little-replicable results, the identified differentially expressed genes are reliable as shown by low rates of false positives. Each underpowered study thus discovers a small subset of the ground truth. Our study concludes with practical recommendations for RNA-Seq studies with small cohort sizes.
... likely to be identified as DEGs by the two methods from the permuted datasets ( Fig. 1D and Additional file 1: Fig. S2). This finding is consistent with a recent paper, which also reported that selecting the genes with the largest estimated differences between the two conditions would inflate the FDR [12]. As biologists tend to believe that these large-foldchange genes are more likely true DEGs (which is not necessarily true because a dataset may contain no true DEGs at all), the fact that these genes are false positives would likely waste experimental validation efforts. ...
Article
Full-text available
When identifying differentially expressed genes between two conditions using human population RNA-seq samples, we found a phenomenon by permutation analysis: two popular bioinformatics methods, DESeq2 and edgeR, have unexpectedly high false discovery rates. Expanding the analysis to limma-voom, NOISeq, dearseq, and Wilcoxon rank-sum test, we found that FDR control is often failed except for the Wilcoxon rank-sum test. Particularly, the actual FDRs of DESeq2 and edgeR sometimes exceed 20% when the target FDR is 5%. Based on these results, for population-level RNA-seq studies with large sample sizes, we recommend the Wilcoxon rank-sum test.
... However, a common practice is to manually curate this list by adding or subtracting genes, based on some external or priori knowledge (such as the knowledge of gene sets or pathways). A typical example is the case of volcano plots (Cui and Churchill, 2003), where one selects those genes passing both a significance threshold and a threshold on the fold change (difference of average gene expression on the log scale), see Figure 3. Ebrahimpoor and Goeman (2021) have recently shown in an extensive simulation study that this type of double filtering strategy yields inflated FDRs. ...
... In practice, Simes post hoc bounds are recommended in Goeman et al. (2019), as they are valid under the PRDS assumption and can be calculated efficiently. Simes post hoc bounds have recently been popularized in genomics by Ebrahimpoor and Goeman (2021) but also in neuroimaging studies by Rosenblatt et al. (2018), where this approach has been called 'All-resolutions inference' (ARI). ...
... As noted by Ebrahimpoor and Goeman (2021), post hoc inference makes it possible to select genes of interest based on both fold change and significance, without compromising the validity of the corresponding bounds. Moreover, even if Wilcoxon tests have been performed for the calibration of the post hoc bounds, our proposed post hoc bounds are still valid when relying on other statistics for the selection genes of interest. ...
Article
Motivation: The standard approach for statistical inference in differential expression (DE) analyses is to control the False Discovery Rate (FDR). However, controlling the FDR does not in fact imply that the proportion of false discoveries is upper bounded. Moreover, no statistical guarantee can be given on subsets of genes selected by FDR thresholding. These known limitations are overcome by post hoc inference, which provides guarantees of the number of proportion of false discoveries among arbitrary gene selections. However, post hoc inference methods are not yet widely used for DE studies. Results: In this paper, we demonstrate the relevance and illustrate the performance of adaptive interpolation-based post hoc methods for two-group DE studies. First, we formalize the use of permutation-based methods to obtain sharp confidence bounds that are adaptive to the dependence between genes. Then, we introduce a generic linear time algorithm for computing post hoc bounds, making these bounds applicable to large-scale two-group DE studies. The use of the resulting Adaptive Simes bound is illustrated on a RNA sequencing study. Comprehensive numerical experiments based on real microarray and RNA sequencing data demonstrate the statistical performance of the method. Availability and implementation: A cross-platform open source implementation within the R package sanssouci is available at https://sanssouci-org.github.io/sanssouci/. Supplementary information: Supplementary data are available at Bioinformatics online. Rmarkdown vignettes for the differential analysis of microarray and RNAseq data are available from the package.
... 1 Applying a cut-off on the fold-change is a common practice in proteomics, however, if improperly achieved, it can hinder the statistical rigor. Notably, it has been demonstrated that applying such a cut-off downstream of the FDR is invalid [10]. It remains possible to incorporate the fold-change cut-off in the statistical test [11], but the resulting p-value interpretation can be a bit more difficult. ...
Preprint
Full-text available
In discovery proteomics, as well as many other "omic" approaches, more than two biological conditions or group treatments can be compared in the one-way Analysis of Variance (OW-ANOVA) framework. The subsequent possibility to test for the differential abundance of hundreds (or thousands) of features simultaneously is appealing, despite requiring specific statistical safeguards, among which controlling for the False Discovery Rate (FDR) has become standard. However, using an FDR control procedure in a context where other multiple test corrections may also be involved because of the number of compared groups raises practical difficulties. This article surveys means to circumvent them, thanks to various practical data processing scenarios.
... However, a common practice is to manually curate this list by adding or subtracting genes, based on some external or priori knowledge (such as the knowledge of gene sets or pathways). A typical example is the case of volcano plots (Cui and Churchill, 2003), where one selects those genes passing both a significance threshold and a threshold on the fold change (difference of average gene expression on the log scale), see Figure 3. Ebrahimpoor and Goeman (2021) have recently shown in an extensive simulation study that this type of double filtering strategy yields inflated false discovery rates. ...
... In practice, Simes post hoc bounds are recommended in Goeman et al. (2019), as they are valid under the PRDS assumption and can be calculated efficiently. Simes post hoc bounds have recently been popularized in genomics by Ebrahimpoor and Goeman (2021), but also in neuroimaging studies by Rosenblatt et al. (2018), where this approach has been called "All-resolutions inference" (ARI). ...
... The copyright holder for this preprint this version posted March 8, 2022. ; https://doi.org/10.1101/2022.03.08.483449 doi: bioRxiv preprint As noted by Ebrahimpoor and Goeman (2021), post hoc inference makes it possible to select genes of interest based on both fold change and significance, without compromising the validity of the corresponding bounds. Moreover, even if Wilcoxon tests have been performed for the calibration of the post hoc bounds, it is possible to rely on other statistics for the selection genes of interest. ...
Preprint
Full-text available
Motivation The standard approach for statistical inference in differential expression (DE) analyses is to control the False Discovery Rate (FDR). However, controlling the FDR does not in fact imply that the proportion of false discoveries is upper bounded. Moreover, no statistical guarantee can be given on subsets of genes selected by FDR thresholding. These known limitations are overcome by post hoc inference, which provides guarantees of the number of proportion of false discoveries among arbitrary gene selections. However, post hoc inference methods are not yet widely used for DE studies. Results In this paper, we demonstrate the relevance and illustrate the performance of adaptive interpolation-based post hoc methods for DE studies. First, we formalize the use of permutation-based methods to obtain sharp confidence bounds that are adaptive to the dependence between genes. Then, we introduce a generic linear time algorithm for computing post hoc bounds, making these bounds applicable to large-scale DE studies. The use of the resulting Adaptive Simes bound is illustrated on a RNA sequencing study. Comprehensive numerical experiments based on real microarray and RNA sequencing data demonstrate the statistical performance of the method. Availability A cross-platform open source implementation within the R package sanssouci is available at https://pneuvial.github.io/sanssouci/ .
Article
Full-text available
Inorganic fertilizers are routinely used in large scale crop production for the supplementation of nitrogen, phosphorus, and potassium in nutrient poor soil. To explore metabolic changes in tomato plants grown...
Article
Medical science has often used adult males as the standard to establish pathological conditions, their transitions, diagnostic methods, and treatment methods. However, it has recently become clear that sex differences exist in how risk factors contribute to the same disease, and these differences also exist in the efficacy of the same drug. Furthermore, the elderly and children have lower metabolic functions than adult males, and the results of clinical trials on adult males cannot be directly applied to these patients. Spontaneous reporting systems have become an important source of information for safety assessment, thereby reflecting drugs’ actual use in specific populations and clinical settings. However, spontaneous reporting systems only register drug-related adverse events (AEs); thus, they cannot accurately capture the total number of patients using these drugs. Therefore, although various algorithms have been developed to exploit disproportionality and search for AE signals, there is no systematic literature on how to detect AE signals specific to the elderly and children or sex-specific signals. This review describes signal detection using data mining, considering traditional methods and the latest knowledge, and their limitations.
Article
Full-text available
Selecting omic biomarkers using both their effect size and their differential status significance (i.e., selecting the “volcano-plot outer spray”) has long been equally biologically relevant and statistically troublesome. However, recent proposals are paving the way to resolving this dilemma.