ArticlePDF Available

Fudging the volcano-plot without dredging the data

Authors:

Abstract

Selecting omic biomarkers using both their effect size and their differential status significance (i.e., selecting the “volcano-plot outer spray”) has long been equally biologically relevant and statistically troublesome. However, recent proposals are paving the way to resolving this dilemma.
Comment
https://doi.org/10.1038/s41467-024-45834-7
Fudging the volcano-plot without dredging
the data
Thomas Burger Check for updates
Selecting omic biomarkers using both their
effect size and their differential status sig-
nicance (i.e., selecting the volcano-plot outer
spray) has long been equally biologically rele-
vant and statistically troublesome. However,
recent proposals are paving the way to resolving
this dilemma.
In their recent Nature Communications article, Bayer et al. present the
tool CurveCurator1to select biomarkers according to their dose-
response proles, with well-established statistical guarantees. To
conveniently blend the effect size and the signicance of the dose-
response curve into a single relevance score, they revisit the so-called
fudge factor introduced in the SAM test2. Moreover, to overcome the
risk of involuntary data dredging inherent to fudgingthe differential
analysis3, they propose a new approach inspired by the target-decoy
competition framework (TDC4). The principle of TDC is to add coun-
terfactual amino acid sequences (termed decoys) to a (target) data-
base of real amino acid sequences, as to mimic erroneous matches in a
peptide identication task. Despite its original empirical-only justi-
cations (peptide matches involving decoy sequences should be as
probable as mismatches involving target sequences), TDC has long
been used in mass spectrometry-based proteomics to validate peptide
identications according to a False Discovery Rate (FDR5) threshold.
Accordingly, Bayer et al. claim FDR control guarantees regardless of
the fudge factor tuning. Several recent works in selective inference (a
subeld of high-dimensional statistics) have provided theoretical
support to their intuition6,7, which justify its generalization to a variety
of similar situations. Concretely, this comment asserts that essentially
any omics data analysis involving a volcano-plot is concerned be it
transcriptomics, metabolomics, proteomics or any other; either at
bulk or single cell resolution. Therefore, elaborating on Bayer et al.
visionary proposal should lead to new user-tailored computational
omic tools, with sweeping consequences from the application
standpoint.
Issues pertaining to the fudge factor
While the fudge factor was originally introduced as a small positive
constant (denoted as s0) to improve the independence of the test
statistic variance and of the omic feature expression, its tuning to a
larger value has been observed to yielda user-dened weighting of the
signicance and of the effect size. Concomitantly, the permutation-
based procedure of SAM test has sometimes been replaced by classical
p-value adjustment as prescribed in the Benjamini-Hochberg (BH)
procedure for FDR control5. Applying simultaneously these two tricks
enhances volcano-plot interpretation: the biomarkers selected are
located in the outer spray of the volcano-plot, with selection
boundaries following hyperbolic contours (see Fig. 1). Unfortunately
doing so jeopardizes the statistical guarantees: briey, a too large s0
value distorts the p-values as well as the subsequent adjusted p-values
calculated in the BH procedure. To cope with this, itis either necessary
to constrain the tuning of s0(at the cost of less exible selection of the
outerspray)ortoreplaceBHprocedurebyanotherFDRcontrol
method that does not require any p-value adjustment. Although the
permutation-based procedure associated to SAM test is an option, it
does not strictly controls for the FDR (see Table 1). Bayer et al. have
thus explored another option inspired by TDC, which has emerged
nearly twenty years ago in proteomics in absence of p-values to assess
the signicance of peptide identication.
Competition-based alternatives to control for the FDR
Although published a decade later, the most convincing theoretical
support of TDC to date has been knock-off lters (or KO)6,7.Inspiteof
minor discrepancies with TDC8, KO mathematically justies TDC gen-
eral approach to FDR control, as well as its main computational steps.
Notably, it demonstrates that FDR can be controlled on a biomarker
selection task by thresholding a contrast of relevance scores, which
results from a pairwise competition between the real putative
Fig. 1 | A typical volcano-plot. Asignicance measure is depicted on the Y-axis
(here, -log10(p-value)) and an effect size is depicted on the X-axis (here, the loga-
rithmized fold-change). The blue lines represent the contours of the relevance
score and the points highlighted in red are those selected according to a knockoff
procedure.
nature communications (2024) 15:1392 | 1
1234567890():,;
1234567890():,;
biomarkers and other ones, ctionalized respectively referred to as
decoys and knock-offs in the proteomic and statistic parlances. Intui-
tively, the proportion of ctionalized features selected should be a
decent proxy of the ratio of false discoveries [Nota Bene:InKOtheory,
this proportion is corrected by adding 1 to the ratio numerator to cope
for a bias issue. Although this bias is still investigated9,thissuggeststo
correct for Eq. 16 in1by adding 1 to the numerator too.], as long as the
decision is made symmetrically (i.e., their relevance score is attributed
regardless of their real/ctional status). However, despite conceptual
similarities, the problems solvable by TDC and KO differ: For the for-
mer, features are classically amino acid sequences; while for the latter,
a quantitative dataset describing biomolecular expression levels in
response to various experimental conditions is classically considered.
In this context, the TDC extension proposed in CurveCurator to pro-
cess quantitative dose-response curves constitutes a nice bridge
between the TDC and KO kingdoms.
Generalizing the CurveCurator approach
With this in mind, the pragmatic fallouts of Bayer et al. become strik-
ing. Any data analyst wishing to select omic biomarkers with a rele-
vance score picturing hyperbolic contours on a volcano plot (see Fig. 1)
can easily adapt CurveCurator approach to their own case, by follow-
ing the above procedure:
(1) Perform statistical tests to obtain a p-value for each putative
biomarker that assess the signicance of its differential status,
(2) Likewise, compute the biomarker fold-change, as a measure of the
effect size, and construct the volcano-plot,
(3) Tune s0to blend the signicance of the differential status and the
effect size into a single relevance score,
(4) Acknowledge the relevance score looks like a p-value even though
it may not be valid to use it as such, depending on the s0chosen,
(5) Rely on the KO framework (e.g., using the knockoffRpackage
(https://cran.r-project.org/web/packages/knockoff/index.html)
as well as on the numerous tutorials available (https://web.
stanford.edu/group/candes/knockoffs/software/knockoffs/)to
control for the FDR on the biomarker selected according to the
relevance score, in a way similar to that of CurveCurator.
Different FDR control frameworks for different situations
An important and possibly troublesome feature of Fig. 1is that some
unselectedblack points are surrounded by selectedred ones. In
other words, some putative biomarkers may not be retained while
other ones with smaller effect size and larger raw p-value are. This is a
classical drawback of competition-based FDR control methods: each
putative biomarker being retained or not does not only depend on its
features, but also on those of its ctionalized counterpart, which
generation is subject to randomness. Although this weakness can be
addressed too, it requires less straightforward tools10. Another still
open problem in KO theory lies in the KO/decoy generation, which can
be difcult depending on the dataset. With this respect, the approach
of CurveCurator is worthwhile. More generally, no method is perfect:
KO lters, like p-value adjustment or permutation-based control have
pros and cons (see Table 1). Therefore, depending on the data analyst
need, the preferred method should change. Considering this need for
multiple off-the-shelf tools, it is important to noticethat KO lters have
hardly spread beyond the theoretical community so far,and that their
applications to enhance data analysis in biology-centered investiga-
tions are still scarce, unfortunately. In this context, the seminal pro-
posal of Bayer et al. can be expected to foster the translation of these
fast-evolving theories into practical and efcient software with grow-
ing importance in biomarker discoveries, and they must be acknowl-
edged for this.
Thomas Burger
1
1
Univ.GrenobleAlpes,INSERM,CEA,UA13BGE,CNRS,CEA,FR2048
ProFI, 38000 Grenoble, France. e-mail: thomas.burger@cea.fr
Received: 21 December 2023; Accepted: 2 February 2024;
References
1. Bayer, F. P., Gander, M., Kuster, B. & The, M. CurveCurator: a recalibrated F-statistic to
assess, classify, and explore signicance of doseresponse curves. Nat. Commun. 14,
7902 (2023).
2. Tusher, V. G., Tibshirani, R. & Chu, G. Signicance analysis of microarrays applied to the
ionizing radiation response. Proc. Natl Acad. Sci. 98,51165121 (2001).
3. Giai Gianetto, Q.,Couté, Y., Bruley, C.& Burger, T. Uses and misuses of thefudge factor in
quantitative discovery proteomics. Proteomics 16,19551960 (2016).
4. Elias, J. E. & Gygi,S. P. Target-decoy searchstrategy for increased condence in large-scale
protein identicationsbymassspectrometry.Nat. Methods 4,207214 (2007).
5. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful
approach to multiple testing. J. R. Stat. Soc.: Ser. B (Methodol.) 57,289300 (1995).
6. Barber, R. F. & Candès, E. J. Controlling the falsediscovery rate via knockoffs. Ann. Stat. 43,
20552085 (2015).
7. Candès, E., Fan, Y., Janson, L. & Lv, J. Panning for gold:model-Xknockoffs for high
dimensional controlled variable selection. J. R. Stat. Soc. Ser. B: Stat. Methodol. 80,
551577 (2018).
Table 1 | Pros and cons of the various approach to FDR control with respect to selecting biomarkers on the outer spray of the
volcano-plot
Approach to FDR control Advantages Disadvantages
P-value adjustment/
q-value
Standard, easy to apply and computationally efcient. Requires well-calibrated p-values.
Issue with FC ltering3,11,12, either following hyperbolic contours
or not.
Empirical Bayes/null Can cope for most of the drawbacks of the above methods (p-
value calibration and FC interaction).
Requires the capability to tune the priors.
Does not have frequentist interpretation, which may hinder objec-
tive signicance assessment13.
Permutations The multiple test correction is non-parametric.
No calibration issue.
Related works based on FDP bounding authorize double-
dipping14.
Strictly speaking, does not control for the FDR; Instead , it provides a
probabilistic upper bound to the FDP15.
The fudge factor should not be tuned in contradiction to the sta-
tistical guidelines2.
Knock-offs/decoys Flexibility of the relevance score. Instable w.r.t. KO generation10.
Difculty of assessing the KO generation (which can lead to overly
conservative FDR control).
Comment
nature communications (2024) 15:1392 | 2
8. Etourneau, L. & Burger, T. Challenging targets or describing mismatches? A comment on
common decoydistribution by Madej et al. J. Proteome Res. 21, 28402845 (2022).
9. Rajchert, A. & Keich, U. Controlling the false discovery rate via competition: Is the+ 1 nee-
ded? Stat. Probab. Lett. 197, 109819 (2023).
10. Nguyen, T. B., Chevalier, J. A., Thirion, B., & Arlot, S. (2020,November). Aggregation of
multipleknockoffs.In InternationalConferenceon Machine Learning(pp. 7283-7293). PMLR.
11. McCarthy, D. J. & Smyth, G. K. Testing signicance relative to a fold-change threshold is a
TREAT. Bioinformatics 25,765771 (2009).
12. Ebrahimpoor,M. & Goeman, J. J. Inated false discovery rate due tovolcano plots:problem
and solutions. Brief. Bioinform. 22, bbab053 (2021).
13. Burger, T. CanOmics Biology Go Subjective because of Articial Intelligence? A Comment
on Challenges and Opportunities for Bayesian Statistics in Proteomicsby Crook et al. J.
Proteome Res. 21,17831786 (2022).
14. Enjalbert-Courrech, N. & Neuvial, P. Powerful and interpretable control of false discoveries
in two-group differential expression studies. Bioinformatics 38,52145221 (2022).
15. Hemerik, J. & Goeman,J. J. False discovery proportion estimation by permutations: con-
dence for signicance analysis of microarrays. J. R. Stat. Soc. Ser. B: Stat. Methodol. 80,
137155 (2018).
Acknowledgements
This work was supported by grants from the French National Research Agency: ProFI project
(ANR-10-INBS-08), GRAL CBH project(ANR-17-EURE-0003) and MIAI@ GrenobleAlpes (ANR-19-
P3IA-0003).
Author contributions
Conceptualization (TB), bibliography (TB), analysis (TB), manuscript writing (TB).
Competing interests
The author declares no competing interests.
Additional information
Correspondence and requests for materials should be addressed to Thomas Burger.
Peer review information Nature Communications thanks the anonymous reviewer(s) for their
contribution to the peer review of this work.
Reprints and permissions information is available at
http://www.nature.com/reprints
PublishersnoteSpringer Nature remains neutral with regard to jurisdictional claims in pub-
lished maps and institutional afliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International
License,which permitsuse, sharing,adaptation, distributionand reproductionin any medium or
format,as long as you give appropriate credit to the original author(s) and thesource, provide a
link to the Creative Commons licence, and indicate if changes were made. The imagesor other
third party materialin this article are included in thearticles Creative Commons licence, unless
indicated otherwise in a credit line to the material. If material is not included in the articles
Creative Commons licence and your intended use is not permitted by statutory regulation or
exceeds thepermitted use, you will need to obtain permission directly from the copyright
holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
© The Author(s)2024
Comment
nature communications (2024) 15:1392 | 3
... In trying to overcome this, we exploited a novel strategy, by using so-called volcano plots. Volcano plots are gaining popularity in omics data analyses (i.e., genomics, proteomics, or metabolomics) 44 . This type of statistical plot combines the statistical signi cance with the so-called fold-change of the studied data groups. ...
Preprint
Full-text available
Genomic alterations drive the tumorigenesis of pancreatic cancer (PC). However, alone they do not explain its numerous phenotypes. Exploring the epigenetic landscapes of PC delivers a more insightful picture and might reveal excellent targeted therapies that could improve patient survival. PC subtyping based on histological features reflects its morphological diversity and correlates with clinical outcomes. Here we used a label-free multiplexed molecular imaging to recognize PC epigenetic modifications spatially, consequently, DNA and histone methylation (at lysine and arginine) and histone acetylation (at lysine) were investigated. To complete the picture, B-to-Z-DNA conformational change was assessed. We utilized convolutional neural networks and other machine learning approaches to analyze and semi-quantify the relative variability of epigenome among the six most common PC histological subtypes. We found foamy-glands (FG) and squamous-differentiated (SD) presenting oppositely to others and more alike the benign controls. They consistently expressed higher global levels of epigenetic modifications and higher Z-DNA ratios. Overall, our results suggest variable efficacy of targeting epigenetic regulators in histologically distinct PC subtypes.
Article
Full-text available
Dose-response curves are key metrics in pharmacology and biology to assess phenotypic or molecular actions of bioactive compounds in a quantitative fashion. Yet, it is often unclear whether or not a measured response significantly differs from a curve without regulation, particularly in high-throughput applications or unstable assays. Treating potency and effect size estimates from random and true curves with the same level of confidence can lead to incorrect hypotheses and issues in training machine learning models. Here, we present CurveCurator, an open-source software that provides reliable dose-response characteristics by computing p-values and false discovery rates based on a recalibrated F-statistic and a target-decoy procedure that considers dataset-specific effect size distributions. The application of CurveCurator to three large-scale datasets enables a systematic drug mode of action analysis and demonstrates its scalable utility across several application areas, facilitated by a performant, interactive dashboard for fast data exploration.
Article
Full-text available
Motivation Volcano plots are used to select the most interesting discoveries when too many discoveries remain after application of Benjamini–Hochberg’s procedure (BH). The volcano plot suggests a double filtering procedure that selects features with both small adjusted $P$-value and large estimated effect size. Despite its popularity, this type of selection overlooks the fact that BH does not guarantee error control over filtered subsets of discoveries. Therefore the selected subset of features may include an inflated number of false discoveries. Results In this paper, we illustrate the substantially inflated type I error rate of volcano plot selection with simulation experiments and RNA-seq data. In particular, we show that the feature with the largest estimated effect is a very likely false positive result. Next, we investigate two alternative approaches for multiple testing with double filtering that do not inflate the false discovery rate. Our procedure is implemented in an interactive web application and is publicly available.
Article
Full-text available
Significance analysis of microarrays (SAM) is a highly popular permutation-based multiple-testing method that estimates the false discovery proportion (FDP): the fraction of false positive results among all rejected hypotheses. Perhaps surprisingly, until now this method had no known properties. This paper extends SAM by providing 1−α upper confidence bounds for the FDP, so that exact confidence statements can be made. As a special case, an estimate of the FDP is obtained that underestimates the FDP with probability at most 0.5. Moreover, using a closed testing procedure, this paper decreases the upper bounds and estimates in such a way that the confidence level is maintained. We base our methods on a general result on exact testing with random permutations.
Article
Full-text available
Selecting proteins with significant differential abundance is the cornerstone of many relative quantitative proteomics experiments. To do so, a trade-off between p-value thresholding and fold-change thresholding can be performed thanks to a specific parameter, named fudge factor, and classically noted s(0) . We have observed that this fudge factor is routinely turned away from its original (and statistically valid) use, leading to important distortion in the distribution of p-values, jeopardizing the protein differential analysis; as well as the subsequent biological conclusion. In this article, we provide a comprehensive viewpoint on this issue, as well as some guidelines to circumvent it. This article is protected by copyright. All rights reserved.
Article
Motivation: The standard approach for statistical inference in differential expression (DE) analyses is to control the False Discovery Rate (FDR). However, controlling the FDR does not in fact imply that the proportion of false discoveries is upper bounded. Moreover, no statistical guarantee can be given on subsets of genes selected by FDR thresholding. These known limitations are overcome by post hoc inference, which provides guarantees of the number of proportion of false discoveries among arbitrary gene selections. However, post hoc inference methods are not yet widely used for DE studies. Results: In this paper, we demonstrate the relevance and illustrate the performance of adaptive interpolation-based post hoc methods for two-group DE studies. First, we formalize the use of permutation-based methods to obtain sharp confidence bounds that are adaptive to the dependence between genes. Then, we introduce a generic linear time algorithm for computing post hoc bounds, making these bounds applicable to large-scale two-group DE studies. The use of the resulting Adaptive Simes bound is illustrated on a RNA sequencing study. Comprehensive numerical experiments based on real microarray and RNA sequencing data demonstrate the statistical performance of the method. Availability and implementation: A cross-platform open source implementation within the R package sanssouci is available at https://sanssouci-org.github.io/sanssouci/. Supplementary information: Supplementary data are available at Bioinformatics online. Rmarkdown vignettes for the differential analysis of microarray and RNAseq data are available from the package.
Article
In their recent article, Madej et al. (Madej, D.; Wu, L.; Lam, H.Common Decoy Distributions Simplify False Discovery Rate Estimation in Shotgun Proteomics. J. Proteome Res.2022, 21 (2), 339-348) proposed an original way to solve the recurrent issue of controlling for the false discovery rate (FDR) in peptide-spectrum-match (PSM) validation. Briefly, they proposed to derive a single precise distribution of decoy matches termed the Common Decoy Distribution (CDD) and to use it to control for FDR during a target-only search. Conceptually, this approach is appealing as it takes the best of two worlds, i.e., decoy-based approaches (which leverage a large-scale collection of empirical mismatches) and decoy-free approaches (which are not subject to the randomness of decoy generation while sparing an additional database search). Interestingly, CDD also corresponds to a middle-of-the-road approach in statistics with respect to the two main families of FDR control procedures: Although historically based on estimating the false-positive distribution, FDR control has recently been demonstrated to be possible thanks to competition between the original variables (in proteomics, target sequences) and their fictional counterparts (in proteomics, decoys). Discriminating between these two theoretical trends is of prime importance for computational proteomics. In addition to highlighting why proteomics was a source of inspiration for theoretical biostatistics, it provides practical insights into the improvements that can be made to FDR control methods used in proteomics, including CDD.
Article
In their recent review ( J. Proteome Res. 2022, 21 (4), 849-864), Crook et al. diligently discuss the basics (and less basics) of Bayesian modeling, survey its various applications to proteomics, and highlight its potential for the improvement of computational proteomic tools. Despite its interest and comprehensiveness on these aspects, the pitfalls and risks of Bayesian approaches are hardly introduced to proteomic investigators. Among them, one is sufficiently important to be brought to attention: namely, the possibility that priors introduced at an early stage of the computational investigations detrimentally influence the final statistical significance.
Article
A common problem in modern statistical applications is to select, from a large set of candidates, a subset of variables which are important for determining an outcome of interest. For instance, the outcome may be disease status and the variables may be hundreds of thousands of single nucleotide polymorphisms on the genome. For data coming from low-dimensional ($n\ge p$) linear homoscedastic models, the knockoff procedure recently introduced by Barber and Cand\`{e}s solves the problem by performing variable selection while controlling the false discovery rate (FDR). The present paper extends the knockoff framework to arbitrary (and unknown) conditional models and any dimensions, including $n<p$, allowing it to solve a much broader array of problems. This extension requires the design matrix be random (independent and identically distributed rows) with a covariate distribution that is known, although we show our procedure to be robust to unknown/estimated distributions. To our knowledge, no other procedure solves the variable selection problem in such generality, but in the restricted settings where competitors exist, we demonstrate the superior power of knockoffs through simulations. Finally, we apply our procedure to data from a case-control study of Crohn's disease in the United Kingdom, making twice as many discoveries as the original analysis of the same data.
Article
In many fields of science, we observe a response variable together with a large number of potential explanatory variables, and would like to be able to discover which variables are truly associated with the response. At the same time, we need to know that the false discovery rate (FDR)---the expected fraction of false discoveries among all discoveries---is not too high, in order to assure the scientist that most of the discoveries are indeed true and replicable. This paper introduces the knockoff filter, a new variable selection procedure controlling the FDR in the statistical linear model whenever there are at least as many observations as variables. This method achieves exact FDR control in finite sample settings no matter the design or covariates, the number of variables in the model, and the amplitudes of the unknown regression coefficients, and does not require any knowledge of the noise level. As the name suggests, the method operates by manufacturing knockoff variables that are cheap---their construction does not require any new data---and are designed to mimic the correlation structure found within the existing variables, in a way that allows for accurate FDR control, beyond what is possible with permutation-based methods. The method of knockoffs is very general and flexible, and can work with a broad class of test statistics. We test the method in combination with statistics from the Lasso for sparse regression, and obtain empirical results showing that the resulting method has far more power than existing selection rules when the proportion of null variables is high. We also apply the knockoff filter to HIV data with the goal of identifying those mutations associated with a form of resistance to treatment plans.