Figure - available from: Scientific Reports
This content is subject to copyright. Terms and conditions apply.
(A) Representative kernel density plots for the original features and after applying OPNested ComBat. (B) Representative kernel density plots for the original features and after applying OPNested + GMM ComBat (C) Representative kernel density plots for the original features and after harmonizing with OPNested − GMM ComBat. Kernel density plots represent ComBat results separated by the batch variable manufacturer, and plots for representative features whose distributions best visually demonstrate the effects of GMM ComBat were selected by screening all the feature distributions before and after harmonization. Harmonization should result in more similar feature distributions.

(A) Representative kernel density plots for the original features and after applying OPNested ComBat. (B) Representative kernel density plots for the original features and after applying OPNested + GMM ComBat (C) Representative kernel density plots for the original features and after harmonizing with OPNested − GMM ComBat. Kernel density plots represent ComBat results separated by the batch variable manufacturer, and plots for representative features whose distributions best visually demonstrate the effects of GMM ComBat were selected by screening all the feature distributions before and after harmonization. Harmonization should result in more similar feature distributions.

Source publication
Article
Full-text available
Radiomic approaches in precision medicine are promising, but variation associated with image acquisition factors can result in severe biases and low generalizability. Multicenter datasets used in these studies are often heterogeneous in multiple imaging parameters and/or have missing information, resulting in multimodal radiomic feature distributio...

Citations

... The effective implementation of tools for correcting batch effects has resulted in the creation of improved harmonization methods, and a greater number of studies are being carried out to assess their effectiveness across a wider range of datasets 11,16 . Some of these advancements include generalizing ComBat to multiple batch variables and scenarios when batch variables are unknown, as well as application of ComBat to covariance structures and longitudinal data [17][18][19][20] . However, the methods used to evaluate when harmonization is necessary, and the performance of harmonization methods are varied and inconsistent. ...
... The KS test enables testing for more general differences in distribution, but can only be used for testing between two groups and not when batch variables contain three or more groups. The AD test, which is available for k-sample testing, enables assessment of differences in distribution for three or more groups but is sensitive to cases where the bulk of the distributions are aligned due to the heavier weight it places on the tails of the distribution 18 . These weaknesses indicate that the most commonly used statistical tests for detecting batch effects are inadequate for the complex distributions often observed in radiomic features. ...
... Univariate statistical tests of the null hypothesis of no differences in distribution across batch groups are a common approach for detecting batch effects and assessing harmonization performance in radiomic datasets. We thus use these to benchmark batch effect detection performance relative to RESI and PERMANOVA 14,15,18 . In this work, we use the Wilcoxon Rank-Sum (WRS) test, the Kolmogorov-Smirnov (KS), and the Anderson-Darling (AD) test. ...
Article
Full-text available
While precision medicine applications of radiomics analysis are promising, differences in image acquisition can cause “batch effects” that reduce reproducibility and affect downstream predictive analyses. Harmonization methods such as ComBat have been developed to correct these effects, but evaluation methods for quantifying batch effects are inconsistent. In this study, we propose the use of the multivariate statistical test PERMANOVA and the Robust Effect Size Index (RESI) to better quantify and characterize batch effects in radiomics data. We evaluate these methods in both simulated and real radiomics features extracted from full-field digital mammography (FFDM) data. PERMANOVA demonstrated higher power than standard univariate statistical testing, and RESI was able to interpretably quantify the effect size of site at extremely large sample sizes. These methods show promise as more powerful and interpretable methods for the detection and quantification of batch effects in radiomics studies.
... The ComBat (and ReComBat) algorithm applied twice showed variable results when changing the order of application. This inconsistency is not surprising, as it motivated the development of OPNested ComBat in the first place 21,22 . In fact, despite the slight algorithmic differences between ComBat and OPNested, OPNested performed very similarly to ComBat-scanner-center. ...
... As a matter of fact, applying ComBat or ReComBat in cascade to capture and remove the linear variability from more than one confounding factor may cause instabilities depending on the specific order of the harmonization steps. Very recently, Horng et al. 21,22 proposed an optimized procedure for sequentially harmonizing data from multiple batch effects, namely OPNested ComBat. Besides ComBat and ReComBat, we included OPNested as a benchmark model, to be tested in both deconfusion and predictive powers. ...
Article
Full-text available
Medical imaging represents the primary tool for investigating and monitoring several diseases, including cancer. The advances in quantitative image analysis have developed towards the extraction of biomarkers able to support clinical decisions. To produce robust results, multi-center studies are often set up. However, the imaging information must be denoised from confounding factors—known as batch-effect—like scanner-specific and center-specific influences. Moreover, in non-solid cancers, like lymphomas, effective biomarkers require an imaging-based representation of the disease that accounts for its multi-site spreading over the patient’s body. In this work, we address the dual-factor deconfusion problem and we propose a deconfusion algorithm to harmonize the imaging information of patients affected by Hodgkin Lymphoma in a multi-center setting. We show that the proposed model successfully denoises data from domain-specific variability (p-value < 0.001) while it coherently preserves the spatial relationship between imaging descriptions of peer lesions (p-value = 0), which is a strong prognostic biomarker for tumor heterogeneity assessment. This harmonization step allows to significantly improve the performance in prognostic models with respect to state-of-the-art methods, enabling building exhaustive patient representations and delivering more accurate analyses (p-values < 0.001 in training, p-values < 0.05 in testing). This work lays the groundwork for performing large-scale and reproducible analyses on multi-center data that are urgently needed to convey the translation of imaging-based biomarkers into the clinical practice as effective prognostic tools. The code is available on GitHub at this https://github.com/LaraCavinato/Dual-ADAE.
... Indeed, to remove multiple confounders, they must be applied repeatedly, one factor at a time. As the context of radiomics studies oftentimes implies multiple confounders, Nested ComBat (20) and its improved evolution from the same authors, OPNested Combat (21), were recently proposed specifically to tackle multi-factor deconfusion. The latter algorithm applies ComBat iteratively on confounder-associated subsets of features, identifying the optimal order of factors to correct for. ...
... The latter algorithm applies ComBat iteratively on confounder-associated subsets of features, identifying the optimal order of factors to correct for. Notably, irrespectively of the number of confounders removed from the data, ComBat-based methods rely on the hypothesis of normality of the features' errors, which might be unrealistic for radiomics data (21). Moreover, none of the above methods perform dimensionality reduction and are thus typically followed by Principal Component Analysis (PCA) before the analysis. ...
... For the sake of comparison with state-of-the-art approaches, we tested three major ComBat implementations, namely Combat (17), ReComBat (19) and OPNested Combat (21), comparing the results to quantify the improvements of our solution. Specifically, single-factor ComBat was applied twice in cascade (in both confounders' orders). ...
Preprint
Full-text available
Medical imaging represents the primary tool for investigating and monitoring several diseases, including cancer. The advances in quantitative image analysis have developed towards the extraction of biomarkers able to support clinical decisions. To produce robust results, multi-center studies are often set up. However, the imaging information must be denoised from confounding factors – known as batch-effect – like scanner-specific and center-specific influences. Moreover, in non-solid cancers, like lymphomas, effective biomarkers require an imaging-based representation of the disease that accounts for its multi-site spreading over the patient’s body. In this work, we address the dual-factor deconfusion problem and we propose a deconfusion algorithm to harmonize the imaging information of patients affected by Hodgkin Lymphoma in a multi-center setting. We show that the proposed model successfully denoises data from domain-specific variability while it coherently preserves the spatial relationship between imaging descriptions of peer lesions, which is a strong prognostic biomarker for tumor heterogeneity assessment. This harmonization step allows to significantly improve the performance in prognostic models, enabling building exhaustive patient representations and delivering more accurate analyses. This work lays the groundwork for performing large-scale and reproducible analyses on multi-center data that are urgently needed to convey the translation of imaging-based biomarkers into the clinical practice as effective prognostic tools. The code is available on GitHub ( link ).
... While ComBat is fast and easy to use, current implementations of ComBat are only able to harmonize by a single batch effect at a time and are therefore unable to adequately harmonize datasets that are heterogeneous in more than one batch effect. The OPNested ComBat approach used in our study enables harmonization by multiple batch effects by implementing sequential harmonization [29][30][31] . The approach was initialized with the radiomic features as input data and a list of batch variables (Breast I-SPY1: Table 1 and NSCLC IO: www.nature.com/scientificreports/ ...
... We note that although the statistical significance of the phenotypes obtained with heterogeneity mitigation is improved, the prognostic performance of the models does not improve substantially. One of the possible reasons for this reduction has been discussed in the paper based on the harmonization method used in our analysis: "A possible explanation is that because imaging parameters were generally associated with outcome as a consequence is study design, the removal of variation associated with those imaging parameters reduced predictive performance" 30 . However, we would like to point out that an improvement in the reproducibility of the radiomic signatures does not necessarily correlate with an improvement in prognostic performance. ...
Article
Full-text available
Our study investigates the effects of heterogeneity in image parameters on the reproducibility of prognostic performance of models built using radiomic biomarkers. We compare the prognostic performance of models derived from the heterogeneity-mitigated features with that of models obtained from raw features, to assess whether reproducibility of prognostic scores improves upon application of our methods. We used two datasets: The Breast I-SPY1 dataset—Baseline DCE-MRI scans of 156 women with locally advanced breast cancer, treated with neoadjuvant chemotherapy, publicly available via The Cancer Imaging Archive (TCIA); The NSCLC IO dataset—Baseline CT scans of 107 patients with stage 4 non-small cell lung cancer (NSCLC), treated with pembrolizumab immunotherapy at our institution. Radiomic features (n = 102) are extracted from the tumor ROIs. We use a variety of resampling and harmonization scenarios to mitigate the heterogeneity in image parameters. The patients were divided into groups based on batch variables. For each group, the radiomic phenotypes are combined with the clinical covariates into a prognostic model. The performance of the groups is assessed using the c-statistic, derived from a Cox proportional hazards model fitted on all patients within a group. The heterogeneity-mitigation scenario (radiomic features, derived from images that have been resampled to minimum voxel spacing, are harmonized using the image acquisition parameters as batch variables) gave models with highest prognostic scores (for e.g., IO dataset; batch variable: high kernel resolution—c-score: 0.66). The prognostic performance of patient groups is not comparable in case of models built using non-heterogeneity mitigated features (for e.g., I-SPY1 dataset; batch variable: small pixel spacing—c-score: 0.54, large pixel spacing—c-score: 0.65). The prognostic performance of patient groups is closer in case of heterogeneity-mitigated scenarios (for e.g., scenario—harmonize by voxel spacing parameters: IO dataset; thin slice—c-score: 0.62, thick slice—c-score: 0.60). Our results indicate that accounting for heterogeneity in image parameters is important to obtain more reproducible prognostic scores, irrespective of image site or modality. For non-heterogeneity mitigated models, the prognostic scores are not comparable across patient groups divided based on batch variables. This study can be a step in the direction of constructing reproducible radiomic biomarkers, thus increasing their application in clinical decision making.
Article
Purpose: In this work, we endeavor to investigate how texture information may contribute to the response of a blur measure (BM) with motivation rooted in mammography. This is vital as the interpretation of the BM is typically not evaluated with respect to texture present in an image. We are particularly concerned with lower scales of blur (≤1 mm) as this blur is least likely to be detected but can still have a detrimental effect on detectability of microcalcifications. Approach: Three sets of linear models, where BM response was modeled as a linear combination of texture information determined by texture measures (TMs), were constructed from three different datasets of equal-blur-level images; one of computer-generated mammogram-like clustered lumpy background (CLB) images and two image sets derived from the Brodatz texture images. The linear models were refined by removing those TMs that are not significantly non-zero across all three datasets for each BM. We use five levels of Gaussian blur to blur the CLB images and assess the ability of the BMs and TMs to separate the images based on blur level. Results: We found that many TMs used frequently in the reduced linear models, mimicked the structure of the BMs that they modeled. Surprisingly, while none of the BMs could separate the CLB images across all levels of blur, a group of TMs could. These TMs occurred infrequently in the reduced linear models meaning that they rely on different information compared with that used by the BMs. Conclusion: These results confirm our hypothesis that BMs can be influenced by texture information in an image. That a subset of TMs performed better than all BMs on the blur classification problem with the CLB images further shows that conventional BMs may not be the optimal tool for blur classification in mammogram images.
Article
Magnetic resonance imaging and computed tomography from multiple batches (e.g. sites, scanners, datasets, etc.) are increasingly used alongside complex downstream analyses to obtain new insights into the human brain. However, significant confounding due to batch-related technical variation, called batch effects, is present in this data; direct application of downstream analyses to the data may lead to biased results. Image harmonization methods seek to remove these batch effects and enable increased generalizability and reproducibility of downstream results. In this review, we describe and categorize current approaches in statistical and deep learning harmonization methods. We also describe current evaluation metrics used to assess harmonization methods and provide a standardized framework to evaluate newly-proposed methods for effective harmonization and preservation of biological information. Finally, we provide recommendations to end-users to advocate for more effective use of current methods and to methodologists to direct future efforts and accelerate development of the field.