Figure 7 - available from: Scientific Reports
This content is subject to copyright. Terms and conditions apply.
Stabilisation performance [% variance reduction] in 10-fold CV (blue) and SSD (red) in the phantom study.

Stabilisation performance [% variance reduction] in 10-fold CV (blue) and SSD (red) in the phantom study.

Source publication
Article
Full-text available
The goal of radiomics is to convert medical images into a minable data space by extraction of quantitative imaging features for clinically relevant analyses, e.g. survival time prediction of a patient. One problem of radiomics from computed tomography is the impact of technical variation such as reconstruction kernel variation within a study. Addit...

Context in source publication

Context 1
... dataset -feature stabilisation. The results are shown in Fig. 7. The technome stabilisation mode gives a stabilisation performance [% variance reduction on test set] in a 10-fold CV (SSD) of 90.4% (91.8%). The RAVEL-like calibration yields 79.3% (76.7%) and the naive approach via GLM 74.7% (79.3%) or via random forest 76.7% ...

Similar publications

Article
Full-text available
Multicenter studies are needed to demonstrate the clinical potential value of radiomics as a prognostic tool. However, variability in scanner models, acquisition protocols and reconstruction settings are unavoidable and radiomic features are notoriously sensitive to these factors, which hinders pooling them in a statistical analysis. A statistical...

Citations

... Moreover, radiomics data features are subject to patient variabilities, such as geometry, which can introduce noise and artifacts. To address this issue, a recent study aimed to quantify these "non-reducible technical variations" and stabilize the radiomics features accordingly [27]. However, this and other previous research works have only suggested some theoretical solutions and lack robust practical solutions to address the limitations of implementing radiomics approaches in prostate cancer classification. ...
Article
Full-text available
This paper focuses on enhancing machine learning (ML)-based diagnosis and clinical decision-making by leveraging radiomics data, which provides a quantitative description of grayscale medical images such as MRI, CT, PET, or X-Ray. Extracted using advanced mathematical and statistical analysis methods, this data comprises hundreds of relevant and irrelevant radiomics features. The study underscores the critical importance of selecting the most relevant and efficient features to enhance ML-based diagnosis and clinical decision-making processes. To address this challenge, the paper introduces an accurate binary prostate cancer classification algorithm that integrates linear support vector machines (SVM) and ridge regression-based four-feature selection algorithms. The algorithm’s performance was evaluated using the PROSTATEx dataset. Notably, when trained on feature subsets selected through importance coefficient, forward- and backward-sequential, and correlation coefficient-based feature selectors, the algorithm achieved classification accuracy exceeding 90%. However, when trained on the full set of features, the algorithm achieved 43.64% classification accuracy. These findings underscore the pivotal role of feature selection in achieving higher accuracy and speed during the training and testing of ML algorithms. Overall, the results indicate that the proposed algorithm can substantially improve the accuracy of prostate cancer classification. Furthermore, the findings have broader implications for the development of more efficient ML-based diagnosis and clinical decision-making systems in the field of gray-scale medical imaging analysis.
... However, since most studies are retrospective, the uneven use of acquisition protocols and filters between institutions makes it difficult to report radiomics data. We should also mention "non-reducible technical variations" that involve patient variabilities that cause image noise or artifacts that are not dependent on the scanner settings and can affect the quality of the radiomics data [30,33,34]. ...
Article
Full-text available
Artificial intelligence (AI) and in particular radiomics has opened new horizons by extracting data from medical imaging that could be used not only to improve diagnostic accuracy, but also to be included in predictive models contributing to treatment stratification of cancer. Head and neck cancers (HNC) are associated with higher recurrence rates, especially in advanced stages of disease. It is considered that approximately 50% of cases will evolve with loco-regional recurrence, even if they will benefit from a current standard treatment consisting of definitive chemo-radiotherapy. Radiotherapy, the cornerstone treatment in locally advanced HNC, could be delivered either by the simultaneous integrated boost (SIB) technique or by the sequential boost technique, the decision often being a subjective one. The principles of radiobiology could be the basis of an optimal decision between the two methods of radiation dose delivery, but the heterogeneity of HNC radio-sensitivity makes this approach difficult. Radiomics has demonstrated the ability to non-invasively predict radio-sensitivity and the risk of relapse in HNC. Tumor heterogeneity evaluated with radiomics, the inclusion of coarseness, entropy and other first order features extracted from gross tumor volume (GTV) in multivariate models could identify pre-treatment cases that will benefit from one of the approaches (SIB or sequential boost radio-chemotherapy) considered the current standard of care for locally advanced HNC. Computer tomography (CT) simulation and daily cone beam CT (CBCT) could be chosen as imaging source for radiomic analysis.
... This idea seems promising, as in the current training pipeline only reconstructions with a softer kernel are included. An additional possibility to increase robustness might be to transfer algorithms aiming to disentangle biological and technical information 16 into the deep learning world. ...
Article
Full-text available
Recently, algorithms capable of assessing the severity of Coronary Artery Disease (CAD) in form of the Coronary Artery Disease-Reporting and Data System (CAD-RADS) grade from Coronary Computed Tomography Angiography (CCTA) scans using Deep Learning (DL) were proposed. Before considering to apply these algorithms in clinical practice, their robustness regarding different commonly used Computed Tomography (CT)-specific image formation parameters—including denoising strength, slab combination, and reconstruction kernel—needs to be evaluated. For this study, we reconstructed a data set of 500 patient CCTA scans under seven image formation parameter configurations. We select one default configuration and evaluate how varying individual parameters impacts the performance and stability of a typical algorithm for automated CAD assessment from CCTA. This algorithm consists of multiple preprocessing and a DL prediction step. We evaluate the influence of the parameter changes on the entire pipeline and additionally on only the DL step by propagating the centerline extraction results of the default configuration to all others. We consider the standard deviation of the CAD severity prediction grade difference between the default and variation configurations to assess the stability w.r.t. parameter changes. For the full pipeline we observe slight instability (± 0.226 CAD-RADS) for all variations. Predictions are more stable with centerlines propagated from the default to the variation configurations (± 0.122 CAD-RADS), especially for differing denoising strengths (± 0.046 CAD-RADS). However, stacking slabs with sharp boundaries instead of mixing slabs in overlapping regions (called true stack ± 0.313 CAD-RADS) and increasing the sharpness of the reconstruction kernel (± 0.150 CAD-RADS) leads to unstable predictions. Regarding the clinically relevant tasks of excluding CAD (called rule-out; AUC default 0.957, min 0.937) and excluding obstructive CAD (called hold-out; AUC default 0.971, min 0.964) the performance remains on a high level for all variations. Concluding, an influence of reconstruction parameters on the predictions is observed. Especially, scans reconstructed with the true stack parameter need to be treated with caution when using a DL-based method. Also, reconstruction kernels which are underrepresented in the training data increase the prediction uncertainty.
... Test-retest analysis is helpful to evaluate the inherent variation of radiomic features [51,[56][57][58], and there are publicly available datasets to support these analyses, including the popular RIDER dataset [59]. In addition, post-reconstruction image processing may further reduce batch effects, particularly those associated with scene characteristics of the subject being imaged, which remain even when imaging protocols are held constant [60]. Normalization of the signal intensity is a useful post-reconstruction tool that may increase the efficacy of downstream radiomic analysis steps. ...
Article
Full-text available
Radiomics is a high-throughput approach to image phenotyping. It uses computer algorithms to extract and analyze a large number of quantitative features from radiological images. These radiomic features collectively describe unique patterns that can serve as digital fingerprints of disease. They may also capture imaging characteristics that are difficult or impossible to characterize by the human eye. The rapid development of this field is motivated by systems biology, facilitated by data analytics, and powered by artificial intelligence. Here, as part of Abdominal Radiology’s special issue on Quantitative Imaging, we provide an introduction to the field of radiomics. The technique is formally introduced as an advanced application of data analytics, with illustrating examples in abdominal radiology. Artificial intelligence is then presented as the main driving force of radiomics, and common techniques are defined and briefly compared. The complete step-by-step process of radiomic phenotyping is then broken down into five key phases. Potential pitfalls of each phase are highlighted, and recommendations are provided to reduce sources of variation, non-reproducibility, and error associated with radiomics. Graphic abstract
... Similarly to the batch effect in genomics [15], technical variation in CT scans occurs for a variety of reasons, becoming especially problematic when it is correlated with the predictive task, for instance due to prior knowledge of the clinician and/or patient of a likely diagnosis, or site-specific differences in patient selection and acquisition protocols within multi-center data sets [16]. ...
... Notably, a resampling preprocessing approach accounts for less frequent permutations of image properties -acquired with a variational autoencoder (VAE) [13] -by sampling the associated images more frequently [2]. In classical ML, RAVEL regresses unwanted features per voxel using control regions [8] and the Technome [16] combines debiasing and training in order to avoid the risk of removing informative biological information when stabilizing during preprocessing. Our work aims to transfer such ideas to the field of DL. ...
Preprint
Full-text available
Reliably detecting diseases using relevant biological information is crucial for real-world applicability of deep learning techniques in medical imaging. We debias deep learning models during training against unknown bias - without preprocessing/filtering the input beforehand or assuming specific knowledge about its distribution or precise nature in the dataset. We use control regions as surrogates that carry information regarding the bias, employ the classifier model to extract features, and suppress biased intermediate features with our custom, modular DecorreLayer. We evaluate our method on a dataset of 952 lung computed tomography scans by introducing simulated biases w.r.t. reconstruction kernel and noise level and propose including an adversarial test set in evaluations of bias reduction techniques. In a moderately sized model architecture, applying the proposed method to learn from data exhibiting a strong bias, it near-perfectly recovers the classification performance observed when training with corresponding unbiased data.
... These features may be helpful for two reasons, either by providing relevant biological information or, indirectly, by internally calibrating the measurements within the lung by features in NL-ROIs. 40,41 For these meta-configurations the mRMR algorithm is always used to slightly reduce the impact of overfitting. ...
Article
Full-text available
Purpose In the literature on automated phenotyping of chronic obstructive pulmonary disease (COPD), there is a multitude of isolated classical machine learning and deep learning techniques, mostly investigating individual phenotypes, with small study cohorts and heterogeneous meta‐parameters, e.g., different scan protocols or segmented regions. The objective is to compare the impact of different experimental setups, i.e., varying meta‐parameters related to image formation and data representation, with the impact of the learning technique for subtyping automation for a variety of phenotypes. The identified associations of these parameters with automation performance and their interactions might be a first step towards a determination of optimal meta‐parameters, i.e., a meta‐strategy. Methods A clinical cohort of 981 patients (53.8 ± 15.1 years, 554 male) was examined. The inspiratory CT images were analyzed to automate the diagnosis of 13 COPD phenotypes given by two radiologists. A benchmark feature set that integrates many quantitative criteria was extracted from the lung and trained a variety of learning algorithms on the first 654 patients (two thirds) and the respective algorithm retrospectively assessed the remaining 327 patients (one third). The automation performance was evaluated by the area under the receiver operating characteristic curve (AUC). 1717 experiments were conducted with varying meta‐parameters such as reconstruction kernel, segmented regions and input dimensionality, i.e., number of extracted features. The association of the meta‐parameters with the automation performance was analyzed by multivariable general linear model decomposition of the automation performance in the contributions of meta‐parameters and the learning technique. Results The automation performance varied strongly for varying meta‐parameters. For emphysema‐predominant phenotypes, an AUC of 93%–95% could be achieved for the best meta‐configuration. The airways‐predominant phenotypes led to a lower performance of 65%–85%, while smooth kernel configurations on average were unexpectedly superior to those with sharp kernels. The performance impact of meta‐parameters, even that of often neglected ones like the missing‐data imputation, was in general larger than that of the learning technique. Advanced learning techniques like 3D deep learning or automated machine learning yielded inferior automation performance for non‐optimal meta‐configurations in comparison to simple techniques with suitable meta‐configurations. The best automation performance was achieved by a combination of modern learning techniques and a suitable meta‐configuration. Conclusions Our results indicate that for COPD phenotype automation, study design parameters such as reconstruction kernel and the model input dimensionality should be adapted to the learning technique and may be more important than the technique itself. To achieve optimal automation and prediction results, the interaction between input those meta‐parameters and the learning technique should be considered. This might be particularly relevant for the development of specific scan protocols for novel learning algorithms, and towards an understanding of good study design for automated phenotyping.
... Differences in slice thickness, voxel sizes and convolutional kernels can be normalised using a range of approaches such as voxel-size resampling, batch effect correlation, and grey-level normalisation [109][110][111]. A predictive internal calibration approach was shown to improve performance of emphysema prediction in a COPD study [112]. Moving to an ML based automated approach for segmentation has higher accuracy and reduced variability compared to manual segmentation [113]. ...
Article
Full-text available
Accurate phenotyping of patients with pulmonary hypertension (PH) is an integral part of informing disease classification, treatment, and prognosis. The impact of lung disease on PH outcomes and response to treatment remains a challenging area with limited progress. Imaging with computed tomography (CT) plays an important role in patients with suspected PH when assessing for parenchymal lung disease, however, current assessments are limited by their semi-qualitative nature. Quantitative chest-CT (QCT) allows numerical quantification of lung parenchymal disease beyond subjective visual assessment. This has facilitated advances in radiological assessment and clinical correlation of a range of lung diseases including emphysema, interstitial lung disease, and coronavirus disease 2019 (COVID-19). Artificial Intelligence approaches have the potential to facilitate rapid quantitative assessments. Benefits of cross-sectional imaging include ease and speed of scan acquisition, repeatability and the potential for novel insights beyond visual assessment alone. Potential clinical benefits include improved phenotyping and prediction of treatment response and survival. Artificial intelligence approaches also have the potential to aid more focused study of pulmonary arterial hypertension (PAH) therapies by identifying more homogeneous subgroups of patients with lung disease. This state-of-the-art review summarizes recent QCT developments and potential applications in patients with PH with a focus on lung disease.
... Apart from the variations in scanners and settings, radiomic feature values are also influenced by patient variabilities, e.g., geometry, which impact the levels of noise and presence of artifacts in an image. Therefore, the aim of a recent study was to quantify these so-called "non-reducible technical variations" and stabilize the radiomic features accordingly [33]. The next sections summarize the studies that assessed radiomic feature robustness for different acquisition and reconstruction settings of CT, PET, and MRI, as well as for ROI delineation and image preprocessing steps. ...
Article
Full-text available
Radiomics is a quantitative approach to medical imaging, which aims at enhancing the existing data available to clinicians by means of advanced mathematical analysis. Through mathematical extraction of the spatial distribution of signal intensities and pixel interrelationships, radiomics quantifies textural information by using analysis methods from the field of artificial intelligence. Various studies from different fields in imaging have been published so far, highlighting the potential of radiomics to enhance clinical decision-making. However, the field faces several important challenges, which are mainly caused by the various technical factors influencing the extracted radiomic features. The aim of the present review is twofold: first, we present the typical workflow of a radiomics analysis and deliver a practical “how-to” guide for a typical radiomics analysis. Second, we discuss the current limitations of radiomics, suggest potential improvements, and summarize relevant literature on the subject.
... This is particularly noteworthy as we used 26 different CT scanner types in 103 patients with baseline scans due to referrals from external physicians. Although there exist approaches to calibrate texture to technical variation [43,44], a complete absence of the influence of technical variation could fundamentally increase confidence in AI-supported systems. The GMS model showed good predictive and also statistical performance and can trivially be interpreted as a machine learning extension of TBS to integrate more finegrained and non-linear patterns regarding metastasis distribution. ...
Article
Objectives To investigate the prediction of 1-year survival (1-YS) in patients with metastatic colorectal cancer with use of a systematic comparative analysis of quantitative imaging biomarkers (QIBs) based on the geometric and radiomics analysis of whole liver tumor burden (WLTB) in comparison to predictions based on the tumor burden score (TBS), WLTB volume alone, and a clinical model.MethodsA total of 103 patients (mean age: 61.0 ± 11.2 years) with colorectal liver metastases were analyzed in this retrospective study. Automatic segmentations of WLTB from baseline contrast-enhanced CT images were used. Established biomarkers as well as a standard radiomics model building were used to derive 3 prognostic models. The benefits of a geometric metastatic spread (GMS) model, the Aerts radiomics prior model of the WLTB, and the performance of TBS and WLTB volume alone were assessed. All models were analyzed in both statistical and predictive machine learning settings in terms of AUC.ResultsTBS showed the best discriminative performance in a statistical setting to discriminate 1-YS (AUC = 0.70, CI: [0.56, 0.90]). For the machine learning–based prediction for unseen patients, both a model of the GMS of WLTB (0.73, CI: [0.60, 0.84]) and the Aerts radiomics prior model (0.76, CI: [0.65, 0.86]) applied on the WLTB showed a numerically higher predictive performance than TBS (0.68, CI: [0.54, 0.79]), radiomics (0.65, CI: [0.55, 0.78]), WLTB volume alone (0.53, CI: [0.40. 0.66]), or the clinical model (0.56, CI: [0.43, 0.67]).Conclusions The imaging-based GMS model may be a first step towards a more fine-grained machine learning extension of the TBS concept for risk stratification in mCRC patients without the vulnerability to technical variance of radiomics.Key Points • CT-based geometric distribution and radiomics analysis of whole liver tumor burden in metastatic colorectal cancer patients yield prognostic information. • Differences in survival are possibly attributable to the spatial distribution of metastatic lesions and the geometric metastatic spread analysis of all liver metastases may serve as robust imaging biomarker invariant to technical variation. • Imaging-based prediction models outperform clinical models for 1-year survival prediction in metastatic colorectal cancer patients with liver metastases.
Chapter
Harmonization in the feature domain involves standardizing the extraction and quantification of imaging features from medical images, ensuring consistency across different platforms, institutions, and studies. Two primary approaches are discussed: identifying stable radiomic variables and employing normalization techniques. Reproducibility is a key concern, as radiomic features must be robust to variations in imaging protocols, patient characteristics, and scanners. Various methods and considerations are explored to minimize these challenges, encompassing imaging data reproducibility, segmentation, and post-processing techniques. Normalization techniques play a pivotal role in ensuring the comparability of radiomic features across different modalities and centers. Methods like statistical normalization, ComBat, and deep learning approaches are examined. These techniques are essential for addressing variations that can introduce bias into radiomics models, and this chapter underscores the critical role of harmonization in radiomics, emphasizing the need for reproducibility and the use of normalization techniques to ensure reliable and comparable results.