Article

Receiver Operating Characteristic Curves and Their Use in Radiology1

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Sensitivity and specificity are the basic measures of accuracy of a diagnostic test; however, they depend on the cut point used to define "positive" and "negative" test results. As the cut point shifts, sensitivity and specificity shift. The receiver operating characteristic (ROC) curve is a plot of the sensitivity of a test versus its false-positive rate for all possible cut points. The advantages of the ROC curve as a means of defining the accuracy of a test, construction of the ROC, and identification of the optimal cut point on the ROC curve are discussed. Several summary measures of the accuracy of a test, including the commonly used percentage of correct diagnoses and area under the ROC curve, are described and compared. Two examples of ROC curve application in radiologic research are presented.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Features calculated based on WoE values for the target in this study are summarized in Table 3. [29,34], [34,37], [37,39], [39,50], [50, inf] [1,7], [7,12], [12,16], [16,22], [22, inf] Total AWAKE sleep time (MINUTE) in initial 90 min [1,2], [2,6], [6,8], [8,12], [12,16], [ [40,48], [48,50], [50,52], [52,55], [55,58], [58,60], [ This study used a logistic regression model [35] to generate the primary sleep habit score. The reasons for this are: (1) ease of interpretation of regression coefficients; (2) since the model can estimate the probability of belonging to a class, it is often used for risk and credibility analysis required for probability calculation; (3) it can be used as a base model. ...
... Features calculated based on WoE values for the target in this study are summarized in Table 3. [29,34], [34,37], [37,39], [39,50], [50, inf] [1,7], [7,12], [12,16], [16,22], [22, inf] Total AWAKE sleep time (MINUTE) in initial 90 min [1,2], [2,6], [6,8], [8,12], [12,16], [ [40,48], [48,50], [50,52], [52,55], [55,58], [58,60], [ This study used a logistic regression model [35] to generate the primary sleep habit score. The reasons for this are: (1) ease of interpretation of regression coefficients; (2) since the model can estimate the probability of belonging to a class, it is often used for risk and credibility analysis required for probability calculation; (3) it can be used as a base model. ...
... To evaluate the performance of the model, the training data and the verification data were first randomly extracted and divided, at a ratio of 7:3 and then three verification metrics commonly used in the credit evaluation model were used, specifically area under ROC (receiver-operating characteristic) curve [37], K-S (Kolmogorov-Smirnov) statistic [38], and Gini coefficient [39]. AUROC (area under ROC) means the area under the ROC curve: the closer the value is to 1, the higher the sensitivity and specificity, so the model can be called a good classification model. ...
Article
Full-text available
The rate of people suffering from sleep disorders has been continuously increasing in recent years, such that interest in healthy sleep is also naturally increasing. Although there are many health-care industries and services related to sleep, specific and objective evaluation of sleep habits is still lacking. Most of the sleep scores presented in wearable-based sleep health services are calculated based only on the sleep stage ratio, which is not sufficient for studies considering the sleep dimension. In addition, most score generation techniques use weighted expert evaluation models, which are often selected based on experience instead of objective weights. Therefore, this study proposes an objective daily sleep habit score calculation method that considers various sleep factors based on user sleep data and gait data collected from wearable devices. A credit rating model built as a logistic regression model is adapted to generate sleep habit scores for good and bad sleep. Ensemble machine learning is designed to generate sleep habit scores for the intermediate sleep remainder. The sleep habit score and evaluation model of this study are expected to be in demand not only in health-care and health-service applications but also in the financial and insurance sectors.
... Regarding diagnostic thresholds for diaphragmatic pillars thickness on CT, for right-sided diaphragmatic paralysis, ROC analysis identified a threshold of 3.0 mm at the level of the celiac artery and a threshold of 4.5 mm at the level of L1, both demonstrating good diagnostic performances [20]. Both thresholds exhibited good sensitivity, specificity, and PPV. ...
... In the case of right diaphragmatic paralysis or weakness, ROC curve analysis determined a threshold of height difference between the two domes of 4.4 cm and 3.5 cm respectively, with perfect diagnostic performances of 100% [20]. This implies that the diagnosis of right diaphragmatic paralysis or weakness is certain if the height exceeds the respective threshold. ...
Article
Full-text available
Introduction Computed tomography (CT) is routinely employed on the evaluation of dyspnea, yet limited data exist on its assessment of diaphragmatic muscle. This study aimed to determine the capability of CT in identifying structural changes in the diaphragm among patients with ultrasound-confirmed diaphragmatic dysfunction. Methods Diaphragmatic ultrasounds conducted between 2018 and 2021 at our center in Marseille, France, were retrospectively collected. Diaphragmatic pillars were measured on CT scans at the L1 level and the celiac artery. Additionally, the difference in height between the two diaphragmatic domes in both diaphragmatic dysfunction cases and controls was measured and compared. Results A total of 65 patients were included, comprising 24 with diaphragmatic paralysis, 13 with diaphragmatic weakness, and 28 controls. In the case group (paralysis and weakness) with left dysfunctions (n = 24), the CT thickness of the pillars at the level of L1 and the celiac artery was significantly thinner compared with controls (2.0 mm vs. 7.4 mm and 1.8 mm vs. 3.1 mm, p < 0.001 respectively). Significantly different values were observed for paralysis (but not weakness) in the right dysfunction subgroup (n = 15) (2.6 mm vs. 7.4 mm and 2.2 mm vs. 3.8 mm, p < 0.001 respectively, for paralysis vs. controls). Regardless of the side of dysfunction, a significant difference in diaphragmatic height was observed between cases and controls (7.70 cm vs. 1.16 cm and 5.51 cm vs. 1.16 cm, p < 0.001 for right and left dysfunctions, respectively). Threshold values determined through ROC curve analyses for height differences between the two diaphragmatic domes, indicative of paralysis or weakness in the right dysfunctions, were 4.44 cm and 3.51 cm, respectively. Similarly for left dysfunctions, the thresholds were 2.70 cm and 2.48 cm, respectively, demonstrating good performance (aera under the curve of 1.00, 1.00, 0.98, and 0.79, respectively). Conclusion In cases of left diaphragmatic dysfunction, as well as in paralysis associated with right diaphragmatic dysfunction, CT revealed thinner pillars. Additionally, a notable increase in the difference in diaphragmatic height demonstrated a strong potential to identify diaphragmatic dysfunction, with specific threshold values.
... With regard to diagnostic thresholds for diaphragmatic pillars thickness on CT, for right-sided diaphragmatic paralysis, ROC analysis identi ed a threshold of 3.0mm at the level of the celiac artery, and a threshold of 4.5mm at the level of L1 with good diagnostic performances [20]. Both thresholds showed good sensitivity, speci city, and PPV, but the pillar measurement at the L1 level was more sensitive than at the celiac artery level, with a higher area under the curve. ...
... In the case of right diaphragmatic paralysis or weakness, ROC curve analysis determined a threshold of height difference between the two domes of 4.4cm and 3.5cm respectively, with perfect diagnostic performances of 100% [20], making the diagnosis of right diaphragmatic paralysis or weakness certain if the height is greater than the threshold. ...
Preprint
Full-text available
Introduction: Computed tomography (CT) is routinely performed to assess dyspnea, but few data are evaluating diaphragmatic muscle using CT. This study aimed to assess CT in the diagnosis of diaphragmatic dysfunction. Methods: We retrospectively collected diaphragmatic ultrasounds performed between 2018 and 2021 at our center (Marseille, France). We measured diaphragmatic pillars on CT at the level of L1 and the celiac artery, as well as the difference in height between the two diaphragmatic domes in diaphragmatic dysfunctions and controls, and compared with ultrasound measurements. Results: 65 patients were included, 24 with diaphragmatic paralysis, 13 with diaphragmatic weakness, and 28 controls. The CT thickness of the pillars in the case group (paralysis and weakness) of left dysfunctions (n=24) was significantly thinner at the level of L1 and the celiac artery compared with controls (2.0mm vs. 7.4mm and 1.8mm vs. 3.1mm, p<0.001 respectively), and significantly different for paralysis (and not weakness) when right dysfunction (n=15) (2.6mm vs. 7.4mm and 2.2mm vs. 3.8mm, p<0.001 respectively for paralysis vs controls). Whatever the side of dysfunction, there was a significant difference in diaphragmatic height between cases and controls (7.70cm vs. 1.16cm and 5.51cm vs. 1.16cm, p<0.001 right and left dysfunction respectively). The threshold values (ROC curve analyses) for height differences between the two domes in favor of paralysis or weakness on the right dysfunctions were 4.44cm and 3.51cm respectively; and 2.70cm and 2.48cm on the left dysfunctions respectively, with good performances. Conclusion:The thickness of the pillars on CT was thinner in left diaphragmatic dysfunction and in paralysis in right diaphragmatic dysfunction. An increase in the difference in the diaphragmatic height may strongly identify diaphragmatic dysfunction with precise thresholds.
... The concept of the ROC curve, originated from signal detection theory, is based on the recognition of signal in presence of noise. (20,23) The main parameters obtained from ROC curves, A E and A CI , provide general accuracy measures of predictors, whereas sensitivity, specificity, and likelihood ratios of cut-off points provide information for clinical decision based on parameter data. (23) Due to the high A E and A CI obtained from the OARI ROC curve, as well as the impressive likelihood ratio of 4 related to the cut-off point of 0.56, OARI was established as a factor. ...
... (20,23) The main parameters obtained from ROC curves, A E and A CI , provide general accuracy measures of predictors, whereas sensitivity, specificity, and likelihood ratios of cut-off points provide information for clinical decision based on parameter data. (23) Due to the high A E and A CI obtained from the OARI ROC curve, as well as the impressive likelihood ratio of 4 related to the cut-off point of 0.56, OARI was established as a factor. Predictors were introduced in their categorized forms in OARI-unfactored models, and as continuous, transformed variables in OARI-factored models to better assess their relevance in the context of reduced vascular resistance. ...
Article
Full-text available
Objective Vascular findings in preeclampsia are usually attributed to increased vascular tone. Recently, however, important studies have improved the understanding of the main pathophysiological events in this condition, especially vascular brain remodeling, impaired autoregulation, and damage of the blood-brain barrier, which are well recognized features of cerebral overperfusion. Methods In this study, the association between choriocapillaris ischemia with ophthalmic artery blood flow parameters on orbital Doppler ultrasound is reported for the first time using multivariate logistic models. Multivariate logistic models with ophthalmic artery blood flow parameters, as well as major clinical and laboratory predictive variables were established for choriocapillaris ischemia and choriocapillaris ischemia with retinal detachment. Results In a series of 165 patients, 46 (28%) presented choriocapillaris ischemia; among them, 20 (12%) presented associated retinal detachment. The ophthalmic artery resistive index was the main predictor for choriocapillaris ischemia and choriocapillaris ischemia with retinal detachment in multivariate logistic models. Ophthalmic artery resistance lower than 0.56 was associated with a significantly high incidence of both outcomes. Conclusion This study supports that the branching pattern of choroidal arterioles and the lobular organization of choriocapillaris are the major morphological aspects underlying endothelial damage and lobular ischemia in the context of choroidal overperfusion. Overperfused lobules bordering areas of choriocapillaris ischemia produce a perfusion pressure gradient, with lobular reperfusion, leakage from reperfused choriocapillaris, and retinal detachment. Ophthalmic artery-resistive index lower than 0.56 is proposed as a major predictor of the overperfusion-related choriocapillaris ischemia and choriocapillaris ischemia with retinal detachment in preeclampsia. Choroid; Retina; Ultrasonography, doppler; Pre-eclampsia; Endothelium; Ischemia
... The curve is commonly used to evaluate the quality of predictive models, particularly in the medical field. One can use the bootstrap method to construct a 95% confidence interval for the sensitivity at a given specificity or vice versa [38][39][40][41]. The area under the curve (AUC) is an overall summary of diagnostic accuracy. ...
... Finally, 63 cohorts (64.81% females) were selected for the experimental group, and 63 (58.33% females) were in the control group. The gender-wise classifications with their age groups of(40)(41)(42)(43)(44)(45)(46)(47)(48)(49)(50),(50)(51)(52)(53)(54)(55)(56)(57)(58)(59)(60),(60)(61)(62)(63)(64)(65)(66)(67)(68)(69)(70) and(70)(71)(72)(73)(74)(75)(76)(77)(78)(79)(80) years for both groups were shown inFigure 2. ...
Article
Full-text available
Background: This study aimed to develop a new diagnostic tool for identifying risk factors in knee osteoarthritis (KOA) by utilizing radiological images graded according to the Kellgren-Lawrence scale (KLS) and assessing abnormal clinical outcome measures, deranged lower extremities, and increased biochemical parameters such as 4-hydroxyproline and collagen oligomeric matrix protein (COMP). Methods: The study collected baseline data from 63 OA patients and 63 healthy controls, confirming the results with radiological imaging. Separate analyses were performed for participants with outcome measures, lower extremities, and biochemical parameters such as 4-hydroxyproline and COMP for those with and without OA. Results: The areas under the receiver operating characteristic curves of the studied outcome measures, lower extremities, biomarkers, and experimental cohorts were found to be within the range of 0.997-0.915, with ideal cutoff points revealing normal values that increased significantly (p<0.0001), indicating a successful diagnostic strategy. Conclusion: Monitoring these risk factors could help develop a cost-effective and safe diagnostic protocol for patients with acute KOA. The study suggests that the clinical and radiologic findings used to diagnose KOA are insufficiently sensitive to track the disease's development and that these risk factors could be useful in developing a better diagnostic protocol for KOA patients.
... An ROC curve is generated by plotting the TPR and FPR across varying thresholds ( Figure 2). 32 The area under the curve (AUC) summarizes the area underneath the entire ROC curve across all possible thresholds and thus provides an overall and combined measure of sensitivity and specificity. The AUC ranges between 0 and 1, where .5 indicates that the model does not perform better than chance. ...
... A single, normally distributed biomarker requires a Cohen's d of at least 1.25 to reach an AUC of .80 ( Figure 4C), considered as the lower limit of good discrimination. 32,40 Biomarkers with both a sensitivity and specificity of .8 or 80%-instead of combined in the AUC-require an even greater effect size, namely a Cohen's d of 1.66 ( Figure S1). Effect sizes of 1.25 and 1.66 correspond to odds ratios (OR) of 9.65 (Table S2) and 20.31, respectively. ...
Article
Full-text available
Objective The quest for epilepsy biomarkers is on the rise. Variables with statistically significant group‐level differences are often misinterpreted as biomarkers with sufficient discriminative power. This study aimed to demonstrate the relationship between significant group‐level differences and a variable's power to discriminate between individuals. Methods We simulated normal‐distributed datasets from hypothetical populations with varying sample sizes (25–800), effect sizes (Cohen's d: .25–2.50), and variability (standard deviation: 10–35) to assess the impact of these parameters on significance and discriminative power. The simulation data were illustrated by assessing the discriminative power of a potential real‐case biomarker—the EEG beta band power—to diagnose generalized epilepsy, using data from 66 children with generalized epilepsy and 385 controls. Additionally, we evaluated recently reported epilepsy biomarkers by comparing their effect sizes to our simulation‐derived effect size criterion. Results Group size affects significance but not discriminative power. Discriminative power is much more related to variability and effect size. Our real data example supported these simulation results by demonstrating that group‐level significance does not translate, one to one, into discriminative power. Although we found a significant difference in the beta band power between children with and without epilepsy, the discriminative power was poor due to a small effect size. A Cohen's d of at least 1.25 is required to reach good discriminative power in univariable prediction modeling. Slightly over 60% of the biomarkers in our literature search met this criterion. Significance Rather than statistical significance of group‐level differences, effect size should be used as an indicator of a variable's biomarker potential. The minimal required effects size for individual biomarkers—a Cohen's d of 1.25—is large. This calls for multivariable approaches, in which combining multiple variables with smaller effect sizes could increase the overall effect size and discriminative power.
... For inter-algorithm agreement, Bland-Altman plots with the limits of agreement (LoA) set at 1.96 standard deviations (SDs), which results in a 95 % confidence interval (CI), were evaluated [18]. The ability to discriminate healthy eyes from disease-affected eyes (DR, RVO, Uveitis, and AMD) in full retina and choriocapillaris slabs was evaluated with receiver operating characteristic (ROC) curves and area under the curve (AUC) values [19,20]. ...
... For inter-algorithm agreement Altman plots with the limits of agreement (LoA) set at 1.96 standard deviation which results in a 95 % confidence interval (CI), were evaluated [18]. The ability criminate healthy eyes from disease-affected eyes (DR, RVO, Uveitis, and AMD retina and choriocapillaris slabs was evaluated with receiver operating chara (ROC) curves and area under the curve (AUC) values [19,20]. ...
Article
Full-text available
(1) Background: Calculation of vessel density in optical coherence tomography angiography (OCTA) images with thresholding algorithms varies in clinical routine. The ability to discriminate healthy from diseased eyes based on perfusion of the posterior pole is critical and may depend on the algorithm applied. This study assessed comparability, reliability, and ability in the discrimination of commonly used automated thresholding algorithms. (2) Methods: Vessel density in full retina and choriocapillaris slabs were calculated with five previously published automated thresholding algorithms (Default, Huang, ISODATA, Mean, and Otsu) for healthy and diseased eyes. The algorithms were investigated with LD-F2-analysis for intra-algorithm reliability, agreement, and the ability to discriminate between physiological and pathological conditions. (3) Results: LD-F2-analyses revealed significant differences in estimated vessel densities for the algorithms (p < 0.001). For full retina and choriocapillaris slabs, intra-algorithm values range from excellent to poor, depending on the applied algorithm; the inter-algorithm agreement was low. Discrimination was good for the full retina slabs, but poor when applied to the choriocapillaris slabs. The Mean algorithm demonstrated an overall good performance. (4) Conclusions: Automated threshold algorithms are not interchangeable. The ability for discrimination depends on the analyzed layer. Concerning the full retina slab, all of the five evaluated automated algorithms had an overall good ability for discrimination. When analyzing the choriocapillaris, it might be useful to consider another algorithm.
... This should include other data-driven methods (including AI) such as logistic regression (e.g., Carranza and Hale, 2001;Nykänen et al., 2008;Porwal and Kreuzer, 2010), neural networks (e.g., Singer and Kouda, 1999;Porwal et al., 2003;Nykänen, 2008), random forests (e.g., Carranza and Laborte, 2016;Ford et al., 2015;Ford, 2020;Roshanravan et al., 2023), and knowledge-driven methods such as fuzzy logic (e.g., Tangestani and Moore, 2003;Porwal et al., 2003;González-Álvarez et al., 2010;Yousefi and Carranza, 2015;Nykänen et al., 2023). Validation techniques should also be available including the area-frequency tool used in this study (Behnia et al., 2023) and the receiver operating characteristics (ROC) technique (Obuchowski, 2003;Fawcett, 2006;Nykänen et al., 2017Nykänen et al., , 2023. ...
... However, ROC optimization (e.g., C-index, sensitivity, specificity) is more appropriate for discriminating between already-existing conditions [28] than it is for predicting future events, which are subject to stochasticity and thus are more appropriately described as probabilities (e.g., partial likelihoods via logistic regression or COX proportional hazards, or by probability mass functions) than binary sets [29,30]. However, such probabilistic functions are not readily generable as outcomes from pure mechanistic models. ...
Preprint
Full-text available
We present a study where predictive mechanistic modeling is used in combination with deep learning methods to predict individual patient survival probabilities under immune checkpoint inhibitor (ICI) therapy. This hybrid approach enables prediction based on both measures that are calculable from mechanistic models (but may not be directly measurable in the clinic) and easily measurable quantities or characteristics (that are not always readily incorporated into predictive mechanistic models). The mechanistic model we have applied here can predict tumor response from CT or MRI imaging based on key mechanisms underlying checkpoint inhibitor therapy, and in the present work, its parameters were combined with readily-available clinical measures from 93 patients into a hybrid training set for a deep learning time-to-event predictive model. Analysis revealed that training an artificial neural network with both mechanistic modeling-derived and clinical measures achieved higher per-patient predictive accuracy based on event-time concordance, Brier score, and negative binomial log-likelihood-based criteria than when only mechanistic model-derived values or only clinical data were used. Feature importance analysis revealed that both clinical and model-derived parameters play prominent roles in neural network decision making, and in increasing prediction accuracy, further supporting the advantage of our hybrid approach. We anticipate that many existing mechanistic models may be hybridized with deep learning methods in a similar manner to improve predictive accuracy through addition of additional data that may not be readily implemented in mechanistic descriptions.
... To assess the performance of the predictive models, Area under the Curve (AUC) values were calculated. Combinations that yielded AUC values greater than 0.7 were considered to have fair performance based on established criteria [26][27][28]. The mean AUC value for each variation was calculated to examine the relative performance associated with different feature datasets. ...
Article
Purpose: To investigate the impact of clinical features on model performance in CT-based Non-Small Cell Lung Cancer (NSCLC) and the potential uncertainty regarding their application in machine learning. Methods: Clinical and radiomic features were retrospectively retrieved from EMR and CT images of 496 NSCLC patients. Five feature datasets were constructed: radiomic features-only (Rad), clinical features-only (Clin), shape features-only (Shape), radiomic and clinical features (RaClin), shape and clinical features (ShClin). Five feature selection methods and seven predictive models, along with different cohort sizes, number of input features and validation methods were included for the uncertainty analysis, with two-year survival as the study endpoint. AUC values were calculated for comparisons and Kruskal-Wallis testing was performed to determine significant differences. Results: A total of 19740 distinct combinations of feature sets, feature selection methods, predictive models, cohort sizes and validation techniques are examined. Of those, 25 combinations produce an AUC > 0.7. The clinical-only feature dataset generally outperforms both the radiomic-only feature dataset and the hybrid (clinical and radiomic) feature dataset (P<0.01), which is primarily determined by the endpoint. The combination of different feature selection methods and predictive models, along with the variations in cohort size, number of input features and validation methods generate inconsistent results. Conclusion: Clinical features are a source of data that can improve machine learning model performance. However, its impact strongly depends on various factors that may lead to inconsistent results. A clear approach to incorporate clinical features to generate reliable results requires further investigation.
... The ''receiver operating characteristics'' (ROC) technique (Obuchowski, 2003;Fawcett, 2006;Nykä nen et al., 2017) was used to statistically validate the final prospectivity models and intermediate modeling results. ...
Article
Full-text available
This paper describes mineral prospectivity research conducted in Finland to predict favorable areas for cobalt exploration using the “fuzzy logic overlay” method in a GIS platform and public geodata of the Geological Survey of Finland. Cobalt occurs infrequently as a core product in mineral deposits. Therefore, we decided to construct separate conceptual mineral prospectivity models within the Northern Fennoscandian Shield, Finland, for four deposit types: (1) “ Orthomagmatic Ni–Cu–Co sulfide deposits, ” (2) “ Outokumpu-type mantle peridotite-associated volcanogenic massive sulfide (VMS)-style Cu–Co–Zn–Ni–Ag–Au deposits, ” (3) “ Talvivaara black shale-hosted Ni–Zn–Cu–Co-type deposits” and (4) “Kuusamo-type (orogenic gold with atypical metal association) Au–Co–Cu–U–LREE deposits ”. In addition, we created a model combining till geochemical data with data derived from bedrock drilling and mineral indications, including boulders and outcrops. The mineral prospectivity models were statistically tested with the “ receiver operating characteristics ” method using exploration drilling data from known mineral deposits as validation sites. In addition, the predictive performance of the models was evaluated by using success rate curves, where the number of previously identified deposits was compared with the area coverage of the predicted highly favorable areas. These results indicate that the knowledge-driven mineral prospectivity method using parameters derived from mineral systems models is effective in defining favorable exploration target areas at the regional scale. This study's innovation lies in its comprehension of the process of evaluating mineral prospectivity when the commodity of interest is not the primary commodity within the mineral system.
... and "failed" for 0.5-0.6. 20 The diagnostic accuracy of the Blue value to detect dehydration based on USG and uOsm were presented as positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (+LR) and negative likelihood ratio (−LR). ...
Article
Full-text available
Objective Direct urine color assessment has been shown to correlate with hydration status. However, this method is subject to inter- and intra-observer variability. Digital image colorimetry provides a more objective method. This study evaluated the diagnostic accuracy of urine photo colorimetry using different smartphones under different lighting conditions, and determined the optimal cut-off value to predict clinical dehydration. Methods The urine samples were photographed in a customized photo box, under five simulated lighting conditions, using five smartphones. The images were analyzed using Adobe Photoshop to obtain Red, Green, and Blue (RGB) values. The correlation between RGB values and urine laboratory parameters were determined. The optimal cut-off value to predict dehydration was determined using area under the receiver operating characteristic curve. Results A total of 56 patients were included in the data analysis. Images captured using five different smartphones under five lighting conditions produced a dataset of 1400 images. The study found a statistically significant correlation between Blue and Green values with urine osmolality, sodium, urine specific gravity, protein, and ketones. The diagnostic accuracy of the Blue value for predicting dehydration were “good” to “excellent” across all phones under all lighting conditions with sensitivity >90% at cut-off Blue value of 170. Conclusions Smartphone-based urine colorimetry is a highly sensitive tool in predicting dehydration.
... An X-ray classification task was selected given that SDT can be applied to diagnostic accuracy [44]. Further, detection of viral pneumonia within X-ray images can be applied online as a real world application of a Yes-No task [22]. ...
Article
Full-text available
Appropriately calibrated human trust is essential for successful Human-Agent collaboration. Probabilistic frameworks using a partially observable Markov decision process (POMDP) have been previously employed to model the trust dynamics of human behavior, optimising the outcomes of a task completed with a collaborative recommender system. A POMDP model utilising signal detection theory to account for latent user trust is presented, with the model working to calibrate user trust via the implementation of three distinct agent features: disclaimer message, request for additional information, and no additional feature. A simulation experiment is run to investigate the efficacy of the proposed POMDP model compared against a random feature model and a control model. Evidence demonstrates that the proposed POMDP model can appropriately adapt agent features in-task based on human trust belief estimates in order to achieve trust calibration. Specifically, task accuracy is highest with the POMDP model, followed by the control and then the random model. This emphasises the importance of trust calibration, as agents that lack considered design to implement features in an appropriate way can be more detrimental to task outcome compared to an agent with no additional features.
... A common choice is receiver operating characteristic (ROC) analysis which characterizes performance for all possible operating points of the classifier. An ROC curve plots TPR as a function of the false-positive rate (FPR = 1-TNR) when the threshold on the classifier output is varied over the complete range of possible output scores [54][55][56]. An example of an ROC curve is shown in Fig. 3. ...
Chapter
Full-text available
This chapter presents a regulatory science perspective on the assessment of machine learning algorithms in diagnostic imaging applications. Most of the topics are generally applicable to many medical imaging applications, while brain disease-specific examples are provided when possible. The chapter begins with an overview of US FDA’s regulatory framework followed by assessment methodologies related to ML devices in medical imaging. Rationale, methods, and issues are discussed for the study design and data collection, the algorithm documentation, and the reference standard. Finally, study design and statistical analysis methods are overviewed for the assessment of standalone performance of ML algorithms as well as their impact on clinicians (i.e., reader studies). We believe that assessment methodologies and regulatory science play a critical role in fully realizing the great potential of ML in medical imaging, in facilitating ML device innovation, and in accelerating the translation of these technologies from bench to bedside to the benefit of patients.
... The explanations in this article are applicable when the sensitivity and specificity values are known. Therefore, the selection of optimal thresholds to convert continuous raw AI output into categorical decisions was not the focus of this article, which can be found elsewhere [5,6]. ...
... 14 In our sample, the discriminative power of PRE-DELIRIC in critically ill patients shows a good performance in a moderate-risk group of ICU-onset delirium. Although validation findings are independent of the prevalence of ICU delirium, 21,22 the different validation results between the countries may have several explanations. They are attributable to differences in risk factors between populations and varying analgosedation protocols applied in clinical practice. ...
Article
Background: To predict delirium in intensive care unit (ICU) patients, the Prediction of Delirium in ICU Patients (PRE-DELIRIC) score may be used. This model may help nurses to predict delirium in high-risk ICU patients. Objectives: The aims of this study were to externally validate the PRE-DELIRIC model and to identify predictive factors and outcomes for ICU delirium. Method: All patients underwent delirium risk assessment by the PRE-DELIRIC model at admission. We used the Intensive Care Delirium Screening Check List to identify patients with delirium. The receiver operating characteristic curve measured discrimination capacity among patients with or without ICU delirium. Calibration ability was determined by slope and intercept. Results: The prevalence of ICU delirium was 55.8%. Discrimination capacity (Intensive Care Delirium Screening Check List score ≥4) expressed by the area under the receiver operating characteristic curve was 0.81 (95% confidence interval, 0.75-0.88), whereas sensitivity was 91.3% and specificity was 64.4%. The best cut-off was 27%, obtained by the max Youden index. Calibration of the model was adequate, with a slope of 1.03 and intercept of 8.14. The onset of ICU delirium was associated with an increase in ICU length of stay (P < .0001), higher ICU mortality (P = .008), increased duration of mechanical ventilation (P < .0001), and more prolonged respiratory weaning (P < .0001) compared with patients without delirium. Discussion: The PRE-DELIRIC score is a sensitive measure that may be useful in early detection of patients at high risk for developing delirium. The baseline PRE-DELIRIC score could be useful to trigger use of standardized protocols, including nonpharmacologic interventions.
... Second, previous studies have mainly assessed the discriminative ability of the nomogram model through the consistency index (C-index) and the area under subject operating characteristic curve (AUC). 12,13 Net reclassification index (NRI) and integrated discrimination improvement (IDI) have infrequently been used to compare the predictive performance of the nomogram model and the staging system. 14,15 Taken together, these studies underscore the need to generate a novel nomogram for a more accurate prediction of the outcomes of ESCC patients undergoing definitive radiotherapy. ...
Article
Full-text available
Background At present, there is no objective prognostic index available for patients with esophageal squamous cell carcinoma (ESCC) who underwent intensity-modulated radiotherapy (IMRT). This study is to develop a nomogram based on hematologic inflammatory indices for ESCC patients treated with IMRT. Methods 581 patients with ESCC receiving definitive IMRT were enrolled in our retrospective study. Of which, 434 patients with treatment-naïve ESCC in Fujian Cancer Hospital were defined as the training cohort. Additional 147 newly diagnosed ESCC patients were used as the validation cohort. Independent predictors of overall survival (OS) were employed to establish a nomogram model. The predictive ability was evaluated by time-dependent receiver operating characteristic curves, the concordance index (C-index), net reclassification index (NRI), and integrated discrimination improvement (IDI). Decision curve analysis (DCA) was performed to assess the clinical benefits of the nomogram model. The entire series was divided into 3 risk subgroups stratified by the total nomogram scores. Results Clinical TNM staging, primary gross tumor volume, chemotherapy, neutrophil-to-lymphocyte ratio and platelet lymphocyte ratio were independent predictors of OS. Nomogram was developed incorporating these factors. Compared with the 8th American Joint Committee on Cancer (AJCC) staging, the C-index for 5-year OS (.627 and .629) and the AUC value of 5-year OS (.706 and .719) in the training and validation cohorts (respectively) were superior. Furthermore, the nomogram model presented higher NRI and IDI. DCA also demonstrated that the nomogram model provided greater clinical benefit. Finally, patients with <84.8, 84.8-151.4, and >151.4 points were categorized into low-risk, intermediate-risk, and high-risk groups. Their 5-year OS rates were 44.0%, 23.6%, and 8.9%, respectively. The C-index was .625, which was higher than the 8 th AJCC staging. Conclusions We have developed a nomogram model that enables risk-stratification of patients with ESCC receiving definitive IMRT. Our findings may serve as a reference for personalized treatment.
... We did not calculate c statistic or area under the receiver operating characteristic (ROC) curves to evaluate model performance as part of this study. The c-statistic is a measure of discrimination and is of most interest when classification into groups with or without the outcome is the goal, such as in diagnostic testing 73 . Since the goal of the study was to estimate the probability that a patient has a cUTI with a non-susceptible organism based on the number of model covariates present in that individual at clinical presentation, we relied on calibration (i.e., a measure of how well predicted probabilities agree with actual observed risk) to assess the accuracy of the models. ...
Article
Full-text available
Background Clinical risk scores were developed to estimate the risk of adult outpatients having a complicated urinary tract infection (cUTI) that was non-susceptible to trimethoprim-sulfamethoxazole (TMP-SMX), fluoroquinolone, nitrofurantoin, or third-generation cephalosporins (3-GC) based on variables available on clinical presentation. Methods A retrospective cohort study (12/1/2017-12/31/2020) was performed among adult Kaiser Permanente Southern California members with an outpatient cUTI. Separate risk scores were developed for TMP-SMX, fluoroquinolone, nitrofurantoin, and 3-GC. The models were translated into risk scores to quantify the likelihood of non-susceptibility based on the presence of final model covariates in a given cUTI outpatient. Results A total of 30,450 cUTIs (26,326 patients) met the study criteria. Non-susceptibility to TMP-SMX, fluoroquinolones, nitrofurantoin, and 3-GC were 37%, 20%, 27%, and 24%, respectively. Receipt of prior antibiotics was the most important predictor across all models. The risk of non-susceptibility in the TMP-SMX model exceeded 20% in the absence of any risk factors, suggesting that empiric use of TMP-SMX may not be advisable. For fluoroquinolone, nitrofurantoin, or 3-GC, clinical risk scores of 10, 7, and 11, respectively, predicted a ≥20% estimated probability of non-susceptibility in the models that included cumulative number of prior antibiotics at model entry. This finding suggests caution should be used when considering these agents empirically in patients who have several risk factors present in a given model(s) at presentation. Conclusions We developed high-performing parsimonious risk scores to facilitate empiric treatment selection for adult cUTI outpatients in the critical period between infection presentation and availability of susceptibility results.
... shown that the values of AUC above 0.6 are meaningful [40][41][42]. Our results showed that the AUCs of ROC in 1, 3, and 5 years were all greater than 0.7, indicating that our prognostic model is reliable and accurate for predicting CRC survival. ...
Preprint
Full-text available
Backgrounds: Colorectal cancer (CRC) is a global health issue that requires innovative prognostic signatures to improve patient outcomes. Alternative splicing (AS) of RNA is a crucial modification process involved in cancer progression, and zinc finger proteins (ZNFs), the largest family of DNA binding proteins, have been implicated in various aspects of cancer development. However, the role of ZNF AS events in cancer remains poorly understood. Methods: To address this, we investigated the relationship between ZNF AS and CRC development using clinical samples and bioinformatics approaches to identify a prognostic signature. Results: We identified 227 differentially expressed genes (DEGs) and 98 survival-related genes among ZNFs. We also identified 29 differentially expressed AS (DEAS) events and 93 survival-related AS events in CRC patients. Using these results, we developed a thirteen-AS signature that showed excellent predictive ability, with a 3-year area under the receiver operating characteristic (ROC) curve (AUC) value of 0.80, outperforming the commonly used tumor-node-metastasis (TNM) staging-based model (AUC = 0.73). Gene Set Enrichment Analysis (GSEA) showed that the risk score of our model was associated with various cancer-related pathways, including PI3K AKT MTOR, CELL CYCLE, APOPTOSIS, and more. We also validated our findings through qPCR and explored the correlations between splicing factors (SFs) and DEAS events. Conclusions: Our study provides new insights into the role of ZNFs in cancer and highlights their potential as prognostic biomarkers for CRC progression.
... Sample size calculations were based on the number of cases required to identify a statistically significant receiver operator characteristic area-under-the-curve (ROC AUC) using methods described by Obuchowski and McClish [36,37]. ROC plots are used to assess discrimination performance for binary diagnoses -here presence or absence of cardiometabolic risk factors. ...
Article
Full-text available
To examine the (i) relationships between various body mass index (BMI)-derived metrics for measuring severe obesity (SO) over time based the Centers for Disease Control and Prevention (CDC) and World Health Organization (WHO) references and (ii) ability of these metrics to discriminate children and adolescents based on the presence of cardiometabolic risk factors. In this cohort study completed from 2013 to 2021, we examined data from 3- to 18-year-olds enrolled in the CANadian Pediatric Weight management Registry. Anthropometric data were used to create nine BMI-derived metrics based on the CDC and WHO references. Cardiometabolic risk factors were examined, including dysglycemia, dyslipidemia, and elevated blood pressure. Analyses included Pearson correlations, intraclass correlation coefficients (ICC), and receiver operator characteristic area-under-the-curve (ROC AUC). Our sample included 1,288 participants (n = 666 [52%] girls; n = 874 [68%] white). The prevalence of SO varied from 60–67%, depending on the definition. Most BMI-derived metrics were positively and significantly related to one another (r = 0.45–1.00); ICCs revealed high tracking (0.90–0.94). ROC AUC analyses showed CDC and WHO metrics had a modest ability to discriminate the presence of cardiometabolic risk factors, which improved slightly with increasing numbers of risk factors. Overall, most BMI-derived metrics rated poorly in identifying presence of cardiometabolic risk factors. Conclusion: CDC BMI percent of the 95th percentile and WHO BMIz performed similarly as measures of SO, although neither showed particularly impressive discrimination. They appear to be interchangeable in clinical care and research in pediatrics, but there is a need for a universal standard. WHO BMIz may be useful for clinicians and researchers from countries that recommend using the WHO growth reference.What is Known: • Severe obesity in pediatrics is a global health issue. • Few reports have evaluated body mass index (BMI)-derived metrics based on the World Health Organization growth reference. What is New: • Our analyses showed that the Centers for Disease Control and Prevention BMI percent of the 95thpercentile and World Health Organization (WHO) BMI z-score (BMIz) performed similarly as measures of severe obesity in pediatrics. • WHO BMIz should be a useful metric to measure severe obesity for clinicians and researchers from countries that recommend using the WHO growth reference.
... This allows the assessment of the overall performance of each ML model. The AUC, on the other hand, provides a measure of the diagnostic test's performance by calculating the average sensitivity across all possible values of specificity [7]. The AUC is a commonly reported parameter in radiomics studies. ...
Article
Full-text available
Background Radiomics is the process of converting radiological images into high-dimensional data that may be used to cre- ate machine learning models capable of predicting clinical outcomes, such as disease progression, treatment response and survival. Pediatric central nervous system (CNS) tumors differ from adult CNS tumors in terms of their tissue morphology, molecular subtype and textural features. We set out to appraise the current impact of this technology in clinical pediatric neuro-oncology practice. Objectives The aims of the study were to assess radiomics’ current impact and potential utility in pediatric neuro-oncology practice; to evaluate the accuracy of radiomics-based machine learning models and compare this to the current standard which is stereotactic brain biopsy; and finally, to identify the current limitations of radiomics applications in pediatric neuro-oncology. Materials and methods Following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) standards, a systematic review of the literature was carried out with protocol number CRD42022372485 in the prospective register of systematic reviews (PROSPERO). We performed a systematic literature search via PubMed, Embase, Web of Science and Google Scholar. Studies involving CNS tumors, studies that utilized radiomics and studies involving pediatric patients (age<18 years) were included. Several parameters were collected including imaging modality, sample size, image segmentation technique, machine learning model used, tumor type, radiomics utility, model accuracy, radiomics quality score and reported limitations. Results The study included a total of 17 articles that underwent full-text review, after excluding duplicates, conference abstracts and studies that did not meet the inclusion criteria. The most commonly used machine learning models were support vector machines (n=7) and random forests (n=6), with an area under the curve (AUC) range of 0.60–0.94. The included studies investigated several pediatric CNS tumors, with ependymoma and medulloblastoma being the most frequently studied. Radiomics was primarily used for lesion identification, molecular subtyping, survival prognostica- tion and metastasis prediction in pediatric neuro-oncology. The low sample size of studies was a commonly reported limitation. Conclusion The current state of radiomics in pediatric neuro-oncology is promising, in terms of distinguishing between tumor types; however, its utility in response assessment requires further evaluation which, given the relatively low number of pediatric tumors, calls for multicenter collaboration.
... An AUROC of 0.70-0.80 is good performance, an AUROC of > 0.8 implies good accuracy, and an AUROC > 0.9 implies very good accuracy of a model. 23 To validate our model, we used the Hosmer-Lemeshow chi-square statistic (calibration statistics), which compares the predicted to the observed outcome probabilities. The page number in the footer is not for bibliographic referencing ...
Article
Full-text available
Background: Most patients who present to South African state hospitals with advanced stage oesophageal squamous cell cancer (OSCC) disease receive palliative treatment. This study aimed to assess the factors that influence survival in patients with OSCC who received palliative management and to develop a prognostic score to aid clinicians in decision-making. Methods: Analysis of a prospectively collected database assessed factors influencing survival of patients diagnosed with OSCC receiving palliative treatment. Factors assessed included patient demographics, clinical and laboratory data and tumour factors. A multivariable logistic regression model was used to assess for significant factors associated with survival time and a prognostic score was developed and internally validated based on these factors. Results: There were 384 patients with a male-to-female ratio of 1.3:1. The median survival of the cohort was 3.7 months. Factors that influenced survival on multivariate analysis included area of residence (aOR 1.82, 95% CI 1.02–3.24), performance status (aOR 2.56, 95% CI 1.50–4.35), body mass index (aOR 1.87, 95% CI 1.14–3.06) and serum albumin (aOR 3.06, 95% CI 1.46–6.42). The final prognostic score contained three of the four independent variables based on the regression coefficient for each variable. After internal validation, the risk score maintained fair discrimination and good calibration. Conclusion: The prognostic scoring system based on patient performance status, body mass index and serum albumin, if validated on an independent cohort, would allow more objective decisions on whether to stage or not prior to embarking on palliative treatment, streamlining care and improving quality of life.
... The ROC is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. The area under the ROC curve, called ROC AUC, or simply AUC, is routinely employed for classifier quality [92]. ...
Article
Full-text available
Post-traumatic stress disorder (PTSD) is a severe mental illness with grave social, political, economic , and humanitarian implications. To apply the principles of personalized omics-based medicine to this psychiatric problem, we implemented our previously introduced drug efficiency index (DEI) to the PTSD gene expression datasets. Generally, omics-based personalized medicine evaluates individual drug action using two classes of data: (1) gene expression, mutation, and Big Data profiles, and (2) molecular pathway graphs that reflect the protein-protein interaction. In the particular case of the DEI metric, we evaluate the drug action according to the drug's ability to restore healthy (control) activation levels of molecular pathways. We have curated five PTSD and one TRD (treatment-resistant depression) cohorts of next-generation sequencing (NGS) and microarray hybridization (MH) gene expression profiles, which, in total, comprise 791 samples, including 379 cases and 413 controls. To check the applicability of our DEI metrics, we have performed three differential studies with gene expression and pathway activation data: (1) case samples vs. control samples, (2) case samples after treatment or/and observation vs. before treatment, and (3) samples from patients positively responding to the treatment vs. those responding negatively or non-responding patients. We found that the DEI values that use the signaling pathway impact activation (SPIA) metric were better than those that used the Oncobox pathway activation level (Oncobox PAL) approach. However, SPIA, Oncobox PAL, and DEI evaluations were reliable only if there were differential genes between case and control, or treated and untreated, samples.
... Here, we provided different predictive models for PCOS based on the variables whose diagnostic performances, as measured by the AUC of ROC curve analyses, ranged from 0.801 (80.1%) when considering only miR-142-3p and miR-598-3p to 0.895 (89.5%) when adding WHR and LH/FSH ratio. Notably, AUCs of ROC curve analyses between 0.80 and 0.90, such as those reported here, are considered indicative of good diagnostic performance (Metz, 1978;Hosmer and Lemeshow, 2000;Obuchowski, 2003;Ludemann et al., 2006). The clinical relevance of these figures is exemplified by the comparison with a marker that is routinely used to discriminate menopausal from premenopausal women, namely, circulating FSH concentrations, which showed an AUC of ROC curve analysis of only 0.783 (78.3%) in a large study (Li and Ng, 2021). ...
Article
STUDY QUESTION Circulating miRNAs previously associated with androgen excess in women might be used as diagnostic biomarkers for polycystic ovary syndrome (PCOS). SUMMARY ANSWER Models based on circulating miR-142-3p and miR-598-3p expression show good discrimination among women with and without PCOS, particularly when coupled with easily available measurements such as waist-to-hip ratio (WHR) and circulating LH-to-FSH (LH/FSH) ratios. WHAT IS KNOWN ALREADY The lack of standardization of the signs, methods, and threshold values used to establish the presence of the diagnostic criteria (hyperandrogenism, ovulatory dysfunction, and polycystic ovarian morphology) complicates the diagnosis of PCOS. Certain biomarkers may help with such a diagnosis. We conducted a validation study to check the diagnostic accuracy for PCOS of several miRNAs that were associated with the syndrome in a small pilot study that had been previously carried out by our research group. STUDY DESIGN, SIZE, DURATION This was a diagnostic test study involving 140 premenopausal women. PARTICIPANTS/MATERIALS, SETTING, METHODS We included 71 women with PCOS and 69 healthy control women in the study. Both groups were selected as to be similar in terms of body mass index. We used miRCURY LNA™ Universal RT microRNA PCR to analyse the five miRNAs that had shown the strongest associations with PCOS in a much smaller pilot study previously conducted by our group. We studied diagnostic accuracy using receiver operating characteristics (ROC) curve analysis. MAIN RESULTS AND THE ROLE OF CHANCE Only the expression of two miRNAs, miR-142-3p and miR-598-3p, of the five studied, was different between the women with PCOS and the non-hyperandrogenic controls. The diagnostic accuracy of the combination of these circulating miRNAs was good (area under the ROC curve (AUC) 0.801; 95% CI: 0.72–0.88) and was further improved when adding WHR (AUC 0.834, 95% CI: 0.756–0.912), LH/FSH ratio (AUC = 0.869, 95% CI: 0.804–0.934) or both (AUC = 0.895, 95% CI: 0.835–0.954). We developed several models by selecting different threshold values for these variables favouring either sensitivity or specificity, with positive and negative predictive values as high as 88% or 85%, respectively. LIMITATIONS, REASONS FOR CAUTION Patients included here had the classic PCOS phenotype, consisting of hyperandrogenism and ovulatory dysfunction; hence, the present results might not apply to milder phenotypes lacking androgen excess. WIDER IMPLICATIONS OF THE FINDINGS If confirmed in larger studies addressing different populations and PCOS phenotypes, these biomarkers may be useful to simplify the clinical diagnosis of this prevalent syndrome.
... Researchers offered benchmarks for classifying area under the curve (AUC) results, suggesting that the values ≥0.9, ≥0.80, ≥0.70, ≥0.60, and <0.60 indicate excellent, good, fair, poor, and unacceptable predictive performance, respectively. However, these are likely to be appropriate for engineering and some applications in biomedicine but less so for mental health diagnoses [57][58][59]. Our deep learning models gave fair to good results with respect to the AUC values. ...
Article
Full-text available
Positron emission tomography and computed tomography with 18F-fluorodeoxyglucose (18F-FDG PET-CT) were used to predict outcomes after liver transplantation in patients with hepatocellular carcinoma (HCC). However, few approaches for prediction based on 18F-FDG PET-CT images that leverage automatic liver segmentation and deep learning were proposed. This study evaluated the performance of deep learning from 18F-FDG PET-CT images to predict overall survival in HCC patients before liver transplantation (LT). We retrospectively included 304 patients with HCC who underwent 18F-FDG PET/CT before LT between January 2010 and December 2016. The hepatic areas of 273 of the patients were segmented by software, while the other 31 were delineated manually. We analyzed the predictive value of the deep learning model from both FDG PET/CT images and CT images alone. The results of the developed prognostic model were obtained by combining FDG PET-CT images and combining FDG CT images (0.807 AUC vs. 0.743 AUC). The model based on FDG PET-CT images achieved somewhat better sensitivity than the model based on CT images alone (0.571 SEN vs. 0.432 SEN). Automatic liver segmentation from 18F-FDG PET-CT images is feasible and can be utilized to train deep-learning models. The proposed predictive tool can effectively determine prognosis (i.e., overall survival) and, thereby, select an optimal candidate of LT for patients with HCC.
... For the evaluation of a predictive model, the area under the curve (AUC) was analyzed. In addition, the AUC value was classified as excellent (0.9-1.0), good (0.8-0.9), fair (0.7-0.8), poor (0.6-0.7), and failed (0.5-0.6) [24]. ...
Article
Full-text available
This study aimed to identify predictors for successful post-treatment outcomes in early orthopedic class III malocclusion treatment with a facemask and hyrax expander appliance. The study was performed on lateral cephalograms from 37 patients at the start of treatment (T0), post-treatment (T1), and a minimum of three years after treatment (T2). The patients were grouped as stable or unstable according to the existence of a 2-mm overjet at T2. For statistical analysis, independent t-tests were used to compare the baseline characteristics and measurements of the two groups, considering a significance level of < 0.05. Thirty variables of pretreatment cephalograms were considered during logistic regression analysis to identify predictors. A discriminant equation was established using a stepwise method. The success rate and area under the curve were calculated, with AB to the mandibular plane, ANB, ODI, APDI, and A–B plane angles as predictors. The A–B plane angle was the most significantly different between the stable and unstable groups. In terms of the A–B plane angle, the success rate of early class III treatment with a facemask and hyrax expander appliance was 70.3%, and the area under the curve indicated a fair grade.
... fair for AUC values between 0.7-0.8, poor for AUC values between 0.6-0.7 and failed for AUC values between 0.5-0.6 (11,12). Yound index was used to decide the cut-off point [the Youden Index is calculated as max (sensitivity + specificity − 1)] (13). ...
Article
Full-text available
Aim: The study-cohort aims to assess PET-CT's correlation with adenocarcinomas' subtypes and propose a scoring system for mediastinal lymph nodes staging. Material and Method: The patient cohort is a multicenter, retrospective analysis of 268 patient that underwent surgery for NSCLC adenocarcinoma. Preoperative PET-CT results for mediastinal lymph node staging was pathologically confirmed on tissue specimens obtained at anatomical resection. Statistical evaluation of PET CT, radiological and pathological outcomes were performed on all subgroups. Results: The low FDG affinity in the lepidic pattern was statistically significant in the study (p
... He preferred this technique because it provided the investigator with all possible combinations of sensitivity and specificity. ROC analysis has been used in the field of radiology (Metz & Obuchowski, 2003). ROC analysis was applied to biomedical informatics, (Lasko, Bhagwat, Zou, & Ohno-Machado, 2005;Brown & Davis, 2006;Hand, & Till, 2001), Signal Detection Theory (Green & Swets, 1966); it provides a precise language and graphic notation for analyzing decision-making in the presence of uncertainty. ...
... and the Keras platform. To compare different experiments, ROC [Obuchowski, 2003] curves were computed on the same test dataset. Moreover, several experiments were designed to investigate the performance of the proposed method. ...
Article
Recently, Deep Convolutional Neural Networks (DCNNs) have opened their ways into various medical image processing practices such as Computer-Aided Diagnosis (CAD) systems. Despite significant developments in CAD systems based on deep models, designing an efficient model, as well as a training strategy to cope with the shortage of medical images have yet to be addressed. To address current challenges, this paper presents a model including a hybrid DCNN, which takes advantage of various feature maps of different deep models and an incremental training algorithm. Also, a weighting Test Time Augmentation strategy is presented. Besides, the proposed work develops the Mask-RCNN to not only detect mass and calcification in mammography images, but also to classify normal images. Moreover, this work aims to benefit from a radiology specialist to compare with the performance of the proposed method. Illustrating the region of interest to explain how the model makes decisions is the other aim of the study to cover existing challenges among the stateof-the-art research works. The wide range of conducted quantitative and qualitative experiments suggest that the proposed method can classify breast X-ray images of the INbreast dataset to normal, mass, and calcification.
Article
b> Background: To minimize adverse events of peripheral blood stem cell (PBSC) collection in healthy donors, it is reasonable to limit the total dose of granulocyte colony-stimulating factor (G-CSF) and/or the number of apheresis days without decreasing of PBSCs yield. Therefore, we have started to collect G-CSF induced PBSCs on day 4 instead of on day 5. So, we retrospectively aimed to investigate the results of this 4-day G-CSF administration. Study Design and Methods: Seventy-six healthy donors who performed on G-CSF induced PBSCs donation consecutively between January 2020 and July 2022 were included in this study. G-CSF (filgrastim) at 2 × 5 µg/kg/day subcutaneously was applied. Apheresis started on day 4. Results: Sixty-nine (90.8%) of 76 donors provided enough PBSCs on day 4 apheresis session. Younger age ( p = 0.004), higher PB CD34+ cell count on the 4th day of G-CSF ( p < 0.001), and male donor ( p = 0.010) were correlated with increased amounts of PBSCs yield. Univariate and multivariate logistic regression analyses to predict very good mobilizers (collected PBSCs ≥8 × 10<sup>6</sup>/kg after the first apheresis) were performed. In multivariate logistic regression analyses, male sex ( p = 0.004), PB CD34+ cell count ≥100/µL on the 4th day of G-CSF ( p < 0.001), and glomerular filtration rate ≥115 mL/min ( p = 0.031) were found to be independent predicting factors to demonstrate very good mobilizer. Conclusion: It seems that starting the apheresis on the 4th day of G-CSF administration is effective and to provide minimal G-CSF exposure in healthy donors.
Book
Preface Disease early detection and prevention offer numerous benefits to both our health and society. Often, the earlier a disease is detected, the higher the likelihood of successful cure or management. Managing a disease in its early stages can significantly reduce its impact on a patient’s quality of life and decrease healthcare costs. To detect a disease early, disease screening has become a popular tool. This method aims to determine the likelihood of a given patient having a particular disease by applying medical procedures or tests to check the major risk factors, even in patients without obvious symptoms of the disease. While disease screening primarily focuses on individual patients, disease surveillance is for detecting disease outbreaks early within a given population. For example, our society faces constant threats from bioterrorist attacks and pandemic influenza. It is thus important to monitor the incidence of infectious diseases continuously and detect their outbreaks promptly. This allows governments and individuals to implement timely disease control and prevention measures, minimizing the impact of these diseases. This book introduces some recent analytic methodologies and software packages developed for effective disease screening and disease surveillance. My exploration into disease screening was motivated by an experience around 2010 when I analyzed a dataset from the Framingham Heart Study (FHS). The FHS primarily aims to identify major risk factors for cardiovascular diseases (CVDs), and numerous CVD risk factors have been recognized since the study's inception in 1948, including smoking, high blood pressure, obesity, high cholesterol levels, physical inactivity, and more. During my data analysis, a pivotal question emerged: Could the identified CVD risk factors be utilized to predict the likelihood of a severe CVD, such as stroke, for individual patients? Statistically, this translates into a sequential decision-making problem, where the relevant statistical tool is the statistical process control (SPC) charts. However, traditional SPC charts, developed primarily for monitoring production lines in manufacturing, assume independence and identical distribution of process observations when the process is in-control (IC), and are designed for monitoring a single sequential process. In the context of disease screening, observed data of a patient's disease risk factors would rarely be independent and identically distributed over time and treating a patient's observed data as a process introduces numerous processes of different patients, making traditional SPC charts unsuitable to use. Recognizing the importance of the disease screening problem, I dedicated much of the past decade to addressing this issue. This endeavor led to the development of a series of new concepts and methods by my research team. The central methodology, termed the Dynamic Screening System (DySS), operates as follows: firstly, the regular longitudinal pattern of disease risk factors is estimated from a pre-collected dataset representing the population without the target disease. Subsequently, a patient's observed pattern of disease risk factors is cross-sectionally compared with the estimated regular longitudinal pattern at each observation time. The cumulative difference between the two patterns up to the current time is then employed to determine the patient's disease status at that time. DySS utilizes all historical data of the patient in its decision-making, and effectively accommodates the complex data structure, including time-varying data distribution. In the summer of 2013, upon joining the University of Florida (UF), I started to work on the pressing issue of disease surveillance due to its paramount importance in public health. Disease incidence data are typically collected sequentially over time and across multiple locations or regions, constituting spatio-temporal data. Similar to disease screening, disease surveillance is a sequential decision-making problem. However, its complexity arises from the intricate spatio-temporal data structure, encompassing seasonality, temporal/spatial variation, data correlation, and intricate data distribution. Common disease reporting and surveillance systems incorporate conventional SPC charts such as the cumulative sum (CUSUM) and exponentially weighted moving average (EWMA) charts. Additionally, retrospective methods like scan tests and generalized linear modeling approaches are employed for routine surveillance. Unfortunately, these methods often prove ineffective or unreliable due to their inability to handle the sequential nature of the problem or their restrictive model assumptions (cf., Section 2.7 and Chapters 7 and 8). Over the past decade, my research team has devoted significant effort to this domain, resulting in the development of several novel analytic methods for disease surveillance. Our initial method operates as follows: First, a nonparametric spatio-temporal modeling approach is employed to estimate the regular spatio-temporal pattern of disease incidence rates from observed data in a baseline time interval (e.g., a previous year without outbreaks). Second, the new spatial data collected at the current time are compared with the estimated regular pattern and decorrelated with all previous data. Third, an SPC chart is then applied to the decorrelated data to determine the occurrence of a disease outbreak by the current time. Modified versions of this method have been crafted to incorporate covariate information and accommodate specific spatial features of disease outbreaks. These methods adeptly handle the complex structure of observed data and have demonstrated effectiveness in disease surveillance. As discussed earlier, both disease screening and disease surveillance pose challenges as sequential decision-making problems, and traditional SPC charts prove unreliable in addressing them adequately. Consequently, disease screening and disease surveillance emerge as crucial applications of SPC, demanding the development of new methods tailored to their specific requirements. Fortuitously, my research journey in SPC began in 1998, allowing me to contribute significantly to several key areas within the field. Notable contributions include advancements in nonparametric process monitoring (e.g., Qiu and Hawkins 2001, Qiu 2018), monitoring correlated data (e.g., Qiu et al. 2020a, Xue and Qiu 2021), dynamic process monitoring (e.g., Qiu and Xiang 2014, Xie and Qiu 2023a), profile monitoring (e.g., Qiu et al. 2010, Zhou and Qiu 2022), and more. For a comprehensive description of SPC and some SPC charts developed by my research group, see the book Qiu (2014). This extensive experience has proven invaluable in my exploration of disease screening and disease surveillance, providing a robust foundation to innovate and tailor SPC methodologies to the distinctive challenges presented in these critical areas of public health. The book comprises nine chapters. In Chapter 1, a concise introduction sets the stage for understanding the challenges posed by disease screening and surveillance problems. Chapter 2 delves into fundamental statistical concepts and methods commonly employed in data modeling and analysis. Given that disease screening and surveillance involve sequential decision-making, Chapter 3 is dedicated to introducing essential SPC concepts and methods -- a major statistical tool for such problems. Chapters 4-6 focus on recent developments in DySS methods tailored for effective disease screening. Chapter 4 covers univariate and multivariate DySS methods based on direct monitoring of observed disease risk factors, while Chapter 5 introduces methods based on disease risk quantification and sequential monitoring of quantified disease risks. The practical implementation of DySS methods by the R package DySS is detailed in Chapter 6. Chapters 7-9 shift the focus to disease surveillance. Chapter 7 explores traditional methods utilizing the Knox test, scan statistics, and generalized linear modeling. Chapter 8 presents recent methods developed by my research team based on nonparametric spatio-temporal data modeling and monitoring. The implementation of these methods is demonstrated using the R package SpTe2M in Chapter 9. This book serves as an ideal primary textbook for a one-semester course focused on disease screening and/or disease surveillance, tailored for graduate students in biostatistics, bioinformatics, health data science, and related disciplines. Additionally, the book can be utilized as a supplementary textbook for courses covering analytic methods and tools relevant to medical and public health studies. Its content is designed to be accessible and beneficial for medical and public health researchers and practitioners. By introducing recent analytic tools for disease screening and surveillance, the book equips readers with valuable insights that can be easily implemented using the accompanying R packages DySS and SpTe2M. I extend my sincere gratitude to my current and former students and collaborators, Drs. Jun Li, Dongdong Xiang, Kai Yang, Lu You, and Jingnan Zhang, whose dedicated efforts, stimulating discussions, and constructive comments have played an invaluable role in the completion of this book. Their patience and insights have been indispensable. I express my deep appreciation to Dr. Xiulin Xie and Mr. Zibo Tian, who generously dedicated their time to reading the entire book manuscript and diligently corrected numerous typos and mistakes. Completing this book has been a three-year journey, and I owe a debt of gratitude to my wife, Yan, for providing unwavering help and support. Her efforts in managing household responsibilities and caring for our two sons, Andrew and Alan, allowed me to focus on this project. I extend my heartfelt thanks to my family for their love and constant support throughout this endeavor. Peihua Qiu Gainesville, Florida November 2023
Article
The day-to-day variability of electroencephalogram (EEG) poses a significant challenge to decode human brain activity in EEG-based passive brain-computer interfaces (pBCIs). Conventionally, a time-consuming calibration process is required to collect data from users on a new day to ensure the performance of the machine learning-based decoding model, which hinders the application of pBCIs to monitor mental workload (MWL) states in real-world settings. This study investigated the day-to-day stability of the raw power spectral density (PSD) and their periodic and aperiodic components decomposed by the Fitting Oscillations and One-Over-F algorithm. In addition, we validated the feasibility of using periodic components to improve cross-day MWL classification performance. Compared to the raw PSD (69.9%±18.5%) and the aperiodic component (69.4%±19.2%), the periodic component had better day-to-day stability and significantly higher cross-day classification accuracy (84.2%±11.0%). This finding not only enhances the practicality of pBCIs for MWL estimation but also unlocks the potential for decoding various brain states in future applications.
Article
Background Because of the global increase in the incidence of nonalcoholic fatty liver disease, the development of noninvasive, widely available, and highly accurate methods for assessing hepatic steatosis is necessary. Purpose To evaluate the performance of models with different combinations of quantitative US parameters for their ability to predict at least 5% steatosis in patients with chronic liver disease (CLD) as defined using MRI proton density fat fraction (PDFF). Materials and Methods Patients with CLD were enrolled in this prospective multicenter study between February 2020 and April 2021. Integrated backscatter coefficient (IBSC), signal-to-noise ratio (SNR), and US-guided attenuation parameter (UGAP) were measured in all participants. Participant MRI PDFF value was used to define at least 5% steatosis. Four models based on different combinations of US parameters were created: model 1 (UGAP alone), model 2 (UGAP with IBSC), model 3 (UGAP with SNR), and model 4 (UGAP with IBSC and SNR). Diagnostic performance of all models was assessed using area under the receiver operating characteristic curve (AUC). The model was internally validated using 1000 bootstrap samples. Results A total of 582 participants were included in this study (median age, 64 years; IQR, 52-72 years; 274 female participants). There were 364 participants in the steatosis group and 218 in the nonsteatosis group. The AUC values for steatosis diagnosis in models 1-4 were 0.92, 0.93, 0.95, and 0.96, respectively. The C-indexes of models adjusted by the bootstrap method were 0.92, 0.93, 0.95, and 0.96, respectively. Compared with other models, models 3 and 4 demonstrated improved discrimination of at least 5% steatosis (P < .01). Conclusion A model built using the quantitative US parameters UGAP, IBSC, and SNR could accurately discriminate at least 5% steatosis in patients with CLD. © RSNA, 2023 Supplemental material is available for this article. See also the editorial by Han in this issue.
Article
Introduction An understanding of how frequently individual fluoroquinolone (FQ) adverse events of interest (FQAEIs) occur within specific infection types is imperative to coordinate appropriate use of FQ and potential avoidance in certain disease states and/or patient populations. Objectives Study objectives were to i) quantify the incidence of three concerning FQAEI (i.e., adverse tendon event (TE), Clostridioides difficile infection (CDI), and aortic aneurysm/dissection (AAD)), ii) identify the patient‐level factors that predict these events, and iii) develop clinical risk scores to estimate the predicted probabilities of each FQAEI based on patient‐level covariates available on clinical presentation. Methods A retrospective cohort study was performed among hospitalized patients with community‐acquired pneumonia receiving care in the Upstate New York Veterans’ Healthcare Administration from 2011‐2016. The outcomes of interest for this study were the occurrence of TE, CDI, and AAD. We also evaluated a composite of these three outcomes, FQAEI. Results The study population consisted of 1,071 patients. The overall incidence of FQAEI, TE, AAD, and CDI were 6.5%, 1.8%, 4.5%, and 0.3%, respectively. For each outcome evaluated, the probability of the event of interest was predicted by the presence of certain comorbidities, previous health care exposure, choice of specific FQ antibiotic, or therapy duration. Concomitant steroids, pneumonia in preceding 180 days, and creatinine clearance <30 mL/min predicted FQAEI. Conclusions Individual frequencies of three important FQAEIs were quantified and risk scores were developed to estimate the probabilities of experiencing these events to help clinicians individualize treatment decisions for patients and reduce the potential risks of select FQAEIs.
Article
Objectives: Limited data exist regarding association between physical performance and in-hospital falls. This study was performed to investigate the association between physical performance and in-hospital falls in a high-risk population. Design: Retrospective cohort study. Setting and participants: The study population consisted of 1200 consecutive patients with a median age of 74 years (50.8% men) admitted to a ward with high incidence rates of falls, primarily in the departments of geriatrics and neurology, in a university hospital between January 2019 and December 2021. Methods: Short Physical Performance Battery (SPPB) was measured after treatment in the acute phase. As the primary end point of the study, the incidence of in-hospital falls was examined prospectively based on data from mandatory standardized incident report forms and electronic patient records. Results: SPPB assessment was performed at a median of 3 days after admission, and the study population had a median SPPB score of 3 points. Falls occurred in 101 patients (8.4%) over a median hospital stay of 15 days. SPPB score showed a significant inverse association with the incidence of in-hospital falls after adjusting for possible confounders (adjusted odds ratio for each 1-point decrease in SPPB: 1.19, 95% CI 1.10-1.28; P < .001), and an SPPB score ≤6 was significantly associated with increased risk of in-hospital falls. Inclusion of SPPB with previously identified risk factors significantly increased the area under the curve for in-hospital falls (0.683 vs. 0.740, P = .003). Conclusion and implications: This study demonstrated an inverse association of SPPB score with risk of in-hospital falls in a high-risk population and showed that SPPB assessment is useful for accurate risk stratification in a hospital setting.
Article
Objective: We investigated the association between altered levels of inflammatory proteins in the cervicovaginal fluid (CVF) and acute histologic chorioamnionitis (HCA) and funisitis in women with preterm labor (PTL). Methods: In this study, a total of 134 consecutive singleton pregnant women with PTL (at 23+0-34+0 weeks) who delivered preterm (at < 37 weeks) and from whom CVF samples were collected at admission were retrospectively enrolled. The CVF levels of haptoglobin, interleukin-6/8, kallistatin, lipocalin-2, matrix metalloproteinase (MMP)-8, resistin, S100 calcium-binding protein A8, and serpin A1 were determined using enzyme-linked immunosorbent assay. The placentas were histologically analyzed after delivery. Results: Multiple logistic regression analyses showed significant associations between elevated CVF interleukin-8 and resistin levels and acute HCA after adjusting for baseline covariates (e.g., gestational age at sampling). CVF haptoglobin, interleukin-6/8, kallistatin, MMP-8, and resistin levels were significantly higher in women with funisitis than in those without, whereas the baseline covariates were similar between the two groups (P > 0.1). The area under the receiver operating characteristic curves of the aforementioned biomarkers ranged from 0.61 to 0.77 regarding each outcome. Notably, HCA risk significantly increased with increasing CVF levels of interleukin-8 and resistin (P for trend < 0.05). Conclusions: Haptoglobin, interleukin-6/8, kallistatin, MMP-8, and resistin were identified as potential inflammatory CVF biomarkers predictive of acute HCA and funisitis in women with PTL. Moreover, the risk severity of acute HCA may be associated with the degree of the inflammatory response in the CVF (particularly based on interleukin-8 levels).
Article
Background: Pre-transplant inflammatory and nutritional status has not been widely explored in terms of its impact on autologous hematopoietic stem cell transplantation (auto-HSCT) outcomes in lymphoma patients. We aimed to evaluate the impact of body mass index (BMI), prognostic nutri-tional index (PNI), and C-reactive protein to albumin ratio (CAR) on auto-HSCT outcomes. Meth-od: We retrospectively analyzed 87 consecutive lymphoma patients who underwent their first auto-HSCT at the Adult Hematopoietic Stem Cell Transplantation Unit at Akdeniz University Hospital. Result: CAR had no impact on post-transplant outcomes. PNI≤50 was an independent prognostic factor for both shorter progression free survival (PFS) (hazard ratio [HR]=2.43, P = .025) and worse overall survival (OS) (HR=2.93, P = .021), respectively. The 5-year PFS rate was significantly lower in patients with PNI≤50 than in patients with PNI>50 (37.3% vs. 59.9%, P = .003). The 5-year OS rate in patients with PNI≤50 had significantly low when compared with patients who had PNI>50 as well (45.5% vs. 67.2%, P = .011). Patients with BMI<25 had higher 100-day TRM compared with patients with BMI≥25 (14.7% vs 1.9%, P = .020). BMI<25 was an independent prognostic factor associated with shorter PFS and OS (HR=2.98, P = .003, HR=5.06, P < .001, respectively). The 5-year PFS rate was significantly lower in patients with BMI<25 than patients with BMI≥25 (40.2% vs. 53.7%, P = .037). Similarly, the 5-year OS rate in patients with BMI<25 had significant-ly inferior compared to patients with BMI≥25 (42.7% vs. 64.7%, P = .002). Conclusion: Our study confirms that lower BMI and CAR have negative impacts on auto-HSCT outcomes in lymphoma patients. Furthermore, higher BMI should not be considered an obstacle for lymphoma patients who need auto-HSCT, conversely, it could be an advantage for post-transplant outcomes.
Article
Currently, obtaining accurate medical annotations requires high labor and time effort, which largely limits the development of supervised learning-based tumor detection tasks. In this work, we investigated a weakly supervised learning model for detecting breast lesions in dynamic contrast-enhanced MRI (DCE-MRI) with only image-level labels. Two hundred fifty-four normal and 398 abnormal cases with pathologically confirmed lesions were retrospectively enrolled into the breast dataset, which was divided into the training set (80%), validation set (10%), and testing set (10%) at the patient level. First, the second image series S2 after the injection of a contrast agent was acquired from the 3.0-T, T1-weighted dynamic enhanced MR imaging sequences. Second, a feature pyramid network (FPN) with convolutional block attention module (CBAM) was proposed to extract multi-scale feature maps of the modified classification network VGG16. Then, initial location information was obtained from the heatmaps generated using the layer class activation mapping algorithm (Layer-CAM). Finally, the detection results of breast lesion were refined by the conditional random field (CRF). Accuracy, sensitivity, specificity, and area under the receiver operating characteristic (ROC) curve (AUC) were utilized for evaluation of image-level classification. Average precision (AP) was estimated for breast lesion localization. Delong's test was used to compare the AUCs of different models for significance. The proposed model was effective with accuracy of 95.2%, sensitivity of 91.6%, specificity of 99.2%, and AUC of 0.986. The AP for breast lesion detection was 84.1% using weakly supervised learning. Weakly supervised learning based on FPN combined with Layer-CAM facilitated automatic detection of breast lesion.
Article
Full-text available
Background: The study aimed to develop a nomogram model to predict overall survival (OS) and construct a risk stratification system of upper thoracic esophageal squamous cell carcinoma (ESCC). Methods: Newly diagnosed 568 patients with upper ESCC at Fujian Medical University Cancer Hospital were taken as a training cohort, and additional 155 patients with upper ESCC from Sichuan Cancer Hospital Institute were used as a validation cohort. A nomogram was established using Cox proportional hazard regression to identify prognostic factors for OS. The predictive power of nomogram model was evaluated by using 4 indices: concordance statistics (C-index), time-dependent ROC (ROCt) curve, net reclassification index (NRI) and integrated discrimination improvement (IDI). Results: In this study, multivariate analysis revealed that gender, clinical T stage, clinical N stage and primary gross tumor volume were independent prognostic factors for OS in the training cohort. The nomogram based on these factors presented favorable prognostic efficacy in the both training and validation cohorts, with concordance statistics (C-index) of 0.622, 0.713, and area under the curve (AUC) value of 0.709, 0.739, respectively, which appeared superior to those of the American Joint Committee on Cancer (AJCC) staging system. Additionally, net reclassification index (NRI) and integrated discrimination improvement (IDI) of the nomogram presented better discrimination ability to predict survival than those of AJCC staging. Furthermore, decision curve analysis (DCA) of the nomogram exhibited greater clinical performance than that of AJCC staging. Finally, the nomogram fairly distinguished the OS rates among low, moderate, and high risk groups, whereas the OS curves of clinical stage could not be well separated among clinical AJCC stage. Conclusion: We built an effective nomogram model for predicting OS of upper ESCC, which may improve clinicians' abilities to predict individualized survival and facilitate to further stratify the management of patients at risk.
Article
Purpose: Diagnosis of congenital optic nerve hypoplasia (CONH) can be challenging in children or uncooperative individuals. Misdiagnosis can lead to inappropriate treatment; thus, it is important to identify an objective and reliable measurement. The purpose of this study was to evaluate whether Cirrus spectral domain optical coherence tomography (SD-OCT) is a valid test for diagnosing CONH by comparing it to the disc-macula distance to disc diameter (DM:DD) ratio. Methods: A total of 93 participants (64 controls and 29 CONH) underwent comprehensive eye examinations, fundus photography and Cirrus SD-OCT. Receiver operating characteristic (ROC) curves for the DM:DD ratio and OCT disc area were constructed for CONH and control eyes. Results: Mean (±SD) OCT disc area was 1.46 (±0.42) mm2 and 1.89 (±0.38) mm2 for CONH and control eyes, respectively (p < 0.0001). The area under the curve for the DM:DD ratio was 0.97 (95% confidence interval: 0.91-0.99) and 0.79 for OCT disc area (95% confidence interval: 0.70-0.86), which were significantly different (p = 0.0005). The optimal cut-off value for OCT disc area was 1.66 mm2 (76% sensitivity, 70% specificity), while the optimal cut-off for DM:DD ratio was 3.10 (85% sensitivity and 95% specificity). The Cirrus SD-OCT showed a tendency to overestimate disc size, especially in cases with no light perception (NLP) or segmental CONH. Conclusions: Although the DM:DD ratio is superior to OCT in diagnosing CONH with a higher sensitivity and specificity, the ratio is subject to inter-examiner variability and can be challenging to obtain. We found the Cirrus SD-OCT to be a valid objective test for diagnosing CONH. Caution is advised when using SD-OCT in segmental CONH or in an eye with NLP. We suggest 1.66 mm2 as the optimal cut-off value for Cirrus SD-OCT disc area to differentiate a hypoplastic from a normal optic disc.
Preprint
Full-text available
Purpose: To examine cross-sectional and longitudinal relationships between body mass index (BMI)-derived metrics for measuring severe obesity (SO) using the Centers for Disease Control and Prevention (CDC) and World Health Organization (WHO) references and cardiometabolic risk factors in children and adolescents. Methods: In this cohort study completed from 2013 to 2021, we examined data from 3- to 18-year-olds enrolled in the CANadian Pediatric Weight management Registry. Anthropometric data were used to create nine BMI-derived metrics based on the CDC and WHO references. Cardiometabolic risk factors were examined, including dysglycemia, dyslipidemia, and elevated blood pressure. Analyses included intraclass correlation coefficients (ICC) and receiver operator characteristic area-under-the-curve (ROC AUC). Results: Our sample included 1,288 participants (n=666 [51.7%] girls; n=874 [67.9%] white), with SO of 59.9–67.0%. ICCs revealed high tracking (0.90–0.94) for most BMI-derived metrics. ROC AUC analyses showed CDC and WHO metrics discriminated the presence of cardiometabolic risk factors, which improved with increasing numbers of risk factors. Overall, most BMI-derived metrics rated poorly in identifying presence of cardiometabolic risk factors. Conclusion: CDC BMI percent of the 95th percentile and WHO BMIz performed similarly as measures of SO, suggesting both can be used for clinical care and research in pediatrics. The latter definition may be particularly useful for clinicians and researchers from countries that recommend using the WHO growth reference.
Article
Problem: To investigate whether altered expression of various inflammation-, angiogenesis-, and extracellular matrix-related mediators in cervicovaginal fluid (CVF) could be independently associated with acute histological chorioamnionitis (HCA), microbial-associated HCA, and funisitis in women with preterm premature rupture of membranes (PPROM). Method of study: Clinical data of 102 consecutive singleton pregnant women with PPROM at 23+0 to 34+0 weeks were retrospectively analyzed. CVF samples were collected upon admission. Levels of APRIL, DKK-3, IGFBP-1/2, IL-6/8, lipocalin-2, M-CSF, MIP-1α, MMP-8/9, S100A8A9, TGFBI, TIMP-1, TNFR2, uPA, and VDBP were determined by ELISA. Placentas were histologically examined after birth. Results: Multivariate logistic regression analyses showed that: (1) elevated CVF levels of IL-8 and TNFR2 were independently associated with acute HCA; (2) elevated CVF levels of IL-6, IL-8, M-CSF, MMP-8, and TNFR2 were independently associated with microbial-associated HCA; and (3) elevated CVF IL-8 and MMP-8 levels were independently associated with funisitis when adjusted for gestational age. Areas under the curves of the aforementioned CVF biomarkers ranged within 0.61-0.77, thereby demonstrating poor to fair diagnostic capacity for these clinical endpoints. HCA risk significantly increased as the CVF levels of each inflammatory mediator increased (P for trend < 0.05). Conclusions: Herein, we identified several inflammatory biomarkers (IL-6/8, M-CSF, MMP-8, and TNFR2) in the CVF that are independently associated with acute HCA, microbial-associated HCA, and funisitis in women with PPROM. Furthermore, the degree of inflammatory response in the CVF, based on the levels of these proteins, demonstrated a direct relationship with HCA risk (especially risk severity). This article is protected by copyright. All rights reserved.
Article
Full-text available
Background Exploring the human microbiome in multiple body niches is beneficial for clinicians to determine which microbial dysbiosis should be targeted first. We aimed to study whether both the fecal and vaginal microbiomes are disrupted in SLE patients and whether they are correlated, as well as their associations with immunological features. Methods A group of 30 SLE patients and 30 BMI-age-matched healthy controls were recruited. Fecal and vaginal samples were collected, the 16S rRNA gene was sequenced to profile microbiomes, and immunological features were examined. Results Distinct fecal and vaginal bacterial communities and decreased microbial diversity in feces compared with the vagina were found in SLE patients and controls. Altered bacterial communities were found in the feces and vaginas of patients. Compared with the controls, the SLE group had slightly lower gut bacterial diversity, which was accompanied by significantly higher bacterial diversity in their vaginas. The most predominant bacteria differed between feces and the vagina in all groups. Eleven genera differed in patients’ feces; for example, Gardnerella and Lactobacillus increased, whereas Faecalibacterium decreased. Almost all the 13 genera differed in SLE patients’ vaginas, showing higher abundances except for Lactobacillus. Three genera in feces and 11 genera in the vagina were biomarkers for SLE patients. The distinct immunological features were only associated with patients’ vaginal microbiomes; for example, Escherichia−Shigella was negatively associated with serum C4. Conclusions Although SLE patients had fecal and vaginal dysbiosis, dysbiosis in the vagina was more obvious than that in feces. Additionally, only the vaginal microbiome interacted with patients’ immunological features.
Article
Background: Current predictive tools for TKA focus on clinicians rather than patients as the intended user. The purpose of this study was to develop a patient-focused model to predict health-related quality of life outcomes at 1-year post-TKA. Methods: Patients who underwent primary TKA for osteoarthritis from a tertiary institutional registry after January 2006 were analysed. The primary outcome was improvement after TKA defined by the minimal clinically important difference in utility score at 1-year post-surgery. Potential predictors included demographic information, comorbidities, lifestyle factors, and patient-reported outcome measures. Four models were developed, including both conventional statistics and machine learning (artificial intelligence) methods: logistic regression, classification tree, extreme gradient boosted trees, and random forest models. Models were evaluated using discrimination and calibration metrics. Results: A total of 3755 patients were included in the study. The logistic regression model performed the best with respect to both discrimination (AUC = 0.712) and calibration (intercept = -0.083, slope = 1.123, Brier score = 0.202). Less than 2% (n = 52) of the data were missing and therefore removed for complete case analysis. The final model used age (categorical), sex, baseline utility score, and baseline Veterans-RAND 12 responses as predictors. Conclusion: The logistic regression model performed better than machine learning algorithms with respect to AUC and calibration plot. The logistic regression model was well calibrated enough to stratify patients into risk deciles based on their likelihood of improvement after surgery. Further research is required to evaluate the performance of predictive tools through pragmatic clinical trials. Level of evidence: Level II, decision analysis.
Article
Optimal performance of collaborative tasks requires consideration of the interactions between intelligent agents and their human counterparts. The functionality and success of these agents lie in their ability to maintain user trust; with too much or too little trust leading to over-reliance and under-utilisation, respectively. This problem highlights the need for an appropriate trust calibration methodology with an ability to vary user trust and decision making in-task. An online experiment was run to investigate whether stimulus difficulty and the implementation of agent features by a collaborative recommender system interact to influence user perception, trust and decision making. Agent features are changes to the Human-Agent interface and interaction style, and include presentation of a disclaimer message, a request for more information from the user and no additional feature. Signal detection theory is utilised to interpret decision making, with this applied to assess decision making on the task, as well as with the collaborative agent. The results demonstrate that decision change occurs more for hard stimuli, with participants choosing to change their initial decision across all features to follow the agent recommendation. Furthermore, agent features can be utilised to mediate user decision making and trust in-task, though the direction and extent of this influence is dependent on the implemented feature and difficulty of the task. The results emphasise the complexity of user trust in Human-Agent collaboration, highlighting the importance of considering task context in the wider perspective of trust calibration.
Article
Full-text available
We aim to determine whether combined thermal and ultrasound (CTUS) imaging can identify rheumatoid arthritis (RA) patients with at least moderate disease activity (DAS28 > 3.2). Temperature differences of maximum (Tmax), average (Tavg) and minimum (Tmin) temperatures from a control temperature at 22 joints (bilateral hands) were summed up to derive the respective MAX, AVG and MIN per patient. MAX (PD), AVG (PD) and MIN (PD) are CTUS results derived by multiplying MAX, AVG and MIN by a factor of 2 when a patient’s total ultrasound power Doppler (PD) joint inflammation score > median score, which otherwise remained unchanged. Receiver operating characteristic (ROC) analysis was used to determine whether CTUS imaging can identify patients with DAS28 > 3.2. In this cross-sectional study, 814 joints were imaged among 37 RA patients (mean disease duration, 31 months). CTUS (but not single modality) imaging parameters were all significantly greater comparing patients with DAS28 > 3.2 versus those with DAS28 ≤ 3.2 (all P < 0.01). Area under the ROC curves (AUCs) using cut-off levels of ≥ 94.5, ≥ 64.6 and ≥ 42.3 in identifying patients with DAS28 > 3.2 were 0.73 , 0.76 and 0.76 for MAX (PD), AVG (PD) and MIN (PD), respectively (with sensitivity ranging from 58 to 61% and specificity all 100%). The use of CTUS in detecting a greater severity of joint inflammation among patients with at least moderate disease activity (DAS28 > 3.2) appears promising and will require further validation in independent RA cohorts.
Article
Background: Among the main methods used to identify an altered flexion relaxation phenomenon (FRP) in nonspecific chronic low back pain (NSCLBP), it has been previously demonstrated that flexion relaxation ratio (FRR) and extension relaxation ratio (ERR) are more objective than the visual reference method. Objective: To determine the sensitivity and specificity of the different methods used to calculate the ratios in terms of their ability to identify an altered FRP in NSCLBP. Methods: Forty-four NSCLBP patients performed a standing maximal trunk flexion task. Surface electromyography (sEMG) was recorded along the erector spinae longissimus (ESL) and multifidus (MF) muscles. Altered FRP based on sEMG was visually identified by three experts (current standard). Six FRR methods and five ERR methods were used both for the ESL and MF muscles. ROC curves (with areas under the curve (AUC) and sensitivity/specificity) were generated for each ratio. Results: All methods used to calculate these ratios had an AUC higher than 0.9, excellent sensitivity (>90 %), and good specificity (80-100 %) for both ESL and MF muscles. Conclusion: Both FRP ratios (FRR and ERR) for MF and ESL muscles, appear to be an objective, sensitive and specific method for identifying altered FRP in NSCLBP patients.
Article
Full-text available
The area under the ROC curve is a common index summarizing the information contained in the curve. When comparing two ROC curves, though, problems arise when interest does not lie in the entire range of false-positive rates (and hence the entire area). Numerical integration is suggested for evaluating the area under a portion of the ROC curve. Variance estimates are derived. The method is applicable for either continuous or rating scale binormal data, from independent or dependent samples. An example is presented which looks at rating scale data of computed tomographic scans of the head with and without concomitant use of clinical history. The areas under the two ROC curves over an a priori range of false-positive rates are examined, as well as the areas under the two curves at a specific point.
Article
Full-text available
A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the "rating" method, or by mathematical predictions based on patient characteristics, is presented. It is shown that in such a setting the area represents the probability that a randomly chosen diseased subject is (correctly) rated or ranked with greater suspicion than a randomly chosen non-diseased subject. Moreover, this probability of a correct ranking is the same quantity that is estimated by the already well-studied nonparametric Wilcoxon statistic. These two relationships are exploited to (a) provide rapid closed-form expressions for the approximate magnitude of the sampling variability, i.e., standard error that one uses to accompany the area under a smoothed ROC curve, (b) guide in determining the size of the sample required to provide a sufficiently reliable estimate of this area, and (c) determine how large sample sizes should be to ensure that one can statistically detect differences in the accuracy of diagnostic techniques.
Article
Objective.-To design and implement a methodologically rigorous study to examine the accuracy of magnetic resonance imaging (MRI) in a patient population clinically suspected of having multiple sclerosis (MS). Design and Setting.-Three hundred three patients, who were referred to two university medical centers because of the suspicion of MS, underwent MRI of the head and double-dose, contrast-enhanced computed tomography (CT) of the head. The images were read by two observers individually and without knowledge of the clinical course or final diagnosis. Patients were followed up for at least 6 months and reevaluated clinically with subsequent neurological examination. Final diagnosis (MS or not MS) was made by a panel of neurologists on the basis of the clinical findings at presentation, those that developed during follow-up, and other diagnostic tests. The results of the imaging procedures were excluded to avoid incorporation bias. Diagnostic accuracy was assessed using receiver-operating characteristic analysis and likelihood ratios. Results.-Magnetic resonance imaging of the head was considerably more accurate than CT in diagnosing MS. The area under the receiver-operating characteristic curve for MS was 0.82 (compared with 0.52 for CT) indicating that MRI was a good but not definitively accurate test for MS. A ''definite MS'' reading on an MRI of the head was specific for MS (likelihood ratio, 24.9) and essentially established the diagnosis, especially in patients clinically designated as ''probable MS'' before testing. However, MRI of the head was negative for MS in 25% and equivocal in 40% of the patients considered to have MS by the diagnostic review committee (sensitivity, 58%). Conclusions.-Magnetic resonance imaging of the head provided assistance in the diagnosis of MS when lesions were visualized. Its ability far exceeded imaging with double-contrast CT. The sensitivity and, therefore, the predictive value of a negative MRI result for MS were, however, not sufficiently high for a normal MRI to be used to conclusively exclude the diagnosis of MS.
Article
Objective. —To design and implement a methodologically rigorous study to examine the accuracy of magnetic resonance imaging (MRI) in a patient population clinically suspected of having multiple sclerosis (MS).Design and Setting. —Three hundred three patients, who were referred to two university medical centers because of the suspicion of MS, underwent MRI of the head and double-dose, contrast-enhanced computed tomography (CT) of the head. The images were read by two observers individually and without knowledge of the clinical course or final diagnosis. Patients were followed up for at least 6 months and reevaluated clinically with subsequent neurological examination. Final diagnosis (MS or not MS) was made by a panel of neurologists on the basis of the clinical findings at presentation, those that developed during follow-up, and other diagnostic tests. The results of the imaging procedures were excluded to avoid incorporation bias. Diagnostic accuracy was assessed using receiver-operating characteristic analysis and likelihood ratios.Results. —Magnetic resonance imaging of the head was considerably more accurate than CT in diagnosing MS. The area under the receiver-operating characteristic curve for MS was 0.82 (compared with 0.52 for CT) indicating that MRI was a good but not definitively accurate test for MS. A "definite MS" reading on an MRI of the head was specific for MS (likelihood ratio, 24.9) and essentially established the diagnosis, especially in patients clinically designated as "probable MS" before testing. However, MRI of the head was negative for MS in 25% and equivocal in 40% of the patients considered to have MS by the diagnostic review committee (sensitivity, 58%).Conclusions. —Magnetic resonance imaging of the head provided assistance in the diagnosis of MS when lesions were visualized. Its ability far exceeded imaging with double-contrast CT. The sensitivity and, therefore, the predictive value of a negative MRI result for MS were, however, not sufficiently high for a normal MRI to be used to conclusively exclude the diagnosis of MS.(JAMA. 1993;269:3146-3151)
Article
Receiver operating characteristic graphs are shown to be a variant form of ordinal dominance graphs. The area above the latter graph and the area below the former graph are useful measures of both the size or importance of a difference between two populations and/or the accuracy of discrimination performance. The usual estimator for this area is closely related to the Mann-Whitney U statistic. Statistical literature on this area estimator is reviewed. For large sample sizes, the area estimator is approximately normally distributed. Formulas for the variance and the maximum variance of the area estimator are given. Several different methods of constructing confidence intervals for the area measure are presented and the strengths and weaknesses of each of these methods are discussed. Finally, the Appendix presents the derivation of a new mathematical result, the maximum variance of the area estimator over convex ordinal dominance graphs.
Article
Ogilvie and Creelman have recently attempted to develop maximum likelihood estimates of the parameters of signal-detection theory from the data of yes-no ROC curves. Their method involved the assumption of a logistic distribution rather than the normal distribution in order to make the mathematics more tractable. The present paper presents a method of obtaining maximum likelihood estimates of these parameters using the assumption of underlying normal distributions.
Article
The limitations of diagnostic "accuracy" as a measure of decision performance require introduction of the concepts of the "sensitivity" and "specificity" of a diagnostic test. These measures and the related indices, "true positive fraction" and "false positive fraction," are more meaningful than "accuracy," yet do not provide a unique description of diagnostic performance because they depend on the arbitrary selection of a decision threshold. The receiver operating characteristic (ROC) curve is shown to be a simple yet complete empirical description of this decision threshold effect, indicating all possible combinations of the relative frequencies of the various kinds of correct and incorrect decisions. Practical experimental techniques for measuring ROC curves are described, and the issues of case selection and curve-fitting are discussed briefly. Possible generalizations of conventional ROC analysis to account for decision performance in complex diagnostic tasks are indicated. ROC analysis is shown to be related in a direct and natural way to cost/benefit analysis of diagnostic decision making. The concepts of "average diagnostic cost" and "average net benefit" are developed and used to identify the optimal compromise among various kinds of diagnostic error. Finally, the way in which ROC analysis can be employed to optimize diagnostic strategies is suggested.
Article
Receiver operating characteristic (ROC) analysis has been used in a broad variety of medical imaging studies during the past 15 years, and its advantages over more traditional measures of diagnostic performance are now clearly established. But despite the essential simplicity of the approach, workers in the field often find--sometimes only after an ROC study is under way--that a number of subtle issues related to experimental design and data analysis must be confronted in practice. Many of these issues have not been discussed in the literature in detail, and most are not well known. The purposes of this paper are to make users of ROC methodology in medical imaging aware of potential problems that should be confronted before an ROC study is begun and to indicate, at least broadly, how those problems may be dealt with, given the present state of the art. Some of the issues raised here can be addressed adequately by easily prescribed techniques, whereas others remain difficult and will be resolved fully only by new methodologic developments.
Article
If the performance of a diagnostic imaging system is to be evaluated objectively and meaningfully, one must compare radiologists' image-based diagnoses with actual states of disease and health in a way that distinguishes between the inherent diagnostic capacity of the radiologists' interpretations of the images, and any tendencies to "under-read" or "over-read". ROC methodology provides the only known basis for distinguishing between these two aspects of diagnostic performance. After identifying the fundamental issues that motivate ROC analysis, this article develops ROC concepts in an intuitive way. The requirements of a valid ROC study and practical techniques for ROC data collection and data analysis are sketched briefly. A survey of the radiologic literature indicates the broad variety of evaluation studies in which ROC analysis has been employed.
Article
Methods of evaluating and comparing the performance of diagnostic tests are of increasing importance as new tests are developed and marketed. When a test is based on an observed variable that lies on a continuous or graded scale, an assessment of the overall value of the test can be made through the use of a receiver operating characteristic (ROC) curve. The curve is constructed by varying the cutpoint used to determine which values of the observed variable will be considered abnormal and then plotting the resulting sensitivities against the corresponding false positive rates. When two or more empirical curves are constructed based on tests performed on the same individuals, statistical analysis on differences between curves must take into account the correlated nature of the data. This paper presents a nonparametric approach to the analysis of areas under correlated ROC curves, by using the theory on generalized U-statistics to generate an estimated covariance matrix.
Article
Signal detectability studies help radiologists evaluate equipment systems and performance of assistants.
Article
The ROC plot is a useful tool in the evaluation of the performance of medical tests for separating two populations. For a two-state decision rule based on such a test, the ROC plot is the graph of all observed (1-specificity, sensitivity) pairs. Each point on this empirical plot can be represented by a 2 × 2 contingency table. The non-parametric statistics of Mann-Whitney and Kolmogorov-Smirnov can be immediately identified on this plot. Local non-parametric confidence interval procedures related to the theoretical ROC curve are briefly reviewed. For continuous data, two new simultaneous confidence regions associated with the ROC curve are presented, one based on Kolmogorov-Smirnov confidence bands for distribution functions and the other based on bootstrapping. Two different tests on the same patients can be compared on the ROC scale. For continuous data, one important problem concerns the comparison of two ROC plots (as would arise from two correlated diagnostic tests on each patient) using a sup norm (this metric can detect differences that the ROC area cannot). The distribution of a statistic based on this norm is studied, using the bootstrap. A biomedical example illustrates the methodologies.
Article
The clinical performance of a laboratory test can be described in terms of diagnostic accuracy, or the ability to correctly classify subjects into clinically relevant subgroups. Diagnostic accuracy refers to the quality of the information provided by the classification device and should be distinguished from the usefulness, or actual practical value, of the information. Receiver-operating characteristic (ROC) plots provide a pure index of accuracy by demonstrating the limits of a test's ability to discriminate between alternative states of health over the complete spectrum of operating conditions. Furthermore, ROC plots occupy a central or unifying position in the process of assessing and using diagnostic tools. Once the plot is generated, a user can readily go on to many other activities such as performing quantitative ROC analysis and comparisons of tests, using likelihood ratio to revise the probability of disease in individual subjects, selecting decision thresholds, using logistic-regression analysis, using discriminant-function analysis, or incorporating the tool into a clinical strategy by using decision analysis.
Article
To design and implement a methodologically rigorous study to examine the accuracy of magnetic resonance imaging (MRI) in a patient population clinically suspected of having multiple sclerosis (MS). Three hundred three patients, who were referred to two university medical centers because of the suspicion of MS, underwent MRI of the head and double-dose, contrast-enhanced computed tomography (CT) of the head. The images were read by two observers individually and without knowledge of the clinical course or final diagnosis. Patients were followed up for at least 6 months and reevaluated clinically with subsequent neurological examination. Final diagnosis (MS or not MS) was made by a panel of neurologists on the basis of the clinical findings at presentation, those that developed during follow-up, and other diagnostic tests. The results of the imaging procedures were excluded to avoid incorporation bias. Diagnostic accuracy was assessed using receiver-operating characteristic analysis and likelihood ratios. Magnetic resonance imaging of the head was considerably more accurate than CT in diagnosing MS. The area under the receiver-operating characteristic curve for MS was 0.82 (compared with 0.52 for CT) indicating that MRI was a good but not definitively accurate test for MS. A "definite MS" reading on an MRI of the head was specific for MS (likelihood ratio, 24.9) and essentially established the diagnosis, especially in patients clinically designated as "probable MS" before testing. However, MRI of the head was negative for MS in 25% and equivocal in 40% of the patients considered to have MS by the diagnostic review committee (sensitivity, 58%). Magnetic resonance imaging of the head provided assistance in the diagnosis of MS when lesions were visualized. Its ability far exceeded imaging with double-contrast CT. The sensitivity and, therefore, the predictive value of a negative MRI result for MS were, however, not sufficiently high for a normal MRI to be used to conclusively exclude the diagnosis of MS.
Article
Area under a receiver operating characteristic (ROC) curve (Az) is widely used as an index of diagnostic performance. However, Az is not a meaningful summary of clinical diagnostic performance when high sensitivity must be maintained clinically. The authors developed a new ROC partial area index, which measures clinical diagnostic performance more meaningfully in such situations, to summarize an ROC curve in only a high-sensitivity region. The mathematical formation of the partial area index was derived from the conventional binormal model. Statistical tests of apparent differences in this index were formulated analogous to that of Az. One common statistical test involving the partial area index was validated by computer simulations under realistic conditions. An example in mammography illustrates a situation in which the partial area index is more meaningful than Az in measuring clinical diagnostic performance. The partial area index can be used as a more meaningful alternative to the conventional Az index for highly sensitive diagnostic tests.
Article
To evaluate the differences in accuracy and observer performance at conventional radiography and at digital radiography with a 4 million-pixel charge-coupled device (CCD) for the diagnosis of gastric cancers. A prospective study was performed of 225 patients with suspected gastric cancer who were referred to our hospital from January 1997 through February 1997. One hundred twelve patients were examined at conventional radiography and 113 were examined at digital radiography, and 24 and 27 patients had gastric cancer, respectively. Six radiologists interpreted the images, with attention to tumor findings. They were blinded to the clinical details, and their interpretations were rated against those of three other radiologists who examined the patients and who were aware of the clinical information such as endoscopic features and/or histopathologic findings in biopsy specimens. Receiver operating characteristic (ROC) analysis was used to compare the differences in observer performance for the diagnosis of gastric cancers at conventional radiography and at digital radiography. The overall sensitivity was 64.6% at conventional radiography versus 75.3% at digital radiography (P =. 287); specificities were 84.5% and 90.5%, respectively (P =.011); and the positive predictive values were 53.1% and 71.3%, respectively (P =.036). ROC analysis clearly showed higher diagnostic performance at digital radiography than at conventional radiography. The data demonstrate the high diagnostic value of digital radiography with a 4 million-pixel CCD for gastric cancers. The technique has considerable potential as an alternative to conventional gastrointestinal radiography.