Article

Receiver Operating Characteristic Curves and Their Use in Radiology1

November 2003
Radiology 229(1):3-8

November 2003
229(1):3-8

DOI:10.1148/radiol.2291010898

Source
PubMed

Authors:

Nancy a Obuchowski

Cleveland Clinic

Sensitivity and specificity are the basic measures of accuracy of a diagnostic test; however, they depend on the cut point used to define "positive" and "negative" test results. As the cut point shifts, sensitivity and specificity shift. The receiver operating characteristic (ROC) curve is a plot of the sensitivity of a test versus its false-positive rate for all possible cut points. The advantages of the ROC curve as a means of defining the accuracy of a test, construction of the ROC, and identification of the optimal cut point on the ROC curve are discussed. Several summary measures of the accuracy of a test, including the commonly used percentage of correct diagnoses and area under the ROC curve, are described and compared. Two examples of ROC curve application in radiologic research are presented.

A Study on ML-Based Sleep Score Model Using Lifelog Data

Article

Full-text available

Jan 2023

The rate of people suffering from sleep disorders has been continuously increasing in recent years, such that interest in healthy sleep is also naturally increasing. Although there are many health-care industries and services related to sleep, specific and objective evaluation of sleep habits is still lacking. Most of the sleep scores presented in wearable-based sleep health services are calculated based only on the sleep stage ratio, which is not sufficient for studies considering the sleep dimension. In addition, most score generation techniques use weighted expert evaluation models, which are often selected based on experience instead of objective weights. Therefore, this study proposes an objective daily sleep habit score calculation method that considers various sleep factors based on user sleep data and gait data collected from wearable devices. A credit rating model built as a logistic regression model is adapted to generate sleep habit scores for good and bad sleep. Ensemble machine learning is designed to generate sleep habit scores for the intermediate sleep remainder. The sleep habit score and evaluation model of this study are expected to be in demand not only in health-care and health-service applications but also in the financial and insurance sectors.

Evaluation of computed tomography in the diagnosis of ultrasound-proven diaphragm dysfunction

Article

Full-text available

Mar 2024
RESP RES

Introduction Computed tomography (CT) is routinely employed on the evaluation of dyspnea, yet limited data exist on its assessment of diaphragmatic muscle. This study aimed to determine the capability of CT in identifying structural changes in the diaphragm among patients with ultrasound-confirmed diaphragmatic dysfunction. Methods Diaphragmatic ultrasounds conducted between 2018 and 2021 at our center in Marseille, France, were retrospectively collected. Diaphragmatic pillars were measured on CT scans at the L1 level and the celiac artery. Additionally, the difference in height between the two diaphragmatic domes in both diaphragmatic dysfunction cases and controls was measured and compared. Results A total of 65 patients were included, comprising 24 with diaphragmatic paralysis, 13 with diaphragmatic weakness, and 28 controls. In the case group (paralysis and weakness) with left dysfunctions (n = 24), the CT thickness of the pillars at the level of L1 and the celiac artery was significantly thinner compared with controls (2.0 mm vs. 7.4 mm and 1.8 mm vs. 3.1 mm, p < 0.001 respectively). Significantly different values were observed for paralysis (but not weakness) in the right dysfunction subgroup (n = 15) (2.6 mm vs. 7.4 mm and 2.2 mm vs. 3.8 mm, p < 0.001 respectively, for paralysis vs. controls). Regardless of the side of dysfunction, a significant difference in diaphragmatic height was observed between cases and controls (7.70 cm vs. 1.16 cm and 5.51 cm vs. 1.16 cm, p < 0.001 for right and left dysfunctions, respectively). Threshold values determined through ROC curve analyses for height differences between the two diaphragmatic domes, indicative of paralysis or weakness in the right dysfunctions, were 4.44 cm and 3.51 cm, respectively. Similarly for left dysfunctions, the thresholds were 2.70 cm and 2.48 cm, respectively, demonstrating good performance (aera under the curve of 1.00, 1.00, 0.98, and 0.79, respectively). Conclusion In cases of left diaphragmatic dysfunction, as well as in paralysis associated with right diaphragmatic dysfunction, CT revealed thinner pillars. Additionally, a notable increase in the difference in diaphragmatic height demonstrated a strong potential to identify diaphragmatic dysfunction, with specific threshold values.

Evaluation of computed tomography in the diagnosis of ultrasound-proven diaphragm dysfunction

Preprint

Full-text available

Dec 2023

Introduction: Computed tomography (CT) is routinely performed to assess dyspnea, but few data are evaluating diaphragmatic muscle using CT. This study aimed to assess CT in the diagnosis of diaphragmatic dysfunction. Methods: We retrospectively collected diaphragmatic ultrasounds performed between 2018 and 2021 at our center (Marseille, France). We measured diaphragmatic pillars on CT at the level of L1 and the celiac artery, as well as the difference in height between the two diaphragmatic domes in diaphragmatic dysfunctions and controls, and compared with ultrasound measurements. Results: 65 patients were included, 24 with diaphragmatic paralysis, 13 with diaphragmatic weakness, and 28 controls. The CT thickness of the pillars in the case group (paralysis and weakness) of left dysfunctions (n=24) was significantly thinner at the level of L1 and the celiac artery compared with controls (2.0mm vs. 7.4mm and 1.8mm vs. 3.1mm, p<0.001 respectively), and significantly different for paralysis (and not weakness) when right dysfunction (n=15) (2.6mm vs. 7.4mm and 2.2mm vs. 3.8mm, p<0.001 respectively for paralysis vs controls). Whatever the side of dysfunction, there was a significant difference in diaphragmatic height between cases and controls (7.70cm vs. 1.16cm and 5.51cm vs. 1.16cm, p<0.001 right and left dysfunction respectively). The threshold values (ROC curve analyses) for height differences between the two domes in favor of paralysis or weakness on the right dysfunctions were 4.44cm and 3.51cm respectively; and 2.70cm and 2.48cm on the left dysfunctions respectively, with good performances. Conclusion:The thickness of the pillars on CT was thinner in left diaphragmatic dysfunction and in paralysis in right diaphragmatic dysfunction. An increase in the difference in the diaphragmatic height may strongly identify diaphragmatic dysfunction with precise thresholds.

The ophthalmic artery resistive index as a predictor of choriocapillaris ischemia in multivariate logistic models

Article

Full-text available

Jun 2023

Objective Vascular findings in preeclampsia are usually attributed to increased vascular tone. Recently, however, important studies have improved the understanding of the main pathophysiological events in this condition, especially vascular brain remodeling, impaired autoregulation, and damage of the blood-brain barrier, which are well recognized features of cerebral overperfusion. Methods In this study, the association between choriocapillaris ischemia with ophthalmic artery blood flow parameters on orbital Doppler ultrasound is reported for the first time using multivariate logistic models. Multivariate logistic models with ophthalmic artery blood flow parameters, as well as major clinical and laboratory predictive variables were established for choriocapillaris ischemia and choriocapillaris ischemia with retinal detachment. Results In a series of 165 patients, 46 (28%) presented choriocapillaris ischemia; among them, 20 (12%) presented associated retinal detachment. The ophthalmic artery resistive index was the main predictor for choriocapillaris ischemia and choriocapillaris ischemia with retinal detachment in multivariate logistic models. Ophthalmic artery resistance lower than 0.56 was associated with a significantly high incidence of both outcomes. Conclusion This study supports that the branching pattern of choroidal arterioles and the lobular organization of choriocapillaris are the major morphological aspects underlying endothelial damage and lobular ischemia in the context of choroidal overperfusion. Overperfused lobules bordering areas of choriocapillaris ischemia produce a perfusion pressure gradient, with lobular reperfusion, leakage from reperfused choriocapillaris, and retinal detachment. Ophthalmic artery-resistive index lower than 0.56 is proposed as a major predictor of the overperfusion-related choriocapillaris ischemia and choriocapillaris ischemia with retinal detachment in preeclampsia. Choroid; Retina; Ultrasonography, doppler; Pre-eclampsia; Endothelium; Ischemia

ANALYSIS OF RECEIVER OPERATING CHARACTERISTIC CURVE FOR BIOMARKERS AND LOWER EXTREMITIES AS PREDICTORS OF OSTEOARTHRITIS RISK

Article

Full-text available

May 2023

Background: This study aimed to develop a new diagnostic tool for identifying risk factors in knee osteoarthritis (KOA) by utilizing radiological images graded according to the Kellgren-Lawrence scale (KLS) and assessing abnormal clinical outcome measures, deranged lower extremities, and increased biochemical parameters such as 4-hydroxyproline and collagen oligomeric matrix protein (COMP). Methods: The study collected baseline data from 63 OA patients and 63 healthy controls, confirming the results with radiological imaging. Separate analyses were performed for participants with outcome measures, lower extremities, and biochemical parameters such as 4-hydroxyproline and COMP for those with and without OA. Results: The areas under the receiver operating characteristic curves of the studied outcome measures, lower extremities, biomarkers, and experimental cohorts were found to be within the range of 0.997-0.915, with ideal cutoff points revealing normal values that increased significantly (p<0.0001), indicating a successful diagnostic strategy. Conclusion: Monitoring these risk factors could help develop a cost-effective and safe diagnostic protocol for patients with acute KOA. The study suggests that the clinical and radiologic findings used to diagnose KOA are insufficiently sensitive to track the disease's development and that these risk factors could be useful in developing a better diagnostic protocol for KOA patients.

The importance of discriminative power rather than significance when evaluating potential clinical biomarkers in epilepsy research

Article

Full-text available

May 2023
EPILEPTIC DISORD

Objective The quest for epilepsy biomarkers is on the rise. Variables with statistically significant group‐level differences are often misinterpreted as biomarkers with sufficient discriminative power. This study aimed to demonstrate the relationship between significant group‐level differences and a variable's power to discriminate between individuals. Methods We simulated normal‐distributed datasets from hypothetical populations with varying sample sizes (25–800), effect sizes (Cohen's d: .25–2.50), and variability (standard deviation: 10–35) to assess the impact of these parameters on significance and discriminative power. The simulation data were illustrated by assessing the discriminative power of a potential real‐case biomarker—the EEG beta band power—to diagnose generalized epilepsy, using data from 66 children with generalized epilepsy and 385 controls. Additionally, we evaluated recently reported epilepsy biomarkers by comparing their effect sizes to our simulation‐derived effect size criterion. Results Group size affects significance but not discriminative power. Discriminative power is much more related to variability and effect size. Our real data example supported these simulation results by demonstrating that group‐level significance does not translate, one to one, into discriminative power. Although we found a significant difference in the beta band power between children with and without epilepsy, the discriminative power was poor due to a small effect size. A Cohen's d of at least 1.25 is required to reach good discriminative power in univariable prediction modeling. Slightly over 60% of the biomarkers in our literature search met this criterion. Significance Rather than statistical significance of group‐level differences, effect size should be used as an indicator of a variable's biomarker potential. The minimal required effects size for individual biomarkers—a Cohen's d of 1.25—is large. This calls for multivariable approaches, in which combining multiple variables with smaller effect sizes could increase the overall effect size and discriminative power.

Comparison of Automated Thresholding Algorithms in Optical Coherence Tomography Angiography Image Analysis

Article

Full-text available

Mar 2023

(1) Background: Calculation of vessel density in optical coherence tomography angiography (OCTA) images with thresholding algorithms varies in clinical routine. The ability to discriminate healthy from diseased eyes based on perfusion of the posterior pole is critical and may depend on the algorithm applied. This study assessed comparability, reliability, and ability in the discrimination of commonly used automated thresholding algorithms. (2) Methods: Vessel density in full retina and choriocapillaris slabs were calculated with five previously published automated thresholding algorithms (Default, Huang, ISODATA, Mean, and Otsu) for healthy and diseased eyes. The algorithms were investigated with LD-F2-analysis for intra-algorithm reliability, agreement, and the ability to discriminate between physiological and pathological conditions. (3) Results: LD-F2-analyses revealed significant differences in estimated vessel densities for the algorithms (p < 0.001). For full retina and choriocapillaris slabs, intra-algorithm values range from excellent to poor, depending on the applied algorithm; the inter-algorithm agreement was low. Discrimination was good for the full retina slabs, but poor when applied to the choriocapillaris slabs. The Mean algorithm demonstrated an overall good performance. (4) Conclusions: Automated threshold algorithms are not interchangeable. The ability for discrimination depends on the analyzed layer. Concerning the full retina slab, all of the five evaluated automated algorithms had an overall good ability for discrimination. When analyzing the choriocapillaris, it might be useful to consider another algorithm.

Ranking Mineral Exploration Targets in Support of Commercial Decision Making: A Key Component for Inclusion in an Exploration Information System.

Article

Full-text available

Apr 2024
APPL GEOCHEM

Hybridizing mechanistic mathematical modeling with deep learning methods to predict individual cancer patient survival after immune checkpoint inhibitor therapy

Preprint

Full-text available

Mar 2024

We present a study where predictive mechanistic modeling is used in combination with deep learning methods to predict individual patient survival probabilities under immune checkpoint inhibitor (ICI) therapy. This hybrid approach enables prediction based on both measures that are calculable from mechanistic models (but may not be directly measurable in the clinic) and easily measurable quantities or characteristics (that are not always readily incorporated into predictive mechanistic models). The mechanistic model we have applied here can predict tumor response from CT or MRI imaging based on key mechanisms underlying checkpoint inhibitor therapy, and in the present work, its parameters were combined with readily-available clinical measures from 93 patients into a hybrid training set for a deep learning time-to-event predictive model. Analysis revealed that training an artificial neural network with both mechanistic modeling-derived and clinical measures achieved higher per-patient predictive accuracy based on event-time concordance, Brier score, and negative binomial log-likelihood-based criteria than when only mechanistic model-derived values or only clinical data were used. Feature importance analysis revealed that both clinical and model-derived parameters play prominent roles in neural network decision making, and in increasing prediction accuracy, further supporting the advantage of our hybrid approach. We anticipate that many existing mechanistic models may be hybridized with deep learning methods in a similar manner to improve predictive accuracy through addition of additional data that may not be readily implemented in mechanistic descriptions.

The Impact of Clinical Features in Radiomics of CT Non-Small Cell Lung Cancer

Article

Jan 2023

Gary G

Purpose: To investigate the impact of clinical features on model performance in CT-based Non-Small Cell Lung Cancer (NSCLC) and the potential uncertainty regarding their application in machine learning. Methods: Clinical and radiomic features were retrospectively retrieved from EMR and CT images of 496 NSCLC patients. Five feature datasets were constructed: radiomic features-only (Rad), clinical features-only (Clin), shape features-only (Shape), radiomic and clinical features (RaClin), shape and clinical features (ShClin). Five feature selection methods and seven predictive models, along with different cohort sizes, number of input features and validation methods were included for the uncertainty analysis, with two-year survival as the study endpoint. AUC values were calculated for comparisons and Kruskal-Wallis testing was performed to determine significant differences. Results: A total of 19740 distinct combinations of feature sets, feature selection methods, predictive models, cohort sizes and validation techniques are examined. Of those, 25 combinations produce an AUC > 0.7. The clinical-only feature dataset generally outperforms both the radiomic-only feature dataset and the hybrid (clinical and radiomic) feature dataset (P<0.01), which is primarily determined by the endpoint. The combination of different feature selection methods and predictive models, along with the variations in cohort size, number of input features and validation methods generate inconsistent results. Conclusion: Clinical features are a source of data that can improve machine learning model performance. However, its impact strongly depends on various factors that may lead to inconsistent results. A clear approach to incorporate clinical features to generate reliable results requires further investigation.

Cobalt Prospectivity Using a Conceptual Fuzzy Logic Overlay Method Enhanced with the Mineral Systems Approach

Article

Full-text available

Aug 2023
Nat Resour Res

This paper describes mineral prospectivity research conducted in Finland to predict favorable areas for cobalt exploration using the “fuzzy logic overlay” method in a GIS platform and public geodata of the Geological Survey of Finland. Cobalt occurs infrequently as a core product in mineral deposits. Therefore, we decided to construct separate conceptual mineral prospectivity models within the Northern Fennoscandian Shield, Finland, for four deposit types: (1) “ Orthomagmatic Ni–Cu–Co sulfide deposits, ” (2) “ Outokumpu-type mantle peridotite-associated volcanogenic massive sulfide (VMS)-style Cu–Co–Zn–Ni–Ag–Au deposits, ” (3) “ Talvivaara black shale-hosted Ni–Zn–Cu–Co-type deposits” and (4) “Kuusamo-type (orogenic gold with atypical metal association) Au–Co–Cu–U–LREE deposits ”. In addition, we created a model combining till geochemical data with data derived from bedrock drilling and mineral indications, including boulders and outcrops. The mineral prospectivity models were statistically tested with the “ receiver operating characteristics ” method using exploration drilling data from known mineral deposits as validation sites. In addition, the predictive performance of the models was evaluated by using success rate curves, where the number of previously identified deposits was compared with the area coverage of the predicted highly favorable areas. These results indicate that the knowledge-driven mineral prospectivity method using parameters derived from mineral systems models is effective in defining favorable exploration target areas at the regional scale. This study's innovation lies in its comprehension of the process of evaluating mineral prospectivity when the commodity of interest is not the primary commodity within the mineral system.

Accuracy of smartphone camera urine photo colorimetry as indicators of dehydration

Article

Full-text available

Aug 2023

Objective Direct urine color assessment has been shown to correlate with hydration status. However, this method is subject to inter- and intra-observer variability. Digital image colorimetry provides a more objective method. This study evaluated the diagnostic accuracy of urine photo colorimetry using different smartphones under different lighting conditions, and determined the optimal cut-off value to predict clinical dehydration. Methods The urine samples were photographed in a customized photo box, under five simulated lighting conditions, using five smartphones. The images were analyzed using Adobe Photoshop to obtain Red, Green, and Blue (RGB) values. The correlation between RGB values and urine laboratory parameters were determined. The optimal cut-off value to predict dehydration was determined using area under the receiver operating characteristic curve. Results A total of 56 patients were included in the data analysis. Images captured using five different smartphones under five lighting conditions produced a dataset of 1400 images. The study found a statistically significant correlation between Blue and Green values with urine osmolality, sodium, urine specific gravity, protein, and ketones. The diagnostic accuracy of the Blue value for predicting dehydration were “good” to “excellent” across all phones under all lighting conditions with sensitivity >90% at cut-off Blue value of 170. Conclusions Smartphone-based urine colorimetry is a highly sensitive tool in predicting dehydration.

Simulation Evidence of Trust Calibration: Using POMDP with Signal Detection Theory to Adapt Agent Features for Optimised Task Outcome During Human-Agent Collaboration

Article

Full-text available

Aug 2023

Appropriately calibrated human trust is essential for successful Human-Agent collaboration. Probabilistic frameworks using a partially observable Markov decision process (POMDP) have been previously employed to model the trust dynamics of human behavior, optimising the outcomes of a task completed with a collaborative recommender system. A POMDP model utilising signal detection theory to account for latent user trust is presented, with the model working to calibrate user trust via the implementation of three distinct agent features: disclaimer message, request for additional information, and no additional feature. A simulation experiment is run to investigate the efficacy of the proposed POMDP model compared against a random feature model and a control model. Evidence demonstrates that the proposed POMDP model can appropriately adapt agent features in-task based on human trust belief estimates in order to achieve trust calibration. Specifically, task accuracy is highest with the POMDP model, followed by the control and then the random model. This emphasises the importance of trust calibration, as agents that lack considered design to implement features in an appropriate way can be more detrimental to task outcome compared to an agent with no additional features.

A Regulatory Science Perspective on Performance Assessment of Machine Learning Algorithms in Imaging

Chapter

Full-text available

Jul 2023

This chapter presents a regulatory science perspective on the assessment of machine learning algorithms in diagnostic imaging applications. Most of the topics are generally applicable to many medical imaging applications, while brain disease-specific examples are provided when possible. The chapter begins with an overview of US FDA’s regulatory framework followed by assessment methodologies related to ML devices in medical imaging. Rationale, methods, and issues are discussed for the study design and data collection, the algorithm documentation, and the reference standard. Finally, study design and statistical analysis methods are overviewed for the assessment of standalone performance of ML algorithms as well as their impact on clinicians (i.e., reader studies). We believe that assessment methodologies and regulatory science play a critical role in fully realizing the great potential of ML in medical imaging, in facilitating ML device innovation, and in accelerating the translation of these technologies from bench to bedside to the benefit of patients.

How to Determine If One Diagnostic Method, Such as an Artificial Intelligence Model, is Superior to Another: Beyond Performance Metrics

Article

Full-text available

Jul 2023

Calibration of the PREdiction of DELIRium in ICu Patients (PRE-DELIRIC) Score in a Cohort of Critically Ill Patients: A Retrospective Cohort Study

Article

Jul 2023
Dimens Crit Care Nurs

Background: To predict delirium in intensive care unit (ICU) patients, the Prediction of Delirium in ICU Patients (PRE-DELIRIC) score may be used. This model may help nurses to predict delirium in high-risk ICU patients. Objectives: The aims of this study were to externally validate the PRE-DELIRIC model and to identify predictive factors and outcomes for ICU delirium. Method: All patients underwent delirium risk assessment by the PRE-DELIRIC model at admission. We used the Intensive Care Delirium Screening Check List to identify patients with delirium. The receiver operating characteristic curve measured discrimination capacity among patients with or without ICU delirium. Calibration ability was determined by slope and intercept. Results: The prevalence of ICU delirium was 55.8%. Discrimination capacity (Intensive Care Delirium Screening Check List score ≥4) expressed by the area under the receiver operating characteristic curve was 0.81 (95% confidence interval, 0.75-0.88), whereas sensitivity was 91.3% and specificity was 64.4%. The best cut-off was 27%, obtained by the max Youden index. Calibration of the model was adequate, with a slope of 1.03 and intercept of 8.14. The onset of ICU delirium was associated with an increase in ICU length of stay (P < .0001), higher ICU mortality (P = .008), increased duration of mechanical ventilation (P < .0001), and more prolonged respiratory weaning (P < .0001) compared with patients without delirium. Discussion: The PRE-DELIRIC score is a sensitive measure that may be useful in early detection of patients at high risk for developing delirium. The baseline PRE-DELIRIC score could be useful to trigger use of standardized protocols, including nonpharmacologic interventions.

The Prognostic Significance of Nomogram-Based Pretreatment Inflammatory Indicators in Patients With Esophageal Squamous Cell Carcinoma Receiving Intensity-Modulated Radiotherapy

Article

Full-text available

Jun 2023

Background At present, there is no objective prognostic index available for patients with esophageal squamous cell carcinoma (ESCC) who underwent intensity-modulated radiotherapy (IMRT). This study is to develop a nomogram based on hematologic inflammatory indices for ESCC patients treated with IMRT. Methods 581 patients with ESCC receiving definitive IMRT were enrolled in our retrospective study. Of which, 434 patients with treatment-naïve ESCC in Fujian Cancer Hospital were defined as the training cohort. Additional 147 newly diagnosed ESCC patients were used as the validation cohort. Independent predictors of overall survival (OS) were employed to establish a nomogram model. The predictive ability was evaluated by time-dependent receiver operating characteristic curves, the concordance index (C-index), net reclassification index (NRI), and integrated discrimination improvement (IDI). Decision curve analysis (DCA) was performed to assess the clinical benefits of the nomogram model. The entire series was divided into 3 risk subgroups stratified by the total nomogram scores. Results Clinical TNM staging, primary gross tumor volume, chemotherapy, neutrophil-to-lymphocyte ratio and platelet lymphocyte ratio were independent predictors of OS. Nomogram was developed incorporating these factors. Compared with the 8th American Joint Committee on Cancer (AJCC) staging, the C-index for 5-year OS (.627 and .629) and the AUC value of 5-year OS (.706 and .719) in the training and validation cohorts (respectively) were superior. Furthermore, the nomogram model presented higher NRI and IDI. DCA also demonstrated that the nomogram model provided greater clinical benefit. Finally, patients with <84.8, 84.8-151.4, and >151.4 points were categorized into low-risk, intermediate-risk, and high-risk groups. Their 5-year OS rates were 44.0%, 23.6%, and 8.9%, respectively. The C-index was .625, which was higher than the 8 th AJCC staging. Conclusions We have developed a nomogram model that enables risk-stratification of patients with ESCC receiving definitive IMRT. Our findings may serve as a reference for personalized treatment.

Clinical Risk Scores to Predict Trimethoprim-Sulfamethoxazole, Fluoroquinolone, Nitrofurantoin, and Third-Generation Cephalosporin Non-Susceptibility among Outpatient Episodes of Complicated Urinary Tract Infections among Adults

Article

Full-text available

Jun 2023

Background Clinical risk scores were developed to estimate the risk of adult outpatients having a complicated urinary tract infection (cUTI) that was non-susceptible to trimethoprim-sulfamethoxazole (TMP-SMX), fluoroquinolone, nitrofurantoin, or third-generation cephalosporins (3-GC) based on variables available on clinical presentation. Methods A retrospective cohort study (12/1/2017-12/31/2020) was performed among adult Kaiser Permanente Southern California members with an outpatient cUTI. Separate risk scores were developed for TMP-SMX, fluoroquinolone, nitrofurantoin, and 3-GC. The models were translated into risk scores to quantify the likelihood of non-susceptibility based on the presence of final model covariates in a given cUTI outpatient. Results A total of 30,450 cUTIs (26,326 patients) met the study criteria. Non-susceptibility to TMP-SMX, fluoroquinolones, nitrofurantoin, and 3-GC were 37%, 20%, 27%, and 24%, respectively. Receipt of prior antibiotics was the most important predictor across all models. The risk of non-susceptibility in the TMP-SMX model exceeded 20% in the absence of any risk factors, suggesting that empiric use of TMP-SMX may not be advisable. For fluoroquinolone, nitrofurantoin, or 3-GC, clinical risk scores of 10, 7, and 11, respectively, predicted a ≥20% estimated probability of non-susceptibility in the models that included cumulative number of prior antibiotics at model entry. This finding suggests caution should be used when considering these agents empirically in patients who have several risk factors present in a given model(s) at presentation. Conclusions We developed high-performing parsimonious risk scores to facilitate empiric treatment selection for adult cUTI outpatients in the critical period between infection presentation and availability of susceptibility results.

Systematic profiling of alternative splicing of ZNF family in Colorectal cancer

Preprint

Full-text available

May 2023

Backgrounds: Colorectal cancer (CRC) is a global health issue that requires innovative prognostic signatures to improve patient outcomes. Alternative splicing (AS) of RNA is a crucial modification process involved in cancer progression, and zinc finger proteins (ZNFs), the largest family of DNA binding proteins, have been implicated in various aspects of cancer development. However, the role of ZNF AS events in cancer remains poorly understood. Methods: To address this, we investigated the relationship between ZNF AS and CRC development using clinical samples and bioinformatics approaches to identify a prognostic signature. Results: We identified 227 differentially expressed genes (DEGs) and 98 survival-related genes among ZNFs. We also identified 29 differentially expressed AS (DEAS) events and 93 survival-related AS events in CRC patients. Using these results, we developed a thirteen-AS signature that showed excellent predictive ability, with a 3-year area under the receiver operating characteristic (ROC) curve (AUC) value of 0.80, outperforming the commonly used tumor-node-metastasis (TNM) staging-based model (AUC = 0.73). Gene Set Enrichment Analysis (GSEA) showed that the risk score of our model was associated with various cancer-related pathways, including PI3K AKT MTOR, CELL CYCLE, APOPTOSIS, and more. We also validated our findings through qPCR and explored the correlations between splicing factors (SFs) and DEAS events. Conclusions: Our study provides new insights into the role of ZNFs in cancer and highlights their potential as prognostic biomarkers for CRC progression.

Measuring severe obesity in pediatrics using body mass index-derived metrics from the Centers for Disease Control and Prevention and World Health Organization: a secondary analysis of CANadian Pediatric Weight management Registry (CANPWR) data

Article

Full-text available

Jun 2023
Eur J Pediatr

To examine the (i) relationships between various body mass index (BMI)-derived metrics for measuring severe obesity (SO) over time based the Centers for Disease Control and Prevention (CDC) and World Health Organization (WHO) references and (ii) ability of these metrics to discriminate children and adolescents based on the presence of cardiometabolic risk factors. In this cohort study completed from 2013 to 2021, we examined data from 3- to 18-year-olds enrolled in the CANadian Pediatric Weight management Registry. Anthropometric data were used to create nine BMI-derived metrics based on the CDC and WHO references. Cardiometabolic risk factors were examined, including dysglycemia, dyslipidemia, and elevated blood pressure. Analyses included Pearson correlations, intraclass correlation coefficients (ICC), and receiver operator characteristic area-under-the-curve (ROC AUC). Our sample included 1,288 participants (n = 666 [52%] girls; n = 874 [68%] white). The prevalence of SO varied from 60–67%, depending on the definition. Most BMI-derived metrics were positively and significantly related to one another (r = 0.45–1.00); ICCs revealed high tracking (0.90–0.94). ROC AUC analyses showed CDC and WHO metrics had a modest ability to discriminate the presence of cardiometabolic risk factors, which improved slightly with increasing numbers of risk factors. Overall, most BMI-derived metrics rated poorly in identifying presence of cardiometabolic risk factors. Conclusion: CDC BMI percent of the 95th percentile and WHO BMIz performed similarly as measures of SO, although neither showed particularly impressive discrimination. They appear to be interchangeable in clinical care and research in pediatrics, but there is a need for a universal standard. WHO BMIz may be useful for clinicians and researchers from countries that recommend using the WHO growth reference.What is Known: • Severe obesity in pediatrics is a global health issue. • Few reports have evaluated body mass index (BMI)-derived metrics based on the World Health Organization growth reference. What is New: • Our analyses showed that the Centers for Disease Control and Prevention BMI percent of the 95thpercentile and World Health Organization (WHO) BMI z-score (BMIz) performed similarly as measures of severe obesity in pediatrics. • WHO BMIz should be a useful metric to measure severe obesity for clinicians and researchers from countries that recommend using the WHO growth reference.

Current state of radiomics in pediatric neuro-oncology practice: a systematic review

Article

Full-text available

May 2023
PEDIATR RADIOL

Background Radiomics is the process of converting radiological images into high-dimensional data that may be used to cre- ate machine learning models capable of predicting clinical outcomes, such as disease progression, treatment response and survival. Pediatric central nervous system (CNS) tumors differ from adult CNS tumors in terms of their tissue morphology, molecular subtype and textural features. We set out to appraise the current impact of this technology in clinical pediatric neuro-oncology practice. Objectives The aims of the study were to assess radiomics’ current impact and potential utility in pediatric neuro-oncology practice; to evaluate the accuracy of radiomics-based machine learning models and compare this to the current standard which is stereotactic brain biopsy; and finally, to identify the current limitations of radiomics applications in pediatric neuro-oncology. Materials and methods Following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) standards, a systematic review of the literature was carried out with protocol number CRD42022372485 in the prospective register of systematic reviews (PROSPERO). We performed a systematic literature search via PubMed, Embase, Web of Science and Google Scholar. Studies involving CNS tumors, studies that utilized radiomics and studies involving pediatric patients (age<18 years) were included. Several parameters were collected including imaging modality, sample size, image segmentation technique, machine learning model used, tumor type, radiomics utility, model accuracy, radiomics quality score and reported limitations. Results The study included a total of 17 articles that underwent full-text review, after excluding duplicates, conference abstracts and studies that did not meet the inclusion criteria. The most commonly used machine learning models were support vector machines (n=7) and random forests (n=6), with an area under the curve (AUC) range of 0.60–0.94. The included studies investigated several pediatric CNS tumors, with ependymoma and medulloblastoma being the most frequently studied. Radiomics was primarily used for lesion identification, molecular subtyping, survival prognostica- tion and metastasis prediction in pediatric neuro-oncology. The low sample size of studies was a commonly reported limitation. Conclusion The current state of radiomics in pediatric neuro-oncology is promising, in terms of distinguishing between tumor types; however, its utility in response assessment requires further evaluation which, given the relatively low number of pediatric tumors, calls for multicenter collaboration.

Development and internal validation of the survival time risk score in patients treated for oesophageal cancer with palliative intent in South Africa

Article

Full-text available

Feb 2023
S AFR J SURG

Background: Most patients who present to South African state hospitals with advanced stage oesophageal squamous cell cancer (OSCC) disease receive palliative treatment. This study aimed to assess the factors that influence survival in patients with OSCC who received palliative management and to develop a prognostic score to aid clinicians in decision-making. Methods: Analysis of a prospectively collected database assessed factors influencing survival of patients diagnosed with OSCC receiving palliative treatment. Factors assessed included patient demographics, clinical and laboratory data and tumour factors. A multivariable logistic regression model was used to assess for significant factors associated with survival time and a prognostic score was developed and internally validated based on these factors. Results: There were 384 patients with a male-to-female ratio of 1.3:1. The median survival of the cohort was 3.7 months. Factors that influenced survival on multivariate analysis included area of residence (aOR 1.82, 95% CI 1.02–3.24), performance status (aOR 2.56, 95% CI 1.50–4.35), body mass index (aOR 1.87, 95% CI 1.14–3.06) and serum albumin (aOR 3.06, 95% CI 1.46–6.42). The final prognostic score contained three of the four independent variables based on the regression coefficient for each variable. After internal validation, the risk score maintained fair discrimination and good calibration. Conclusion: The prognostic scoring system based on patient performance status, body mass index and serum albumin, if validated on an independent cohort, would allow more objective decisions on whether to stage or not prior to embarking on palliative treatment, streamlining care and improving quality of life.

Application of Drug Efficiency Index Metric for Analysis of Post-Traumatic Stress Disorder and Treatment Resistant Depression Gene Expression Profiles

Article

Full-text available

Mar 2023

Post-traumatic stress disorder (PTSD) is a severe mental illness with grave social, political, economic , and humanitarian implications. To apply the principles of personalized omics-based medicine to this psychiatric problem, we implemented our previously introduced drug efficiency index (DEI) to the PTSD gene expression datasets. Generally, omics-based personalized medicine evaluates individual drug action using two classes of data: (1) gene expression, mutation, and Big Data profiles, and (2) molecular pathway graphs that reflect the protein-protein interaction. In the particular case of the DEI metric, we evaluate the drug action according to the drug's ability to restore healthy (control) activation levels of molecular pathways. We have curated five PTSD and one TRD (treatment-resistant depression) cohorts of next-generation sequencing (NGS) and microarray hybridization (MH) gene expression profiles, which, in total, comprise 791 samples, including 379 cases and 413 controls. To check the applicability of our DEI metrics, we have performed three differential studies with gene expression and pathway activation data: (1) case samples vs. control samples, (2) case samples after treatment or/and observation vs. before treatment, and (3) samples from patients positively responding to the treatment vs. those responding negatively or non-responding patients. We found that the DEI values that use the signaling pathway impact activation (SPIA) metric were better than those that used the Oncobox pathway activation level (Oncobox PAL) approach. However, SPIA, Oncobox PAL, and DEI evaluations were reliable only if there were differential genes between case and control, or treated and untreated, samples.

Validation of circulating microRNAs miR-142-3p and miR-598-3p in women with polycystic ovary syndrome as potential diagnostic markers

Article

Mar 2023
HUM REPROD

STUDY QUESTION Circulating miRNAs previously associated with androgen excess in women might be used as diagnostic biomarkers for polycystic ovary syndrome (PCOS). SUMMARY ANSWER Models based on circulating miR-142-3p and miR-598-3p expression show good discrimination among women with and without PCOS, particularly when coupled with easily available measurements such as waist-to-hip ratio (WHR) and circulating LH-to-FSH (LH/FSH) ratios. WHAT IS KNOWN ALREADY The lack of standardization of the signs, methods, and threshold values used to establish the presence of the diagnostic criteria (hyperandrogenism, ovulatory dysfunction, and polycystic ovarian morphology) complicates the diagnosis of PCOS. Certain biomarkers may help with such a diagnosis. We conducted a validation study to check the diagnostic accuracy for PCOS of several miRNAs that were associated with the syndrome in a small pilot study that had been previously carried out by our research group. STUDY DESIGN, SIZE, DURATION This was a diagnostic test study involving 140 premenopausal women. PARTICIPANTS/MATERIALS, SETTING, METHODS We included 71 women with PCOS and 69 healthy control women in the study. Both groups were selected as to be similar in terms of body mass index. We used miRCURY LNA™ Universal RT microRNA PCR to analyse the five miRNAs that had shown the strongest associations with PCOS in a much smaller pilot study previously conducted by our group. We studied diagnostic accuracy using receiver operating characteristics (ROC) curve analysis. MAIN RESULTS AND THE ROLE OF CHANCE Only the expression of two miRNAs, miR-142-3p and miR-598-3p, of the five studied, was different between the women with PCOS and the non-hyperandrogenic controls. The diagnostic accuracy of the combination of these circulating miRNAs was good (area under the ROC curve (AUC) 0.801; 95% CI: 0.72–0.88) and was further improved when adding WHR (AUC 0.834, 95% CI: 0.756–0.912), LH/FSH ratio (AUC = 0.869, 95% CI: 0.804–0.934) or both (AUC = 0.895, 95% CI: 0.835–0.954). We developed several models by selecting different threshold values for these variables favouring either sensitivity or specificity, with positive and negative predictive values as high as 88% or 85%, respectively. LIMITATIONS, REASONS FOR CAUTION Patients included here had the classic PCOS phenotype, consisting of hyperandrogenism and ovulatory dysfunction; hence, the present results might not apply to milder phenotypes lacking androgen excess. WIDER IMPLICATIONS OF THE FINDINGS If confirmed in larger studies addressing different populations and PCOS phenotypes, these biomarkers may be useful to simplify the clinical diagnosis of this prevalent syndrome.

Predicting Overall Survival with Deep Learning from 18F-FDG PET-CT Images in Patients with Hepatocellular Carcinoma before Liver Transplantation

Article

Full-text available

Mar 2023

Positron emission tomography and computed tomography with 18F-fluorodeoxyglucose (18F-FDG PET-CT) were used to predict outcomes after liver transplantation in patients with hepatocellular carcinoma (HCC). However, few approaches for prediction based on 18F-FDG PET-CT images that leverage automatic liver segmentation and deep learning were proposed. This study evaluated the performance of deep learning from 18F-FDG PET-CT images to predict overall survival in HCC patients before liver transplantation (LT). We retrospectively included 304 patients with HCC who underwent 18F-FDG PET/CT before LT between January 2010 and December 2016. The hepatic areas of 273 of the patients were segmented by software, while the other 31 were delineated manually. We analyzed the predictive value of the deep learning model from both FDG PET/CT images and CT images alone. The results of the developed prognostic model were obtained by combining FDG PET-CT images and combining FDG CT images (0.807 AUC vs. 0.743 AUC). The model based on FDG PET-CT images achieved somewhat better sensitivity than the model based on CT images alone (0.571 SEN vs. 0.432 SEN). Automatic liver segmentation from 18F-FDG PET-CT images is feasible and can be utilized to train deep-learning models. The proposed predictive tool can effectively determine prognosis (i.e., overall survival) and, thereby, select an optimal candidate of LT for patients with HCC.

Prediction Model for Future Success of Early Orthopedic Treatment of Class III Malocclusion

Article

Full-text available

Feb 2023

This study aimed to identify predictors for successful post-treatment outcomes in early orthopedic class III malocclusion treatment with a facemask and hyrax expander appliance. The study was performed on lateral cephalograms from 37 patients at the start of treatment (T0), post-treatment (T1), and a minimum of three years after treatment (T2). The patients were grouped as stable or unstable according to the existence of a 2-mm overjet at T2. For statistical analysis, independent t-tests were used to compare the baseline characteristics and measurements of the two groups, considering a significance level of < 0.05. Thirty variables of pretreatment cephalograms were considered during logistic regression analysis to identify predictors. A discriminant equation was established using a stepwise method. The success rate and area under the curve were calculated, with AB to the mandibular plane, ANB, ODI, APDI, and A–B plane angles as predictors. The A–B plane angle was the most significantly different between the stable and unstable groups. In terms of the A–B plane angle, the success rate of early class III treatment with a facemask and hyrax expander appliance was 70.3%, and the area under the curve indicated a fair grade.

Mediastinal lymphnode positivity clinical scoring system for lung adenocarsinoma-mediastinal lymph node evaluation and staging

Article

Full-text available

May 2022

Aim: The study-cohort aims to assess PET-CT's correlation with adenocarcinomas' subtypes and propose a scoring system for mediastinal lymph nodes staging. Material and Method: The patient cohort is a multicenter, retrospective analysis of 268 patient that underwent surgery for NSCLC adenocarcinoma. Preoperative PET-CT results for mediastinal lymph node staging was pathologically confirmed on tissue specimens obtained at anatomical resection. Statistical evaluation of PET CT, radiological and pathological outcomes were performed on all subgroups. Results: The low FDG affinity in the lepidic pattern was statistically significant in the study (p

Establishment of Confidence Thresholds for Interactive Voice Response Systems Using ROC Analysis

Article

Jun 2014

oredola A. Soluade

Incremental deep learning training approach for lesion detection and classification in mammograms

Article

Dec 2022

Recently, Deep Convolutional Neural Networks (DCNNs) have opened their ways into various medical image processing practices such as Computer-Aided Diagnosis (CAD) systems. Despite significant developments in CAD systems based on deep models, designing an efficient model, as well as a training strategy to cope with the shortage of medical images have yet to be addressed. To address current challenges, this paper presents a model including a hybrid DCNN, which takes advantage of various feature maps of different deep models and an incremental training algorithm. Also, a weighting Test Time Augmentation strategy is presented. Besides, the proposed work develops the Mask-RCNN to not only detect mass and calcification in mammography images, but also to classify normal images. Moreover, this work aims to benefit from a radiology specialist to compare with the performance of the proposed method. Illustrating the region of interest to explain how the model makes decisions is the other aim of the study to cover existing challenges among the stateof-the-art research works. The wide range of conducted quantitative and qualitative experiments suggest that the proposed method can classify breast X-ray images of the INbreast dataset to normal, mass, and calcification.

The traditional Chinese version of the Geriatric Anxiety Inventory: Psychometric properties and cutoff point for detecting anxiety

Article

Jun 2024

Single Apheresis Session on the 4th Day of Granulocyte Colony-Stimulating Factor Administration Seems Convenient to Collect Enough Peripheral Blood Stem Cells from Healthy Donors

Article

Jun 2024

b> Background: To minimize adverse events of peripheral blood stem cell (PBSC) collection in healthy donors, it is reasonable to limit the total dose of granulocyte colony-stimulating factor (G-CSF) and/or the number of apheresis days without decreasing of PBSCs yield. Therefore, we have started to collect G-CSF induced PBSCs on day 4 instead of on day 5. So, we retrospectively aimed to investigate the results of this 4-day G-CSF administration. Study Design and Methods: Seventy-six healthy donors who performed on G-CSF induced PBSCs donation consecutively between January 2020 and July 2022 were included in this study. G-CSF (filgrastim) at 2 × 5 µg/kg/day subcutaneously was applied. Apheresis started on day 4. Results: Sixty-nine (90.8%) of 76 donors provided enough PBSCs on day 4 apheresis session. Younger age ( p = 0.004), higher PB CD34+ cell count on the 4th day of G-CSF ( p < 0.001), and male donor ( p = 0.010) were correlated with increased amounts of PBSCs yield. Univariate and multivariate logistic regression analyses to predict very good mobilizers (collected PBSCs ≥8 × 10<sup>6</sup>/kg after the first apheresis) were performed. In multivariate logistic regression analyses, male sex ( p = 0.004), PB CD34+ cell count ≥100/µL on the 4th day of G-CSF ( p < 0.001), and glomerular filtration rate ≥115 mL/min ( p = 0.031) were found to be independent predicting factors to demonstrate very good mobilizer. Conclusion: It seems that starting the apheresis on the 4th day of G-CSF administration is effective and to provide minimal G-CSF exposure in healthy donors.

Predictive modelling on the effects of the critical parameters in grain storage systems: A case study in the Philippines

Article

Jun 2024
J STORED PROD RES

Statistical Methods for Dynamic Disease Screening and Spatio-Temporal Disease Surveillance

Book

May 2024

Peihua Qiu

Preface Disease early detection and prevention offer numerous benefits to both our health and society. Often, the earlier a disease is detected, the higher the likelihood of successful cure or management. Managing a disease in its early stages can significantly reduce its impact on a patient’s quality of life and decrease healthcare costs. To detect a disease early, disease screening has become a popular tool. This method aims to determine the likelihood of a given patient having a particular disease by applying medical procedures or tests to check the major risk factors, even in patients without obvious symptoms of the disease. While disease screening primarily focuses on individual patients, disease surveillance is for detecting disease outbreaks early within a given population. For example, our society faces constant threats from bioterrorist attacks and pandemic influenza. It is thus important to monitor the incidence of infectious diseases continuously and detect their outbreaks promptly. This allows governments and individuals to implement timely disease control and prevention measures, minimizing the impact of these diseases. This book introduces some recent analytic methodologies and software packages developed for effective disease screening and disease surveillance. My exploration into disease screening was motivated by an experience around 2010 when I analyzed a dataset from the Framingham Heart Study (FHS). The FHS primarily aims to identify major risk factors for cardiovascular diseases (CVDs), and numerous CVD risk factors have been recognized since the study's inception in 1948, including smoking, high blood pressure, obesity, high cholesterol levels, physical inactivity, and more. During my data analysis, a pivotal question emerged: Could the identified CVD risk factors be utilized to predict the likelihood of a severe CVD, such as stroke, for individual patients? Statistically, this translates into a sequential decision-making problem, where the relevant statistical tool is the statistical process control (SPC) charts. However, traditional SPC charts, developed primarily for monitoring production lines in manufacturing, assume independence and identical distribution of process observations when the process is in-control (IC), and are designed for monitoring a single sequential process. In the context of disease screening, observed data of a patient's disease risk factors would rarely be independent and identically distributed over time and treating a patient's observed data as a process introduces numerous processes of different patients, making traditional SPC charts unsuitable to use. Recognizing the importance of the disease screening problem, I dedicated much of the past decade to addressing this issue. This endeavor led to the development of a series of new concepts and methods by my research team. The central methodology, termed the Dynamic Screening System (DySS), operates as follows: firstly, the regular longitudinal pattern of disease risk factors is estimated from a pre-collected dataset representing the population without the target disease. Subsequently, a patient's observed pattern of disease risk factors is cross-sectionally compared with the estimated regular longitudinal pattern at each observation time. The cumulative difference between the two patterns up to the current time is then employed to determine the patient's disease status at that time. DySS utilizes all historical data of the patient in its decision-making, and effectively accommodates the complex data structure, including time-varying data distribution. In the summer of 2013, upon joining the University of Florida (UF), I started to work on the pressing issue of disease surveillance due to its paramount importance in public health. Disease incidence data are typically collected sequentially over time and across multiple locations or regions, constituting spatio-temporal data. Similar to disease screening, disease surveillance is a sequential decision-making problem. However, its complexity arises from the intricate spatio-temporal data structure, encompassing seasonality, temporal/spatial variation, data correlation, and intricate data distribution. Common disease reporting and surveillance systems incorporate conventional SPC charts such as the cumulative sum (CUSUM) and exponentially weighted moving average (EWMA) charts. Additionally, retrospective methods like scan tests and generalized linear modeling approaches are employed for routine surveillance. Unfortunately, these methods often prove ineffective or unreliable due to their inability to handle the sequential nature of the problem or their restrictive model assumptions (cf., Section 2.7 and Chapters 7 and 8). Over the past decade, my research team has devoted significant effort to this domain, resulting in the development of several novel analytic methods for disease surveillance. Our initial method operates as follows: First, a nonparametric spatio-temporal modeling approach is employed to estimate the regular spatio-temporal pattern of disease incidence rates from observed data in a baseline time interval (e.g., a previous year without outbreaks). Second, the new spatial data collected at the current time are compared with the estimated regular pattern and decorrelated with all previous data. Third, an SPC chart is then applied to the decorrelated data to determine the occurrence of a disease outbreak by the current time. Modified versions of this method have been crafted to incorporate covariate information and accommodate specific spatial features of disease outbreaks. These methods adeptly handle the complex structure of observed data and have demonstrated effectiveness in disease surveillance. As discussed earlier, both disease screening and disease surveillance pose challenges as sequential decision-making problems, and traditional SPC charts prove unreliable in addressing them adequately. Consequently, disease screening and disease surveillance emerge as crucial applications of SPC, demanding the development of new methods tailored to their specific requirements. Fortuitously, my research journey in SPC began in 1998, allowing me to contribute significantly to several key areas within the field. Notable contributions include advancements in nonparametric process monitoring (e.g., Qiu and Hawkins 2001, Qiu 2018), monitoring correlated data (e.g., Qiu et al. 2020a, Xue and Qiu 2021), dynamic process monitoring (e.g., Qiu and Xiang 2014, Xie and Qiu 2023a), profile monitoring (e.g., Qiu et al. 2010, Zhou and Qiu 2022), and more. For a comprehensive description of SPC and some SPC charts developed by my research group, see the book Qiu (2014). This extensive experience has proven invaluable in my exploration of disease screening and disease surveillance, providing a robust foundation to innovate and tailor SPC methodologies to the distinctive challenges presented in these critical areas of public health. The book comprises nine chapters. In Chapter 1, a concise introduction sets the stage for understanding the challenges posed by disease screening and surveillance problems. Chapter 2 delves into fundamental statistical concepts and methods commonly employed in data modeling and analysis. Given that disease screening and surveillance involve sequential decision-making, Chapter 3 is dedicated to introducing essential SPC concepts and methods -- a major statistical tool for such problems. Chapters 4-6 focus on recent developments in DySS methods tailored for effective disease screening. Chapter 4 covers univariate and multivariate DySS methods based on direct monitoring of observed disease risk factors, while Chapter 5 introduces methods based on disease risk quantification and sequential monitoring of quantified disease risks. The practical implementation of DySS methods by the R package DySS is detailed in Chapter 6. Chapters 7-9 shift the focus to disease surveillance. Chapter 7 explores traditional methods utilizing the Knox test, scan statistics, and generalized linear modeling. Chapter 8 presents recent methods developed by my research team based on nonparametric spatio-temporal data modeling and monitoring. The implementation of these methods is demonstrated using the R package SpTe2M in Chapter 9. This book serves as an ideal primary textbook for a one-semester course focused on disease screening and/or disease surveillance, tailored for graduate students in biostatistics, bioinformatics, health data science, and related disciplines. Additionally, the book can be utilized as a supplementary textbook for courses covering analytic methods and tools relevant to medical and public health studies. Its content is designed to be accessible and beneficial for medical and public health researchers and practitioners. By introducing recent analytic tools for disease screening and surveillance, the book equips readers with valuable insights that can be easily implemented using the accompanying R packages DySS and SpTe2M. I extend my sincere gratitude to my current and former students and collaborators, Drs. Jun Li, Dongdong Xiang, Kai Yang, Lu You, and Jingnan Zhang, whose dedicated efforts, stimulating discussions, and constructive comments have played an invaluable role in the completion of this book. Their patience and insights have been indispensable. I express my deep appreciation to Dr. Xiulin Xie and Mr. Zibo Tian, who generously dedicated their time to reading the entire book manuscript and diligently corrected numerous typos and mistakes. Completing this book has been a three-year journey, and I owe a debt of gratitude to my wife, Yan, for providing unwavering help and support. Her efforts in managing household responsibilities and caring for our two sons, Andrew and Alan, allowed me to focus on this project. I extend my heartfelt thanks to my family for their love and constant support throughout this endeavor. Peihua Qiu Gainesville, Florida November 2023

Noninvasive Assessment of Kidney Injury by Combining Structure and Function Using Artificial Intelligence-Based Manganese-Enhanced Magnetic Resonance Imaging

Article

Jan 2024
ACS APPL MATER INTER

Enhancing EEG-based cross-day mental workload classification using periodic component of power spectrum

Article

Nov 2023

The day-to-day variability of electroencephalogram (EEG) poses a significant challenge to decode human brain activity in EEG-based passive brain-computer interfaces (pBCIs). Conventionally, a time-consuming calibration process is required to collect data from users on a new day to ensure the performance of the machine learning-based decoding model, which hinders the application of pBCIs to monitor mental workload (MWL) states in real-world settings. This study investigated the day-to-day stability of the raw power spectral density (PSD) and their periodic and aperiodic components decomposed by the Fitting Oscillations and One-Over-F algorithm. In addition, we validated the feasibility of using periodic components to improve cross-day MWL classification performance. Compared to the raw PSD (69.9%±18.5%) and the aperiodic component (69.4%±19.2%), the periodic component had better day-to-day stability and significantly higher cross-day classification accuracy (84.2%±11.0%). This finding not only enhances the practicality of pBCIs for MWL estimation but also unlocks the potential for decoding various brain states in future applications.

Multivariable Quantitative US Parameters for Assessing Hepatic Steatosis

Article

Oct 2023

Background Because of the global increase in the incidence of nonalcoholic fatty liver disease, the development of noninvasive, widely available, and highly accurate methods for assessing hepatic steatosis is necessary. Purpose To evaluate the performance of models with different combinations of quantitative US parameters for their ability to predict at least 5% steatosis in patients with chronic liver disease (CLD) as defined using MRI proton density fat fraction (PDFF). Materials and Methods Patients with CLD were enrolled in this prospective multicenter study between February 2020 and April 2021. Integrated backscatter coefficient (IBSC), signal-to-noise ratio (SNR), and US-guided attenuation parameter (UGAP) were measured in all participants. Participant MRI PDFF value was used to define at least 5% steatosis. Four models based on different combinations of US parameters were created: model 1 (UGAP alone), model 2 (UGAP with IBSC), model 3 (UGAP with SNR), and model 4 (UGAP with IBSC and SNR). Diagnostic performance of all models was assessed using area under the receiver operating characteristic curve (AUC). The model was internally validated using 1000 bootstrap samples. Results A total of 582 participants were included in this study (median age, 64 years; IQR, 52-72 years; 274 female participants). There were 364 participants in the steatosis group and 218 in the nonsteatosis group. The AUC values for steatosis diagnosis in models 1-4 were 0.92, 0.93, 0.95, and 0.96, respectively. The C-indexes of models adjusted by the bootstrap method were 0.92, 0.93, 0.95, and 0.96, respectively. Compared with other models, models 3 and 4 demonstrated improved discrimination of at least 5% steatosis (P < .01). Conclusion A model built using the quantitative US parameters UGAP, IBSC, and SNR could accurately discriminate at least 5% steatosis in patients with CLD. © RSNA, 2023 Supplemental material is available for this article. See also the editorial by Han in this issue.

Fluoroquinolone‐Associated Adverse Events of Interest among Hospitalized Veterans Affairs Patients with Community‐Acquired Pneumonia Who Were Treated with a Fluoroquinolone: A Focus on Tendonitis, Clostridioides difficile Infection, and Aortic Aneurysm

Article

Sep 2023

Introduction An understanding of how frequently individual fluoroquinolone (FQ) adverse events of interest (FQAEIs) occur within specific infection types is imperative to coordinate appropriate use of FQ and potential avoidance in certain disease states and/or patient populations. Objectives Study objectives were to i) quantify the incidence of three concerning FQAEI (i.e., adverse tendon event (TE), Clostridioides difficile infection (CDI), and aortic aneurysm/dissection (AAD)), ii) identify the patient‐level factors that predict these events, and iii) develop clinical risk scores to estimate the predicted probabilities of each FQAEI based on patient‐level covariates available on clinical presentation. Methods A retrospective cohort study was performed among hospitalized patients with community‐acquired pneumonia receiving care in the Upstate New York Veterans’ Healthcare Administration from 2011‐2016. The outcomes of interest for this study were the occurrence of TE, CDI, and AAD. We also evaluated a composite of these three outcomes, FQAEI. Results The study population consisted of 1,071 patients. The overall incidence of FQAEI, TE, AAD, and CDI were 6.5%, 1.8%, 4.5%, and 0.3%, respectively. For each outcome evaluated, the probability of the event of interest was predicted by the presence of certain comorbidities, previous health care exposure, choice of specific FQ antibiotic, or therapy duration. Concomitant steroids, pneumonia in preceding 180 days, and creatinine clearance <30 mL/min predicted FQAEI. Conclusions Individual frequencies of three important FQAEIs were quantified and risk scores were developed to estimate the probabilities of experiencing these events to help clinicians individualize treatment decisions for patients and reduce the potential risks of select FQAEIs.

In-Hospital Fall Risk Prediction by Objective Measurement of Lower Extremity Function in a High-Risk Population

Article

Aug 2023
J AM MED DIR ASSOC

Objectives: Limited data exist regarding association between physical performance and in-hospital falls. This study was performed to investigate the association between physical performance and in-hospital falls in a high-risk population. Design: Retrospective cohort study. Setting and participants: The study population consisted of 1200 consecutive patients with a median age of 74 years (50.8% men) admitted to a ward with high incidence rates of falls, primarily in the departments of geriatrics and neurology, in a university hospital between January 2019 and December 2021. Methods: Short Physical Performance Battery (SPPB) was measured after treatment in the acute phase. As the primary end point of the study, the incidence of in-hospital falls was examined prospectively based on data from mandatory standardized incident report forms and electronic patient records. Results: SPPB assessment was performed at a median of 3 days after admission, and the study population had a median SPPB score of 3 points. Falls occurred in 101 patients (8.4%) over a median hospital stay of 15 days. SPPB score showed a significant inverse association with the incidence of in-hospital falls after adjusting for possible confounders (adjusted odds ratio for each 1-point decrease in SPPB: 1.19, 95% CI 1.10-1.28; P < .001), and an SPPB score ≤6 was significantly associated with increased risk of in-hospital falls. Inclusion of SPPB with previously identified risk factors significantly increased the area under the curve for in-hospital falls (0.683 vs. 0.740, P = .003). Conclusion and implications: This study demonstrated an inverse association of SPPB score with risk of in-hospital falls in a high-risk population and showed that SPPB assessment is useful for accurate risk stratification in a hospital setting.

Inflammatory biomarkers in the cervicovaginal fluid to identify histologic chorioamnionitis and funisitis in women with preterm labor

Article

Aug 2023
CYTOKINE

Objective: We investigated the association between altered levels of inflammatory proteins in the cervicovaginal fluid (CVF) and acute histologic chorioamnionitis (HCA) and funisitis in women with preterm labor (PTL). Methods: In this study, a total of 134 consecutive singleton pregnant women with PTL (at 23+0-34+0 weeks) who delivered preterm (at < 37 weeks) and from whom CVF samples were collected at admission were retrospectively enrolled. The CVF levels of haptoglobin, interleukin-6/8, kallistatin, lipocalin-2, matrix metalloproteinase (MMP)-8, resistin, S100 calcium-binding protein A8, and serpin A1 were determined using enzyme-linked immunosorbent assay. The placentas were histologically analyzed after delivery. Results: Multiple logistic regression analyses showed significant associations between elevated CVF interleukin-8 and resistin levels and acute HCA after adjusting for baseline covariates (e.g., gestational age at sampling). CVF haptoglobin, interleukin-6/8, kallistatin, MMP-8, and resistin levels were significantly higher in women with funisitis than in those without, whereas the baseline covariates were similar between the two groups (P > 0.1). The area under the receiver operating characteristic curves of the aforementioned biomarkers ranged from 0.61 to 0.77 regarding each outcome. Notably, HCA risk significantly increased with increasing CVF levels of interleukin-8 and resistin (P for trend < 0.05). Conclusions: Haptoglobin, interleukin-6/8, kallistatin, MMP-8, and resistin were identified as potential inflammatory CVF biomarkers predictive of acute HCA and funisitis in women with PTL. Moreover, the risk severity of acute HCA may be associated with the degree of the inflammatory response in the CVF (particularly based on interleukin-8 levels).

Lower Body Mass Index and Prognostic Nutritional Index Are Associated With Poor Post-transplant Outcomes in Lymphoma Patients Undergoing Autologous Stem Cell Transplantation

Article

Jun 2023

Background: Pre-transplant inflammatory and nutritional status has not been widely explored in terms of its impact on autologous hematopoietic stem cell transplantation (auto-HSCT) outcomes in lymphoma patients. We aimed to evaluate the impact of body mass index (BMI), prognostic nutri-tional index (PNI), and C-reactive protein to albumin ratio (CAR) on auto-HSCT outcomes. Meth-od: We retrospectively analyzed 87 consecutive lymphoma patients who underwent their first auto-HSCT at the Adult Hematopoietic Stem Cell Transplantation Unit at Akdeniz University Hospital. Result: CAR had no impact on post-transplant outcomes. PNI≤50 was an independent prognostic factor for both shorter progression free survival (PFS) (hazard ratio [HR]=2.43, P = .025) and worse overall survival (OS) (HR=2.93, P = .021), respectively. The 5-year PFS rate was significantly lower in patients with PNI≤50 than in patients with PNI>50 (37.3% vs. 59.9%, P = .003). The 5-year OS rate in patients with PNI≤50 had significantly low when compared with patients who had PNI>50 as well (45.5% vs. 67.2%, P = .011). Patients with BMI<25 had higher 100-day TRM compared with patients with BMI≥25 (14.7% vs 1.9%, P = .020). BMI<25 was an independent prognostic factor associated with shorter PFS and OS (HR=2.98, P = .003, HR=5.06, P < .001, respectively). The 5-year PFS rate was significantly lower in patients with BMI<25 than patients with BMI≥25 (40.2% vs. 53.7%, P = .037). Similarly, the 5-year OS rate in patients with BMI<25 had significant-ly inferior compared to patients with BMI≥25 (42.7% vs. 64.7%, P = .002). Conclusion: Our study confirms that lower BMI and CAR have negative impacts on auto-HSCT outcomes in lymphoma patients. Furthermore, higher BMI should not be considered an obstacle for lymphoma patients who need auto-HSCT, conversely, it could be an advantage for post-transplant outcomes.

Weakly Supervised Breast Lesion Detection in Dynamic Contrast-Enhanced MRI

Article

May 2023
J DIGIT IMAGING

Currently, obtaining accurate medical annotations requires high labor and time effort, which largely limits the development of supervised learning-based tumor detection tasks. In this work, we investigated a weakly supervised learning model for detecting breast lesions in dynamic contrast-enhanced MRI (DCE-MRI) with only image-level labels. Two hundred fifty-four normal and 398 abnormal cases with pathologically confirmed lesions were retrospectively enrolled into the breast dataset, which was divided into the training set (80%), validation set (10%), and testing set (10%) at the patient level. First, the second image series S2 after the injection of a contrast agent was acquired from the 3.0-T, T1-weighted dynamic enhanced MR imaging sequences. Second, a feature pyramid network (FPN) with convolutional block attention module (CBAM) was proposed to extract multi-scale feature maps of the modified classification network VGG16. Then, initial location information was obtained from the heatmaps generated using the layer class activation mapping algorithm (Layer-CAM). Finally, the detection results of breast lesion were refined by the conditional random field (CRF). Accuracy, sensitivity, specificity, and area under the receiver operating characteristic (ROC) curve (AUC) were utilized for evaluation of image-level classification. Average precision (AP) was estimated for breast lesion localization. Delong's test was used to compare the AUCs of different models for significance. The proposed model was effective with accuracy of 95.2%, sensitivity of 91.6%, specificity of 99.2%, and AUC of 0.986. The AP for breast lesion detection was 84.1% using weakly supervised learning. Weakly supervised learning based on FPN combined with Layer-CAM facilitated automatic detection of breast lesion.

Development of a prognostic nomogram and risk stratification system for upper thoracic esophageal squamous cell carcinoma

Article

Full-text available

Apr 2023

Background: The study aimed to develop a nomogram model to predict overall survival (OS) and construct a risk stratification system of upper thoracic esophageal squamous cell carcinoma (ESCC). Methods: Newly diagnosed 568 patients with upper ESCC at Fujian Medical University Cancer Hospital were taken as a training cohort, and additional 155 patients with upper ESCC from Sichuan Cancer Hospital Institute were used as a validation cohort. A nomogram was established using Cox proportional hazard regression to identify prognostic factors for OS. The predictive power of nomogram model was evaluated by using 4 indices: concordance statistics (C-index), time-dependent ROC (ROCt) curve, net reclassification index (NRI) and integrated discrimination improvement (IDI). Results: In this study, multivariate analysis revealed that gender, clinical T stage, clinical N stage and primary gross tumor volume were independent prognostic factors for OS in the training cohort. The nomogram based on these factors presented favorable prognostic efficacy in the both training and validation cohorts, with concordance statistics (C-index) of 0.622, 0.713, and area under the curve (AUC) value of 0.709, 0.739, respectively, which appeared superior to those of the American Joint Committee on Cancer (AJCC) staging system. Additionally, net reclassification index (NRI) and integrated discrimination improvement (IDI) of the nomogram presented better discrimination ability to predict survival than those of AJCC staging. Furthermore, decision curve analysis (DCA) of the nomogram exhibited greater clinical performance than that of AJCC staging. Finally, the nomogram fairly distinguished the OS rates among low, moderate, and high risk groups, whereas the OS curves of clinical stage could not be well separated among clinical AJCC stage. Conclusion: We built an effective nomogram model for predicting OS of upper ESCC, which may improve clinicians' abilities to predict individualized survival and facilitate to further stratify the management of patients at risk.

Comparison of Cirrus spectral domain OCT with disc-macula distance to disc diameter ratio in diagnosing congenital optic nerve hypoplasia

Article

Apr 2023
OPHTHAL PHYSL OPT

Purpose: Diagnosis of congenital optic nerve hypoplasia (CONH) can be challenging in children or uncooperative individuals. Misdiagnosis can lead to inappropriate treatment; thus, it is important to identify an objective and reliable measurement. The purpose of this study was to evaluate whether Cirrus spectral domain optical coherence tomography (SD-OCT) is a valid test for diagnosing CONH by comparing it to the disc-macula distance to disc diameter (DM:DD) ratio. Methods: A total of 93 participants (64 controls and 29 CONH) underwent comprehensive eye examinations, fundus photography and Cirrus SD-OCT. Receiver operating characteristic (ROC) curves for the DM:DD ratio and OCT disc area were constructed for CONH and control eyes. Results: Mean (±SD) OCT disc area was 1.46 (±0.42) mm2 and 1.89 (±0.38) mm2 for CONH and control eyes, respectively (p < 0.0001). The area under the curve for the DM:DD ratio was 0.97 (95% confidence interval: 0.91-0.99) and 0.79 for OCT disc area (95% confidence interval: 0.70-0.86), which were significantly different (p = 0.0005). The optimal cut-off value for OCT disc area was 1.66 mm2 (76% sensitivity, 70% specificity), while the optimal cut-off for DM:DD ratio was 3.10 (85% sensitivity and 95% specificity). The Cirrus SD-OCT showed a tendency to overestimate disc size, especially in cases with no light perception (NLP) or segmental CONH. Conclusions: Although the DM:DD ratio is superior to OCT in diagnosing CONH with a higher sensitivity and specificity, the ratio is subject to inter-examiner variability and can be challenging to obtain. We found the Cirrus SD-OCT to be a valid objective test for diagnosing CONH. Caution is advised when using SD-OCT in segmental CONH or in an eye with NLP. We suggest 1.66 mm2 as the optimal cut-off value for Cirrus SD-OCT disc area to differentiate a hypoplastic from a normal optic disc.

Measuring severe obesity in pediatrics: A cohort study

Preprint

Full-text available

Mar 2023

Purpose: To examine cross-sectional and longitudinal relationships between body mass index (BMI)-derived metrics for measuring severe obesity (SO) using the Centers for Disease Control and Prevention (CDC) and World Health Organization (WHO) references and cardiometabolic risk factors in children and adolescents. Methods: In this cohort study completed from 2013 to 2021, we examined data from 3- to 18-year-olds enrolled in the CANadian Pediatric Weight management Registry. Anthropometric data were used to create nine BMI-derived metrics based on the CDC and WHO references. Cardiometabolic risk factors were examined, including dysglycemia, dyslipidemia, and elevated blood pressure. Analyses included intraclass correlation coefficients (ICC) and receiver operator characteristic area-under-the-curve (ROC AUC). Results: Our sample included 1,288 participants (n=666 [51.7%] girls; n=874 [67.9%] white), with SO of 59.9–67.0%. ICCs revealed high tracking (0.90–0.94) for most BMI-derived metrics. ROC AUC analyses showed CDC and WHO metrics discriminated the presence of cardiometabolic risk factors, which improved with increasing numbers of risk factors. Overall, most BMI-derived metrics rated poorly in identifying presence of cardiometabolic risk factors. Conclusion: CDC BMI percent of the 95th percentile and WHO BMIz performed similarly as measures of SO, suggesting both can be used for clinical care and research in pediatrics. The latter definition may be particularly useful for clinicians and researchers from countries that recommend using the WHO growth reference.

Expression of inflammatory, angiogenic, and extracellular matrix‐related mediators in the cervicovaginal fluid of women with preterm premature rupture of membranes: Relationship with acute histological chorioamnionitis

Article

Mar 2023
AM J REPROD IMMUNOL

Problem: To investigate whether altered expression of various inflammation-, angiogenesis-, and extracellular matrix-related mediators in cervicovaginal fluid (CVF) could be independently associated with acute histological chorioamnionitis (HCA), microbial-associated HCA, and funisitis in women with preterm premature rupture of membranes (PPROM). Method of study: Clinical data of 102 consecutive singleton pregnant women with PPROM at 23+0 to 34+0 weeks were retrospectively analyzed. CVF samples were collected upon admission. Levels of APRIL, DKK-3, IGFBP-1/2, IL-6/8, lipocalin-2, M-CSF, MIP-1α, MMP-8/9, S100A8A9, TGFBI, TIMP-1, TNFR2, uPA, and VDBP were determined by ELISA. Placentas were histologically examined after birth. Results: Multivariate logistic regression analyses showed that: (1) elevated CVF levels of IL-8 and TNFR2 were independently associated with acute HCA; (2) elevated CVF levels of IL-6, IL-8, M-CSF, MMP-8, and TNFR2 were independently associated with microbial-associated HCA; and (3) elevated CVF IL-8 and MMP-8 levels were independently associated with funisitis when adjusted for gestational age. Areas under the curves of the aforementioned CVF biomarkers ranged within 0.61-0.77, thereby demonstrating poor to fair diagnostic capacity for these clinical endpoints. HCA risk significantly increased as the CVF levels of each inflammatory mediator increased (P for trend < 0.05). Conclusions: Herein, we identified several inflammatory biomarkers (IL-6/8, M-CSF, MMP-8, and TNFR2) in the CVF that are independently associated with acute HCA, microbial-associated HCA, and funisitis in women with PPROM. Furthermore, the degree of inflammatory response in the CVF, based on the levels of these proteins, demonstrated a direct relationship with HCA risk (especially risk severity). This article is protected by copyright. All rights reserved.

Alterations of the fecal and vaginal microbiomes in patients with systemic lupus erythematosus and their associations with immunological profiles

Article

Full-text available

Mar 2023

Background Exploring the human microbiome in multiple body niches is beneficial for clinicians to determine which microbial dysbiosis should be targeted first. We aimed to study whether both the fecal and vaginal microbiomes are disrupted in SLE patients and whether they are correlated, as well as their associations with immunological features. Methods A group of 30 SLE patients and 30 BMI-age-matched healthy controls were recruited. Fecal and vaginal samples were collected, the 16S rRNA gene was sequenced to profile microbiomes, and immunological features were examined. Results Distinct fecal and vaginal bacterial communities and decreased microbial diversity in feces compared with the vagina were found in SLE patients and controls. Altered bacterial communities were found in the feces and vaginas of patients. Compared with the controls, the SLE group had slightly lower gut bacterial diversity, which was accompanied by significantly higher bacterial diversity in their vaginas. The most predominant bacteria differed between feces and the vagina in all groups. Eleven genera differed in patients’ feces; for example, Gardnerella and Lactobacillus increased, whereas Faecalibacterium decreased. Almost all the 13 genera differed in SLE patients’ vaginas, showing higher abundances except for Lactobacillus. Three genera in feces and 11 genera in the vagina were biomarkers for SLE patients. The distinct immunological features were only associated with patients’ vaginal microbiomes; for example, Escherichia−Shigella was negatively associated with serum C4. Conclusions Although SLE patients had fecal and vaginal dysbiosis, dysbiosis in the vagina was more obvious than that in feces. Additionally, only the vaginal microbiome interacted with patients’ immunological features.

SMART choice (knee) tool: a patient-focused predictive model to predict improvement in health-related quality of life after total knee arthroplasty

Article

Jan 2023
ANZ J SURG

Background: Current predictive tools for TKA focus on clinicians rather than patients as the intended user. The purpose of this study was to develop a patient-focused model to predict health-related quality of life outcomes at 1-year post-TKA. Methods: Patients who underwent primary TKA for osteoarthritis from a tertiary institutional registry after January 2006 were analysed. The primary outcome was improvement after TKA defined by the minimal clinically important difference in utility score at 1-year post-surgery. Potential predictors included demographic information, comorbidities, lifestyle factors, and patient-reported outcome measures. Four models were developed, including both conventional statistics and machine learning (artificial intelligence) methods: logistic regression, classification tree, extreme gradient boosted trees, and random forest models. Models were evaluated using discrimination and calibration metrics. Results: A total of 3755 patients were included in the study. The logistic regression model performed the best with respect to both discrimination (AUC = 0.712) and calibration (intercept = -0.083, slope = 1.123, Brier score = 0.202). Less than 2% (n = 52) of the data were missing and therefore removed for complete case analysis. The final model used age (categorical), sex, baseline utility score, and baseline Veterans-RAND 12 responses as predictors. Conclusion: The logistic regression model performed better than machine learning algorithms with respect to AUC and calibration plot. The logistic regression model was well calibrated enough to stratify patients into risk deciles based on their likelihood of improvement after surgery. Further research is required to evaluate the performance of predictive tools through pragmatic clinical trials. Level of evidence: Level II, decision analysis.

Using Agent Features to Influence User Trust, Decision Making and Task Outcome during Human-Agent Collaboration

Article

Jan 2023

Optimal performance of collaborative tasks requires consideration of the interactions between intelligent agents and their human counterparts. The functionality and success of these agents lie in their ability to maintain user trust; with too much or too little trust leading to over-reliance and under-utilisation, respectively. This problem highlights the need for an appropriate trust calibration methodology with an ability to vary user trust and decision making in-task. An online experiment was run to investigate whether stimulus difficulty and the implementation of agent features by a collaborative recommender system interact to influence user perception, trust and decision making. Agent features are changes to the Human-Agent interface and interaction style, and include presentation of a disclaimer message, a request for more information from the user and no additional feature. Signal detection theory is utilised to interpret decision making, with this applied to assess decision making on the task, as well as with the collaborative agent. The results demonstrate that decision change occurs more for hard stimuli, with participants choosing to change their initial decision across all features to follow the agent recommendation. Furthermore, agent features can be utilised to mediate user decision making and trust in-task, though the direction and extent of this influence is dependent on the implemented feature and difficulty of the task. The results emphasise the complexity of user trust in Human-Agent collaboration, highlighting the importance of considering task context in the wider perspective of trust calibration.

Receiver operating characteristic analysis using a novel combined thermal and ultrasound imaging for assessment of disease activity in rheumatoid arthritis

Article

Full-text available

Dec 2022

We aim to determine whether combined thermal and ultrasound (CTUS) imaging can identify rheumatoid arthritis (RA) patients with at least moderate disease activity (DAS28 > 3.2). Temperature differences of maximum (Tmax), average (Tavg) and minimum (Tmin) temperatures from a control temperature at 22 joints (bilateral hands) were summed up to derive the respective MAX, AVG and MIN per patient. MAX (PD), AVG (PD) and MIN (PD) are CTUS results derived by multiplying MAX, AVG and MIN by a factor of 2 when a patient’s total ultrasound power Doppler (PD) joint inflammation score > median score, which otherwise remained unchanged. Receiver operating characteristic (ROC) analysis was used to determine whether CTUS imaging can identify patients with DAS28 > 3.2. In this cross-sectional study, 814 joints were imaged among 37 RA patients (mean disease duration, 31 months). CTUS (but not single modality) imaging parameters were all significantly greater comparing patients with DAS28 > 3.2 versus those with DAS28 ≤ 3.2 (all P < 0.01). Area under the ROC curves (AUCs) using cut-off levels of ≥ 94.5, ≥ 64.6 and ≥ 42.3 in identifying patients with DAS28 > 3.2 were 0.73 , 0.76 and 0.76 for MAX (PD), AVG (PD) and MIN (PD), respectively (with sensitivity ranging from 58 to 61% and specificity all 100%). The use of CTUS in detecting a greater severity of joint inflammation among patients with at least moderate disease activity (DAS28 > 3.2) appears promising and will require further validation in independent RA cohorts.

Sensitivity and specificity of the flexion and extension relaxation ratios to identify altered paraspinal muscles’ flexion relaxation phenomenon in nonspecific chronic low back pain patients

Article

Dec 2022
J ELECTROMYOGR KINES

Background: Among the main methods used to identify an altered flexion relaxation phenomenon (FRP) in nonspecific chronic low back pain (NSCLBP), it has been previously demonstrated that flexion relaxation ratio (FRR) and extension relaxation ratio (ERR) are more objective than the visual reference method. Objective: To determine the sensitivity and specificity of the different methods used to calculate the ratios in terms of their ability to identify an altered FRP in NSCLBP. Methods: Forty-four NSCLBP patients performed a standing maximal trunk flexion task. Surface electromyography (sEMG) was recorded along the erector spinae longissimus (ESL) and multifidus (MF) muscles. Altered FRP based on sEMG was visually identified by three experts (current standard). Six FRR methods and five ERR methods were used both for the ESL and MF muscles. ROC curves (with areas under the curve (AUC) and sensitivity/specificity) were generated for each ratio. Results: All methods used to calculate these ratios had an AUC higher than 0.9, excellent sensitivity (>90 %), and good specificity (80-100 %) for both ESL and MF muscles. Conclusion: Both FRP ratios (FRR and ERR) for MF and ESL muscles, appear to be an objective, sensitive and specific method for identifying altered FRP in NSCLBP patients.

Analyzing a portion of the ROC Curve

Article

Full-text available

Aug 1989

Donna McClish

The area under the ROC curve is a common index summarizing the information contained in the curve. When comparing two ROC curves, though, problems arise when interest does not lie in the entire range of false-positive rates (and hence the entire area). Numerical integration is suggested for evaluating the area under a portion of the ROC curve. Variance estimates are derived. The method is applicable for either continuous or rating scale binormal data, from independent or dependent samples. An example is presented which looks at rating scale data of computed tomographic scans of the head with and without concomitant use of clinical history. The areas under the two ROC curves over an a priori range of false-positive rates are examined, as well as the areas under the two curves at a specific point.

The Meaning and Use of the Area Under a Receiver Operating Characteristic (ROC) Curve

Article

Full-text available

May 1982

A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the "rating" method, or by mathematical predictions based on patient characteristics, is presented. It is shown that in such a setting the area represents the probability that a randomly chosen diseased subject is (correctly) rated or ranked with greater suspicion than a randomly chosen non-diseased subject. Moreover, this probability of a correct ranking is the same quantity that is estimated by the already well-studied nonparametric Wilcoxon statistic. These two relationships are exploited to (a) provide rapid closed-form expressions for the approximate magnitude of the sampling variability, i.e., standard error that one uses to accompany the area under a smoothed ROC curve, (b) guide in determining the size of the sample required to provide a sufficiently reliable estimate of this area, and (c) determine how large sample sizes should be to ensure that one can statistically detect differences in the accuracy of diagnostic techniques.

The accuracy of magnetic resonance imaging in patients with suspected multiple sclerosis

Article

Jun 1993

Objective.-To design and implement a methodologically rigorous study to examine the accuracy of magnetic resonance imaging (MRI) in a patient population clinically suspected of having multiple sclerosis (MS). Design and Setting.-Three hundred three patients, who were referred to two university medical centers because of the suspicion of MS, underwent MRI of the head and double-dose, contrast-enhanced computed tomography (CT) of the head. The images were read by two observers individually and without knowledge of the clinical course or final diagnosis. Patients were followed up for at least 6 months and reevaluated clinically with subsequent neurological examination. Final diagnosis (MS or not MS) was made by a panel of neurologists on the basis of the clinical findings at presentation, those that developed during follow-up, and other diagnostic tests. The results of the imaging procedures were excluded to avoid incorporation bias. Diagnostic accuracy was assessed using receiver-operating characteristic analysis and likelihood ratios. Results.-Magnetic resonance imaging of the head was considerably more accurate than CT in diagnosing MS. The area under the receiver-operating characteristic curve for MS was 0.82 (compared with 0.52 for CT) indicating that MRI was a good but not definitively accurate test for MS. A ''definite MS'' reading on an MRI of the head was specific for MS (likelihood ratio, 24.9) and essentially established the diagnosis, especially in patients clinically designated as ''probable MS'' before testing. However, MRI of the head was negative for MS in 25% and equivocal in 40% of the patients considered to have MS by the diagnostic review committee (sensitivity, 58%). Conclusions.-Magnetic resonance imaging of the head provided assistance in the diagnosis of MS when lesions were visualized. Its ability far exceeded imaging with double-contrast CT. The sensitivity and, therefore, the predictive value of a negative MRI result for MS were, however, not sufficiently high for a normal MRI to be used to conclusively exclude the diagnosis of MS.

The Accuracy of Magnetic Resonance Imaging in Patients With Suspected Multiple Sclerosis

Article

Jun 1993

Alvin I Mushlin

Objective. —To design and implement a methodologically rigorous study to examine the accuracy of magnetic resonance imaging (MRI) in a patient population clinically suspected of having multiple sclerosis (MS).Design and Setting. —Three hundred three patients, who were referred to two university medical centers because of the suspicion of MS, underwent MRI of the head and double-dose, contrast-enhanced computed tomography (CT) of the head. The images were read by two observers individually and without knowledge of the clinical course or final diagnosis. Patients were followed up for at least 6 months and reevaluated clinically with subsequent neurological examination. Final diagnosis (MS or not MS) was made by a panel of neurologists on the basis of the clinical findings at presentation, those that developed during follow-up, and other diagnostic tests. The results of the imaging procedures were excluded to avoid incorporation bias. Diagnostic accuracy was assessed using receiver-operating characteristic analysis and likelihood ratios.Results. —Magnetic resonance imaging of the head was considerably more accurate than CT in diagnosing MS. The area under the receiver-operating characteristic curve for MS was 0.82 (compared with 0.52 for CT) indicating that MRI was a good but not definitively accurate test for MS. A "definite MS" reading on an MRI of the head was specific for MS (likelihood ratio, 24.9) and essentially established the diagnosis, especially in patients clinically designated as "probable MS" before testing. However, MRI of the head was negative for MS in 25% and equivocal in 40% of the patients considered to have MS by the diagnostic review committee (sensitivity, 58%).Conclusions. —Magnetic resonance imaging of the head provided assistance in the diagnosis of MS when lesions were visualized. Its ability far exceeded imaging with double-contrast CT. The sensitivity and, therefore, the predictive value of a negative MRI result for MS were, however, not sufficiently high for a normal MRI to be used to conclusively exclude the diagnosis of MS.(JAMA. 1993;269:3146-3151)

The Area Above the Ordinal Dominance Graph and the Area Below the Receiver Operating Characteristic Graph

Article

Nov 1975

Donald Bamber

Receiver operating characteristic graphs are shown to be a variant form of ordinal dominance graphs. The area above the latter graph and the area below the former graph are useful measures of both the size or importance of a difference between two populations and/or the accuracy of discrimination performance. The usual estimator for this area is closely related to the Mann-Whitney U statistic. Statistical literature on this area estimator is reviewed. For large sample sizes, the area estimator is approximately normally distributed. Formulas for the variance and the maximum variance of the area estimator are given. Several different methods of constructing confidence intervals for the area measure are presented and the strengths and weaknesses of each of these methods are discussed. Finally, the Appendix presents the derivation of a new mathematical result, the maximum variance of the area estimator over convex ordinal dominance graphs.

Maximum likelihood estimation of parameters of signal detection theory-a direct solution [Binormal ROC curve-dichotomous diagnostic test]

Article

Feb 1968
PSYCHOMETRIKA

Ogilvie and Creelman have recently attempted to develop maximum likelihood estimates of the parameters of signal-detection theory from the data of yes-no ROC curves. Their method involved the assumption of a logistic distribution rather than the normal distribution in order to make the mathematics more tractable. The present paper presents a method of obtaining maximum likelihood estimates of these parameters using the assumption of underlying normal distributions.

Basic Principles of ROC analysis

Article

Nov 1978
SEMIN NUCL MED

Charles E. Metz

The limitations of diagnostic "accuracy" as a measure of decision performance require introduction of the concepts of the "sensitivity" and "specificity" of a diagnostic test. These measures and the related indices, "true positive fraction" and "false positive fraction," are more meaningful than "accuracy," yet do not provide a unique description of diagnostic performance because they depend on the arbitrary selection of a decision threshold. The receiver operating characteristic (ROC) curve is shown to be a simple yet complete empirical description of this decision threshold effect, indicating all possible combinations of the relative frequencies of the various kinds of correct and incorrect decisions. Practical experimental techniques for measuring ROC curves are described, and the issues of case selection and curve-fitting are discussed briefly. Possible generalizations of conventional ROC analysis to account for decision performance in complex diagnostic tasks are indicated. ROC analysis is shown to be related in a direct and natural way to cost/benefit analysis of diagnostic decision making. The concepts of "average diagnostic cost" and "average net benefit" are developed and used to identify the optimal compromise among various kinds of diagnostic error. Finally, the way in which ROC analysis can be employed to optimize diagnostic strategies is suggested.

Some Practical Issues of Experimental Design and Data Analysis in Radiological ROC Studies

Article

Apr 1989

Charles E. Metz

Receiver operating characteristic (ROC) analysis has been used in a broad variety of medical imaging studies during the past 15 years, and its advantages over more traditional measures of diagnostic performance are now clearly established. But despite the essential simplicity of the approach, workers in the field often find--sometimes only after an ROC study is under way--that a number of subtle issues related to experimental design and data analysis must be confronted in practice. Many of these issues have not been discussed in the literature in detail, and most are not well known. The purposes of this paper are to make users of ROC methodology in medical imaging aware of potential problems that should be confronted before an ROC study is begun and to indicate, at least broadly, how those problems may be dealt with, given the present state of the art. Some of the issues raised here can be addressed adequately by easily prescribed techniques, whereas others remain difficult and will be resolved fully only by new methodologic developments.

ROC Methodology in Radiologic Imaging

Article

Oct 1986

CHARLES E. METZ

If the performance of a diagnostic imaging system is to be evaluated objectively and meaningfully, one must compare radiologists' image-based diagnoses with actual states of disease and health in a way that distinguishes between the inherent diagnostic capacity of the radiologists' interpretations of the images, and any tendencies to "under-read" or "over-read". ROC methodology provides the only known basis for distinguishing between these two aspects of diagnostic performance. After identifying the fundamental issues that motivate ROC analysis, this article develops ROC concepts in an intuitive way. The requirements of a valid ROC study and practical techniques for ROC data collection and data analysis are sketched briefly. A survey of the radiologic literature indicates the broad variety of evaluation studies in which ROC analysis has been employed.

Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach

Article

Oct 1988

Methods of evaluating and comparing the performance of diagnostic tests are of increasing importance as new tests are developed and marketed. When a test is based on an observed variable that lies on a continuous or graded scale, an assessment of the overall value of the test can be made through the use of a receiver operating characteristic (ROC) curve. The curve is constructed by varying the cutpoint used to determine which values of the observed variable will be considered abnormal and then plotting the resulting sensitivities against the corresponding false positive rates. When two or more empirical curves are constructed based on tests performed on the same individuals, statistical analysis on differences between curves must take into account the correlated nature of the data. This paper presents a nonparametric approach to the analysis of areas under correlated ROC curves, by using the theory on generalized U-statistics to generate an estimated covariance matrix.

Signal Detectability and Medical Decision-Making

Article

Apr 1971

Lee B. Lusted

Signal detectability studies help radiologists evaluate equipment systems and performance of assistants.

Advances in statistical methodology for the evaluation of diagnostic and laboratory tests

Article

Mar 1994

Gregory Campbell

The ROC plot is a useful tool in the evaluation of the performance of medical tests for separating two populations. For a two-state decision rule based on such a test, the ROC plot is the graph of all observed (1-specificity, sensitivity) pairs. Each point on this empirical plot can be represented by a 2 × 2 contingency table. The non-parametric statistics of Mann-Whitney and Kolmogorov-Smirnov can be immediately identified on this plot. Local non-parametric confidence interval procedures related to the theoretical ROC curve are briefly reviewed. For continuous data, two new simultaneous confidence regions associated with the ROC curve are presented, one based on Kolmogorov-Smirnov confidence bands for distribution functions and the other based on bootstrapping. Two different tests on the same patients can be compared on the ROC scale. For continuous data, one important problem concerns the comparison of two ROC plots (as would arise from two correlated diagnostic tests on each patient) using a sup norm (this metric can detect differences that the ROC area cannot). The distribution of a statistic based on this norm is studied, using the bootstrap. A biomedical example illustrates the methodologies.

Receiver-Operating Characteristic (ROC) Plots: A Fundamental Evaluation Tool in Clinical Medicine

Article

May 1993

The clinical performance of a laboratory test can be described in terms of diagnostic accuracy, or the ability to correctly classify subjects into clinically relevant subgroups. Diagnostic accuracy refers to the quality of the information provided by the classification device and should be distinguished from the usefulness, or actual practical value, of the information. Receiver-operating characteristic (ROC) plots provide a pure index of accuracy by demonstrating the limits of a test's ability to discriminate between alternative states of health over the complete spectrum of operating conditions. Furthermore, ROC plots occupy a central or unifying position in the process of assessing and using diagnostic tools. Once the plot is generated, a user can readily go on to many other activities such as performing quantitative ROC analysis and comparisons of tests, using likelihood ratio to revise the probability of disease in individual subjects, selecting decision thresholds, using logistic-regression analysis, using discriminant-function analysis, or incorporating the tool into a clinical strategy by using decision analysis.

The accuracy of magnetic resonance imaging in patients with suspected multiple sclerosis. The Rochester-Toronto Magnetic Resonance Imaging Study Group

Article

Jun 1993

To design and implement a methodologically rigorous study to examine the accuracy of magnetic resonance imaging (MRI) in a patient population clinically suspected of having multiple sclerosis (MS). Three hundred three patients, who were referred to two university medical centers because of the suspicion of MS, underwent MRI of the head and double-dose, contrast-enhanced computed tomography (CT) of the head. The images were read by two observers individually and without knowledge of the clinical course or final diagnosis. Patients were followed up for at least 6 months and reevaluated clinically with subsequent neurological examination. Final diagnosis (MS or not MS) was made by a panel of neurologists on the basis of the clinical findings at presentation, those that developed during follow-up, and other diagnostic tests. The results of the imaging procedures were excluded to avoid incorporation bias. Diagnostic accuracy was assessed using receiver-operating characteristic analysis and likelihood ratios. Magnetic resonance imaging of the head was considerably more accurate than CT in diagnosing MS. The area under the receiver-operating characteristic curve for MS was 0.82 (compared with 0.52 for CT) indicating that MRI was a good but not definitively accurate test for MS. A "definite MS" reading on an MRI of the head was specific for MS (likelihood ratio, 24.9) and essentially established the diagnosis, especially in patients clinically designated as "probable MS" before testing. However, MRI of the head was negative for MS in 25% and equivocal in 40% of the patients considered to have MS by the diagnostic review committee (sensitivity, 58%). Magnetic resonance imaging of the head provided assistance in the diagnosis of MS when lesions were visualized. Its ability far exceeded imaging with double-contrast CT. The sensitivity and, therefore, the predictive value of a negative MRI result for MS were, however, not sufficiently high for a normal MRI to be used to conclusively exclude the diagnosis of MS.

A receiver operating characteristic partial area index for highly sensitive diagnostic tests

Article

Jan 1997

Area under a receiver operating characteristic (ROC) curve (Az) is widely used as an index of diagnostic performance. However, Az is not a meaningful summary of clinical diagnostic performance when high sensitivity must be maintained clinically. The authors developed a new ROC partial area index, which measures clinical diagnostic performance more meaningfully in such situations, to summarize an ROC curve in only a high-sensitivity region. The mathematical formation of the partial area index was derived from the conventional binormal model. Statistical tests of apparent differences in this index were formulated analogous to that of Az. One common statistical test involving the partial area index was validated by computer simulations under realistic conditions. An example in mammography illustrates a situation in which the partial area index is more meaningful than Az in measuring clinical diagnostic performance. The partial area index can be used as a more meaningful alternative to the conventional Az index for highly sensitive diagnostic tests.

In pursuit of a piece of the ROC

Article

Jan 1997

A.J. Dwyer

Diagnosis of Gastric Cancers: Comparison of Conventional Radiography and Digital Radiography with a 4 Million-Pixel Charge-coupled Device1

Article

Mar 2000

To evaluate the differences in accuracy and observer performance at conventional radiography and at digital radiography with a 4 million-pixel charge-coupled device (CCD) for the diagnosis of gastric cancers. A prospective study was performed of 225 patients with suspected gastric cancer who were referred to our hospital from January 1997 through February 1997. One hundred twelve patients were examined at conventional radiography and 113 were examined at digital radiography, and 24 and 27 patients had gastric cancer, respectively. Six radiologists interpreted the images, with attention to tumor findings. They were blinded to the clinical details, and their interpretations were rated against those of three other radiologists who examined the patients and who were aware of the clinical information such as endoscopic features and/or histopathologic findings in biopsy specimens. Receiver operating characteristic (ROC) analysis was used to compare the differences in observer performance for the diagnosis of gastric cancers at conventional radiography and at digital radiography. The overall sensitivity was 64.6% at conventional radiography versus 75.3% at digital radiography (P =. 287); specificities were 84.5% and 90.5%, respectively (P =.011); and the positive predictive values were 53.1% and 71.3%, respectively (P =.036). ROC analysis clearly showed higher diagnostic performance at digital radiography than at conventional radiography. The data demonstrate the high diagnostic value of digital radiography with a 4 million-pixel CCD for gastric cancers. The technique has considerable potential as an alternative to conventional gastrointestinal radiography.

Receiver Operating Characteristic Curves and Their Use in Radiology1

Abstract

No full-text available

Recommended publications

Optimization of reference library used in content-based medical image retrieval scheme

Part I: Malignant liver tumors

Multiscale High-Level Feature Fusion for Histopathological Image Classification

Toward Consistent Use of Reporting Scales in Imaging Studies