Article

Detection and Classification of Myocardial Delayed Enhancement Patterns on MR Images with Deep Neural Networks: A Feasibility Study

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Purpose: To evaluate whether deep neural networks trained on a similar number of images to that required during physician training in the American College of Cardiology Core Cardiovascular Training Statement can acquire the capability to detect and classify myocardial delayed enhancement (MDE) patterns. Materials and methods: The authors retrospectively evaluated 1995 MDE images for training and validation of a deep neural network. Images were from 200 consecutive patients who underwent cardiovascular MRI and were obtained from the institutional database. Experienced cardiac MR image readers classified the images as showing the following MDE patterns: no pattern, epicardial enhancement, subendocardial enhancement, midwall enhancement, focal enhancement, transmural enhancement, and nondiagnostic. Data were divided into training and validation datasets by using a fourfold cross-validation method. Three untrained deep neural network architectures using the convolutional neural network (CNN) technique were trained with the training dataset images. The detection and classification accuracies of the trained CNNs were calculated with validation data. Results: The 1995 MDE images were classified by human readers as follows: no pattern, 926; epicardial enhancement, 91; subendocardial enhancement, 458; midwall enhancement, 118; focal enhancement, 141; transmural enhancement, 190; and nondiagnostic, 71. GoogLeNet, AlexNet, and ResNet-152 CNNs demonstrated accuracies of 79.5% (1592 of 1995 images), 78.9% (1574 of 1995 images), and 82.1% (1637 of 1995 images), respectively. Conclusion: Deep learning with CNNs using a limited amount of training data, less than that required during physician training, achieved high diagnostic performance in the detection of MDE on MR images.© RSNA, 2019Supplemental material is available for this article.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... LGE is an established technique in clinical practice and LGE patterns on MR images play an essential role in diagnosing cardiomyopathies and guiding therapy [5,12,13]. The presence and distribution of contrast agent can reveal focal pathologic changes in the myocardium, such as necrosis, fibrosis, amyloid deposition, and edema, with high spatial resolution [3,5]. ...
... Automated diagnostic systems like CNNs can help professionals detect diseases early in a more accurate way that is less time-consuming and costly [2,17]. In particular, inexperienced physicians can benefit from a reference finding in their decision-making process [12]. ...
... Ohta et al. [12] cropped image regions outside the heart, risking information loss. In contrast, our model was trained on whole images. ...
Article
Full-text available
Background A deep learning (DL) model that automatically detects cardiac pathologies on cardiac MRI may help streamline the diagnostic workflow. To develop a DL model to detect cardiac pathologies on cardiac MRI T1-mapping and late gadolinium phase sensitive inversion recovery (PSIR) sequences were used. Methods Subjects in this study were either diagnosed with cardiac pathology (n = 137) including acute and chronic myocardial infarction, myocarditis, dilated cardiomyopathy, and hypertrophic cardiomyopathy or classified as normal (n = 63). Cardiac MR imaging included T1-mapping and PSIR sequences. Subjects were split 65/15/20% for training, validation, and hold-out testing. The DL models were based on an ImageNet pretrained DenseNet-161 and implemented using PyTorch and fastai. Data augmentation with random rotation and mixup was applied. Categorical cross entropy was used as the loss function with a cyclic learning rate (1e-3). DL models for both sequences were developed separately using similar training parameters. The final model was chosen based on its performance on the validation set. Gradient-weighted class activation maps (Grad-CAMs) visualized the decision-making process of the DL model. Results The DL model achieved a sensitivity, specificity, and accuracy of 100%, 38%, and 88% on PSIR images and 78%, 54%, and 70% on T1-mapping images. Grad-CAMs demonstrated that the DL model focused its attention on myocardium and cardiac pathology when evaluating MR images. Conclusions The developed DL models were able to reliably detect cardiac pathologies on cardiac MR images. The diagnostic performance of T1 mapping alone is particularly of note since it does not require a contrast agent and can be acquired quickly.
... They analyzed a cohort of 200 patients obtaining an accuracy ranging between 78.9%82.1%. [26] Though these results are not sufficient for introduction in daily clinical practice, they represent promising applications that could be further developed with the availability of multi-institutional and larger datasets. ...
... Otha, et al. [26] Convolutional neural network To detect and classify myocardial delayed enhancement pattern Accuracy: 87.2%88.9% ...
Article
Full-text available
Machine learning (ML) is a software solution with the ability of making predictions without prior explicit programming, aiding in the analysis of large amounts of data. These algorithms can be trained through supervised or unsupervised learning. Cardiology is one of the fields of medicine with the highest interest in its applications. They can facilitate every step of patient care, reducing the margin of error and contributing to precision medicine. In particular, ML has been proposed for cardiac imaging applications such as automated computation of scores, differentiation of prognostic phenotypes, quantification of heart function and segmentation of the heart. These tools have also demonstrated the capability of performing early and accurate detection of anomalies in electrocardiographic exams. ML algorithms can also contribute to cardiovascular risk assessment in different settings and perform predictions of cardiovascular events. Another interesting research avenue in this field is represented by genomic assessment of cardiovascular diseases. Therefore, ML could aid in making earlier diagnosis of disease, develop patient-tailored therapies and identify predictive characteristics in different pathologic conditions, leading to precision cardi-ology.
... It is possible to exclude the possibility of ischemia and ascertain whether myocardial dysfunction is present. Accuracy among a sample of 200 patients studied by the researchers varied from 78.9% to 82.1% [31]. Heart segmentation automation is also possible with machine learning. ...
Article
Background The integration of artificial intelligence and machine learning holds great promise for enhancing healthcare institutions and providing fresh perspectives on the origins and advancement of long-term illnesses. In the healthcare sector, artificial intelligence and machine learning are used to address supply and demand concerns, genomic applications, and new advancements in drug development, cancer, and heart disease. Objective The article explores the ways that machine learning, AI, precision medicine, and genomics are changing healthcare. The essay also discusses how AI's examination of various patient data could enhance healthcare institutions, provide fresh insights into chronic conditions, and advance precision medicine. The potential uses of machine learning for genome analysis are also examined in the paper, particularly about genetic biomarker-based disease risk and symptom prediction. Discussion The challenges posed by the phenotype-genotype relationship are examined, as well as the significance of comprehending disease pathways in order to create tailored treatments. Moreover, it offers a streamlined and modularized method that predicts how genotypes affect cell properties using machine-learning models, enabling the development of personalized drugs. The collective feedback highlights the rapid interdisciplinary growth of medical genomics following the completion of the Human Genome Project. It also emphasizes how important genomic data is for improving healthcare outcomes and facilitating personalized medicine. Conclusion The study's conclusions point to a revolutionary shift in healthcare: the application of AI/ML to illness control. Even though these innovations have a lot of potential benefits, problems like algorithm interpretability and ethical issues need to be worked out before they can be successfully incorporated into routine medical practice. Using machine learning in medicine has enormous potential benefits for the biotech industry. Further research, ongoing regulatory frameworks, and collaboration between medical professionals and data analysts are necessary to fully utilize machine learning as well as artificial intelligence in disease management.
... Herein, we used ResNet50 because it exhibited the best diagnostic performance in a certain task. 26,27 Fifth, the time interval between the DXA and CT did not appear to match between the test data set of 87 (14-185) days and the training and validation data set of 49 (9-121) days. Although this difference was not statistically significant, we could not rule out the possibility of affecting the performance of the prediction of BMD and TBS with deep learning in this study. ...
Article
Full-text available
Objectives: We evaluated the feasibility of using deep learning with a convolutional neural network for predicting bone mineral density (BMD) and bone microarchitecture from conventional computed tomography (CT) images acquired by multivendor scanners. Methods: We enrolled 402 patients who underwent noncontrast CT examinations, including L1-L4 vertebrae, and dual-energy x-ray absorptiometry (DXA) examination. Among these, 280 patients (3360 sagittal vertebral images), 70 patients (280 sagittal vertebral images), and 52 patients (208 sagittal vertebral images) were assigned to the training data set for deep learning model development, the validation, and the test data set, respectively. Bone mineral density and the trabecular bone score (TBS), an index of bone microarchitecture, were assessed by DXA. BMDDL and TBSDL were predicted by deep learning with a convolutional neural network (ResNet50). Pearson correlation tests assessed the correlation between BMDDL and BMD, and TBSDL and TBS. The diagnostic performance of BMDDL for osteopenia/osteoporosis and that of TBSDL for bone microarchitecture impairment were evaluated using receiver operating characteristic curve analysis. Results: BMDDL and BMD correlated strongly (r = 0.81, P < 0.01), whereas TBSDL and TBS correlated moderately (r = 0.54, P < 0.01). The sensitivity and specificity of BMDDL for identifying osteopenia or osteoporosis were 93% and 90%, and 100% and 94%, respectively. The sensitivity and specificity of TBSDL for identifying patients with bone microarchitecture impairment were 73% for all values. Conclusions: The BMDDL and TBSDL derived from conventional CT images could identify patients who should undergo DXA, which could be a gatekeeper tool for detecting latent osteoporosis/osteopenia or bone microarchitecture impairment.
... This feature can help distinguish between ischemia and non-ischemic cardiomyopathy and reveal myocardial dysfunction. Researchers investigated a group of 200 patients and found that their accuracy ranged from 78.9% to 82.1 percent [160]. ...
Article
Full-text available
The advancement of precision medicine in medical care has led behind the conventional symptom-driven treatment process by allowing early risk prediction of disease through improved diagnostics and customization of more effective treatments. It is necessary to scrutinize overall patient data alongside broad factors to observe and differentiate between ill and relatively healthy people to take the most appropriate path toward precision medicine, resulting in an improved vision of biological indicators that can signal health changes. Precision and genomic medicine combined with artificial intelligence have the potential to improve patient healthcare. Patients with less common therapeutic responses or unique healthcare demands are using genomic medicine technologies. AI provides insights through advanced computation and inference, enabling the system to reason and learn while enhancing physician decision making. Many cell characteristics, including gene up-regulation, proteins binding to nucleic acids, and splicing, can be measured at high throughput and used as training objectives for predic-tive models. Researchers can create a new era of effective genomic medicine with the improved availability of a broad range of datasets and modern computer techniques such as machine learning. This review article has elucidated the contributions of ML algorithms in precision and genome medicine.
... One of the biggest obstacles to using data for a broader variety of ML applications is that data are usually stored in diverse repositories, which are not readily usable for cardiovascular research, due to various data quality challenges [2]. Where the data are readily available, different ML algorithms have been successfully used, such as Wasserstein generative adversarial networks [21], convolutional NNs [22,23], deep NNs [24], and boosted decision trees [25]. Some authors have tested multiple models, such as RFs, artificial NNs, SVMs, and Bayesian networks [26], or a combination of J48, naive Bayes, KNNs, SVMs, RFs, bagging, and boosting [27]. ...
Article
Full-text available
Background Cardiovascular disorders in general are responsible for 30% of deaths worldwide. Among them, hypertrophic cardiomyopathy (HCM) is a genetic cardiac disease that is present in about 1 of 500 young adults and can cause sudden cardiac death (SCD). Objective Although the current state-of-the-art methods model the risk of SCD for patients, to the best of our knowledge, no methods are available for modeling the patient's clinical status up to 10 years ahead. In this paper, we propose a novel machine learning (ML)-based tool for predicting disease progression for patients diagnosed with HCM in terms of adverse remodeling of the heart during a 10-year period. Methods The method consisted of 6 predictive regression models that independently predict future values of 6 clinical characteristics: left atrial size, left atrial volume, left ventricular ejection fraction, New York Heart Association functional classification, left ventricular internal diastolic diameter, and left ventricular internal systolic diameter. We supplemented each prediction with the explanation that is generated using the Shapley additive explanation method. Results The final experiments showed that predictive error is lower on 5 of the 6 constructed models in comparison to experts (on average, by 0.34) or a consortium of experts (on average, by 0.22). The experiments revealed that semisupervised learning and the artificial data from virtual patients help improve predictive accuracies. The best-performing random forest model improved R2 from 0.3 to 0.6. Conclusions By engaging medical experts to provide interpretation and validation of the results, we determined the models' favorable performance compared to the performance of experts for 5 of 6 targets.
... This feature can help distinguish between ischemia and non-ischemic cardiomyopathy and reveal myocardial dysfunction. Researchers investigated a group of 200 patients and found that their accuracy ranged from 78.9% to 82.1 per cent [161]. ...
Preprint
Full-text available
The advancement of precision medicine in medical care has led behind the conventional symptom-driven treatment process by allowing early risk prediction of disease through improved diagnostics and customization of more effective treatments. It is necessary to scrutinize overall patient data alongside broad factors to observe and differentiate between ill and relatively healthy people to take the most appropriate path toward precision medicine, resulting in an improved vision of biological indicators that can signal health changes. Precision and genomic medicine combined with artificial intelligence have the potential to improve patient healthcare. Patients with less common therapeutic responses or unique healthcare demands are using genomic medicine technologies. AI provides insights through advanced computation and inference, enabling the system to reason and learn while enhancing physician decision-making. Many cell characteristics, including gene up-regulation, proteins binding to nucleic acids, and splicing, can be measured at high throughput and used as training objectives for predictive models. Researchers can create a new era of effective genomic medicine with the improved availability of a broad range of data sets and modern computer techniques such as machine learning. This review article has elucidated the contributions of ML algorithms in precision and genome medicine.
Article
Full-text available
This paper aims to thoroughly discuss the impact of artificial intelligence (AI) on clinical practice in interventional cardiology (IC) with special recognition of its most recent advancements. Thus, recent years have been exceptionally abundant in advancements in computational tools, including the development of AI. The application of AI development is currently in its early stages, nevertheless new technologies have proven to be a promising concept, particularly considering IC showing great impact on patient safety, risk stratification and outcomes during the whole therapeutic process. The primary goal is to achieve the integration of multiple cardiac imaging modalities, establish online decision support systems and platforms based on augmented and/or virtual realities, and finally to create automatic medical systems, providing electronic health data on patients. In a simplified way, two main areas of AI utilization in IC may be distinguished, namely, virtual and physical. 1 Consequently, numerous studies have provided data regarding AI utilization in terms of automated interpretation and analysis from various cardiac modalities, including electrocardiogram, echocardiography, angiography, cardiac magnetic resonance imaging, and computed tomography as well as data collected during robotic-assisted percutaneous coronary intervention procedures. Thus, this paper aims to thoroughly discuss the impact of AI on clinical practice in IC with special recognition of its most recent advancements.
Article
In recent years, cardiovascular diseases (CVDs) have become one of the leading causes of mortality globally. At early stages, CVDs appear with minor symptoms and progressively get worse. The majority of people experience symptoms such as exhaustion, shortness of breath, ankle swelling, fluid retention, and other symptoms when starting CVD. Coronary artery disease (CAD), arrhythmia, cardiomyopathy, congenital heart defect (CHD), mitral regurgitation, and angina are the most common CVDs. Clinical methods such as blood tests, electrocardiography (ECG) signals, and medical imaging are the most effective methods used for the detection of CVDs. Among the diagnostic methods, cardiac magnetic resonance imaging (CMRI) is increasingly used to diagnose, monitor the disease, plan treatment and predict CVDs. Coupled with all the advantages of CMR data, CVDs diagnosis is challenging for physicians as each scan has many slices of data, and the contrast of it might be low. To address these issues, deep learning (DL) techniques have been employed in the diagnosis of CVDs using CMR data, and much research is currently being conducted in this field. This review provides an overview of the studies performed in CVDs detection using CMR images and DL techniques. The introduction section examined CVDs types, diagnostic methods, and the most important medical imaging techniques. The following presents research to detect CVDs using CMR images and the most significant DL methods. Another section discussed the challenges in diagnosing CVDs from CMRI data. Next, the discussion section discusses the results of this review, and future work in CVDs diagnosis from CMR images and DL techniques are outlined. Finally, the most important findings of this study are presented in the conclusion section.
Article
Introduction: Artificial Intelligence-based Medical Devices (AI-based MDs) are experiencing exponential growth in healthcare. This study aimed to investigate whether current studies assessing AI contain the information required for health technology assessment (HTA) by HTA bodies. Methods: We conducted a systematic literature review based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses methodology to extract articles published between 2016 and 2021 related to the assessment of AI-based MDs. Data extraction focused on study characteristics, technology, algorithms, comparators, and results. AI quality assessment and HTA scores were calculated to evaluate whether the items present in the included studies were concordant with the HTA requirements. We performed a linear regression for the HTA and AI scores with the explanatory variables of the impact factor, publication date, and medical specialty. We conducted a univariate analysis of the HTA score and a multivariate analysis of the AI score with an alpha risk of 5 %. Results: Of 5578 retrieved records, 56 were included. The mean AI quality assessment score was 67 %; 32 % of articles had an AI quality score ≥ 70 %, 50 % had a score between 50 % and 70 %, and 18 % had a score under 50 %. The highest quality scores were observed for the study design (82 %) and optimisation (69 %) categories, whereas the scores were lowest in the clinical practice category (23 %). The mean HTA score was 52 % for all seven domains. 100 % of the studies assessed clinical effectiveness, whereas only 9 % evaluated safety, and 20 % evaluated economic issues. There was a statistically significant relationship between the impact factor and the HTA and AI scores (both p = 0.046). Discussion: Clinical studies on AI-based MDs have limitations and often lack adapted, robust, and complete evidence. High-quality datasets are also required because the output data can only be trusted if the inputs are reliable. The existing assessment frameworks are not specifically designed to assess AI-based MDs. From the perspective of regulatory authorities, we suggest that these frameworks should be adapted to assess the interpretability, explainability, cybersecurity, and safety of ongoing updates. From the perspective of HTA agencies, we highlight that transparency, professional and patient acceptance, ethical issues, and organizational changes are required for the implementation of these devices. Economic assessments of AI should rely on a robust methodology (business impact or health economic models) to provide decision-makers with more reliable evidence. Conclusion: Currently, AI studies are insufficient to cover HTA prerequisites. HTA processes also need to be adapted because they do not consider the important specificities of AI-based MDs. Specific HTA workflows and accurate assessment tools should be designed to standardise evaluations, generate reliable evidence, and create confidence.
Article
Objectives To determine the optimal inversion time (TI) from Look-Locker scout images using a convolutional neural network (CNN) and to investigate the feasibility of correcting TI using a smartphone.Methods In this retrospective study, TI-scout images were extracted using a Look-Locker approach from 1113 consecutive cardiac MR examinations performed between 2017 and 2020 with myocardial late gadolinium enhancement. Reference TI null points were independently determined visually by an experienced radiologist and an experienced cardiologist, and quantitatively measured. A CNN was developed to evaluate deviation of TI from the null point and then implemented in PC and smartphone applications. Images on 4 K or 3-megapixel monitors were captured by a smartphone, and CNN performance on each monitor was determined. Optimal, undercorrection, and overcorrection rates using deep learning on the PC and smartphone were calculated. For patient analysis, TI category differences in pre- and post-correction were evaluated using the TI null point used in late gadolinium enhancement imaging.ResultsFor PC, 96.4% (772/749) of images were classified as optimal, with under- and overcorrection rates of 1.2% (9/749) and 2.4% (18/749), respectively. For 4 K images, 93.5% (700/749) of images were classified as optimal, with under- and overcorrection rates of 3.9% (29/749) and 2.7% (20/749), respectively. For 3-megapixel images, 89.6% (671/749) of images were classified as optimal, with under- and overcorrection rates of 3.3% (25/749) and 7.0% (53/749), respectively. On patient-based evaluations, subjects classified as within optimal range increased from 72.0% (77/107) to 91.6% (98/107) using the CNN.Conclusions Optimizing TI on Look-Locker images was feasible using deep learning and a smartphone.Key Points • A deep learning model corrected TI-scout images to within optimal null point for LGE imaging. • By capturing the TI-scout image on the monitor with a smartphone, the deviation of the TI from the null point can be immediately determined. • Using this model, TI null points can be set to the same degree as that by an experienced radiological technologist.
Preprint
Full-text available
In recent years, cardiovascular diseases (CVDs) have become one of the leading causes of mortality globally. CVDs appear with minor symptoms and progressively get worse. The majority of people experience symptoms such as exhaustion, shortness of breath, ankle swelling, fluid retention, and other symptoms when starting CVD. Coronary artery disease (CAD), arrhythmia, cardiomyopathy, congenital heart defect (CHD), mitral regurgitation, and angina are the most common CVDs. Clinical methods such as blood tests, electrocardiography (ECG) signals, and medical imaging are the most effective methods used for the detection of CVDs. Among the diagnostic methods, cardiac magnetic resonance imaging (CMR) is increasingly used to diagnose, monitor the disease, plan treatment and predict CVDs. Coupled with all the advantages of CMR data, CVDs diagnosis is challenging for physicians due to many slices of data, low contrast, etc. To address these issues, deep learning (DL) techniques have been employed to the diagnosis of CVDs using CMR data, and much research is currently being conducted in this field. This review provides an overview of the studies performed in CVDs detection using CMR images and DL techniques. The introduction section examined CVDs types, diagnostic methods, and the most important medical imaging techniques. In the following, investigations to detect CVDs using CMR images and the most significant DL methods are presented. Another section discussed the challenges in diagnosing CVDs from CMR data. Next, the discussion section discusses the results of this review, and future work in CVDs diagnosis from CMR images and DL techniques are outlined. The most important findings of this study are presented in the conclusion section.
Chapter
Myocarditis is a cardiovascular disease caused by infectious agents, especially viruses. Compared to other cardiovascular diseases, myocarditis is very rare, occurring mainly due to chest pain or heart failure. Cardiac magnetic resonance (CMR) imaging is a popular technique for diagnosis of myocarditis. Factors such as low contrast, different noises, and high CMR slices of each patient cause many challenges when diagnosing myocarditis by specialist physicians. Therefore, it is necessary to introduce new artificial intelligence (AI) techniques for diagnosis of myocarditis from CMR images. This paper presents a new method to detect myocarditis in CMR images using deep learning (DL) models. First, the Z-Alizadeh Sani myocarditis dataset was used for simulations, which included CMR images of normal subjects and myocardial infarction patients. Next, preprocessing is performed on CMR images. CMR images are created with the help of the cycle generative adversarial network (GAN) model at this step. Finally, pretrained models including EfficientNet B3, EfficientNet V2, HrNet, ResNetrs50, ResNest50d, and ResNet 50d have been used to classify the input data. Among pretrained methods, the EfficientNet V2 model has achieved 99.33% accuracy.KeywordsMyocarditisDiagnosisDeep learningPretrainedCycle GAN
Conference Paper
The purpose of this paper is to present a method to predict the radio frequency (RF) induced heating for passive implantable medical devices under magnetic resonance imaging (MRI) using a convolutional neural network (CNN). A total of 576 generic solid plate devices are constructed as study examples. Numerical simulations were conducted at both 1.5 T and 3 T using a full-wave electromagnetic solver based on the finite-difference time-domain (FDTD) method to simulate the RF-induced heating for the solid devices in the ASTM phantom. Then the solid plate devices are characterized by using three-dimensional (3D) point cloud data (PCD) representations and used as the input of CNN. The extracted RF-induced heating from the numerical simulation, in terms of peak 10 gram (g) averaged specific absorption rate (psSAR 10g ), is related to the 3D PCD by using a CNN. Seventy percent of the configurations and the corresponding psSAR 10g from simulation results were randomly selected and used as the training set of the CNN, while the residuals were used as the test set. The results have shown the test error under 1.5 T system was very small with a mean absolute error which was less than 2.56 W/kg with a mean psSAR 10g of 34.52 W/kg. The test error under 3 T system was smaller than that from the 1.5 T system with a mean absolute error which was less than 1.14 W/kg.
Article
Full-text available
Importance Application of deep learning algorithms to whole-slide pathology images can potentially improve diagnostic accuracy and efficiency. Objective Assess the performance of automated deep learning algorithms at detecting metastases in hematoxylin and eosin–stained tissue sections of lymph nodes of women with breast cancer and compare it with pathologists’ diagnoses in a diagnostic setting. Design, Setting, and Participants Researcher challenge competition (CAMELYON16) to develop automated solutions for detecting lymph node metastases (November 2015-November 2016). A training data set of whole-slide images from 2 centers in the Netherlands with (n = 110) and without (n = 160) nodal metastases verified by immunohistochemical staining were provided to challenge participants to build algorithms. Algorithm performance was evaluated in an independent test set of 129 whole-slide images (49 with and 80 without metastases). The same test set of corresponding glass slides was also evaluated by a panel of 11 pathologists with time constraint (WTC) from the Netherlands to ascertain likelihood of nodal metastases for each slide in a flexible 2-hour session, simulating routine pathology workflow, and by 1 pathologist without time constraint (WOTC). Exposures Deep learning algorithms submitted as part of a challenge competition or pathologist interpretation. Main Outcomes and Measures The presence of specific metastatic foci and the absence vs presence of lymph node metastasis in a slide or image using receiver operating characteristic curve analysis. The 11 pathologists participating in the simulation exercise rated their diagnostic confidence as definitely normal, probably normal, equivocal, probably tumor, or definitely tumor. Results The area under the receiver operating characteristic curve (AUC) for the algorithms ranged from 0.556 to 0.994. The top-performing algorithm achieved a lesion-level, true-positive fraction comparable with that of the pathologist WOTC (72.4% [95% CI, 64.3%-80.4%]) at a mean of 0.0125 false-positives per normal whole-slide image. For the whole-slide image classification task, the best algorithm (AUC, 0.994 [95% CI, 0.983-0.999]) performed significantly better than the pathologists WTC in a diagnostic simulation (mean AUC, 0.810 [range, 0.738-0.884]; P < .001). The top 5 algorithms had a mean AUC that was comparable with the pathologist interpreting the slides in the absence of time constraints (mean AUC, 0.960 [range, 0.923-0.994] for the top 5 algorithms vs 0.966 [95% CI, 0.927-0.998] for the pathologist WOTC). Conclusions and Relevance In the setting of a challenge competition, some deep learning algorithms achieved better diagnostic performance than a panel of 11 pathologists participating in a simulation exercise designed to mimic routine pathology workflow; algorithm performance was comparable with an expert pathologist interpreting whole-slide images without time constraints. Whether this approach has clinical utility will require evaluation in a clinical setting.
Article
Full-text available
Remarkable progress has been made in image recognition, primarily due to the availability of large-scale annotated datasets and the revival of deep CNN. CNNs enable learning data-driven, highly representative, layered hierarchical image features from sufficient training data. However, obtaining datasets as comprehensively annotated as ImageNet in the medical imaging domain remains a challenge. There are currently three major techniques that successfully employ CNNs to medical image classification: training the CNN from scratch, using off-the-shelf pre-trained CNN features, and conducting unsupervised CNN pre-training with supervised fine-tuning. Another effective method is transfer learning, i.e., fine-tuning CNN models pre-trained from natural image dataset to medical image tasks. In this paper, we exploit three important, but previously understudied factors of employing deep convolutional neural networks to computer-aided detection problems. We first explore and evaluate different CNN architectures. The studied models contain 5 thousand to 160 million parameters, and vary in numbers of layers. We then evaluate the influence of dataset scale and spatial image context on performance. Finally, we examine when and why transfer learning from pre-trained ImageNet (via fine-tuning) can be useful. We study two specific computer-aided detection (CADe) problems, namely thoraco-abdominal lymph node (LN) detection and interstitial lung disease (ILD) classification. We achieve the state-of-the-art performance on the mediastinal LN detection, with 85% sensitivity at 3 false positive per patient, and report the first five-fold cross-validation classification results on predicting axial CT slices with ILD categories. Our extensive empirical evaluation, CNN model analysis and valuable insights can be extended to the design of high performance CAD systems for other medical imaging tasks.
Article
Full-text available
Lung cancer has a poor prognosis when not diagnosed early and unresectable lesions are present. The management of small lung nodules noted on computed tomography scan is controversial due to uncertain tumor characteristics. A conventional computer-aided diagnosis (CAD) scheme requires several image processing and pattern recognition steps to accomplish a quantitative tumor differentiation result. In such an ad hoc image analysis pipeline, every step depends heavily on the performance of the previous step. Accordingly, tuning of classification performance in a conventional CAD scheme is very complicated and arduous. Deep learning techniques, on the other hand, have the intrinsic advantage of an automatic exploitation feature and tuning of performance in a seamless fashion. In this study, we attempted to simplify the image analysis pipeline of conventional CAD with deep learning techniques. Specifically, we introduced models of a deep belief network and a convolutional neural network in the context of nodule classification in computed tomography images. Two baseline methods with feature computing steps were implemented for comparison. The experimental results suggest that deep learning methods could achieve better discriminative results and hold promise in the CAD application domain.
Article
Full-text available
Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.
Article
Full-text available
With mounting data on its accuracy and prognostic value, cardiovascular magnetic resonance (CMR) is becoming an increasingly important diagnostic tool with growing utility in clinical routine. Given its versatility and wide range of quantitative parameters, however, agreement on specific standards for the interpretation and post-processing of CMR studies is required to ensure consistent quality and reproducibility of CMR reports. This document addresses this need by providing consensus recommendations developed by the Task Force for Post Processing of the Society for Cardiovascular MR (SCMR). The aim of the task force is to recommend requirements and standards for image interpretation and post processing enabling qualitative and quantitative evaluation of CMR images. Furthermore, pitfalls of CMR image analysis are discussed where appropriate.
Article
Full-text available
Although there are many commercially available statistical software packages, only a few implement a competing risk analysis or a proportional hazards regression model with time-dependent covariates, which are necessary in studies on hematopoietic SCT. In addition, most packages are not clinician friendly, as they require that commands be written based on statistical languages. This report describes the statistical software 'EZR' (Easy R), which is based on R and R commander. EZR enables the application of statistical functions that are frequently used in clinical studies, such as survival analyses, including competing risk analyses and the use of time-dependent covariates, receiver operating characteristics analyses, meta-analyses, sample size calculation and so on, by point-and-click access. EZR is freely available on our website (http://www.jichi.ac.jp/saitama-sct/SaitamaHP.files/statmed.html) and runs on both Windows (Microsoft Corporation, USA) and Mac OS X (Apple, USA). This report provides instructions for the installation and operation of EZR.Bone Marrow Transplantation advance online publication, 3 December 2012; doi:10.1038/bmt.2012.244.
Chapter
Full-text available
The generalization performance of a learning method relates to its prediction capability on independent test data. Assessment of this performance is extremely important in practice, since it guides the choice of learning method or model, and gives us a measure of the quality of the ultimately chosen model.
Article
Full-text available
Non-ischaemic cardiomyopathies (NICMs) are chronic, progressive myocardial diseases with distinct patterns of morphological, functional, and electrophysiological changes. In the setting of cardiomyopathy (CM), determining the exact aetiology is important because the aetiology is directly related to treatment and patient survival. Determining the exact aetiology, however, can be difficult using currently available imaging techniques, such as echocardiography, radionuclide imaging or X-ray coronary angiography, since overlap of features between CMs may be encountered. Cardiovascular magnetic resonance (CMR) imaging has recently emerged as a new non-invasive imaging modality capable of providing high-resolution images of the heart in any desired plane. Delayed contrast enhanced CMR (DE-CMR) can be used for non-invasive tissue characterization and may hold promise in differentiating ischaemic from NICMs, as the typical pattern of hyperenhancement can be classified as ‘ischaemic-type’ or ‘non-ischaemic type’ on the basis of pathophysiology of ischaemia. This article reviews the potential of DE-CMR to distinguish between ischaemic and NICM as well as to differentiate non-ischaemic aetiologies. Rather than simply describing various hyperenhancement patterns that may occur in different disease states, our goal will be (i) to provide an overall imaging approach for the diagnosis of CM and (ii) to demonstrate how this approach is based on the underlying relationships between contrast enhancement and myocardial pathophysiology.
Article
Purpose To compare the performance of a deep-learning bone age assessment model based on hand radiographs with that of expert radiologists and that of existing automated models. Materials and Methods The institutional review board approved the study. A total of 14 036 clinical hand radiographs and corresponding reports were obtained from two children's hospitals to train and validate the model. For the first test set, composed of 200 examinations, the mean of bone age estimates from the clinical report and three additional human reviewers was used as the reference standard. Overall model performance was assessed by comparing the root mean square (RMS) and mean absolute difference (MAD) between the model estimates and the reference standard bone ages. Ninety-five percent limits of agreement were calculated in a pairwise fashion for all reviewers and the model. The RMS of a second test set composed of 1377 examinations from the publicly available Digital Hand Atlas was compared with published reports of an existing automated model. Results The mean difference between bone age estimates of the model and of the reviewers was 0 years, with a mean RMS and MAD of 0.63 and 0.50 years, respectively. The estimates of the model, the clinical report, and the three reviewers were within the 95% limits of agreement. RMS for the Digital Hand Atlas data set was 0.73 years, compared with 0.61 years of a previously reported model. Conclusion A deep-learning convolutional neural network model can estimate skeletal maturity with accuracy similar to that of an expert radiologist and to that of existing automated models. (©) RSNA, 2017.
Article
Purpose: To investigate diagnostic performance by using a deep learning method with a convolutional neural network (CNN) for the differentiation of liver masses at dynamic contrast agent-enhanced computed tomography (CT). Materials and This clinical retrospective study used CT image sets of Methods: liver masses over three phases (noncontrast-agent enhanced, arterial, and delayed). Masses were diagnosed according to five categories (category A, classic hepatocellular carcinomas [HCCs]; category B, malignant liver tumors other than classic and early HCCs; category C, indeterminate masses or mass-like lesions [including early HCCs and dysplastic nodules] and rare benign liver masses other than hemangiomas and cysts; category D, hemangiomas; and category E, cysts). Supervised training was performed by using 55536 image sets obtained in 2013 (from 460 patients, 1068 sets were obtained and they were augmented by a factor of 52 [rotated, parallel-shifted, strongly enlarged, and noise-added images were generated from the original images]). The CNN was composed of six convolutional, three maximum pooling, and three fully connected layers. The CNN was tested with 100 liver mass image sets obtained in 2016 (74 men and 26 women; mean age, 66.4 years 6 10.6 [standard deviation]; mean mass size, 26.9 mm 6 25.9; 21, nine, 35, 20, and 15 liver masses for categories A, B, C, D, and E, respectively). Training and testing were performed five times. Accuracy for categorizing liver masses with CNN model and the area under receiver operating characteristic curve for differentiating categories A-B versus categories C-E were calculated. Results: Median accuracy of differential diagnosis of liver masses for test data were 0.84. Median area under the receiver operating characteristic curve for differentiating categories A-B from C-E was 0.92. Conclusion: Deep learning with CNN showed high diagnostic performance in differentiation of liver masses at dynamic CT.
Book
We present a state-of-the-art image recognition system, Deep Image, developed using end-to-end deep learning. The key components are a custom-built supercomputer dedicated to deep learning, a highly optimized parallel algorithm using new strategies for data partitioning and communication, larger deep neural network models, novel data augmentation approaches, and usage of multi-scale high-resolution images. On one of the most challenging computer vision benchmarks, the ImageNet classification challenge, our system has achieved the best result to date, with a top-5 error rate of 5.33%, a relative 20.0% improvement over the previous best result.
Conference Paper
Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on our PReLU networks (PReLU-nets), we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66%). To our knowledge, our result is the first to surpass human-level performance (5.1%, Russakovsky et al.) on this visual recognition challenge.
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry
Article
Purpose To evaluate the efficacy of deep convolutional neural networks (DCNNs) for detecting tuberculosis (TB) on chest radiographs. Materials and Methods Four deidentified HIPAA-compliant datasets were used in this study that were exempted from review by the institutional review board, which consisted of 1007 posteroanterior chest radiographs. The datasets were split into training (68.0%), validation (17.1%), and test (14.9%). Two different DCNNs, AlexNet and GoogLeNet, were used to classify the images as having manifestations of pulmonary TB or as healthy. Both untrained and pretrained networks on ImageNet were used, and augmentation with multiple preprocessing techniques. Ensembles were performed on the best-performing algorithms. For cases where the classifiers were in disagreement, an independent board-certified cardiothoracic radiologist blindly interpreted the images to evaluate a potential radiologist-augmented workflow. Receiver operating characteristic curves and areas under the curve (AUCs) were used to assess model performance by using the DeLong method for statistical comparison of receiver operating characteristic curves. Results The best-performing classifier had an AUC of 0.99, which was an ensemble of the AlexNet and GoogLeNet DCNNs. The AUCs of the pretrained models were greater than that of the untrained models (P < .001). Augmenting the dataset further increased accuracy (P values for AlexNet and GoogLeNet were .03 and .02, respectively). The DCNNs had disagreement in 13 of the 150 test cases, which were blindly reviewed by a cardiothoracic radiologist, who correctly interpreted all 13 cases (100%). This radiologist-augmented approach resulted in a sensitivity of 97.3% and specificity 100%. Conclusion Deep learning with DCNNs can accurately classify TB at chest radiography with an AUC of 0.99. A radiologist-augmented approach for cases where there was disagreement among the classifiers further improved accuracy. (©) RSNA, 2017.
Conference Paper
In this work, we examine the strength of deep learning approaches for pathology detection in chest radiograph data. Convolutional neural networks (CNN) deep architecture classification approaches have gained popularity due to their ability to learn mid and high level image representations. We explore the ability of a CNN to identify different types of pathologies in chest x-ray images. Moreover, since very large training sets are generally not available in the medical domain, we explore the feasibility of using a deep learning approach based on non-medical learning. We tested our algorithm on a dataset of 93 images. We use a CNN that was trained with ImageNet, a well-known large scale nonmedical image database. The best performance was achieved using a combination of features extracted from the CNN and a set of low-level features. We obtained an area under curve (AUC) of 0.93 for Right Pleural Effusion detection, 0.89 for Enlarged heart detection and 0.79 for classification between healthy and abnormal chest x-ray, where all pathologies are combined into one large class. This is a first-of-its-kind experiment that shows that deep learning with large scale non-medical image databases may be sufficient for general medical image recognition tasks.
Article
For interval estimation of a proportion, coverage probabilities tend to be too large for "exact" confidence intervals based on inverting the binomial test and too small for the interval based on inverting the Wald large-sample normal test (i.e., sample proportion ± z-score × estimated standard error). Wilson's suggestion of inverting the related score test with null rather than estimated standard error yields coverage probabilities close to nominal confidence levels, even for very small sample sizes. The 95% score interval has similar behavior as the adjusted Wald interval obtained after adding two "successes" and two "failures" to the sample. In elementary courses, with the score and adjusted Wald methods it is unnecessary to provide students with awkward sample size guidelines.
Article
Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch}. Batch Normalization allows us to use much higher learning rates and be less careful about initialization. It also acts as a regularizer, in some cases eliminating the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.9% top-5 validation error (and 4.8% test error), exceeding the accuracy of human raters.
Article
Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on our PReLU networks (PReLU-nets), we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66%). To our knowledge, our result is the first to surpass human-level performance (5.1%, Russakovsky et al.) on this visual recognition challenge.
Article
This review provided a conceptual framework of sample size calculations in the studies of diagnostic test accuracy in various conditions and test outcomes. The formulae of sample size calculations for estimation of adequate sensitivity/specificity, likelihood ratio and AUC as an overall index of accuracy and also for testing in single modality and comparing two diagnostic tasks have been presented for desired confidence interval. The required sample sizes were calculated and tabulated with different levels of accuracies and marginal errors with 95% confidence level for estimating and for various effect sizes with 80% power for purpose of testing as well. The results show how sample size is varied with accuracy index and effect size of interest. This would help the clinicians when designing diagnostic test studies that an adequate sample size is chosen based on statistical principles in order to guarantee the reliability of study.
Article
Although acute myocardial infarction (AMI) is still one of the main causes of high morbidity in Western countries, the rate of mortality has decreased significantly. The main cause of this drop appears to be the decline of the incidence of ST-segment elevation myocardial infarction (STEMI) along with an absolute reduction in case fatality rate once STEMI has occurred. Myocardial ischaemia progresses with the duration of coronary occlusion and the delay in time to reperfusion determines the extent of irreversibile necrosis from subendocarial layers towards the epicardium in accordance with the so-called 'wave-front phenomenon'. Coronary artery recanalization, either by thrombolitic therapy or primary percutaneous intervention, may prevent myocardial cell necrosis increasing salvage of damaged, but still viable, myocardium within the area at risk. Magnetic resonance imaging (MRI) can provide a wide range of clinically useful information in AMI by detecting not only location of transmural necrosis, infarct size and myocardial oedema, but also showing in vivo important microvascular pathophysiological processes associated with AMI in the reperfusion era, such as intramyocardial haemorrhage and no-reflow. The focus of this review will be on the impact of cardiac MRI in the characterization of AMI pathophysiology in vivo in the current reperfusion era, concentrating also on clinical applications and future perspectives for specific therapeutic strategies.
Article
This paper presents a general statistical methodology for the analysis of multivariate categorical data arising from observer reliability studies. The procedure essentially involves the construction of functions of the observed proportions which are directed at the extent to which the observers agree among themselves and the construction of test statistics for hypotheses involving these functions. Tests for interobserver bias are presented in terms of first-order marginal homogeneity and measures of interobserver agreement are developed as generalized kappa-type statistics. These procedures are illustrated with a clinical diagnosis example from the epidemiological literature.
Article
Statistical measures are described that are used in diagnostic imaging for expressing observer agreement in regard to categorical data. The measures are used to characterize the reliability of imaging methods and the reproducibility of disease classifications and, occasionally with great care, as the surrogate for accuracy. The review concentrates on the chance-corrected indices, kappa and weighted kappa. Examples from the imaging literature illustrate the method of calculation and the effects of both disease prevalence and the number of rating categories. Other measures of agreement that are used less frequently, including multiple-rater kappa, are referenced and described briefly.
Article
Sensitivity and specificity are the basic measures of accuracy of a diagnostic test; however, they depend on the cut point used to define "positive" and "negative" test results. As the cut point shifts, sensitivity and specificity shift. The receiver operating characteristic (ROC) curve is a plot of the sensitivity of a test versus its false-positive rate for all possible cut points. The advantages of the ROC curve as a means of defining the accuracy of a test, construction of the ROC, and identification of the optimal cut point on the ROC curve are discussed. Several summary measures of the accuracy of a test, including the commonly used percentage of correct diagnoses and area under the ROC curve, are described and compared. Two examples of ROC curve application in radiologic research are presented.