Figure 3 - uploaded by Hieu H. Pham
Content may be subject to copyright.
An example of a tree of 4 diseases: A, B, C, and D.

An example of a tree of 4 diseases: A, B, C, and D.

Source publication
Article
Full-text available
Chest radiography is one of the most common types of diagnostic radiology exams, which is critical for screening and diagnosis of many different thoracic diseases. Specialized algorithms have been developed to detect several specific pathologies such as lung nodules or lung cancer. However, accurately detecting the presence of multiple diseases fro...

Similar publications

Article
Full-text available
The airborne disease is a severe disease in the world that spreads exponentially. An Immunochromatography Test(IT) and Biological Aerosol Particles (BAP) test with a disposable instrument that will be recorded in the literature is a standard diagnostic procedure for respiratory infections, such as influenza and TB, in the investigation causes of su...

Citations

... Explainable AI (XAI) models are being used more, especially in safety-critical applications such as automatic medical diagnosis [1,2,3]. An explanation of a decision should be understandable for humans [4], and include objects or features that are responsible for that decision made by a model, i.e., faithful to the model decision [5,6,7]. ...
Preprint
Full-text available
In recent years, model explanation methods have been designed to interpret model decisions faithfully and intuitively so that users can easily understand them. In this paper, we propose a framework, Faithful Attention Explainer (FAE), capable of generating faithful textual explanations regarding the attended-to features. Towards this goal, we deploy an attention module that takes the visual feature maps from the classifier for sentence generation. Furthermore, our method successfully learns the association between features and words, which allows a novel attention enforcement module for attention explanation. Our model achieves promising performance in caption quality metrics and a faithful decision-relevance metric on two datasets (CUB and ACT-X). In addition, we show that FAE can interpret gaze-based human attention, as human gaze indicates the discriminative features that humans use for decision-making, demonstrating the potential of deploying human gaze for advanced human-AI interaction.
... Chest X-rays are used by radiologists to identify and identify a widespread range of other illnesses in addition to diseases. These ailments include fractures, infiltrations, atelectasis, pericarditis, bronchitis, and several more (Pham et al. 2021). ...
... This can be useful when there is a lack of annotated data available for the task at hand, as it allows the model to leverage the knowledge it has already learned in the source domain. Transfer learning has been shown to be effective in a range of medical tasks, including lung disease classification, and can help to reduce the time and effort required to train a model from scratch [47,[58][59][60][61][62]. For instance, a study by Wang et al. [4] used a ResNet model pre-trained on the ImageNet dataset and finetuned it on a chest X-ray dataset for the classification of lung diseases. ...
... The use of an ensemble, which refers to the combination of multiple models during classification, is another technique that can be employed to improve the performance of a deep learning model. By combining the predictions of multiple models, it is possible to achieve better performance and reduce overfitting [47,59,60,62]. Ensemble techniques have been applied to a variety of deep learning architectures, including AlexNet [60,63,64], VGGNet [63][64][65][66][67][68][69][70], GoogleNet [37,60,63,64,71], MobileNet [71][72][73][74], ResNet [4,11,14,41,50,64,[74][75][76][77][78], and DenseNet [5,6,12,15,38,[78][79][80][81][82][83][84][85][86], with promising results in the task of lung disease classification. ...
... A study by Islam et al. [59] used an ensemble of five different CNN models for the classification of lung cancer and achieved an accuracy of 95.6%. Similarly, a study by Pham et al. [62] used an ensemble of three CNN models for the classification of COVID-19 and achieved an accuracy of 97.9%. ...
Article
Full-text available
The purpose of this survey is to provide a comprehensive review of the most recent publications on lung disease classification from chest X-ray images using deep learning algorithms. Methods: This research aims to present several common chest radiography datasets and to introduce briefly the general image preprocessing procedures that are applied to chest X-ray images. Then, the classification of specific and multiple lung diseases is described, focusing on the method and dataset used in the selected studies, the evaluation measures and the results. In addition, the problems and future direction of lung diseases classification are discussed to provide an important research base for researchers in the future. As the most common examination tool, Chest X-ray (CXR) is crucial in the medical field for disease diagnosis. Thus, the classification of chest diseases based on chest X-ray has gained significant attention from researchers. In recent years, deep learning methods have been used and have emerged as powerful techniques in medical imaging fields. One hundred ten articles published from 2016 to 2023 were reviewed and summarized, confirming that this particular research area is very important and has great potential for future research.
... In Table 2 we show results for = 5 and observe that GAN-DALF gives the best performance as per the lowest Ranking loss and highest LRAP values. There are clear improvements over the baseline DenseNet-121, and the results of GESTALT (Mahapatra et al., 2022a;Pham et al., 2020), which is the second-best ranked method for the CheXpert dataset. This demonstrates that our approach of using graph transformers identifies more informative multi-label samples because of the self-attention module of transformers. ...
... Numerous techniques have been proposed to enhance the resilience of medical image classifiers in the presence of noisy labels. These methods encompass a variety of strategies, including the utilization of label smoothing (Pham et al., 2021) for conditions like thoracic diseases, pancreatic and skin cancers, breast tumors, and retinal diseases, as well as architectural modifications like the incorporation of a noise layer (Dgani et al., 2018). Additionally, some approaches involve sample re-weighting techniques (Le et al., 2019;Xue et al., 2019), uncertainty-based methodologies (Ju et al., 2022), and various mathematical techniques such as PCA, low-rank representation, graph regularization, among others (Ying et al., 2023). ...
Preprint
Full-text available
Noisy labels can significantly impact medical image classification, particularly in deep learning , by corrupting learned features. Self-supervised pretraining, which doesn't rely on labeled data, can enhance robustness against noisy labels. However, this robustness varies based on factors like the number of classes, dataset complexity, and training size. In medical images, subtle inter-class differences and modality-specific characteristics add complexity. Previous research hasn't comprehensively explored the interplay between self-supervised learning and robustness against noisy labels in medical image classification, considering all these factors. In this study, we address three key questions: i) How does label noise impact various medical image classification datasets? ii) Which types of medical image datasets are more challenging to learn and more affected by label noise? iii) How do different self-supervised pretraining methods enhance robustness across various medical image datasets? Our results show that DermNet, among five datasets (Fetal plane, DermNet, COVID-DU-Ex, MURA, NCT-CRC-HE-100K), is the most challenging but exhibits greater robustness against noisy labels. Additionally, contrastive learning stands out among the eight self-supervised methods as the most effective approach to enhance robustness against noisy labels .
... A chest X-Ray provides images of the entire chest, including the lungs, heart, blood vessels, airways, chest, and spinal bones [1,2]. In the current period, X-Rays are one of the most common and widely used radiographic methods for identifying chest disorders [3]. Without causing much invasiveness, this method could help physicians observe the body's interior. ...
Article
Full-text available
The existing multilabel X-Ray image learning tasks generally contain much information on pathology co-occurrence and interdependency, which is very important for clinical diagnosis. However, the challenging part of this subject is to accurately diagnose multiple diseases that occurred in a single X-Ray image since multiple levels of features are generated in the images, and create different features as in single label detection. Various works were developed to address this challenge with proposed deep learning architectures to improve classification performance and enrich diagnosis results with multi-probability disease detection. The objective is to create an accurate result and a faster inference system to support a quick diagnosis in the medical system. To contribute to this state-of-the-art, we designed a fusion architecture, CheXNet and Feature Pyramid Network (FPN), to classify and discriminate multiple thoracic diseases from chest X-Rays. This concept enables the model to extract while creating a pyramid of feature maps with different spatial resolutions that capture low-level and high-level semantic information to encounter multiple features. The model’s effectiveness is evaluated using the NIH ChestXray14 dataset, with the Area Under Curve (AUC) and accuracy metrics used to compare the results against other cutting-edge approaches. The overall results demonstrate that our method outperforms other approaches and has become promising for multilabel disease classification in chest X-Rays, with potential applications in clinical practice. The result demonstrated that we achieved an average AUC of 0.846 and an accuracy of 0.914. Further, our proposed architecture diagnoses images in 0.013 s, faster than the latest approaches.
... To this end, we have trained 28 different models for a medical image interpretation task (chest Xray (CXR) classification) and study their robustness under common image perturbations. Specifically, the selected 28 models cover most popular ImageNet models such as ResNet models [9], DenseNet models [10] that are widely used in CXR classification [1], [21], [23], [30], and recently proposed EfficientNetV2 models [11] (see Table I for all 28 models). The selected models are pretrained on ImageNet, so they have different ImageNet Top-1 accuracies. ...
... Each single model is fine-tuned on the CheXpert training set for 10 epochs with all parameters unfrozen. The uncertain labels in the training set are handled by label smoothing regularization [30]. That is, substituting the uncertain labels by random scalars generated from a uniform distribution U [0.55,0.85] . ...
Conference Paper
The robustness of medical image interpretation deep learning models to common image perturbations is crucial, as the medical images in clinical applications may be from different institutions and contain various perturbations that did not appear in training data, decreasing the interpretation performance. In this paper, we investigate the correlations of the robustness of 28 ImageNet models under 6 image perturbation types over 10 severity levels on the CheXpert chest X-ray (CXR) classification dataset. The results demonstrate that: (1) If a model has a higher ImageNet accuracy, after fine-tuning it on CheXpert for CXR classification, it tends to be more robust on perturbed CXRs. (2) If a model has a higher CXR classification performance after fine-tuning on CheXpert, it is not necessarily more robust on perturbed CXRs, depending on the severity levels of the perturbations. Under stronger perturbations, lower CXR performance models tend to be more robust instead. (3) The model architectures may be a key factor to the robustness. For instance, no matter how large the models are, EfficientNet and EfficientNetV2 models tend to be more robust, while ResNet models tend to be more vulnerable. Our work can help select or design robust models for medical image interpretation to improve the capability for clinical applications.
... Baseline is trained on the training set for 10 epochs. We follow the previous state-of-the-art [31,44] to treat unknown labels as negative (Negative mode) and treat uncertain labels as positive with label smoothing [31]. Images are rescaled to [0, 1]. ...
... Baseline is trained on the training set for 10 epochs. We follow the previous state-of-the-art [31,44] to treat unknown labels as negative (Negative mode) and treat uncertain labels as positive with label smoothing [31]. Images are rescaled to [0, 1]. ...
... Baseline achieves mAUC 89.6% on the validation set (as reported in Table 1) which is already very high for a single CNN. E.g., the single CNN of 2 nd place on the competition leaderboard achieves mAUC 89.4% [31]. ...
Chapter
Image multi-label classification datasets are often partially labeled (for each sample, only the labels on some categories are known). One popular solution for training convolutional neural networks is treating all unknown labels as negative labels, named Negative mode. But it produces wrong labels unevenly over categories, decreasing the binary classification performance on different categories to varying degrees. On the other hand, although Ignore mode that ignores the contributions of unknown labels may be less effective than Negative mode, it ensures the data have no additional wrong labels, which is what Negative mode lacks. In this paper, we propose Category-wise Fine-Tuning (CFT), a new post-training method that can be applied to a model trained with Negative mode to improve its performance on each category independently. Specifically, CFT uses Ignore mode to one-by-one fine-tune the logistic regressions (LRs) in the classification layer. The use of Ignore mode reduces the performance decreases caused by the wrong labels of Negative mode during training. Particularly, Genetic Algorithm (GA) and binary crossentropy are used in CFT for fine-tuning the LRs. The effectiveness of our methods was evaluated on the CheXpert competition dataset and achieves state-of-the-art results, to our knowledge. A single model submitted to the competition server for the official evaluation achieves mAUC 91.82% on the test set, which is the highest single model score in the leaderboard and literature. Moreover, our ensemble achieves mAUC 93.33% (The competition was recently closed. We evaluate the ensemble on a local machine after the test set is released and can be downloaded.) on the test set, superior to the best in the leaderboard and literature (93.05%). Besides, the effectiveness of our methods is also evaluated on the partially labeled versions of the MS-COCO dataset.
... A recent review evaluates relevant methods in medical imaging [10]. This includes a method used in chest X-rays based on label smoothing to prevent overconfident predictions on training samples that contain mislabeled data [11]. Prior work on noisy labels captures the label noise of individual annotators [12]. ...
... After training, the label distribution of the panel of experts can be modeled using multiple forward passes through the classifier p θ (y|x, z), conditioned on random samples z, following (5). When trained correctly, different locations in the latent space encode different label variants (see section III-A), such that multiple forward passes through the network result in a distribution of predictions (11). To do so, z can be sampled directly from the prior distribution, eliminating the need for a support set S d during inference, following (4). ...
... We refer back to Fig. 1 for an overview of the steps involved during inference. Note that we reuse the output of the feature extractor h(x), when drawing samples from p θ (y|x, z), following (11). As a result, only the forward pass through the upsamling layer u θ and the feed-forward network f θ need to be recomputed several times. ...
Article
Full-text available
Many inherently ambiguous tasks in medical imaging suffer from inter-observer variability, resulting in a reference standard defined by a distribution of labels with high variance. Training only on a consensus or majority vote label, as is common in medical imaging, discards valuable information on uncertainty amongst a panel of experts. In this work, we propose to train on the full label distribution to predict the uncertainty within a panel of experts and the most likely ground-truth label. To do so, we propose a new stochastic classification framework based on the conditional variational auto-encoder, which we refer to as the Latent Doctor Model (LDM). In an extensive comparative analysis, we compare the LDM with a model trained on the majority vote label and other methods capable of learning a distribution of labels. We show that the LDM is able to reproduce the reference-standard distribution significantly better than the majority vote baseline. Compared to the other baseline methods, we demonstrate that the LDM performs best at modeling the label distribution and its corresponding uncertainty in two prostate tumor grading tasks. Furthermore, we show competitive performance of the LDM with the more computationally demanding deep ensembles on a tumor budding classification task.
... First Phase: Initialization under Conditional Training In the first phase of entity training, the HR system is trained on data under the condition that its parent class is positive. This follows what has been done in other work on hierarchical classification (Redmon and Farhadi, 2017;Roy et al., 2020;Yan et al., 2015;Chen et al., 2019;Pham et al., 2021). The intention behind this training regime is that it directly models the conditional probabilities of the entities by learning the dependent relationships between parent and child entities and concentrating on distinguishing lower-level labels, in particular the leaf entities. ...
Preprint
We present RadGraph2, a novel dataset for extracting information from radiology reports that focuses on capturing changes in disease state and device placement over time. We introduce a hierarchical schema that organizes entities based on their relationships and show that using this hierarchy during training improves the performance of an information extraction model. Specifically, we propose a modification to the DyGIE++ framework, resulting in our model HGIE, which outperforms previous models in entity and relation extraction tasks. We demonstrate that RadGraph2 enables models to capture a wider variety of findings and perform better at relation extraction compared to those trained on the original RadGraph dataset. Our work provides the foundation for developing automated systems that can track disease progression over time and develop information extraction models that leverage the natural hierarchy of labels in the medical domain.