Figure 1 - uploaded by Le Lu
Content may be subject to copyright.
An example of OpenI [2] chest x-ray image, report, and annotations.

An example of OpenI [2] chest x-ray image, report, and annotations.

Contexts in source publication

Context 1
... publicly available radiology dataset is ex- ploited which contains chest x-ray images and reports pub- lished on the Web as a part of the OpenI [2] open source literature and biomedical image collections. An example of a chest x-ray image, report, and annotations available on OpenI is shown in Figure 1. ...
Context 2
... ever, a few findings have been rendered uninterpretable. More details about the dataset and the anonymization pro- cedure can be found in [11], and an example case of the dataset is shown in Figure 1. ...
Context 3
... report is structured as comparison, indication, find- ings, and impression sections, in line with a common radi- ology reporting format for diagnostic chest x-rays. In the example shown in Figure 1, we observe an error resulting from the aggressive automated de-identification scheme. A word possibly indicating a disease was falsely detected as a personal information, and was thereby "anonymized" as "XXXX". ...
Context 4
... radiology reports contain comprehensive information about the image and the patient, they may also contain information that cannot be inferred from the image content. For instance, in the example shown in Figure 1, it is probably impossible to determine that the image is of a Burmese male. ...

Citations

... One of the first applications used for detection was in 1995 to detect nodules in X-rays of the lungs [37]. Another object detection algorithm was developed to detect and classify several entities in chest X-rays like cardiomegaly, calcified granulomas, catheters, surgical instruments or thoracic vertebrae [38]. The emergence of convolutional neural networks / DL more than a decade ago opened up completely new possibilities [39]. ...
Article
Full-text available
Objectives Tooth extraction is one of the most frequently performed medical procedures. The indication is based on the combination of clinical and radiological examination and individual patient parameters and should be made with great care. However, determining whether a tooth should be extracted is not always a straightforward decision. Moreover, visual and cognitive pitfalls in the analysis of radiographs may lead to incorrect decisions. Artificial intelligence (AI) could be used as a decision support tool to provide a score of tooth extractability. Material and methods Using 26,956 single teeth images from 1,184 panoramic radiographs (PANs), we trained a ResNet50 network to classify teeth as either extraction-worthy or preservable. For this purpose, teeth were cropped with different margins from PANs and annotated. The usefulness of the AI-based classification as well that of dentists was evaluated on a test dataset. In addition, the explainability of the best AI model was visualized via a class activation mapping using CAMERAS. Results The ROC-AUC for the best AI model to discriminate teeth worthy of preservation was 0.901 with 2% margin on dental images. In contrast, the average ROC-AUC for dentists was only 0.797. With a 19.1% tooth extractions prevalence, the AI model's PR-AUC was 0.749, while the dentist evaluation only reached 0.589. Conclusion AI models outperform dentists/specialists in predicting tooth extraction based solely on X-ray images, while the AI performance improves with increasing contextual information. Clinical relevance AI could help monitor at-risk teeth and reduce errors in indications for extractions.
... For instance, deep learning has been applied to segmentation tasks in brain images, 42,43 image registration, 44 and image fusion for Alzheimer's Disease/Mild Cognitive Impairment (AD/MCI) diagnosis. 45 Additionally, it has been utilized for image annotation in chest X-rays, 46 diagnosis of brain disorders, 47 segmentation of brain tumors, 48 and analysis of microscopic images. 49 These applications highlight the versatility and effectiveness of deep learning in addressing various challenges in medical image analysis, paving the way for improved diagnosis and treatment in healthcare. ...
Article
Full-text available
Objective Assessing pain in individuals with neurological conditions like cerebral palsy is challenging due to limited self-reporting and expression abilities. Current methods lack sensitivity and specificity, underlining the need for a reliable evaluation protocol. An automated facial recognition system could revolutionize pain assessment for such patients. The research focuses on two primary goals: developing a dataset of facial pain expressions for individuals with cerebral palsy and creating a deep learning-based automated system for pain assessment tailored to this group. Methods The study trained ten neural networks using three pain image databases and a newly curated CP-PAIN Dataset of 109 images from cerebral palsy patients, classified by experts using the Facial Action Coding System. Results The InceptionV3 model demonstrated promising results, achieving 62.67% accuracy and a 61.12% F1 score on the CP-PAIN dataset. Explainable AI techniques confirmed the consistency of crucial features for pain identification across models. Conclusion The study underscores the potential of deep learning in developing reliable pain detection systems using facial recognition for individuals with communication impairments due to neurological conditions. A more extensive and diverse dataset could further enhance the models’ sensitivity to subtle pain expressions in cerebral palsy patients and possibly extend to other complex neurological disorders. This research marks a significant step toward more empathetic and accurate pain management for vulnerable populations.
... Medical caption generation transforms visual information from radiological images into coherent, clinically valuable language descriptions. This process is inherently challenging due to the complexity and diversity of medical images, the need for precise and context-aware descriptions, and the necessity to incorporate domain-specific knowledge [3] [4] [5]. ...
Preprint
Full-text available
Purpose: Our study presents an enhanced approach to medical image caption generation by integrating concept detection into attention mechanisms. Method: This method utilizes sophisticated models to identify critical concepts within medical images, which are then refined and incorporated into the caption generation process. Results: Our concept detection task, which employed the Swin-V2 model, achieved an F1 score of 0.58944 on the validation set and 0.61998 on the private test set, securing the third position. For the caption prediction task, our BEiT+BioBart model, enhanced with concept integration and post-processing techniques, attained a BERTScore of 0.60589 on the validation set and 0.5794 on the private test set, placing ninth. Conclusion: These results underscore the efficacy of concept-aware algorithms in generating precise and contextually appropriate medical descriptions. The findings demonstrate that our approach significantly improves the quality of medical image captions, highlighting its potential to enhance medical image interpretation and documentation, thereby contributing to improved healthcare outcomes.
... Captioning prediction, in other words, diagnostic captioning, remains a challenging research problem, designed to support the diagnostic process by providing a preliminary report rather than replacing the physicians and human factors involved [2]. It is designed as a tool to assist in generating an initial diagnostic report of a patient's condition, helping doctors focus on important areas of the image [4] and assisting them in making diagnoses. Guess more accurately quickly [5]. ...
... For the task of medical image captioning, various methods have been developed, with pioneering work in applying the CNN-RNN encoder-decoder approach to generate captions from medical images conducted by Shin et al. [4]. They utilized either the Network-in-Network or GoogLeNet architectures as encoding models, followed by LSTM [18] or GRU [19] as the decoding RNN to translate the encoded images into descriptive captions. ...
Preprint
Full-text available
Purpose: This study focuses on the development of automated text generation from radiology images, termed diagnostic captioning, to assist medical professionals in reducing clinical errors and improving productivity. The aim is to provide tools that enhance report quality and efficiency, which can significantly impact both clinical practice and deep learning research in the biomedical field. Methods: In our participation in the ImageCLEFmedical2024 Caption evaluation campaign, we explored caption prediction tasks using advanced Transformer-based models. We developed methods incorporating Transformer encoder-decoder and Query Transformer architectures. These models were trained and evaluated to generate diagnostic captions from radiology images. Results: Experimental evaluations demonstrated the effectiveness of our models, with the VisionDiagnostor-BioBART model achieving the highest BERTScore of 0.6267. This performance contributed to our team, DarkCow, achieving third place on the leaderboard. Conclusion: Our diagnostic captioning models show great promise in aiding medical professionals by generating high-quality reports efficiently. This approach can facilitate better data processing and performance optimization in medical imaging departments, ultimately benefiting healthcare delivery.
... regions [15]. Current report generation methods rely on global visual features, leading to insufficient structured descriptions of body parts. ...
Preprint
Full-text available
In response to the worldwide COVID-19 pandemic, advanced automated technologies have emerged as valuable tools to aid healthcare professionals in managing an increased workload by improving radiology report generation and prognostic analysis. This study proposes Multi-modality Regional Alignment Network (MRANet), an explainable model for radiology report generation and survival prediction that focuses on high-risk regions. By learning spatial correlation in the detector, MRANet visually grounds region-specific descriptions, providing robust anatomical regions with a completion strategy. The visual features of each region are embedded using a novel survival attention mechanism, offering spatially and risk-aware features for sentence encoding while maintaining global coherence across tasks. A cross LLMs alignment is employed to enhance the image-to-text transfer process, resulting in sentences rich with clinical detail and improved explainability for radiologist. Multi-center experiments validate both MRANet's overall performance and each module's composition within the model, encouraging further advancements in radiology report generation research emphasizing clinical interpretation and trustworthiness in AI models applied to medical studies. The code is available at https://github.com/zzs95/MRANet.
... Radiology and pathology medical images are widely used for the detection and treatment of different diseases [4,5]. Generation of medical reports in multiple sentences from these images is a tedious task as this includes comprehensive examination e.g., X-ray images necessitate a detailed interpretation of visible information, including the airway, lung, cardiovascular system, and disability. ...
Article
Full-text available
Medical Image Captioning (MIC), is a developing area of artificial intelligence that combines two main research areas, computer vision and natural language processing. In order to support clinical workflows and decision-making, MIC is used in a variety of applications pertaining to diagnosis, therapy, report production, and computer-aided diagnosis. The generation of long and coherent reports highlighting correct abnormalities is a challenging task. Therefore, in this direction, this paper presents an efficient \(FDT-D{r}^{2}T\) framework for the generation of coherent radiology reports with efficient exploitation of medical content. The proposed framework leverages the fusion of texture features and deep features in the first stage by incorporating ISCM-LBP + PCA-HOG feature extraction algorithm and Convolutional Triple Attention-based Efficient XceptionNet (\(C-TaXNet\)). Further, fused features from the FDT module are utilized by the Dense Radiology Report Generation Transformer (\(D{r}^{2}T\)) model with modified multi-head attention generating dense radiology reports by highlighting specific crucial abnormalities. To evaluate the performance of the proposed \(FDT-D{r}^{2}T\) extensive experiments are conducted on publicly available IU Chest X-ray dataset and the best performance of the work is observed as 0.531 BLEU@1, 0.398 BLEU@2, 0.322 BLEU@3, 0.251 BLEU@4, 0.384 CIDEr, 0.506 ROUGE-L, 0.277 METEOR. An ablation study is carried out to support the experiments. Overall, the results obtained demonstrate the efficiency and efficacy of the proposed framework.
... Early ARRG research coincided with the 2016 release of the Indiana University X-ray Dataset [41], the first dataset that contained both radiology images and their associated radiology reports. Shin et al. [42] used this dataset to develop a method to automatically annotate medical images using a convolutional neural network (CNN) for their encoder and a recurrent neural network (RNN) for their decoder. In 2018, research into ARRG started gaining popularity [43], [44], however the generated reports from these CNN-RNN methodologies are overly rigid [9] and often repeat phrases from the training set at inappropriate times, leading to summaries that appear less human-like. ...
Preprint
Full-text available
Increasing demands on medical imaging departments are taking a toll on the radiologist's ability to deliver timely and accurate reports. Recent technological advances in artificial intelligence have demonstrated great potential for automatic radiology report generation (ARRG), sparking an explosion of research. This survey paper conducts a methodological review of contemporary ARRG approaches by way of (i) assessing datasets based on characteristics, such as availability, size, and adoption rate, (ii) examining deep learning training methods, such as contrastive learning and reinforcement learning, (iii) exploring state-of-the-art model architectures, including variations of CNN and transformer models, (iv) outlining techniques integrating clinical knowledge through multimodal inputs and knowledge graphs, and (v) scrutinising current model evaluation techniques, including commonly applied NLP metrics and qualitative clinical reviews. Furthermore, the quantitative results of the reviewed models are analysed, where the top performing models are examined to seek further insights. Finally, potential new directions are highlighted, with the adoption of additional datasets from other radiological modalities and improved evaluation methods predicted as important areas of future development.
... Notable efforts in this domain include Schlegl et al. [15], who proposed a convolutional network (CNN) for classifying tissue patterns in tomographies, utilizing semantic descriptions in reports as labels. Building on this success, subsequent neural network models were explored for X-rays, such as Shin et al. [16], who introduced a CNN for chest X-ray images and a recurrent network (RNN) for annotations, jointly trained to annotate diseases, anatomy, and severity. Other approaches, like that of Moradi et al. [17], focused on annotation through the concatenation of a CNN and an RNN block to identify regions of interest. ...
Article
Full-text available
Deep learning is revolutionizing radiology report generation (RRG) with the adoption of vision encoder–decoder (VED) frameworks, which transform radiographs into detailed medical reports. Traditional methods, however, often generate reports of limited diversity and struggle with generalization. Our research introduces reinforcement learning and text augmentation to tackle these issues, significantly improving report quality and variability. By employing RadGraph as a reward metric and innovating in text augmentation, we surpass existing benchmarks like BLEU4, ROUGE-L, F1CheXbert, and RadGraph, setting new standards for report accuracy and diversity on MIMIC-CXR and Open-i datasets. Our VED model achieves F1-scores of 66.2 for CheXbert and 37.8 for RadGraph on the MIMIC-CXR dataset, and 54.7 and 45.6, respectively, on Open-i. These outcomes represent a significant breakthrough in the RRG field. The findings and implementation of the proposed approach, aimed at enhancing diagnostic precision and radiological interpretations in clinical settings, are publicly available on GitHub to encourage further advancements in the field.
... By leveraging the hierarchical structure of CNNs, the system effectively learns and discriminates between normal lung anatomy and various disease manifestations. Additionally, the incorporation of image retrieval techniques enhances the system's utility by allowing medical practitioners to swiftly access relevant images for comparative analysis and reference [12][13][14][15][16]. ...
... Initially introduced by Cho et al. [9] for machine translation, this framework found applications in various fields, including medical report generation. Shin et al. [10] pioneered the adoption of the encoder-decoder framework for CXR report generation in 2016. They leverage transfer learning and pipelined CNN-RNN model for generating improved annotations. ...
Article
Full-text available
The deep neural networks have facilitated the radiologists to large extent by automating the process of radiological report generation. Majority of the researchers have focussed on improving the learning focus of the model using attention mechanism, reinforcement learning and other techniques. Most of them, have not considered the textual information present in the ground truth radiological reports. In downstream language tasks like text classification, word embedding has played vital role in extracting textual features. Inspired from the same, we empirically study the impact of different word embedding techniques on radiological report generation tasks. In this work, we have used a convolutional neural network and large language model to extract visual and textual features, respectively. Recurrent neural network is used to generate the reports. The proposed method outperforms most of the state-of-the-art methods by achieving following evaluation metrics scores: BLEU-1 = 0.612, BLEU-2 = 0.610, BLEU-3 = 0.608, BLEU-4 = 0.606, ROUGE = 0.811, and CIDEr = 0.317. This work confirms that pre-trained large language model gives significantly better results that other word embedding techniques.