ArticlePublisher preview available

DenUnet: enhancing dental image segmentation through edge and body fusion

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

Accurate tooth segmentation is of paramount importance in oral healthcare because it provides critical positional data for clinical diagnosis, orthodontic treatment, and surgical procedures. Despite the widespread use of convolutional neural networks (CNNs) in image segmentation, its limitations in collecting complete global context and long-range interactions are acknowledged. Although vision transformers show promise in understanding larger contextual information, they struggle to manage the spatial complexities of medical images. To tackle these issues, the proposed DenUnet leverages the strengths of both CNNs and transformers. It introduces a dual-branch encoder that simultaneously extracts edge and body information and a double-level fusion module for integrating multi-scale features. To ensure the fine fusing of edge and body information derived from the two mentioned encoders, we propose a local cross-attention feature fusion module to enhance feature fusion with accurate blending losses. Experimental results underscore the superior efficacy of DenUnet in comparison to existing methods. We achieved 95.4% accuracy and 92.7% F1-score on the DNS dataset, which is particularly evident in its ability to adeptly handle irregular boundaries of dental datasets. Code is available at https://github.com/Omid-Nejati/DenUnet
This content is subject to copyright. Terms and conditions apply.
Multimedia Tools and Applications
https://doi.org/10.1007/s11042-024-19513-0
DenUnet: enhancing dental image segmentation through
edge and body fusion
Omid Nejati Manzari1·Farhad Bayrami2·Hooman Khaloo3·
Zahra Khodakaramimaghsoud4·Shahriar B. Shokouhi1
Received: 11 September 2023 / Revised: 12 March 2024 / Accepted: 26 May 2024
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024
Abstract
Accurate tooth segmentation is of paramount importance in oral healthcare because it provides
critical positional data for clinical diagnosis, orthodontic treatment, and surgical procedures.
Despite the widespread use of convolutional neural networks (CNNs) in image segmenta-
tion, its limitations in collecting complete global context and long-range interactions are
acknowledged. Although vision transformers show promise in understanding larger contex-
tual information, they struggle to manage the spatial complexities of medical images. To
tackle these issues, the proposed DenUnet leverages the strengths of both CNNs and trans-
formers. It introduces a dual-branch encoder that simultaneously extracts edge and body
information and a double-level fusion module for integrating multi-scale features. To ensure
the fine fusing of edge and body information derived from the two mentioned encoders, we
propose a local cross-attention feature fusion module to enhance feature fusion with accurate
blending losses. Experimental results underscore the superior efficacy of DenUnet in com-
parison to existing methods. We achieved 95.4% accuracy and 92.7% F1-score on the DNS
dataset, which is particularly evident in its ability to adeptly handle irregular boundaries of
dental datasets. Code is available at https://github.com/Omid-Nejati/DenUnet
Keywords Teeth segmentation ·Deep learning ·Transformer ·Panoramic dental x-ray
1 Introduction
Computer-assisted decisions are vital in dentistry for diagnosis and treatment planning, facili-
tated by dental imaging that provides insights beyond clinical tests [1]. Dental X-rays provide
a two-dimensional perspective of the entire mouth [2]. Meanwhile, the multifaceted signif-
icance of oral health extends beyond mere dimensions, encompassing vital functions like
BOmid Nejati Manzari
omid_nejaty@alumni.iust.ac.ir
1School of Electrical Engineering, Iran University of Science and Technology, Tehran, Iran
2Department of Computer Science and Engineering (DISI), University of Bologna, Bologna, Italy
3Independent Researcher, Tehran, Iran
4Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Dental caries is one of the most chronic diseases involving the majority of the population during their lifetime. Caries lesions are typically diagnosed by general dentists relying only on their visual inspection using dental x-rays. In many cases, dental caries is hard to identify in x-rays and can be misinterpreted as shadows due to the low image quality. In this research study, we propose an automatic diagnosis system to detect dental caries in Panoramic images, which benefits from various deep pretrained models through transfer learning to extract relevant features and uses a capsule network to draw prediction results. Using a dataset of 470 Panoramic images, our model achieved an accuracy of 86.05% on the test set. The obtained score demonstrates acceptable detection performance and an increase in caries detection speed, as long as the challenges of using Panoramic x-rays are taken into account. Among carious samples, our model acquired recall scores of 69.44% and 90.52% for mild and severe ones, confirming the fact that severe caries spots are more straightforward to detect and efficient mild caries detection needs a larger dataset. Considering the novelty of current study as using Panoramic images, following work is a step towards developing a fully automated system to assist domain experts.
Article
Full-text available
Brain tumor segmentation in multimodal MRI has great significance in clinical diagnosis and treatment. The utilization of multimodal information plays a crucial role in brain tumor segmentation. However, most existing methods focus on the extraction and selection of deep semantic features, while ignoring some features with specific meaning and importance to the segmentation problem. In this paper, we propose a brain tumor segmentation method based on the fusion of deep semantics and edge information in multimodal MRI, aiming to achieve a more sufficient utilization of multimodal information for accurate segmentation. The proposed method mainly consists of a semantic segmentation module, an edge detection module and a feature fusion module. In the semantic segmentation module, the Swin Transformer is adopted to extract semantic features and a shifted patch tokenization strategy is introduced for better training. The edge detection module is designed based on convolutional neural networks (CNNs) and an edge spatial attention block (ESAB) is presented for feature enhancement. The feature fusion module aims to fuse the extracted semantic and edge features, and we design a multi-feature inference block (MFIB) based on graph convolution to perform feature reasoning and information dissemination for effective feature fusion. The proposed method is validated on the popular BraTS benchmarks. The experimental results verify that the proposed method outperforms a number of state-of-the-art brain tumor segmentation methods. The source code of the proposed method is available at https://github.com/HXY-99/brats.
Chapter
Medical image segmentation plays an essential role in developing computer-assisted diagnosis and treatment systems, yet it still faces numerous challenges. In the past few years, Convolutional Neural Networks (CNNs) have been successfully applied to the task of medical image segmentation. Regrettably, due to the locality of convolution operations, these CNN-based architectures have their limitations in learning global context information in images, which might be crucial to the success of medical image segmentation. Meanwhile, the vision Transformer (ViT) architectures own the remarkable ability to extract long-range semantic features with the shortcoming of their computation complexity. To make medical image segmentation more efficient and accurate, we present a novel light-weight architecture named LeViT-UNet, which integrates multi-stage Transformer blocks in the encoder via LeViT, aiming to explore the effectiveness of fusion between local and global features together. Our experiments on two challenging segmentation benchmarks indicate that the proposed LeViT-UNet achieved competitive performance compared with various state-of-the-art methods in terms of efficiency and accuracy, suggesting that LeViT can be a faster feature encoder for medical images segmentation. LeViT-UNet-384, for instance, achieves Dice similarity coefficient (DSC) of 78.53% and 90.32% with a segmentation speed of 85 frames per second (FPS) in the Synapse and ACDC datasets, respectively. Therefore, the proposed architecture could be beneficial for prospective clinic trials conducted by the radiologists. Our source codes are publicly available at https://github.com/apple1986/LeViT_UNet.
Article
Medical image segmentation is indispensable for diagnosis and prognosis of many diseases. To improve the segmentation performance, this study proposes a new 2D body and edge aware network with multi-scale short-term concatenation for medical image segmentation. Multi-scale short-term concatenation modules which concatenate successive convolution layers with different receptive fields, are proposed for capturing multi-scale representations with fewer parameters. Body generation modules with feature adjustment based on weight map computing via enlarging the receptive fields, and edge generation modules with multi-scale convolutions using Sobel kernels for edge detection, are proposed to separately learn body and edge features from convolutional features in decoders, making the proposed network be body and edge aware. Based on the body and edge modules, we design parallel body and edge decoders whose outputs are fused to achieve the final segmentation. Besides, deep supervision from the body and edge decoders is applied to ensure the effectiveness of the generated body and edge features and further improve the final segmentation. The proposed method is trained and evaluated on six public medical image segmentation datasets to show its effectiveness and generality. Experimental results show that the proposed method achieves better average Dice similarity coefficient and 95% Hausdorff distance than several benchmarks on all used datasets. Ablation studies validate the effectiveness of the proposed multi-scale representation learning modules, body and edge generation modules and deep supervision. The code is available at https://github.com/hulinkuang/BEA-Net .
Article
Convolutional Neural Networks (CNNs) have advanced existing medical systems for automatic disease diagnosis. However, there are still concerns about the reliability of deep medical diagnosis systems against the potential threats of adversarial attacks since inaccurate diagnosis could lead to disastrous consequences in the safety realm. In this study, we propose a highly robust yet efficient CNN-Transformer hybrid model which is equipped with the locality of CNNs as well as the global connectivity of vision Transformers. To mitigate the high quadratic complexity of the self-attention mechanism while jointly attending to information in various representation subspaces, we construct our attention mechanism by means of an efficient convolution operation. Moreover, to alleviate the fragility of our Transformer model against adversarial attacks, we attempt to learn smoother decision boundaries. To this end, we augment the shape information of an image in the high-level feature space by permuting the feature mean and variance within mini-batches. With less computational complexity, our proposed hybrid model demonstrates its high robustness and generalization ability compared to the state-of-the-art studies on a large-scale collection of standardized MedMNIST-2D datasets.
Chapter
In the past few years, convolutional neural networks (CNNs) have achieved milestones in medical image analysis. In particular, deep neural networks based on U-shaped architecture and skip-connections have been widely applied in various medical image tasks. However, although CNN has achieved excellent performance, it cannot learn global semantic information interaction well due to the locality of convolution operation. In this paper, we propose Swin-Unet, which is an Unet-like pure Transformer for medical image segmentation. The tokenized image patches are fed into the Transformer-based U-shaped Encoder-Decoder architecture with skip-connections for local-global semantic feature learning. Specifically, we use a hierarchical Swin Transformer with shifted windows as the encoder to extract context features. And a symmetric Swin Transformer-based decoder with a patch expanding layer is designed to perform the up-sampling operation to restore the spatial resolution of the feature maps. Under the direct down-sampling and up-sampling of the inputs and outputs by 4×, experiments on multi-organ and cardiac segmentation tasks demonstrate that the pure Transformer-based U-shaped Encoder-Decoder network outperforms those methods with full-convolution or the combination of transformer and convolution. The codes have been publicly available at the link (https://github.com/HuCaoFighting/Swin-Unet).
Article
Accurate and automatic segmentation of individual tooth and root canal from cone-beam computed tomography (CBCT) images is an essential but challenging step for dental surgical planning. In this paper, we propose a novel framework, which consists of two neural networks, DentalNet and PulpNet, for efficient, precise, and fully automatic tooth instance segmentation and root canal segmentation from CBCT images. We first use the proposed DentalNet to achieve tooth instance segmentation and identification. Then, the region of interest (ROI) of the affected tooth is extracted and fed into the PulpNet to obtain precise segmentation of the pulp chamber and the root canal space. These two networks are trained by multi-task feature learning and evaluated on two clinical datasets respectively and achieve superior performances to several comparing methods. In addition, we incorporate our method into an efficient clinical workflow to improve the surgical planning process. In two clinical case studies, our workflow took only 2 min instead of 6 h to obtain the 3D model of tooth and root canal effectively for the surgical planning, resulting in satisfying outcomes in difficult root canal treatments.
Article
Background and objective It is very significant in orthodontics and restorative dentistry that the teeth are segmented from dental panoramic X-ray images. Nevertheless, there are some problems in panoramic X-ray images of teeth, such as blurred interdental boundaries, low contrast between teeth and alveolar bone. Methods In this paper, The Teeth U-Net model is proposed in this paper to resolve these problems. This paper makes the following contributions: Firstly, a Squeeze-Excitation Module is utilized in the encoder and the decoder. And proposing a dense skip connection between encoder and decoder to reduce the semantic gap. Secondly, due to the irregular shape of the teeth and the low contrast of the dental panoramic X-ray images. A Multi-scale Aggregation attention Block (MAB) in the bottleneck layer is designed to resolve this problem, which can effectively extract teeth shape features and fuse multi-scale features adaptively. Thirdly, in order to capture dental feature information in a larger field of perception, this paper designs a Dilated Hybrid self-Attentive Block (DHAB) at the bottleneck layer. This module effectively suppresses the task-irrelevant background region information without increasing the network parameters. Finally, the effectiveness of the algorithm is validated using a clinical dental panoramic X-ray image datasets. Results The results of the three comparison experiments are shown that Accuracy, Precision, Recall, Dice, Volumetric Overlap Error and Relative Volume Difference for dental panoramic X-ray teeth segmentation are 98.53%, 95.62%, 94.51%, 94.28%, 88.92% and 95.97% by the proposed model respectively. Conclusion The proposed modules complement each other in processing every detail of the dental panoramic X-ray images, which can effectively improve the efficiency of preoperative preparation and postoperative evaluation, and promote the application of dental panoramic X-ray in medical image segmentation. There are more accuracy about Teeth U-Net than others model in dental panoramic X-ray teeth segmentation. That is very important to clinical doctors to cure in orthodontics and restorative dentistry.